How to write fast Java code – thinking about memory by Anders Peterson

preview_player
Показать описание
In this talk I'll discuss things that affect (CPU bound) performance. A key message is that Java developers DO need to worry about memory, even if garbage collection rarely is a problem.

Much of the talk is focused around a demo of a simple benchmark.

Anders Peterson, Optimatika

Recorded at Jfokus 2023 in Stockholm 7th of february
Рекомендации по теме
Комментарии
Автор

The method loopICJ works much faster than loopJCI due to better cache locality and memory access patterns.
Here's a detailed explanation:
1. Cache Locality:
- Modern CPUs have multiple levels of cache (L1, L2, L3) to speed up memory access.
- When data is accessed, it is loaded into the cache in blocks (cache lines).
- Accessing data sequentially (in a linear fashion) is faster because it takes advantage of spatial locality, meaning consecutive memory accesses are likely to be in the same cache line.

2. Memory Access Patterns:
- In loopICJ, the innermost loop iterates over the right matrix's rows (right[c][j]), which means it accesses elements in a row-major order (sequentially in memory).
- In loopJCI, the innermost loop iterates over the left matrix's columns (left[i][c]), which means it accesses elements in a column-major order (non-sequentially in memory).

3. Impact on Performance:
- loopICJ benefits from better cache utilization because it accesses memory in a more cache-friendly manner.
- loopJCI suffers from poor cache performance because it accesses memory in a way that causes more cache misses (accessing columns instead of rows).

In summary, loopICJ is faster because it accesses memory in a way that is more efficient for the CPU cache, leading to fewer cache misses and better overall performance.

RomanTchekashov