java – Why invert the array index into a for to populate the array makes the process faster?


I was doing some tests in Java and I noticed a very strange behavior. I made an array with two dimensions and populated the array in two different ways, filling it with random numbers up to 2^31-1 and just changing its index. I got a considerable performance gain by doing this, but I don't understand why.

Here's the code:

public class Array {

    static int tamanhoX = 12000;
    static int tamanhoY = 12000;
    static int array[][] = new int[tamanhoX][tamanhoY];
    static long tempoInicio = 0, tempoFim = 0;
    static double mediaTempo = 0.0;

    public static void main(String[] args) {
        //Armazena o tempo das operações
        long tempos[] = new long[10];

        //Calcula o tempo da operação diversas vezes:
        for (int i = 0; i < tempos.length; i++) {
            //Armazena o valor retornado dentro do array: tempos[] na posição: i
            tempos[i] = calculaTempo();

        //Soma todos os valores armazenados no array: tempos[]
        for (int i = 0; i < tempos.length; i++) {
            mediaTempo += tempos[i];

        System.out.println("Tempo total: " + (mediaTempo / 1000) + "s");
        mediaTempo = mediaTempo / tempos.length;//Calcula média
        System.out.println("Média de tempo é de: " + mediaTempo + "ms");

    static long calculaTempo() {
        //Armazena o momento de início da operação em milisegundos
        tempoInicio = System.currentTimeMillis();
        for (int i = 0; i < tamanhoX; i++) {
            for (int j = 0; j < tamanhoY; j++) {
                //Preenche com um valor aleatório:
                //Mais lento
                //array[j][i] = (int) (Math.random() * Integer.MAX_VALUE);
                //Mais rápido
                array[i][j] = (int) (Math.random() * Integer.MAX_VALUE);
        //Armazena o tempo final da operação
        tempoFim = System.currentTimeMillis();
        System.out.println("Tempo: " + (tempoFim - tempoInicio) + "ms");
        return (tempoFim - tempoInicio);

Note that I only changed a single line, the array assignment. In my tests I had an average of 90 seconds for:

array[j][i] = (int) (Math.random() * Integer.MAX_VALUE);

and 48 seconds for:

array[i][j] = (int) (Math.random() * Integer.MAX_VALUE);

I would love to understand the reason for this. Why this time difference in just inverting the array index?

Heads up! Do not run this code on computers with low memory. There is a risk of crashing your operating system.


On modern processors the bottleneck is not in the CPU, but in the cache . This means that the fastest program is not [necessarily] the one that executes the fewest instructions, but the one that accesses memory more efficiently (ie causes less cache misses ).

In Java, a multidimensional array is represented as an "array of arrays". This means that the first array is an array of references, and the rest are arrays of integers. Both try to occupy contiguous memory locations:

int[][] array = new int[4][4];
// É equivalente a:
int[] a = { 0, 0, 0, 0 };
int[] b = { 0, 0, 0, 0 };
int[] c = { 0, 0, 0, 0 };
int[] d = { 0, 0, 0, 0 };
int[][] array = { a, b, c, d };

Let's assume, for simplicity, that your L1 cache is size 4 , and that there are only two "pages" available. In principle, array is in cache. When your loop does:

array[0][0] = ...;

It tries to load array (great, it's already in the cache!) and then a (it's not in the cache, look it up in the next level and/or in RAM). Then it assigns normally.

If after that he does:

array[0][1] = ...;

He will find array in the cache and a also in the cache! The operation is quite fast. The same goes for array[0][2] and array[0][3] . Only when it tries to assign array[1][0] will it not find b in the cache – and will have to fetch it.

But if he does instead:

array[1][0] = ...;

So b won't be in the cache, and it will have to fetch it right away. Then comes the array[2][0] , and look: c is not in the cache either, there it goes again. By the time it goes back to array[1][1] it will have loaded and unloaded the entire memory region of your program in the cache, and now it will have to do it all over again…

In practice, the difference is just not that glaring anymore because the cache is not that small, and there are several levels (L1, L2…) – so even if your data structure is not in L1 maybe it is in L2 (ie you don't have to search there from the beginning). But the bigger the data, the bigger this difference becomes – especially if physical memory runs out and it starts paging (ie using virtual/swap memory).

Scroll to Top