Can't find error in ACM module for C

Question:

I am writing a module for a c-program on nasm – bubble sort. For some reason, the program crashes when executed. The whole day I tried to understand the reason – it did not work out. Tried different implementations of this when – useless. NASM gives no errors, and the module crashes on execution.

ACM module:

global _bubblesort_asm
section .text

_bubblesort_asm:
    mov ebx, [esp+4] ; EBX = *m
    mov esi, [esp+8] ; ESI = size
    dec esi ; ESI = size-1 = I
    loop_i:
        xor edi, edi ; J = 0
        loop_j:
            mov ecx, [ebx+edi*4] ; ecx = value of m[J]
            mov edx, [ebx+edi*4+4] ; edx = value of m[j+1]
            cmp edx, ecx ; if(m[j] <= m[j+1])  goto skip
            jns skip
                mov [ebx+edi*4], edx ; mov value m[j+1] to m[j] adress
                mov [ebx+edi*4+4], ecx ; mov value m[j] to m[j+1] adress
            skip:
        inc edi ; j++
        cmp edi, esi ; if(j!=i) goto loop_j
        jnz loop_j
    dec esi ; I--
    cmp esi, 0 ; if(i>=0) goto loop_i
    jns loop_i
retn
;    for(i=size-1; i>=0; i--)
;        for(j=0; j<i; j++)
;            if(m[j] > m[j+1])
;                {temp = m[j]; m[j] = m[j+1]; m[j+1] = temp;}

main.c:

int main()
{
    srand(time(NULL));
    int i,size=100,m[size];
    for(i=0;i<size;i++)
    {
        m[i]=rand()%100;
    }

    bubblesort_asm(m,size);

    for(i=0;i<size;i++)
    {
        printf("%d ",m[i]);
    }
    return 0;

}

Answer:

I compiled a C-based sort using mingw gcc with -O2 optimization, disassembled it, and it turned out like this:

.text:004013C5                 mov     edx, [ebp+m]
.text:004013C8                 mov     esi, [ebp+size]
.text:004013CB                 dec     esi
.text:004013CC                 js      short loc_4013F5
.text:004013CE                 db      66h ; выравнивание
.text:004013CE                 nop         ; выравнивание
.text:004013D0
.text:004013D0 loc_4013D0:                             ; CODE XREF: bubblesort+33j
.text:004013D0                 test    esi, esi
.text:004013D2                 jle     short loc_4013EF
.text:004013D4                 xor     eax, eax
.text:004013D6                 db      66h ; выравнивание
.text:004013D6                 nop         ; выравнивание
.text:004013D8
.text:004013D8 loc_4013D8:                             ; CODE XREF: bubblesort+2Dj
.text:004013D8                 mov     ecx, [edx+eax*4]
.text:004013DB                 mov     ebx, [edx+eax*4+4]
.text:004013DF                 cmp     ecx, ebx
.text:004013E1                 jle     short loc_4013EA
.text:004013E3                 mov     [edx+eax*4], ebx
.text:004013E6                 mov     [edx+eax*4+4], ecx
.text:004013EA
.text:004013EA loc_4013EA:                             ; CODE XREF: bubblesort+21j
.text:004013EA                 inc     eax
.text:004013EB                 cmp     eax, esi
.text:004013ED                 jnz     short loc_4013D8
.text:004013EF
.text:004013EF loc_4013EF:                             ; CODE XREF: bubblesort+12j
.text:004013EF                 dec     esi
.text:004013F0                 cmp     esi, -1
.text:004013F3                 jnz     short loc_4013D0
.text:004013F5
.text:004013F5 loc_4013F5:                             ; CODE XREF: bubblesort+Cj

The code is almost identical to the one in the question, differs in some details, plus gcc added alignment (66h is a prefix modifier for the size of the operand, it does nothing before nop).


Addition about the difference between js and jl.

In essence, cmp is a subtraction that does not store the result, but sets flags. If you compare small modulo numbers, then there will be no difference in the results. But if we compare numbers that, when subtracted, give a result modulo greater than half the dimension of the variable (register), then sf will have a different value, which could be when comparing numbers smaller in modulus.

On the example of values ​​within a byte. Compare -10 and 10 inside byte registers:

mov al, -10
mov cl, 10
cmp al, cl
; результат вычитания -20, флаг SF установлен
; и по js и по jl переход произойдет

Now on the example of numbers -100 and 100:

mov al, -100
mov cl, 100
cmp al, cl
; результат вычитания "на бумаге" -200
; но в пределах байта это уже не -200, а 56, флаг знака сброшен,
; поэтому по js переход не произойдет.
; Зато выставлен флаг переполнения - OF
; По jl, которое проверяет и SF, и OF, переход произойдет.

Addition 2. By half the bit depth, I mean this: for example, signed numbers from -128 to +127 can be stored in a byte register. In Word – from -2^15 to +2^15-1, in Court – from -2^31 to +2^31-1. Accordingly, if the subtraction result goes beyond these limits, then the sign of the subtraction result turns out to be incorrect (and the result itself will be incorrect, but in the case of cmp it does not interest us) and the overflow flag is set. Therefore, to determine that one number is greater than another, both the sign flag and the overflow flag must be taken into account, which is what the jump commands jl (jnge), jnl (jge), jg (jnle), jng (jle) do.

Concerning departure from the outer loop cmp esi, 0 ; jns loop_i. The last iteration of the outer loop must be executed at esi==0. Then the following happens in the inner loop:

  1. edi will reset
  2. if necessary, the operation of permutation of adjacent elements of the array will be performed
  3. edi will increase by 1
  4. will compare edi (1) with esi (0)
  5. since they are not equal (edi is already greater than esi), the loop will not exit
  6. then edi will grow and grow until it reaches the end of the writable memory area.

If we replace cmp esi,0 with cmp esi,1, then the last iteration of the outer loop will be executed at esi==1, and the inner loop will end with one iteration. But there are still problems here, if the array consists of one element or is completely empty, then esi will be less than edi when checking the exit condition from the inner loop and the exit condition will not work again.

To avoid such "jumping" of the inner loop, you need to add a check before entering it ("precondition").

Scroll to Top