Question:
I am writing a module for a c-program on nasm – bubble sort. For some reason, the program crashes when executed. The whole day I tried to understand the reason – it did not work out. Tried different implementations of this when – useless. NASM gives no errors, and the module crashes on execution.
ACM module:
global _bubblesort_asm
section .text
_bubblesort_asm:
mov ebx, [esp+4] ; EBX = *m
mov esi, [esp+8] ; ESI = size
dec esi ; ESI = size-1 = I
loop_i:
xor edi, edi ; J = 0
loop_j:
mov ecx, [ebx+edi*4] ; ecx = value of m[J]
mov edx, [ebx+edi*4+4] ; edx = value of m[j+1]
cmp edx, ecx ; if(m[j] <= m[j+1]) goto skip
jns skip
mov [ebx+edi*4], edx ; mov value m[j+1] to m[j] adress
mov [ebx+edi*4+4], ecx ; mov value m[j] to m[j+1] adress
skip:
inc edi ; j++
cmp edi, esi ; if(j!=i) goto loop_j
jnz loop_j
dec esi ; I--
cmp esi, 0 ; if(i>=0) goto loop_i
jns loop_i
retn
; for(i=size-1; i>=0; i--)
; for(j=0; j<i; j++)
; if(m[j] > m[j+1])
; {temp = m[j]; m[j] = m[j+1]; m[j+1] = temp;}
main.c:
int main()
{
srand(time(NULL));
int i,size=100,m[size];
for(i=0;i<size;i++)
{
m[i]=rand()%100;
}
bubblesort_asm(m,size);
for(i=0;i<size;i++)
{
printf("%d ",m[i]);
}
return 0;
}
Answer:
I compiled a C-based sort using mingw gcc with -O2 optimization, disassembled it, and it turned out like this:
.text:004013C5 mov edx, [ebp+m]
.text:004013C8 mov esi, [ebp+size]
.text:004013CB dec esi
.text:004013CC js short loc_4013F5
.text:004013CE db 66h ; выравнивание
.text:004013CE nop ; выравнивание
.text:004013D0
.text:004013D0 loc_4013D0: ; CODE XREF: bubblesort+33j
.text:004013D0 test esi, esi
.text:004013D2 jle short loc_4013EF
.text:004013D4 xor eax, eax
.text:004013D6 db 66h ; выравнивание
.text:004013D6 nop ; выравнивание
.text:004013D8
.text:004013D8 loc_4013D8: ; CODE XREF: bubblesort+2Dj
.text:004013D8 mov ecx, [edx+eax*4]
.text:004013DB mov ebx, [edx+eax*4+4]
.text:004013DF cmp ecx, ebx
.text:004013E1 jle short loc_4013EA
.text:004013E3 mov [edx+eax*4], ebx
.text:004013E6 mov [edx+eax*4+4], ecx
.text:004013EA
.text:004013EA loc_4013EA: ; CODE XREF: bubblesort+21j
.text:004013EA inc eax
.text:004013EB cmp eax, esi
.text:004013ED jnz short loc_4013D8
.text:004013EF
.text:004013EF loc_4013EF: ; CODE XREF: bubblesort+12j
.text:004013EF dec esi
.text:004013F0 cmp esi, -1
.text:004013F3 jnz short loc_4013D0
.text:004013F5
.text:004013F5 loc_4013F5: ; CODE XREF: bubblesort+Cj
The code is almost identical to the one in the question, differs in some details, plus gcc added alignment (66h is a prefix modifier for the size of the operand, it does nothing before nop).
Addition about the difference between js and jl.
In essence, cmp is a subtraction that does not store the result, but sets flags. If you compare small modulo numbers, then there will be no difference in the results. But if we compare numbers that, when subtracted, give a result modulo greater than half the dimension of the variable (register), then sf will have a different value, which could be when comparing numbers smaller in modulus.
On the example of values within a byte. Compare -10 and 10 inside byte registers:
mov al, -10
mov cl, 10
cmp al, cl
; результат вычитания -20, флаг SF установлен
; и по js и по jl переход произойдет
Now on the example of numbers -100 and 100:
mov al, -100
mov cl, 100
cmp al, cl
; результат вычитания "на бумаге" -200
; но в пределах байта это уже не -200, а 56, флаг знака сброшен,
; поэтому по js переход не произойдет.
; Зато выставлен флаг переполнения - OF
; По jl, которое проверяет и SF, и OF, переход произойдет.
Addition 2. By half the bit depth, I mean this: for example, signed numbers from -128 to +127 can be stored in a byte register. In Word – from -2^15 to +2^15-1, in Court – from -2^31 to +2^31-1. Accordingly, if the subtraction result goes beyond these limits, then the sign of the subtraction result turns out to be incorrect (and the result itself will be incorrect, but in the case of cmp it does not interest us) and the overflow flag is set. Therefore, to determine that one number is greater than another, both the sign flag and the overflow flag must be taken into account, which is what the jump commands jl (jnge), jnl (jge), jg (jnle), jng (jle) do.
Concerning departure from the outer loop cmp esi, 0 ; jns loop_i. The last iteration of the outer loop must be executed at esi==0. Then the following happens in the inner loop:
- edi will reset
- if necessary, the operation of permutation of adjacent elements of the array will be performed
- edi will increase by 1
- will compare edi (1) with esi (0)
- since they are not equal (edi is already greater than esi), the loop will not exit
- then edi will grow and grow until it reaches the end of the writable memory area.
If we replace cmp esi,0 with cmp esi,1, then the last iteration of the outer loop will be executed at esi==1, and the inner loop will end with one iteration. But there are still problems here, if the array consists of one element or is completely empty, then esi will be less than edi when checking the exit condition from the inner loop and the exit condition will not work again.
To avoid such "jumping" of the inner loop, you need to add a check before entering it ("precondition").