I have problem with my assembly code: I need to multiply two arrays, then add up the result and get a square root out of it. I've did the code and looks like it works fine, but I need to receive 9.16, but instead I'm getting 9.0.
I guess problem somewhere in the loop or in addpd, but I don't know how to fix it.
include /masm64/include64/masm64rt.inc
INCLUDELIB MSVCRT
option casemap:none
.data
array1 dq 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0
array2 dq 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0
result dq 0.0
res dq 0.0
tit1 db "Result of using the SSE", 0
buf BYTE 260 dup(?)
bufp QWORD buf, 0
loop_count dq 7
.code
entry_point proc
; Load the two arrays into SSE registers
movupd xmm1, [array1]
movupd xmm2, [array2]
mov rcx, loop_count ; Number of function iterations
loop1:
mulpd xmm1, xmm2
addpd xmm3, xmm1
movupd xmm1, [array1 + 8]
movupd xmm2, [array2 + 8]
loop loop1
; Add the result and store to xmm1
addpd xmm1, xmm3
; Compute the square root of the sum of squares in xmm1
sqrtpd xmm1, xmm1
; Move the result into a general-purpose register for output
movsd res, xmm1
invoke fptoa, res, bufp
invoke MessageBox, 0, bufp, addr tit1, MB_OK
invoke ExitProcess, 0
entry_point endp
end
I've tried to multiply two arrays without using the loop, just mulpd, but I guess this is not the best decision.
movupd xmm1, [array1 + 8]loads from the same place every iteration. You need a pointer or index in a register. (e.g. in RCX if you count up towardsloop_countinstead of using the slowloopinstruction). Also, why are you loading 2 elements at once withpd(packed double) instead ofsd(scalar double) instructions? At the end you usemovsdto store just the lowdoubleelement, so the upper halves were useless. If you wanted to use SSE for SIMD instead of scalar, you'd advance a pointer by 16 bytes (2 elements), but you'd need scalar cleanup if the array length is odd.pdinstead for everything except the finalmovsdwhich only saves the low element. Since scalar SSE is the simplest and standard way to do FP math on x86-64, I don't think we should assume they intended SIMD, especially when the bugs are with even more basic things.