I've been trying to teach myself how to accomplish certain tasks in assembly.
Right now, I am working on trying to detect palindromes. I know I could use a stack, or possibly compare strings using Irvine's library, but I'm trying to do it via registers.
The problem is, when it comes to using registers, I'm more than a bit confused.
The following compiles, but when I get to the CMP line, the program breaks and gives me this message:
Unhandled exception at 0x004033FC in Project.exe: 0xC0000005: Access violation reading location 0x0000000F.
I'm assuming it has something to do with how I set the registers, but even using the registers while debugging isn't helping me much.
Any help would be appreciated.
INCLUDE Irvine32.inc
.data
enteredWord BYTE "Please enter the string to check: ", 0
presetWord BYTE "Step on no pets", 0
isAPalindrome BYTE "The word is a palindrome. ", 0
isNotAPalindrome BYTE "The word is not a palindrome. ", 0
.code
main proc
mov ecx, SIZEOF presetWord - 1
mov esi,OFFSET presetWord
checkWord:
MOV eax,[esi]
CMP [ecx],eax
JNE NOTPALIN
inc esi
dec ecx
loop checkWord
mov edx, offset isAPalindrome
call WriteString
jmp _exit
main endp
NOTPALIN PROC
mov edx, offset isNotAPalindrome
call WriteString
ret
NOTPALIN endp
_exit:
exit
end main
CPU register is piece of computer memory located directly inside the CPU core. Piece of computer memory means some amount of bits (0/1), in case of 64b x86 CPU the general registers are 64 bits "wide", under names
rax, rcx, rdx, rbx, ..
.The
ecx
is the lower 32b part ofrcx
(upper 32b part is not accessible under special name, only through instructions usingrcx
). And the lower 16b part is accessible throughcx
, which is composed from two 8b partsch
(upper), andcl
(lower).So as you are using
ecx
, you can set 32 bits to either 0 or 1. Which can interpreted as unsigned number from 0 to 232-1 (in hexa0 .. 0xFFFFFFFF
), or as signed number from -231 to +231-1 (0x80000000 .. 0x7FFFFFFF
). Or you can interpret the meaning of those bits in any way you wish, and write code for.In your code you can utilize three common ways how to interpret value of bits in some CPU register.
In your example doing
cmp [ecx],eax
means to reference memory at address 15, which is fortunately for you illegal, so it does crash. If you would by accident use some legal address for your process (but not the one you wanted to really use), it would silently proceed and continue with unexpected result.You probably did want to do
cmp [esi+ecx],eax
, which means to reference memory at addresspresetWord+15
(last char of string), but that's true only for first iteration. Then you doinc esi
and it will point atpresetWord+1
address (second char).And you probably wanted to compare only characters, so you should change that
eax
toal
to fetch/compare only single byte at one time, because the string is encoded in ASCII encoding (8bit per char).eax
would work for UTF-32 encoding.To check for palindrome you may want to load one register ("r1") with address of first char, one register ("r2") of address (!) of last char, and then do this loop:
This will produce "false" for presetWord, as
'S' != 's'
, so you may want to introduce case insensitivity to theif (byte [r1]...
part, but I would first make it work without that.While debugging, you should be able to recognize "class" of some of those numbers in registers. If you load size into register, it will be very likely some small number, like
0000000F
(15). Address will be very likely some large number like8040506E
. ASCII characters when used as single char should lead to something like20
-7F
in common cases, but if you domov al,...
, the debugger is still displaying wholeeax
, so the upper three bytes will remain it's previous value, for example reading space character intoeax
set as12345678
will change the value ofeax
to12345620
(space' ' == 0x20
in ASCII).You can also use memory view to check content of particular address in memory. If you would for example change that
cmp
tocmp [esi+ecx],eax
, and check that address in memory view, you would see it would point in second iteration again at the last char, not the second last char.This is all visible and possible to check in the debugger, sometimes a bit tedious, then again often easier than asking on SO or just thinking about the source code, especially if you are stuck for longer time.
Finally ... why even registers? Because computer memory is separate chip. And it may look innocent, but instruction like
mov al,[presetWord]
may actually stall for hundreds of CPU cycles, while the CPU chip will wait for the memory chip to read the content of memory and send it over bus wires to the CPU chip. While theal
andecx
is directly inside the CPU, accessible in the same cycle when the CPU needs it.So you may want to store values into register, if you use them often in your calculation, to not slow down with memory (although once the memory content is cached by L0/1/2/3 caches, the "hundreds" of cycles becomes reasonable amount, sometimes even 0 cycles with cache level directly on CPU chip). But you want to access memory in predictable pattern (so cache can read-ahead), and in reasonable amounts (caches work usually with sizes like 16-32B up to 4-8k by their level). If you access in couple of instructions like 16 different 8k memory pages, you may run out of available cache-lines, and then there will be at least one access featuring full stall, waiting for real memory read.