How long does each machine language instruction take to execute?

8.1k views Asked by At

Do operations like set, read, move and compare all take the same time to execute?

If not: Is there any way to find out how long.

Is there some name for what I mean, some specific type cpu's speed of executing the different assembly language instructions (move, read, etc.)

4

There are 4 answers

2
Mysticial On BEST ANSWER

The key terms you're probably looking are:

  • Instruction Latency
  • Instruction Throughput

These should be easy to google for. But basically, instructions take a certain number of cycles to execute (latency). But you can often execute multiple of them simultaneously (throughput).

Do operations like set, read, move and compare all take the same time to execute?

In general no. Different instructions have different latencies and throughputs. For example, an addition is typically much faster than a division.


If you're interested in the actual values of different assembly instructions on modern processors, you can take a look at Agner Fog's tables.


That said, there's about a gazzillion other factors that affect the performance of a computer.
Most of which are arguably more important than instruction latencies/throughputs:

  • Cache
  • Memory
  • Disk
  • Bloat (this seems to be a big one... :D)
  • etc... the list goes on and on...
0
Rob Smyth On

How fast does each assembly language instruction take? Do operations like set, read, move and compare all take the same time to execute?

You will find this information in the CPU's assembly language manual from the CPU's manufacturer (e.g. Intel). Each CPU instructure usually has a page or two and it will tell you how many "cycles" it will take to execute. It will define "cycles" elsewhere. Instructions can can take different times to execute depending on what they are given. e.g. A conditional jump may or may not jump. A multiply by zero may (i assume) be faster than a multiply by 7.

2
old_timer On

Pipelining and caches and the cpu itself no longer being the primary bottleneck has done two things to your question. One, the cpu's today generally execute one instruction per clock, second it can take many (dozens to hundreds) of clocks to feed the cpu an instruction. The more modern processors, even if their instruction sets are old, rarely bother to mention clock execution because it is one clock and the "real" execution speed is too hard to describe.

The cache and pipeline try to allow the cpu to run at this one instruction per clock rate, but for example a read from memory, has to wait for the response to come back. If this item is not in cache this can be hundreds of clock cycles as it will have to read a number of locations to fill a line in the cache then some more clocks to get it through the caches back to the processor.

Now if you go back in time, or present time but in the microcontroller world for example or other system where the memory system can respond in one clock, or at least a very deterministic number (say two clocks for eeprom and one for ram, that kind of thing), then you can very easily count the exact number of clocks. Processors like often do publish a table of cycles per instruction. A two instruction read for example would be two clocks to fetch the instruction, then another clock to perform the read, 3 clocks minimum. some would actually take more than one clock to execute so that would be added in as well.

I highly recommend finding a (used) copy of Zen of Assembly Language by Michael Abrash. It was dated when it came out but still an important work. learning to juggle the relatively simple 8088/86 was tough enough, todays x86 and other systems are quite a bit more complicated.

If running windows or linux or something like that trying to time your code wont necessarily get you to where you want. add or remove a nop, causing the code to be aligned in memory by as much as a byte can have dramatic affects on the performance of the remainder of the code which other than its location in ram has not changed. As a simple example of understanding the complicated nature of the problem.

What processor or system are you interested in? the stm32f4 discovery board, about $20, contains an ARM (cortex-m) processor with instruction and data caches. It has the complications of a bigger system, but at the same time simple enough (relative to a bigger system) to be able to have controlled experiments.

If you are familiar with the microchip pic world they often count cycles to perform precision delays between events. A very deterministic environment (so long as you dont use interrupts).

0
David On

The answer is MIPS. or IPS million Instructions per second. Since you are talking about Embedded systems.