Is picosecond timer possible?

491 views Asked by At

Given

I want to make a device, that measures distance, based on the time electromagnetic wave passes from one point (computer\microcontroller) to other. I do not consider phase shift method for that.

I want centimeter precision (at least).

For that purpose, both computers should have clock with at least 33.3 ps period resolution.

If consider clock as a counter loop, that just increments uint64_t variable each time, I need that loop do each iteration 30 billions time per second with 33.3 ps pause.

Problem

In this question, commentators and user, who answered states, that CPU performance is limited by its sizes due to the fact, that information can't propagates faster, than speed of light. It makes sense.

Thoughts

However, there is the CPUs millions instructions per second table.

Let's consider, Intel Core i5-11600K: it has 6 cores and 346 350 MIPS at 4.92 GHz. Obviously, 345 billions instructions are for multithreading program (or even not?), however, per one core it should be 346/6 = 57 000 MIPS.

57 000 MIPS means 57 billions instructions per second per one core, and period equals about 17 ps.

Electromagnetic wave passes 5 mm per 17ps.

Threadreaper has 36 trillions instructions per core, which means 8 um wave propagation at all.

So, by that logic it all should not work, however works. What is the trick? Does picosecond timer yet possible?


Update

If You are going to use multithreading as an argument, please, in the table I provided above, look at Intel Core 2 Extreme X6800, that has only 2 cores and 27 000 MIPS, i.e. 13 500 MIPS per core, which is still pretty fast, and should be impossible (light will pass only 2.3 cm).

3

There are 3 answers

0
chrslg On BEST ANSWER

Have you ever visited a factory? Let's say a car factory.

It passes through many different stages. Some workers are painting the bodywork, then it has to dry, then some other workers mount the wheels, some others add the seats, etc, etc.

A Tesla (just use that example, because the data are public) takes 12 weeks to go through the entire process (it takes 12 weeks between the moment the fabrication is started and the moment the working car exist the factory).

Yet, Tesla build 1000 cars per day

It would be a mistake to say 1/(12*7)=0.0119 cars per day. Or, on the contrary to expect that if a factory build 1000 cars per day, that means that a car is build in 86.4 seconds.

And that is not just because of threads, or parallelism. Likewise, with my examples, one could say "but a factory doesn't build 1000 cars a day, there are more than one Tesla factories". Indeed, there are 6 of them. So 6 threads if you will. Still, it would be a mistake to deduce that it takes 518 seconds (86.4×6) to build a Tesla, not 12 weeks. It is also because of the process itself: the worker who is painting the bodywork is painting the one of a second car, while another worker is mounting the wheels of the first car. It takes time for a car to propagate through the whole chain. The computation 86400/1000*6 would be valid only if one worker was working at the same time (nobody takes care of the wheels while the car is being painted, or while it dries).

So, no, you can't deduce the length of an instruction from the number of instructions per second. No more that you can deduce the time it takes to build a Tesla (12 weeks) from the rate at which the factories produce them (1000/6 per day and factory).

(I could have used many other analogy. For example, since you are mentioning a signal propagation: you may know that it takes hours for an electron, when you switch on the light, to goes from the switch you've just closed to the lightbulb. Yet some quintillions of them pass through the lightbuld per second :D

9
Employed Russian On

As others have commented, you can't expect any CPU to complete a loop iteration in 33 picoseconds.

To address your confusion:

57 000 MIPS means 57 billions instructions per second per one core, and period equals about 17 ps.

57,000 MIPS does not mean that the CPU can retire 57e9 instructions per second. The MIPS is mostly meaningless, because it compares to the VAX Unit of performance. IOW, the CPU can retire an equivalent of 57,000 VAX instructions per second.

It's more useful to look at the clock. At 5GHz, a single clock cycle takes 200 picoseconds, and no instruction actually takes a single clock cycle -- they all take multiple (but due to pipelining, you may be able to retire one instruction / cycle).

0
Peter Cordes On

A 5GHz CPU has a clock period of 200 ps. One core being able to run 4 to 6 instructions per clock cycle means they happen in parallel overlapping during that 200 ps, not that they run one after the other to subdivide that 200 ps clock period. (Modern Microprocessors A 90-Minute Guide! is excellent for an intro to superscalar CPU pipelines).

(Multiple cores don't help either; their clocks aren't necessarily even in phase with each other or at the same clock speed, although Intel client (non-server) CPUs do tie all the cores to the same frequency at least for non-turbo. And they're certainly independent of each other in terms of executing instructions.)

Two or three (cached) loads per clock cycle are possible on modern CPUs, but they all read cache at the same time, not 3 different times within the clock period. (L1d cache is multi-ported so 3 different reads can happen in parallel, along with a write.) Off-core traffic takes way longer.


Besides that, I/O takes vastly longer than 1 clock cycle. It's completely impossible to poll an I/O pin even once per 200 ps on a modern x86 using just CPU instructions. Traffic between a core and any I/O pin has to go over a ring bus which connects all the cores, the memory controllers, and the "system agent" (where the PCIe lanes connect). Due to contention, there can be variability in latency there. And the fastest off-chip connections are the PCI-express lanes, which are much slower than 5GHz.

You might possibly factor out most of the variability in rdtsc and stuff (which takes about 20 clock cycles), but at best you have 200 ps precision on a 5GHz i7 even if you could factor out all the I/O timing variability and do I/O at the frequency of a CPU core.

In practice it'd be much less precise.