Sending data from slow clock domain to fast

707 views Asked by At

Suppose I want to send a stream of data from a slow clock domain to a fast domain, and the latency is important. Is there some way of establishing a lower bound on the latency?

The standard solution is a FIFO, and its latency would provide a tight upper bound. It seems clear that the data will need to be registered in both domains, and some time will be needed for the crossdomain path and for metastability to resolve. I can probably implement a FIFO that does not have any overheads over and above this, though its timing constraints would be a pain to specify (and perhaps meet). I can certainly do it safely given one extra cycle in the receiving domain.

However, "seems clear" is not a cast-iron argument. Maybe there is a non-obvious implementation that does not involve connecting two synchronous circuits together. That seems like a long shot, so perhaps there is some rigorous argument that would provide a tight lower bound on the latency? Many thanks.

Edit: when I say lower bound, I am referring to the least amount of time that any correct solution to the problem must take, not to the delay of any particular implementation. An analogy: a ripple carry adder has delay O(n) and this is an upper bound on the cost of adding two binary numbers (because we know how to do it at that speed, so the problem can be no harder than that). We also know that binary addition must take Omega(log(n)) time because the top bit will depend on 2n inputs (and building a tree with those inputs at the leaves is the best we can possibly do).

1

There are 1 answers

1
EEliaz On BEST ANSWER

Some comments before I try to answer:

If you want to take an output of a flop that is toggled in one clock domain and sample it in another flop in a fast clock domain, then by definition your solution involves connecting two synchronous circuits together.

Also, I will say that "slow to fast" is too abstract definition to give an accurate answer, and also a lot of variables are missing from your description. Cross domain crossing is a very wide topic, and there are a lot of scenarios, each can be handled and optimized differently.

Theoretical lower bound:

The minimal time it takes to do such synchronization is the tSetup + tCQ + tHold of a single flop in the fast clock domain. This is because the data in the slow clock domain must remain stable around the clock edge of the fast flop for it to sample the data correctly.

If you guarantee this, there is no metastability, and you can say the data is fully synchronized.

Practical best lower bound:

Excluding really complex physical circuits, then the minimal latency is a single clock cycle of the fast clock domain, but it can be implemented only under specific scenarios.

For example, if I go from a slow clock domain which is X[MHz], to a fast clock domain which is n*X[MHz] (n is a natural number), and both clocks are from the same source PLL, then generally speaking you can just sample the slow signal using the fast clock domain, without even giving timing constraints.

This is because the synthesis tool considers this a synchronous timing path, and it can guarantee a known relationship and phase between the two clocks` edges. So in this scenario the synchronizer is a simple flop in the fast clock domain, the lower (and upper) bound latency is 1 cycle of the fast clock domain, and you don't need a FIFO. You just need to know that the output of this flop is interesting once in every n cycles.

Practical general lower bound:

I will assume the most general case:

  • Two clocks without any known relationship, just that the driver is slow and the receiver is fast
  • Wide bus, where the relation between every two bits must be kept
  • The input arrives every cycle

The answer to this scenario is hidden in your question: "It seems clear that the data will need to be registered in both domains, and some time will be needed for the crossdomain path and for metastability to resolve."

Metastability is not a must too happen every cycle. If metastability happens, then it takes (in the general case of a chain of synchronizer flops) another cycle for the metastability to be resolved. Therefore, the lower bound on the synchronization process is the latency of the circuit, when there is no metastability.

For the general case I've described above, and again, eliminating more complex solutions which depend on the exact scenario, you need CDC FIFO synchronizer.

The lower bound on the latency of the FIFO depends in the implementation of the internal synchronizer chain in the FIFO. This is sync2 in the drawing below, taken from the excellent paper by Clifford E. Cummings (link below): enter image description here

In this drawing the synchronizer chain is 2 flops. So the minimal latency of this circuit from wput to rrdy is 1 cycles of the slow clock (to sample the wput signal), and 2 cycles of the fast clock. If we want upper bound, then we need to consider a possible metastability in the synchronizer chain, so we get 1 cycle of slow clock and 3 cycles of fast clock.

http://www.sunburst-design.com/papers/CummingsSNUG2008Boston_CDC.pdf