In advanced FPGA systems which require different clock frequencies for different parts of the design, there is often a shortage of global clock buffers. Often several of the clocks are related (see below) and it becomes possible to use a single clock plus several clock enable signals, instead of several dedicated clocks. This article tries to shed some light on the impact these two alternatives can have on an FPGA system.

For the rest of this article, let’s assume all clocks C_i with frequency F_i are derived from the same reference clock C_ref with frequency F_ref and are strongly related, fulfilling the equation F_i = F_ref / D_i. This means that the frequency of a related clock is an integer fraction of the reference clock frequency. In that case things look a little different.
I use the terms strongly/weakly related to differentiate between the two basic ways the relationship of two clocks can be constituted. Weakly related clocks would be those which are linked by the equation F_i = F_ref * M_i / D_i. This means that all weakly related clocks have frequencies which are (possibly non-integer!) fractions of the reference clock frequency.
Note that asynchronous clocks or weakly related clocks have to be treated differently, and that clock enables as described here are not applicable for those.

Main Part

Virtually all FPGAs offer D-flip-flops which have an enable input, also called a clock enable (CE) since it controls the effect a rising or falling clock edge has on the content of the D-flip-flop. If the CE input is deasserted, changes to the D input of the flip-flop are not propagated to the Q output after an active clock edge. Only if the CE is asserted the value on the D input does propagate through to the Q output when an active clock edge arrives.
When N related clocks must be derived from one common reference clock there are two major options:

(1) Instantiate a PLL or DCM (Xilinx FPGA primitive) which uses the reference clock C_ref to generate all required clocks C_i. The reference clock C_ref has to be the clock with the highest frequency (see my constraints and assumptions above). The related clocks are generated by dividing the reference clock by an integer value. If a large number of related clocks are required this can lead to a dead end, because a PLL/DCM has a limit to the number of clocks it can generate (usually somewhere between 4-8).
This limitation could be circumvented by using one of the generated clocks C_i,1 of the first PLL/DCM as a reference clock C_ref,2 for a second PLL/DCM and then in turn use the second PLL/DCM to generate additional related clocks C_i,2. However, this will only work if the phase relationship between the reference clock of a PLL/DCM and the generated clock outputs can be adjusted, e.g. in general it may be required to adjust this phase offset to become 0 (or a value which is an integer multiple of the reference clock with the highest frequency).

Clock generation using one clock buffer per clock

(2) Generate only one clock signal, which has the highest frequency that is required. All other related clocks would be obtained by dividing this reference clock frequency by an integer multiple (as explained in 1) above). However, instead of dividing the reference clock, a clock enable signal CE_i is created, which is only active every D_i-th clock cycle. This clock enable signal serves as an enable for all flip-flops which would be located in the domain of the corresponding related clock C_i. This way a clock enable signal CE_i for each related clock C_i can be created.
So there is only one primary clock signal and all other “clocks” are logically represented by an enable signal which is only asserted every other clock cycle.
The tricky part then is to tell the timing analyzer tool to treat the clock enables correctly, so the place and route tool is aware of the timing requirements. Otherwise the design will be over-constrained, since all clock domains are treated as if they had the same frequency (which would be the maximum frequency). This would result in unnecessarily strict timing requirements for all the logic which would normally run at a lower frequency. Thus timing closure will be more difficult to achieve.

Clock generation with multiple clock enables using a single clock buffer


The decision to use multiple clocks over a single clock plus clock enables boils down to a resource trade-off.
On the one hand, multiple PLLs/DCMs and multiple (global) clock buffers are used to generate multiple related clocks. This requires more (global) clocking resources, but no additional fabric resources at all. Each clock domain is defined by a physical clock signal.
On the other hand, only one global clock with one global clock buffer is used. The different clock domains are logically defined by means of enable signals which are only asserted every other clock cycle. This approach requires more fabric resources (LUTs, CLBs, FFs) to generate and distribute the enable signal nets which define the various clock domains. On the other hand only a single clock must be generated which saves global clocking resources.
Bingo bango there you have it.