# An Analytical Model for Interdependent Setup/Hold-Time Characterization of Flip-flops

Hadi Ahmadi Balef<sup>1</sup>, Hailong Jiao<sup>1</sup>, José Pineda de Gyvez<sup>1</sup>, Kees Goossens<sup>1,2</sup> <sup>1</sup>Eindhoven University of Technology, The Netherlands <sup>2</sup>Topic Embedded Products, The Netherlands Email: {H.Ahmadi.Balef, H.Jiao, J.Pineda.de.Gyvez, K.G.W.Goossens}@tue.nl

Abstract—Accurate timing characterization of flip-flops is critical for robust circuit design. Conventionally, setup time and hold time are characterized independently, which results in pessimistic/optimistic designs. To reduce this pessimism/optimism, the interdependency between setup/hold-time has to be taken into account. Fast and accurate characterization of the setup/hold-time interdependency is however a challenging task. In this paper, an analytical model is proposed to capture the setup/hold-time interdependency in a conventional master-slave flip-flop. The accuracy of the proposed model is  $\sim$ 10X higher than the previously published characterization methods. Furthermore, a flow is proposed to find the parameters of the model with  $\sim$ 2.5X shorter computation time compared to the existing methods.

## I. INTRODUCTION

As CMOS technology scales down, the uncertainty in the specifications of integrated circuits (ICs) increases dramatically. This uncertainty is attributed to different sources of variations, such as process, environmental, and temporal variations. Another main source of uncertainty in the design flow is the limited accuracy of the models that are used in design tools. Traditionally, IC designers reserve significant margins in their designs to deal with this uncertainty and guarantee the required design specifications, *e.g.* by designing the circuits at the over-pessimistic worst-case corners. The advantages of technology scaling however diminish due to the large design margins. More accurate models in IC design tools are therefore essential to reduce the design margins as well as to benefit from technology scaling.

In static timing analysis (STA), which is the standard approach for timing closure, the circuit timing is analyzed by checking if timing constraints are violated at any sequential element. Flipflops are the most commonly used sequential elements. A flipflop captures its input data only if there is compliance with its timing constraints. The timing constraints of a flip-flop are illustrated in Fig. 1(a). According to the definitions, setup (hold) skew is the time difference between the change of the data before (after) the clock capturing edge. As the setup skew and/or hold skew decreases, the clock-to-q delay of the flip-flop increases. The minimum allowed skews are defined based on an upper boundary for the clock-to-q delay. Based on the definitions, setup time is the minimum allowed setup skew, i.e. the minimum time that the data is allowed to arrive before the clock capturing edge. Similarly, hold time is the minimum allowed hold skew, *i.e.* the minimum time that the data should be stable after the clock capturing edge. Depending on the setup/hold-skew values, the flip-flop either captures the input data or has a timing violation.



Fig. 1. Representation of setup time and hold time, (a) by using the waveform, and (b) in the setup-hold skew space.

The setup/hold-skew space is therefore divided into the capturing and violation regions (see Fig. 1(b)). In fact, all the points that are on the boundary between the capturing and the violation regions are valid setup time and hold time pairs. As illustrated with the blue curve in Fig. 1(b), the actual value of hold time (setup time) increases by decreasing the setup skew (hold skew). This is called the interdependency between the setup time and hold time of flip-flops [1], [2], [3]. Traditionally, setup time and hold time are characterized independently. The independent characterization methods result in constant setup (hold) time that are independent of hold (setup) skew. As illustrated in Fig. 1(b), the independently characterized setup/hold-time are either larger (the green curve) or smaller (the red curve) than the actual values. A design which is based on larger/smaller than actual setup/hold-time is either pessimistic or optimistic. Therefore, employing more accurate setup/hold-time by considering their interdependency results in a design that is less pessimistic or optimistic.

The setup/hold-time interdependency has been considered in the literature to improve the accuracy of STA (*e.g.* see [3], [4], [5], [6]). However, characterizing the interdependency curve is an extremely computation intensive task. In this paper, we introduce an analytical model for the setup/hold-time interdependency of conventional master-slave flip-flops. The model has the following advantages:

- The proposed model is highly accurate in a wide range of input slopes and requires ~2.5X shorter characterization time compared to the state-of-the-art setup/hold-time interdependency characterization methods for the same level of accuracy.
- The analytical form of the model, which is based on circuit level paramters rather than empirical form, is essential to improve the accuracy and computation efficiency of setup/hold-time interdependency based STA frameworks.
- The model characterization flow is less complex to implement compared to the state-of-the-art setup/hold-time interdependency characterization methods.

This paper is organized as follows. In Section II, the preliminaries and related works are reviewed. The proposed analytical model for setup/hold-time interdependency is introduced in Section III. The model parameter characterization flow is explained in Section IV. In Section V, experimental results are provided for the accuracy, speed, and application of the proposed model. Finally, the paper is concluded in Section VI.

## II. PRELIMINARIES AND RELATED WORKS

In this section, the flip-flop timing definitions are introduced to identify the setup/hold-time interdependency. The importance of considering this interdependency in STA as well as the state-ofthe-art characterization methods in related works are reviewed.

The input data of a flip-flop should be stable sufficiently long before and after the clock capturing edge at the flip-flop. The longest path and the shortest path that end into a flip-flop can cause timing violation by violating the aforementioned criteria. As shown in Fig. 2(a), the longest path and the shortest path include clock buffers, flip-flops (the source flip-flops and the destination flip-flop), and combinational logic. The waveforms at the destination flip-flops are shown in Fig. 2(b). The delay of the longest path  $d_{lp}$  (shortest path  $d_{sp}$ ) determines for how long before (after) the clock capturing edge the data has to be stable. Therefore, the setup skew  $d_{ss}$  and hold skew  $d_{hs}$  are

and

$$a_{ss} = I - a_{lp},\tag{1}$$

(1)

$$d_{hs} = d_{sp},\tag{2}$$

where T is the clock period. Therefore, the flip-flop input data is a pulse that is stable for  $d_{ss}$  before the CK capturing edge, and remains stable for  $d_{hs}$  after the CK capturing edge, with a width

 $\mathbf{T}$ 

$$W_D = d_{ss} + d_{hs}. (3)$$

If the input data changes close to the clock capturing edge (*i.e.* small  $d_{ss}$  and/or  $d_{hs}$ ), the flip-flop delay from its clock input to its data output  $d_{cq}$  increases, *i.e.* the flip-flop enters into metastability. In extreme case where  $d_{ss}$  and/or  $d_{hs}$  are too



Fig. 2. The shortest and longest paths in a circuit. (a) Circuit representation, and (b) waveforms at the destination flip-flop.

small, input data is not captured by the flip-flop. To guarantee the capturing of input data, the minimum allowed  $d_{ss}$  and  $d_{hs}$ are specified as the timing constraints of flip-flops. The former is the setup time  $d_{st}$ , while the latter is the hold time  $d_{ht}$  of flip-flop. The definition of  $d_{st}$  and  $d_{ht}$  are based on a specific criteria on  $d_{cq}$  increase (e.g. 10%) compared to  $d_{cq}$  when  $d_{ss}$  and  $d_{hs}$  are sufficiently large. Traditionally,  $d_{st}$  and  $d_{ht}$  are found by sweeping the corresponding skew while the counterpart skew is fixed.  $d_{st}$  and  $d_{ht}$  are stored in timing libraries to be used in STA. STA reports setup (hold) slack for all flip-flops as the difference between  $d_{ss}$  ( $d_{hs}$ ) and  $d_{st}$  ( $d_{ht}$ ) at each flip-flop.

Due to the combined effect of  $d_{hs}$  and  $d_{ss}$  on  $d_{cq}$  increase,  $d_{st}$  and  $d_{ht}$  are interdependent. This interdependency has been used to improve the accuracy of STA. The initial works on this subject are done by Salman et al. [1], [4], [7]. In [4], the uncertainty in the critical path (that determines the setup time requirement) and the shortest path (that determines the hold time requirement) of a 90nm design is reduced by 100%and 50%, respectively. Chen *et al.* propose an iterative timing analysis framework that considers the interdependency between the setup/hold-time and could reduce the clock period by up to 4.9% for a 90nm circuit [5]. In [3], Kahng et al. improve the work of Chen et al. in [5] by performing a path based STA and linear programming based optimization to find the best combination of flip-flop timing specifications in the circuit. The critical path slack is increased by up to 130ps for a variety of 65nm test circuits. Seo et al. propose a method to optimize clock skew using setup/hold-time interdependency, shortening the clock periods of various circuits by 4.2% on average [6]. For all of these methods, a relationship between the setup time and hold time of each flip-flop is required. Obtaining this relationship is a computationally expensive task.

There are efforts to characterize the interdependency between  $d_{st}$  and  $d_{ht}$  efficiently. The characterization methods in [1] and



Fig. 3. A master-slave flip-flop. (a) The circuit schematic. (b) The input D and the middle point m pulses generated according to the shortest and longest paths for capturing a rising input.

[5] are based on generating  $d_{cq}$  surface as a function of  $d_{ss}$  and  $d_{hs}$ . The setup/hold-time interdependency curve is then approximated considering the intersection of  $d_{cq}$  surface and the constant  $d_{cq}$  constraint. In [5] the relationship between setup/hold-time is obtained from an empirical model for clock-to-q delay as a function of setup/hold-skew. However, it is observed in [5] that obtaining the parameters of the model is computationally expensive. Instead of generating the setup/hold-time interdependency curve from the  $d_{cq}$  surface, which is a time-consuming bruteforce method, the curve is characterized directly in [2] and [8]. The characterization is accelerated in those works. In [2], the characterization is accelerated by increasing the convergence rate in a sufficiently narrow interval. In [8], the effort is limiting the search range based on a heuristic slope estimation for the seeking curve. Having some points on the setup/hold-time interdependency curve, piecewise linear approximation is used to compose the curve. The accuracy of these characterization methods increases by identifying more setup/hold-time pairs with increased characterization time. Despite all these efforts, an analytical model is required to increase the accuracy and computation efficiency of setup/hold-time interdependency based STA frameworks [3], [4]. We therefore propose an analytical model with higher accuracy and shorter characterization time compared to the existing piecewise linear approximations in this paper.

# III. THE PROPOSED MODEL FOR SETUP/HOLD-TIME INTERDEPENDENCY

In this section, the proposed model for setup/hold-time interdependency is introduced. The model equations are derived based on circuit level parameters, specifically the voltage of an internal node in the flip-flop. The most commonly used masterslave flip-flop is shown in Fig. 3(a). Consider the CK-to-Q path of the flip-flop,

$$d_{cq} = d_{CKI} + d_{mQ},\tag{4}$$

where  $d_{CKI}$  is the delay from the CK capturing edge till when the slave latch becomes transparent.  $d_{mQ}$  is the delay of the slave latch from the start of transparency to the switching of the flip-flop output. In (4),  $d_{CKI}$  does not depend on the arrival time of the input data. On the other hand, a small setup skew  $d_{ss}$  and/or hold skew  $d_{hs}$  lowers down the effective voltage driving the slave latch at the onset of transparency ( $V_{m,eff}$ ), thereby increasing  $d_{mQ}$ . By using the RC delay model in [9],

$$d_{mQ} \approx \frac{kV_{DD}}{(V_{m,eff} - V_{TH,eff})^{\alpha}},\tag{5}$$

where k and  $\alpha$  are technology and sizing dependent parameters.  $V_{TH,eff}$  is the effective threshold voltage of the corresponding transistors. If  $d_{hs}$  and  $d_{ss}$  are sufficiently large,  $V_m$  is settled to 0 or  $V_{DD}$  at the onset of the slave latch transparency. In this case,  $d_{cq}$  is minimized by being driven with the maximum effort, *i.e.*   $V_{m,eff}$  is maximized. The minimum  $d_{cq}$  is called the clock-to-q contamination delay  $d_{ccq}$ . As mentioned before,  $d_{cq}$  increases when the flip-flop enters into metastability due to a small  $d_{ss}$ and/or  $d_{hs}$ . The upper bound for  $d_{cq}$ , which is called the clockto-q propagation delay  $d_{pcq}$ , is normally chosen to be 10% longer than  $d_{ccq}$ . According to (4) and (5) the lower boundary for  $V_{m,eff}$ is therefore

$$V_{m,eff,L} = V_{TH,eff} + \left(\frac{kV_{DD}}{d_{pcq} - d_{CKI}}\right)^{1/\alpha}.$$
 (6)

The interdependent setup/hold-time values are  $d_{ss}$  and  $d_{hs}$  pairs that make  $d_{cq} = d_{pcq}$ , thereby  $V_{m,eff} = V_{m,eff,L}$ . Therefore, the setup/hold-time interdependency curve can be modeled by having  $V_{m,eff}$  as a function of  $d_{ss}$  and  $d_{hs}$ .

Without loss of generality, consider the rise capturing waveforms shown in Fig. 3(b). As discussed in Section II, due to the delay requirements for the shortest and the longest paths in the combinational logics (see Fig. 2(a)), the capturing data is a pulse at the input of the flip-flop (see Fig. 2(b)). An approximation for  $V_{m,eff}$  is  $V_m$  at time  $t_{CKI} = t_{CK} + d_{CKI}$  where  $t_{CK}$  is the capturing edge of the CK signal.  $V_m$  at  $t = t_{CKI}$  determines the effective voltage that drives the slave latch, *i.e.*  $V_{m,eff}$ . A reshaped version of the input data pulse (i.e. with changed pulse width and slopes) appears at m if the master latch is transparent to the input. Although  $V_m$  settles to  $V_{DD}$  or 0 after  $t_{CKI}$ , since  $t_{CKI}$  is the end of transparency of the master latch, it is assumed that the reshaped version of the input pulse is observable at node m, while this is not the case for  $t > t_{CKI}$  as shown in Fig. 3(b) with dashed lines. A pulse can be modeled as the sum of two exponential terms where its rise and fall times depend on the corresponding time constants. Without loss of generality, in a rise capturing scenario,  $V_m$  is modeled for  $V_m \geq \frac{V_{DD}}{2}$  as

$$V_m = V_{DD} - \left(\frac{V_{DD}}{2}\right) \left[\exp\frac{t_{ss,m} - t}{\tau_r} + \exp\frac{t_{hs,m} - t}{\tau_f}\right] \quad \text{for} \quad t \le t_{CKI},$$

$$(7)$$

where

$$t_{CKI} = t_{CK} + d_{CKI},$$
  

$$t_{ss,m} = t_{ss} + d_{Dm,r},$$
  

$$t_{hs,m} = t_{hs} + d_{Dm,f}.$$
(8)

In the above equations,  $t_{ss} = t_{CK} - d_{ss}$  and  $t_{hs} = t_{CK} + d_{hs}$  as shown in Fig. 3(b). Furthermore,  $d_{Dm,r}$  and  $d_{Dm,f}$  are the rising and falling delays from D to m, respectively. According to (7) and (8),

$$V_{m,eff} = V_m(t = t_{CKI}) = V_{DD} - (\frac{V_{DD}}{2}) [\exp \frac{d_{ss0} - d_{ss}}{\tau_r} + \exp \frac{d_{hs0} - d_{hs}}{\tau_f}]$$
(9)  
for  $d_{ss0} < d_{ss}$  and  $d_{hs0} < d_{hs}$ ,

where  $\tau_r$  and  $\tau_f$  are time constants. According to (8),

$$d_{ss0} = t_{ss,m} - t_{CKI} + d_{ss} = d_{Dm,r} - d_{CKI},$$
  

$$d_{hs0} = t_{CKI} - t_{hs,m} + d_{hs} = d_{CKI} - d_{Dm,f}.$$
(10)

Note that  $d_{ss0}$  ( $d_{hs0}$ ) is the lower boundary of  $d_{st}$  ( $d_{ht}$ ) for  $V_{m,eff} > \frac{V_{DD}}{2}$ , assuming  $d_{hs}$  ( $d_{ss}$ ) is sufficiently large. The model in (9) expresses  $V_{m,eff}$  as a function of the input skews  $d_{ss}$  and  $d_{hs}$ . Given the lower boundary of  $V_{m,eff}$  in (6), setup/hold-time interdependency is obtained from (9) as

$$d_{ht} = d_{ht,L} - \tau_{hs} \ln(1 - \exp \frac{d_{st,L} - d_{ss}}{\tau_{ss}})$$
  
for  $d_{ss} > d_{ss,minW}$ ,  
 $d_{st} = d_{st,L} - \tau_{ss} \ln(1 - \exp \frac{d_{ht,L} - d_{hs}}{\tau_{hs}})$   
for  $d_{hs} > d_{hs,minW}$ ,  
$$(11)$$

where  $\tau_{ss}$  and  $\tau_{hs}$  are the setup and hold time constants, respectively. The parameters  $d_{st,L}$  and  $d_{ht,L}$  in (9) are the independently characterized  $d_{st}$  and  $d_{ht}$  when  $d_{hs}$  and  $d_{ss}$ are sufficiently large, respectively. Furthermore,  $d_{st,minW}$  and  $d_{ht,minW}$  are the corresponding input skews for the minimum input capturing pulse width based on (3) and (11)

$$\frac{\partial W_D}{d_{ss}} = 0 \Rightarrow d_{ss,minW} = d_{st,L} + \tau_{ss} \ln(1 + \frac{\tau_{hs}}{\tau_{ss}}),$$

$$\frac{\partial W_D}{d_{hs}} = 0 \Rightarrow d_{hs,minW} = d_{ht,L} + \tau_{hs} \ln(1 + \frac{\tau_{ss}}{\tau_{hs}}).$$
(12)

Note that due to the form of the equations, the lower boundary of the skews in the model is chosen to be the corresponding input skews for the minimum capturing input pulse width.

# IV. CHARACTERIZATION FLOW FOR SETUP/HOLD-TIME INTERDEPENDENCY

In this section, a flow is introduced to find the parameters of the model in (11)

- $d_{st,L}$ : independently characterized  $d_{st}$  at large  $d_{hs}$ ,
- $d_{ht,L}$ : independently characterized  $d_{ht}$  at large  $d_{ss}$ ,
- $\tau_{ss}$ : setup time constant, and
- $\tau_{hs}$ : hold time constant.

The proposed flow is shown in Algorithm 1. The parameters  $d_{st,L}$  and  $d_{ht,L}$  are characterized by using an independent characterization method. For example,  $d_{st,L}$  is characterized using binary search to find  $d_{ss}$  that makes  $d_{cq} = d_{pcq}$ , assuming large  $d_{hs}$  (e.g. half of the clock period). Two other model parameters to be characterized are the time constants  $\tau_{ss}$  and  $\tau_{hs}$ . The lower boundaries of the time constants  $\tau_{ss,L}$  and  $\tau_{hs,L}$ .

are found by performing an initial SPICE level simulation. In the initial simulation, a sufficiently wide pulse (e.g. a third of the clock period) is applied to the input of the flip-flop during the low phase of the CK signal. The slopes of the transferred pulse to node m are then calculated to find the initial values for the time constants  $\tau_{ss,L} = \frac{tr_{ss}}{\ln 5}$  and  $\tau_{hs,L} = \frac{tr_{hs}}{\ln 5}$ , where  $tr_{ss}$  and  $tr_{hs}$  are the corresponding transition time from 50% to 90% of the voltage change at node m. Since the effect of switch current induced on node m is not considered in this simulation, the calculated time constant is lower than the actual one in  $V_{m.eff}$ . Therefore, the initial values obtained for the time constants are lower than the actual ones. Employing the lower than actual time constants in (9) results in a curve that is steeper than the actual curve (see Fig. 4). To find more accurate estimations for  $\tau_{ss}$  and  $\tau_{hs}$ , the inverse of (9) is used for the points near the  $d_{ss}$  and  $d_{hs}$ pair corresponding to the minimum capturing input pulse width.

| Algorithm 1 The flow of extracting the model parameters.                                                       |  |  |  |  |  |
|----------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| 1: procedure MODEL_PARAMETERS_EXTRACTION                                                                       |  |  |  |  |  |
| 2: $d_{st,L} \leftarrow \text{characterize } d_{st} \text{ at large } d_{hs}$                                  |  |  |  |  |  |
| : $d_{ht,L} \leftarrow \text{characterize } d_{ht} \text{ at large } d_{ss}$                                   |  |  |  |  |  |
| 4: $\tau_{ss,L}, \tau_{hs,L} \leftarrow$ result of initial simulation                                          |  |  |  |  |  |
| 5: $D_{st}[1] = d_{st,L} + \tau_{ss,L} \ln(1 + \frac{\tau_{hs,L}}{\tau_{ss,L}})$                               |  |  |  |  |  |
| 6: $D_{ht}[1] \leftarrow \text{characterize } d_{ht} \text{ at } d_{ss}^{o,n} = D_{st}[1]$                     |  |  |  |  |  |
| 7: $D_{ht}[2] = d_{ht,L} + \tau_{hs,L} \ln(1 + \frac{\tau_{ss,L}}{\tau_{hs,L}})$                               |  |  |  |  |  |
| 8: $D_{st}[2] \leftarrow \text{characterize } d_{st} \text{ at } d_{hs} \stackrel{\text{as, } D}{=} D_{ht}[2]$ |  |  |  |  |  |
| 9: $D_{st}[3] = d_{st,L} + 2 \times \tau_{ss,L} \ln(1 + \frac{\tau_{hs,L}}{\tau_{ss,L}})$                      |  |  |  |  |  |
| 10: $D_{ht}[3] \leftarrow \text{characterize } d_{ht} \text{ at } d_{ss} = D_{st}[3]$                          |  |  |  |  |  |
| 11: $D_{ht}[4] = d_{ht,L} + 2 \times \tau_{hs,L} \ln(1 + \frac{\tau_{ss,L}}{\tau_{hs,L}})$                     |  |  |  |  |  |
| 12: $D_{st}[4] \leftarrow \text{characterize } d_{st} \text{ at } d_{hs} = D_{ht}^{n, D}[4]$                   |  |  |  |  |  |
| 13: $DS_{st} = \text{sort } D_{st}$ from low to high                                                           |  |  |  |  |  |
| 14: $DS_{ht} = \text{sort } D_{ht} \text{ from low to high}$                                                   |  |  |  |  |  |
| 15: $R = (DS_{ht}[1] - DS_{ht}[4]) / (DS_{st}[1] - DS_{st}[4])$                                                |  |  |  |  |  |
| 16: $	au_{ss} = 	au_{ss,L}$                                                                                    |  |  |  |  |  |
| 17: $e_{tau} = -1$                                                                                             |  |  |  |  |  |
| 18: while $e_{tau} \neq 0$ do                                                                                  |  |  |  |  |  |
| 19: $	au_{ss,prev} = 	au_{ss}$                                                                                 |  |  |  |  |  |
| 20: $	au_{hs} = R \times \tau_{ss}$                                                                            |  |  |  |  |  |
| 21: $C = 1 + \exp((DS_{ht}[1] - DS_{ht}[4])/\tau_{hs})$                                                        |  |  |  |  |  |
| 22: $\tau_{ss} = \frac{DS_{st}[1] - DS_{st}[2]}{\ln(C - \exp((DS_{ht}[1] - DS_{ht}[3])/\tau_{hs}))}$           |  |  |  |  |  |
| 23: $	au_{hs} = R \times 	au_{ss}$                                                                             |  |  |  |  |  |
| 24: $C = 1 + \exp((DS_{ht}[1] - DS_{ht}[4])/\tau_{hs})$                                                        |  |  |  |  |  |
| 25: $\tau_{ss} = \frac{DS_{st}[1] - DS_{st}[3]}{\ln(C - \exp((DS_{ht}[1] - DS_{ht}[2])/\tau_{hs}))}$           |  |  |  |  |  |
| 26: $e_{tau} = \tau_{ss} - \tau_{ss,prev}$                                                                     |  |  |  |  |  |
| 27: end while                                                                                                  |  |  |  |  |  |
| 28: <b>return</b> $d_{st,L}$ , $d_{ht,L}$ , $\tau_{ss}$ , $\tau_{hs}$                                          |  |  |  |  |  |
| 29: end procedure                                                                                              |  |  |  |  |  |

Due to the symmetric form of the setup/hold-time interdependency curve, the skews extracted from the following equations with i = 1 are close to the  $d_{ss}$  and  $d_{hs}$  pair corresponding to the minimum capturing input pulse width

$$d_{ss,i} = d_{st,L} + i \times \tau_{ss,L} \ln(1 + \frac{\tau_{hs,L}}{\tau_{ss,L}}), \qquad (13)$$



Fig. 4. The effect of lower/higher than actual time constants on the setup/holdtime interdependency curve. The initial setup/hold-time pairs that are employed for time constants characterization are shown with black dots.

$$d_{hs,i} = d_{ht,L} + i \times \tau_{hs,L} \ln(1 + \frac{\tau_{ss,L}}{\tau_{hs,L}}).$$
 (14)

The corresponding  $d_{st}$  and  $d_{ht}$  are extracted using an independent characterization method. The obtained pairs are illustrated in Fig. 4. Furthermore, the skews obtained at i = 2 in (14) and the corresponding setup/hold-time are used as two other pairs which are close to the minimum capturing input pulse width  $d_{ss}$  and  $d_{hs}$  pair.

The extracted pairs are then sorted in  $DS_{ht}$  and  $DS_{st}$  to find the bounding pairs, *i.e.* the pairs indexed with 1 and 4. The remaining pairs (*i.e.* indexed with 2 and 3) are the middle pairs. Note that since the setup/hold-time interdependency is a monotonically decreasing function, the  $i^{th}$  element in  $DS_{st}$  is the pair of the element number (4 - i) in  $DS_{ht}$  on the curve. According to (9), the interdependency curve that crosses the boundary pairs is

$$d_{st} = DS_{st}[1] - \tau_{ss} \ln(C - \exp{\frac{DS_{ht}[1] - d_{hs}}{\tau_{hs}}}),$$

$$C = 1 + \exp((DS_{ht}[1] - DS_{ht}[4])/\tau_{hs}),$$

$$\tau_{hs} = \tau_{ss}(DS_{ht}[1] - DS_{ht}[4])/(DS_{st}[1] - DS_{st}[4]).$$
(15)

Finally, the middle points are employed to find  $\tau_{hs}$  and  $\tau_{ss}$  iteratively, as shown by the while loop in Algorithm 1.

## V. EXPERIMENTAL RESULTS

In this section, the accuracy and computation efficiency of the proposed model are assessed in comparison to the state-of-thearts. The model is then employed to identify the critical flip-flops of an ARM Cortex M0 processor considering the setup/holdtime interdependency. The results in this section are obtained by performing SPICE simulations using Cadence Spectre with standard threshold voltage transistors of an industrial 40nm CMOS technology. The ARM Cortex M0 is synthesized, placed, and routed in the same 40nm CMOS technology using Cadence tools.

# A. The Accuracy and Computation Efficiency of The Proposed Model

In conventional standard cell libraries, the setup/hold-time of flip-flops are characterized for different combinations of input slopes, *i.e.* the slopes of the clock input (CK) and the data input (D). The setup/hold-time interdependency curves for the slow



Fig. 5. The setup/hold-time interdependency curve for data and clock input slopes of 50ps at (a) slow corner (Process:SS,  $V_{DD}$ =0.99V, T=125°C) and (b) fast corner (Process:FF,  $V_{DD}$ =1.21V, T=0°C)

| TABLE I                                                     |  |  |  |  |  |
|-------------------------------------------------------------|--|--|--|--|--|
| THE MAXIMUM ERROR OF MODEL AT DIFFERENT CORNERS RELATIVE TO |  |  |  |  |  |
| INPUT SLOPE FOR INPUT SLOPES RANGING FROM 10PS TO 100PS     |  |  |  |  |  |

| Process | $V_{DD}$ | Т              | Setup Time (%) | Hold Time (%) |
|---------|----------|----------------|----------------|---------------|
| FF      | 1.21 V   | $-40^{\circ}C$ | 6.68           | 3.68          |
| FF      | 1.21 V   | $0^{\circ}C$   | 6.44           | 3.45          |
| FF      | 1.21 V   | $125^{\circ}C$ | 5.12           | 2.45          |
| SS      | 0.99 V   | $-40^{\circ}C$ | 7.18           | 6.06          |
| SS      | 0.99 V   | $0^{\circ}C$   | 7.48           | 5.88          |
| SS      | 0.99 V   | $125^{\circ}C$ | 7.44           | 5.28          |

and fast corners according to timing library and with the average input slope (50ps) according to the STA report of the Cortex M0 are illustrated in Fig. 5.

To assess the accuracy of the proposed model, the relative value of the maximum error (*i.e.* pessimism/optimism) with the proposed model at different corners of the timing library for different input slope combinations in the range of 10ps to 100ps are mentioned in Table I for hold time and setup time calculation. As shown in Table I, the maximum pessimism/optimism of the model is less than 10% of input slope in the evaluated range of the slopes.

The model parameters are fitted by following the flow that is discussed in Section IV. The main computational complexity of the model is the four independent characterizations at specific setup/hold-skews. To compare the effectiveness of the proposed model with other methods, the number of required setup/holdtime pairs and the absolute pessimism/optimism is compared to the state-of-the-art methods [2], [4], [8]. The accuracy of the piecewise linear approximation used in these methods increases as the number of points on the actual curve increases. The computation efficiency of the proposed model compared to the piecewise linear approximation is illustrated in Fig. 6 for input slopes equal to 50ps (i.e. in the middle of the slope range). As discussed in Section IV, in the model characterization flow, 4 pairs of setup/hold-time on the interdependency curve are identified in addition to the traditional independent setup time and hold time. Other methods also require to have the traditional setup time and hold time to find other points in the setup/hold-time interdependency curve. Therefore, the additional setup/hold-time pairs are considered as the computation overhead of the methods to characterize setup/hold-time interdependency. According to the results illustrated in Fig. 6, to achieve the same accuracy as the proposed model, ten interdependent setup/holdtime pairs are required by the piecewise linear approximation.



Fig. 6. The computation efficiency of the model compared to the linear approximation.

Therefore, the proposed model is computationally 2.5 (= 10/4) times more efficient than the piecewise linear approximation based methods. Furthermore, with the same number of setup-hold pairs (*i.e.* four pairs) the proposed model is 10X more accurate compared to the piecewise linear approximation based methods.

### B. Optimism Reduction in STA

The static timing report contains the setup/hold-slacks with corresponding flip-flops. The design tool performs optimization based on the slack information in STA, trying to make all the slacks positive to reach timing closure. The timing slacks at different flip-flops of an ARM Cortex M0 after place and routing are illustrated in Fig. 7. As shown in Fig. 7, for most of the flip-flops, both setup slack and hold slack are close to zero, *i.e.* close to the  $d_{ss}$  and  $d_{hs}$  pair corresponding to the minimum capturing input pulse width. Considering the interdependency of setup/hold-time is therefore important in the timing analysis. To identify the critical flip-flops, the proposed model for setup/holdtime interdependency is characterized for each flip-flop considering input slopes to obtain the actual setup/hold-time. Then, the realistic slacks are calculated. Note that the setup slacks are from the slow corner while the hold slacks are from the fast corner. Therefore,  $d_{st,L}$  and  $\tau_{ss}$  are extracted at the slow corner with considering the slopes in the slow corner timing report, while  $d_{ht,L}$  and  $\tau_{hs}$  are extracted at the fast corner with considering the slopes in the fast corner timing report. 18 flip-flops are optimistically reported to have no violation with the commercial design tools, while negative slacks should be reported for them. With the aid of the proposed model and the characterization flow, the accuracy of the STA results is enhanced.

# VI. CONCLUSION

An analytical model for setup/hold-time interdependency is proposed in this paper. The proposed model is based on circuit level parameters for conventional master-slave flip-flops. The accuracy of the model is ~10X higher than the existing piecewise linear approximation based methods. A characterization flow is introduced to find the parameters of the model efficiently. The proposed model characterizes setup/hold-time interdependency with ~2.5X higher computation efficiency compared to the existing piecewise linear approximation based methods. Furthermore, by employing the proposed model to include setup/hold-time interdependency in STA, flip-flops which are reported to meet the timing requirements in conventional design flow are identified to have timing violations in an ARM microprocessor.



Fig. 7. Optimism reduction with the proposed model in the STA report. 18 over-optimistic flip-flops (labeled with asterisk) are identified with the proposed model in an ARM Cortex M0 design.

#### ACKNOWLEDGMENT

This work was partially funded by projects CATRENE CT217 RESIST; ARTEMIS 621429 EMC2, 621353 DEWI, 621439 ALMARVI and ECSEL 692455 ENABLE-S3.

### REFERENCES

- [1] E. Salman, A. Dasdan, F. Taraporevala, K. Kucukcakar, and E. G. Friedman, "Pessimism reduction in static timing analysis using interdependent setup and hold times," in *Proceedings of the 7th IEEE/ACM International Symposium on Quality Electronic Design*, pp. 159–164, March 2006.
- [2] S. Srivastava and J. Roychowdhury, "Independent and interdependent latch setup/hold time characterization via newton-raphson solution and euler curve tracking of state-transition equations," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 27, no. 5, pp. 817–830, May 2008.
- [3] A. B. Kahng and H. Lee, "Timing margin recovery with flexible flipflop timing model," in *Proceedings of the 15th IEEE/ACM International Symposium on Quality Electronic Design*, pp. 496–503, March 2014.
- [4] E. Salman and E. Friedman, "Utilizing interdependent timing constraints to enhance robustness in synchronous circuits," *Microelectronics Journal*, vol. 43, no. 2, pp. 119–127, February 2012.
- [5] N. Chen, B. Li, and U. Schlichtmann, "Iterative timing analysis based on nonlinear and interdependent flipflop modelling," *IET Circuits, Devices & Systems*, vol. 6, no. 5, pp. 330–337, September 2012.
- [6] H. Seo, J. Heo, and T. Kim, "Clock skew optimization for maximizing time margin by utilizing flexible flip-flop timing," in *Proceedings of the 16th IEEE/ACM International Symposium on Quality Electronic Design*, pp. 35– 39, March 2015.
- [7] E. Salman, A. Dasdan, F. Taraporevala, K. Kucukcakar, and E. G. Friedman, "Exploiting setup/hold-time interdependence in static timing analysis," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 26, no. 6, pp. 1114–1125, June 2007.
- [8] S. Hatami, H. Abrishami, and M. Pedram, "Statistical timing analysis of flip-flops considering codependent setup and hold times," *Proceedings of the 18th ACM Great Lakes symposium on VLSI (GLSVLSI)*, pp. 101–106, May 2008.
- [9] T. Sakurai and A. R. Newton, "Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas," *IEEE Journal of Solid-State Circuits*, vol. 25, no. 2, pp. 584–594, April 1990.