Hints, Tips and Solutions – February 2007

Q. How Can I significantly Reduce Circuit Parasitics Netlist Extraction Time?

A. SILVACO has recently released a new suite of parasitic extraction tools to meet the demands of state-of-the-art designers at cell, circuit and chip level. After having proved [1] [2] [3] the accuracy of these tools, SILVACO now focuses his attention to decrease the simulation time, by taking benefit from Multi-CPUs computing architectures.

The results presented here have been obtained with STELLAR, a 3D-based Field Solver with full-chip capacitance extractor. The software uses an advanced numerical method, the so-called fictitious domain method, which is based on the decomposition of the simulation domain into sub-domains. The parallel version of STELLAR accepts a command line option -P n, which allows running the m sub-domains simulations in parallel on n CPUs (n being the number of requested CPUs). Simulations were done by STMicroelectronics Crolles France on a SunOS 5.8, 16 CPUs, Sun-Fire-V890. The layout used for this study had the following characteristics: 106×230 um2 area, 7 metal layers and 6 via layers. In the following text and figures, the CPU time is the total on-CPU time as measured by a UNIX ps or top command, while the Wallclock time is the real-world time, as measured by a watch.

The decomposition algorithm is sufficiently robust to give a very limited (3%) variation of the capacitance with the number of sub-domains represented by a decomposition step d (Figure 1). High value of d correspond to a low number of sub-domains. It is also clear from Figure 1 that CPU time increases with d. As a consequence it was decided to set d to 1.91 corresponding to 90 sub-domains. It has been verified that for a given decomposition, the capacitance does not vary with the number of requested CPUs.

As can be seen in Figure 2, the parallelization is very good (near the theoretical limit) for a number of requested CPUs around 8. Running on 12 CPUs leads to a gain time of a factor 9. The layout used for this benchmark was relatively small. The advantage of parallelization over a larger number of CPUs maybe even more advantageous for larger structures. This result is achieved easily (no preliminary optimization runs) if the decomposition step d is chosen such that a large number of subdomains is obtained (90 in this case). In other words the parallelization is fully exploited only if n is significantly lower than m.