Performance Evaluation of a New Hybrid MPI-thread Parallelized Direct Solver

Background

The solution of linear systems lies in the core of any TCAD simulation. On any nonlinear step of the computation a linear system needs to be solved. The size and condition number of the matrices in these linear systems vary significantly depending on the specific type of TCAD simulation. So in order to achieve fast convergence it is required that the linear solver has good performance, good accuracy, can handle cases of ill-conditioned matrices, and it would be nice if the solver works well on any size linear system.

The two main types of linear system solvers are Direct and Iterative solvers. The pros and cons of the two types of solvers are respectively : Direct solvers are very accurate but can require large amounts of memory for large size problems for example 3D problems and their performance for such problems is usually not very good. Iterative solvers on the other hand are less accurate compared to direct solvers can diverge for linear systems with ill-conditioned matrices, but have very good performance and are designed to handle large size problems.

The question is can we have a direct solver that can handle large 3D problems, not have excessive memory requirements and have performance similar to an iterative solver. And the answer is yes if we split the large problem into smaller ones and design a parallel direct solver with several levels of parallelism.