Page 480 - Maxwell House
P. 480

460                                                                Chapter 9



                                                          The more  efficient  approach is
                          Control        Data             Multiple Instruction stream and
                           unit 1      processor 1        Multiple Data streams (MIMD)
                                                          architectures        shown
                          Control        Data             schematically  in    Figure
           Main            unit 2      processor 2        9.2.2b .    Such execution
                                                               17
          memory                                          corresponds   to   program
                                                          parallelization  by   code
                                                          decomposition among parallel
                          Control        Data             processors that allow achieving
                           unit n      processor n        approximately equal execution
                                                          time  of   the  decomposed
                                                          program   code   parts.  A
                                                          modern HPC is almost always
                                                          a cluster of MIMD  machines,
             Figure 9.2.2b Block-diagram of MIMD system
                                                          each of  which implements
                                                          SIMD instructions.
        While the concept of parallel computation is simple, it requires in practice specialized and
        highly sophisticated software and algorithms engineered for this purpose. Fortunately, many
                                               electrodynamics  algorithms  including  FDTD
                                               can  be carried  out  as  a parallel  computation.
                                               Figure 9.2.3  gives some guidance illustrating
                                                         18
                                               the  FDTD    performance  vs.  memory
                                               requirement on a different platform [11, 13].

                                               9.2.4        GPU and Cache Acceleration

                                               What may be your approach if you do not have
                                               access to HPC?  Do not  panic. Modern
                                               workstations and even laptops include

            Figure 9.2.3 FDTD performance vs.   multiple processing cores and Graphic
                  memory requirement           Processing Units (GPUs). If there are no high-
                                               performance GPUs in your computer, replace
        the old ones with more modern ones or only add extra using the free computer slots. GPU
        numerous cores can handle thousands of threads, i.e. the smallest sequence of programmed
        instructions that can be managed independently. Threads are a way for a program to divide
        itself into two or  more simultaneously (or pseudo-simultaneously) running tasks. Typical
        architecture of GPU consists of thousands of small cores, designed for handling multiple tasks
        in parallel.  That is exactly we need to realize the fast parallel processing. Therefore, EM solver
        that take advantage of GPU should be your first choice. Currently, all the leading 3D EM
        modeling and simulation packages such as  CST STUDIO SUITE®, CST MICROWAVE
                                                                                 TM
        STUDIO®, IMST - Empire XCcel, Altair - FECO, ANSYS – HFSS, Acceleware - AxFDTD ,


        18  Public Domain Image, source:
        www.lumerical.com/support/whitepaper/parallel_processing_fdtd_solutions.html
   475   476   477   478   479   480   481   482   483   484   485