Page 481 - Maxwell House
P. 481

APPROACH TO NUMERICAL SOLUTION OF EM PROBLEMS                           461



            SPEAG  -  SEMCAD X,  Remcom  -  XFdtd  support GPUs-enable  acceleration.  The GPU
            acceleration factor of 10  is not the limit and depends on the project complexity. Figure 9.2.4
                               3
                                                                                      18
            [14] demonstrates how  great  FDTD could be.  The image in Figure 9.2.4a shows E-field
            distribution inside the car and nearby the driver head while he speaks on a cell phone. Figure
            9.2.4b pictures the same around the phone itself and in driver’s hand holding the phone. The
            computation model includes the digital image of the car, cell phone, driver’s head and hand.
                                                                        The performance
                                                                        results  depend  on
                                                                        what kind of GPU
                                                                        has been used and
                                                                        are   shown   in
                                                                        Figure 9.2.5 . The
                                                                                  18
                                                                        violet     curve
                                                                        corresponds   to
                                                                        2xM2050 (Fermi),
                                                                        orange  to C2050
                                               a)                 b)    (Fermi), dark pink

                Figure 9.2.4 FDTD E-field simulation results: a) Car, driver and   to C1060 (Tesla),
                            cell phone, b) Cell phone in hand           and   green   to
                                                                        Nehalem SW.  We
            refer the reader for details to the original publication [14]. Pay attention that the numerical
                                                                   9
            models were up to 90 million cells and executed with speed up to 10  cells per second.
            Now it is time to look closer at memory hierarchy and how 3D solves use it. Modern computer
            processors   have  multiple
            levels/cache  of  memory, each
            level commonly being  a
            different  size and  speed.  The
            first level of the cache has the
            smallest capacity, but the
            highest data access speed often
            being located on the same chip
            as  the   processor.   Each
            subsequent layer gets slower
            being  farther  from  the
            processor  and  (generally)
            larger in size. The fundamental
            purpose of cache memory is to   Figure 9.2.5 FDTD performance vs. model sizes

            store program instructions. That allows for a many-fold speed up over Static Random Access
            Memory (SRAM ) - based solutions. We did not find the information about cache usage by
                          19
            any of the mentioned above simulation packages except IMST - Empire XCcel. According to
            IMST GmbH report (November 30, 2016) “… a new speed record has been obtained with
            EMPIRE XPU. On a quad Intel Xeon server, a simulation performance of up to 21000 Mcells/s

            18  Public Domain Image, source [14]. Permission to reproduce from Prof. M. Okoniewski.

            19  SRAM is used for computer cache memory and provides faster access to data.
   476   477   478   479   480   481   482   483   484   485   486