Page 481 - Maxwell House

P. 481

APPROACH TO NUMERICAL SOLUTION OF EM PROBLEMS 461

SPEAG - SEMCAD X, Remcom - XFdtd support GPUs-enable acceleration. The GPU
acceleration factor of 10 is not the limit and depends on the project complexity. Figure 9.2.4
3
18
[14] demonstrates how great FDTD could be. The image in Figure 9.2.4a shows E-field
distribution inside the car and nearby the driver head while he speaks on a cell phone. Figure
9.2.4b pictures the same around the phone itself and in driver’s hand holding the phone. The
computation model includes the digital image of the car, cell phone, driver’s head and hand.
The performance
results depend on
what kind of GPU
has been used and
are shown in
Figure 9.2.5 . The
18
violet curve
corresponds to
2xM2050 (Fermi),
orange to C2050
a) b) (Fermi), dark pink

Figure 9.2.4 FDTD E-field simulation results: a) Car, driver and to C1060 (Tesla),
cell phone, b) Cell phone in hand and green to
Nehalem SW. We
refer the reader for details to the original publication [14]. Pay attention that the numerical
9
models were up to 90 million cells and executed with speed up to 10 cells per second.
Now it is time to look closer at memory hierarchy and how 3D solves use it. Modern computer
processors have multiple
levels/cache of memory, each
level commonly being a
different size and speed. The
first level of the cache has the
smallest capacity, but the
highest data access speed often
being located on the same chip
as the processor. Each
subsequent layer gets slower
being farther from the
processor and (generally)
larger in size. The fundamental
purpose of cache memory is to Figure 9.2.5 FDTD performance vs. model sizes

store program instructions. That allows for a many-fold speed up over Static Random Access
Memory (SRAM ) - based solutions. We did not find the information about cache usage by
19
any of the mentioned above simulation packages except IMST - Empire XCcel. According to
IMST GmbH report (November 30, 2016) “… a new speed record has been obtained with
EMPIRE XPU. On a quad Intel Xeon server, a simulation performance of up to 21000 Mcells/s

18 Public Domain Image, source [14]. Permission to reproduce from Prof. M. Okoniewski.

19 SRAM is used for computer cache memory and provides faster access to data.

476 477 478 479 480 481 482 483 484 485 486