Page 479 - Maxwell House
P. 479

APPROACH TO NUMERICAL SOLUTION OF EM PROBLEMS                           459



                                                    If so, we can apply the LEGO decomposition
                                                    cutting the computation  domain into
                                                    subdomains like Figure 9.2.2 illustrates and
                                                    sending each subdomain for  a  numerical
                                                    update to an independent processor. As soon
                                                    as all processors finish their task, they
                                                    transfer their field data to the core processor
                                                    that reconciles and updates the received E-
              Figure 9.2.2 Cube subdivided into smaller   and  H-fields  through  the  boundary
                          subdomains
                                                    conditions on the adjacent edges or nodes.

            9.2.3   Parallel Processing

            The parallel processing is the equivalent of LEGO decomposition when a numerical algorithm
            can be fragmented. Then each of independent parts is executed simultaneously on different
            processors attached to the same or multiple computers connected by a network. Such network
            of working in parallel computers forms one HPC with hundreds or thousands of processors and
                                                             shared memory.  The computer
                                                             can be independent developing a
                                     Data        Local
                                   processor 1  memory 1     distributed system that  can  run
                                                             multiple tasks simultaneously or
                                     Data        Local       a single HPC  with  multiple
                                   processor 2  memory 2
                Main    Control                              processors solving  a single
              memory     unit                                problem.  It makes no real
                                                             differences for the user because
                                     Data        Local       it appears like a single computer
                                   processor n  memory n
                                                             and interface in both cases. The
                                                             following material is just a very
                                                             brief synopsis. More details can

                 Figure 9.2.2a Block-diagram of SIMD system   be found in  the  specialized
                                                             literature [3, 10]. Keep in mind
            that this area is extremely dynamic and literally changes every hour if not a minute. The block
            diagram  of  a  computer  system  with Single Instruction  stream  and  Multiple Data streams
            (SIMD) typical for single HPC is illustrated schematically in Figure 9.2.2a . In this system,
                                                                         17
            the initial data and instructions how to execute the program come from the common control
            unit to each data processor. The results of each data processing go back and forth to local and
            main memory  until the simulation process is finished.  The procedure of such exchange is
            managed by the control unit. As any centralized structures, SIMD has several drawbacks like
            not all algorithms can be parallelized easily, difficult to synchronize and optimize the data
            transfer between processors and main memory, etc. The best results could be reached if the
            execution time of programs in all processors is the same or very close. Only then, the processors
            do not mutually delay their actions [10].





            17  Public Domain Image, source: https://edux.pjwstk.edu.pl/mat/264/lec/index121.html. We used some
            text information from this website too.
   474   475   476   477   478   479   480   481   482   483   484