FAQΒΆ
Is there any build step parallelization?
No, nengo_mpi only provides parallelization for the simulation step. The build step is where all the really difficult stuff happens, which, for instance, makes an Ensemble act like an Ensemble. Therefore, nengo_mpi simply uses vanilla nengo’s builder, which runs serially in python.
During an invocation such as:
mpirun -np 8 python -m nengo_mpi nengo_script.py
the build step is performed entirely by the process with index 0.
It is definitely possible to create a parallelized version of the builder. However, that should probably use a more python-friendly, platform-agnostic technology than MPI (something like ZeroMQ). In other words, thats another project.
What is the difference between a cluster a component, a partition, a chunk, a process, a processor, and a node? I’ve seen all these words used in the code with apparently similar meanings.
All these terms do in fact have precise meanings in the context of nengo_mpi. They can nicely be divided up into terms that apply at build time and terms that apply at simulation time.
Build Time
- A
cluster(distinct from a cluster of machines in high-performance computing) is a group of nengo objects that must be simulated together, for any of a number of reasons (see the class NengoObjectCluster in partition/base.py). The most prominent reason is that there is an path of Connections between the two objects that does not have a synapse (since synapses are the main source of “update” operators; see How It Works). Another common reason is that the two objects are connected by a Connection which has a learning rule. The partitioning step applies a partitioning function to a graph whose nodes areclusters. - A
component(as in a component of a partition) is a group ofclustersthat will be simulated together.Componentsare computed by the partitioning step. When creating an instance ofnengo_mpi.Simulator, we typically specify the number ofcomponentsthat we want the network to be divided into. When nengo_mpi saves a network to file for communication with the C++ code, eachcomponentis stored separately. - A
partitionis a collection ofcomponents. The goal of the partitioning step is to create a partition of the set of clusters, in the sense used here. High-quality partitions are those which do not assign drastically different amounts of work to different components, and which minimize the amount of communication between components.
- A
Simulation Time
- A
processis, of course, an OS abstraction for a line of computation. Aprocessoris a physical computation device.Processesrun onprocessors. It is generally possible to run a nengo_mpi simulation using moreprocessesthan there areprocessorsavailable on the machine, however the amount of parallelization we can obtain is determined by the number of physicalprocessors(though hyperthreading can increase the effective number ofprocessors). The number ofprocessesused to run a simulation is specified by the-np <NP>command-line argument when callingmpi_run. - A
chunk(seechunk.hpp) is the C++ code’s abstraction for a collection of nengo objects (actually, signals and operators corresponding to those objects) that are being simulated by a singleprocess. There is a one-to-one relationship betweenchunksandprocesses. One of the first things that eachprocessdoes is create achunk. - The relationship between
chunks/processesandcomponentsis as follows. At build time the network is divided into some specified number ofcomponentsby partitioning. At simulation time, some specified number ofchunks/processeswill be active.Componentsare assigned tochunks/processesin a round-robin fashion. For example, if there are 4chunks/processesactive and the network to simulate has 7 components, thenprocess 0simulates components 0 and 4,process 1simulates 1 and 5, etc. If the network instead had only 3components, thenprocess 3would be left without anything to simulate, which is perfectly OK. - In the world of High-Performance Computing, a
node(distinct from a nengo Node) is a physical computer consisting of some number ofprocessors. On the General Purpose Cluster there are 8 processors per node and on Bluegene/Q there are 16 (that becomes 16 for GPC and 64 for BGQ once hyperthreading is taken into account). When running on one of these high-performance clusters, jobs are assigned computational resources in units ofnodesrather thanprocessors.
- A