Getting Started


As of this writing, nengo_mpi is only known to be usable on Linux. Installation is straightforward. On an Ubuntu workstation (as opposed to a cluster) an OpenMPI based installation can be obtained by first installing dependencies:

sudo apt-get install openmpi-bin libopenmpi-dev libhdf5-openmpi-dev
sudo apt-get install libboost-dev libatlas-base-dev

Then download the code:

git clone

And install nengo_mpi and all its python dependencies (including nengo):

cd nengo_mpi
pip install --user .

or if inside a virtualenv:

cd nengo_mpi
pip install .

And run the test to make sure it works:


See Installing nengo_mpi for more detailed installation instructions, especially for installing on high-performance clusters.

Adapting Existing Nengo Scripts

Existing nengo scripts can be adapted to make use of nengo_mpi by making just a few small modifications. The most basic change that needs to be made is importing nengo_mpi in addition to nengo, and then using the nengo_mpi.Simulator class in place of the Simulator class provided by nengo

import nengo_mpi
import nengo

... Code to build network ...

sim = nengo_mpi.Simulator(network)


This will run a simulation using the nengo_mpi backend, but does not yet take advantage of parallelization. However, even without parallelization, the nengo_mpi backend can often be quite a bit faster than the reference implementation (see our Benchmarks) since it is a C++ library wrapped by a thin python layer, whereas the reference implementation is pure python.


In order to have simulations run in parallel, we need a way of specifying which nengo objects are going to be simulated on which processors. A Partitioner is the abstraction we use to do this specification. The most basic information that a partitioner requires is the number of components to split the network into. We can supply this information when creating the partitioner, and then pass the partitioner to the Simulator object:

partitioner = nengo_mpi.Partitioner(n_components=8)
sim = nengo_mpi.Simulator(network, partitioner=partitioner)

The number of components we specify here acts as an upper bound on the effective number of processors that can be used to run the simulation.

We can also specify a partitioning function, which accepts a graph (corresponding to a nengo network) and a number of components, and returns a python dictionary which gives, for each nengo object, the component it has been assigned to. If no partitioning function is supplied, then a default is used which simply assigns each component a roughly equal number of neurons. A more sophisticated partitioning function (which has additional dependencies) uses the metis package to assign objects to components in a way that minimizes the number of nengo Connections that straddle component boundaries. For example:

partitioner = nengo_mpi.Partitioner(n_components=8, func=nengo_mpi.metis_partitioner)
sim = nengo_mpi.Simulator(network, partitioner=partitioner)

For small networks, we can also supply a dict mapping from nengo objects to component indices:

model = nengo.Network()
with model:
    A = nengo.Ensemble(n_neurons=50, dimensions=1)
    B = nengo.Ensemble(n_neurons=50, dimensions=1)
    nengo.Connection(A, B)

assignments = {A: 0, B: 1}
sim = nengo_mpi.Simulator(model, assignments=assignments)

Note, though, that this does not scale well and should be reserved for toy networks/demos.

Running scripts

To use the nengo_mpi backend without parallelization, scripts modified as above can be run in the usual way


This will run serially, even if we have used a partitioner to specify that the network be split up into multiple components. When a script is run, nengo_mpi automatically detects how many MPI processes are active, and assigns components to each process. In this case only one process (the master process) is active, and all components will be assigned to it.

In order to get parallelization we need a slightly more complex invocation:

mpirun -np NP python -m nengo_mpi

where NP is the number of MPI processes to launch. Its fine if NP is not equal to the number of components that the network is split into; if NP is larger, then some MPI processes will not be assigned any component to simulate, and if NP is smaller, some MPI processes will be assigned multiple components to simulate.