Open MPI over SRIO

Version 1.0.0.21

User Guide

Last updated: 09/18/2015

1 Introduction
- 1.1 Terminology
- 1.2 Topology
2 Open MPI over SRIO
- 2.1 Setup
- 2.2 Testing MPI over SRIO
3 Some MPI SRIO BTL internal details (ti_Open MPI_1.0.0.21)
4 Additional utilities
5 Establishing any to any connectivity with SRIO fabric using packet forwarding
- 5.1 Type 11 Message Routing
- 5.2 Routing Tables in Action
6 Useful SRIO utilities
7 Linux Kernel, U-boot and DTS changes
8 Known Issues & Limitations
9 FAQ

Introduction[edit]

The RapidIO architecture is a high-performance packet-switched, interconnect technology. RapidIO supports both messaging and read/write semantics. RapidIO fabrics guarantee in-order packet delivery, enabling power- and area- efficient protocol implementation in hardware. Please refer to: [1]

Terminology[edit]

The following terminology is being used in this page

Link Partner: One end of a RapidIO link.
Endpoint: A device that can originate and/or terminate RapidIO packets.
Processing Element: A device which has at least one RapidIO port
Switch: A device that can route RapidIO packets.
Node: A K2H device.
Cartridge: A group of interconnected nodes.
Topology: A group of nodes/cartridges connected via SRIO in a particular fashion.
Cluster: A subset of nodes within the topology between which SRIO communication takes place

Topology[edit]

A Topology is a set of nodes connected via SRIO in a particular manner. In some cases, all the nodes in a topology could be connected to a switch. In some other cases there may be no switch at all. In the latter, the nodes may be connected to each other through their SRIO ports in some fashion which. It could take up the form of a ring, 1-D torus, 2-D torus etc.

Below are a few examples of some topologies.

In non-switch topologies, where there may not be a direct SRIO link between two nodes in the topology, SRIO communication is possible if the intermediate nodes in the path (between source & destination node) does forwarding of the packets bound to to the source and destination nodes back and forth. This feature is called 'packet forwarding' and is detailed in another section [2]

A topology could be represented in a readable JSON file format which details the cartridges , nodes and the SRIO connections between them. Representing a topology in this manner would help generate routing algorithms to facilitate SRIO communication between nodes. More details with an example can be found in section [3]

Open MPI over SRIO[edit]

Texas Instruments ti-Open MPI (based on Open MPI 1.7.1) includes SRIO BTL based on SRIO DIO transport, using Linux rio_mport device driver.

Here are the steps to install and test the same.

Setup[edit]

The setup can be divided in to two parts.

Boot time[edit]

Boot all nodes in the cluster simultaneously or in quick succession(a few seconds in apart). Please make sure all the relevant mports have come up after Kernel is b.i.e please check if /dev/rio_mport<n> where n=0,1,2,3, depending on what SRIO ports are actually connected to the adjacent node. To ensure successful registration of mports, please make sure to boot all the cluster nodes in succession.(a few seconds apart). Please note that large delays between boot up of nodes could lead to ports not being registered at boot up and srio initialization might fail. If all the rio_mports have not appeared (for every SRIO port physically connected to the adjacent node), some error happened at the booting, whose clues may lie in the boot log (use dmesg command to analyze). MPI cannot run if all mports have not appeared correctly.

Before running MPI (one time)[edit]

Open MPI specific for TI K2H chips can be obtained from PPA using following command (please note that same package is obtained and installed as part of keystone-hpc package installation):

 sudo apt-get install ti-Open MPI

After this step, one additional, manual step is required (should be performed ONE time by HW vendor). Hardware defined connectivity need to be described by special JSON file (as defined in below sections) and compiled (ONE time) to srio_topology.bin.A sample json file is presented here File:SRIO evm.json.zip This is a netlist of SRIO links, used by routing algorithm to program K2H fabric. It needs to be compiled into binary form using topologyJson2bin utility(installed as a part of keystone-hpc in to /usr/bin/) . More details on this step could be found at the section [4].

 topologyJson2bin <topology-json-file>  srio_topology.bin

MPI run-time is using K2H hostname (standard Linux hostname) to indicate processing node, so translation between hostname and physical location in SRIO network is required. At the moment, this is done using static translation table: srio_hosts file is a text file which maps the node IDs to host names in the system. Every node in the topology is represented as

<cartridge-id> <node-number> <hostname>

The known hosts file needs to be named srio_hosts. It is common across all nodes and need to be created only once by the user. First number in row indicates cartridge number, second number indicates node on that cartridge, third string indicates hostname. Cartridge number and node number correspond to the nomenclature used in topology file (JSON file). For example, for 8 nodes, the entries in the srio_hosts file will look like below.

 1 1 c1n1
 1 2 c1n2
 1 3 c1n3
 1 4 c1n4
 4 1 c4n1
 4 2 c4n2
 4 3 c4n3
 4 4 c4n4

A sample srio_hosts file can be found at File:Srio hosts.zip After creating the above two files, move them to the /etc/cluster folder of the K2H Linux, for use by MPI.

 cp srio_hosts /etc/cluster
 cp srio_topology.bin /etc/cluster

In MPI run-time[edit]

MPI is using common framework (transport independent) to fork processes on all nodes. It is based on SSH (over GigE network). Information with all run-time parameters are exchanged (from master node) with participating nodes - they include number of processes and list of hosts involved. This is actually list of host names matching entries in /etc/hosts (or their IP addresses). In order to translate this to SRIO ID, we need information about physical location. Entries in file /etc/cluster/srio_hosts are used for this purpose (as mentioned earlier). Packet forwarding tables are initialized at this point - please note that only packet forwarding tables in MPI communication world nodes are programmed!!! This means that it is not possible to use disjoined nodes (graph has to be connected) in MPI communication world - since routing between these nodes is not guaranteed (may depend on previous state).

Testing MPI over SRIO[edit]

To test the MPI, run the nbody example which comes along with the HPC release.

Checklist before Test[edit]

Please ensure the following before running MPI tests

Well Connected Nodes: Nodes participating in the test need to be connected, i.e. disconnected islands of nodes are not allowed (otherwise programming of packet forwarding tables is not possible).
Mports: Please make sure all the relevant mports have come up.i.e please check if /dev/rio_mport<n> where n=0,1,2,3, depending on what SRIO ports are actually connected to the adjacent node. To ensure successful registration of mports, please make sure to boot all the cluster nodes in succession.(a few seconds apart). Please note that large delays between boot up of nodes could lead to ports not being registered at boot up and srio initialization might fail. If all the rio_mports have not appeared (for every SRIO port physically connected to the adjacent node), some error happened at the booting, whose clues may lie in the boot log (use dmesg command to analyze)
srio_hosts & srio_topology.bin: As explained in the setup section[5], please ensure these files are present in /etc/cluster directory of ALL the participating nodes.

A few steps such as the extraction of list of participating cartridges (from hostnames), SRIO ID assignment and programming of K2H packet forwarding tables are incorporated into MPI run-time so that no manual steps are required by the user.

Build the Nbody example[edit]

The nbody example's source code is available as a part of MPI installation. The source code is present in /usr/share/ti/examples/Open MPI/nbody. On one of the nodes, copy this source code in to the home directory and compile it

cd ~
cp -r /usr/share/ti/examples/Open MPI/nbody .
cd ~/nbody
make

Copy this nbody executable in to all the nodes participating in the test

 scp -r ~/nbody $2@$1:~/

Change directory to the location of the nbody executable

 cd ~/nbody/

NOTE: For convenience, the above steps have been put in to a script File:Prep mpi nbody.zip which could be run from one of the nodes. The usage is

./prep_mpi_nbody.sh <remote_node_ip_addr> <user name>

Run this script on one of the nodes, for all remote nodes.

for example, from node 1,

 ./prep_mpi_nbody 10.218.109.131 mpiuser
 ./prep_mpi_nbody 10.218.109.132 mpiuser
 ./prep_mpi_nbody 10.218.109.133 mpiuser

where 10.218.109.131 132,133 are the ip addresses of the nodes participating in the test. This script builds the nbody example on a node and copies the same to the remote node. This script also copies the srio_hosts and srio_topology.bin created earlier as well.

Run the test[edit]

Testing on 4 nodes (c1n1,c1n2,c1n3,c1n4)

On one of the nodes, issue the following command:

 /opt/ti-Open MPI/bin/mpirun --mca btl self,srio -np 4 -host c1n1,c1n2,c1n3,c1n4 ./nbody 1000

If successful, the mpirun exits with an output similar to below

 Simulation of 2.000000 seconds done in 0.745069 seconds

Make sure that c1n1, c1n2 etc are the host names present in the srio_hosts file.
Testing on 12 nodes (c1n1,c1n2,..,c4n1,...,c4n4,c7n1,...,c7n4)

 /opt/ti-Open MPI/bin/mpirun --mca btl self,srio -np 12 -host c1n1,c1n2,c1n3,c1n4,c4n1,c4n2,c4n3,c4n4,c7n1,c7n2,c7n3,c7n4 ./nbody 1000

Make sure that all the hostnames mentioned here (c1n1, c1n2...c7n4) are the host names present in the srio_hosts file.

SRIO BTL, performance tests (between two nodes)

To do SRIO BTL performance tests, run the following

 /opt/ti-Open MPI/bin/mpirun --mca btl self,srio -np 2 -host c1n1,c1n2 ./mpptest -sync logscale

Please note that optional MCA parameters "--mca orte_base_help_aggregate N1 --mca btl_base_verbose N2" could be appended to the mpirun command above and can be used to tune the verbosity.

This effectively tests basic MPI operation.

Some MPI SRIO BTL internal details (ti_Open MPI_1.0.0.21)[edit]

Here is a glimpse of how MPI works over SRIO internally

MPI BTL relies on Keystone2 Navigator HW capabilities. It is built on top of custom PDSP firmware, custom SRIO user-space driver (part of MPI distribution) and MCSDK standard QMSS LLD. SERDES initialization is done by rio_mportX kernel driver at boot time.
All SRIO traffic is performed via Type 11 (SRIO terminology) messages that are up to 2KB big.
Neighboring K2H nodes communicate directly over SRIO. Nodes that are not directly connected exchange messages with the help of PDSP RISC processors in transfer nodes (no A15 action needed).
Major role in SRIO packet routing (w/o A15 intervention) is performed by single dedicated PDSP RISC processor (in each K2H node).
In BTL function mca_btl_srio_add_procs, list of cartridges is created based on hostnames received from pml/bml layer. *Information from file /etc/cluster/srio_hosts is used to identify physical location of participating nodes. This list is used to create cluster routing table, which in turn is used by each node to set its own SRIO ID and packet forwarding table.
MPI BTL keeps track for each endpoint in communication world about its SRIO destination ID and local outgoing port (pls note that packet forwarding tables on each node are set to guarantee path between each participating node). Offline verification of path existence (with routing details) can be done using tool routingTableGenTest (deployed in /usr/bin).
MPI BTL SRIO has optional MCA parameter "--mca btl_srio_pdsp_credit_period <1|2|3|4>" (default is 1) that can be used to tune multi-hop BW with some constraints. Higher performance is achieved with credit period of 4, which means that credit packet is sent one per four message packets. In this case size of non-blocking messages should be limited to 1-2MB (due to limited size of internal SRIO dedicated buffers).

For more details, please refer to the Design section [6]

Additional utilities[edit]

During configuration and installation phase, three additional utilities are compiled and deployed in /usr/bin:

pktfwdConfig used to manually configure packet forwarding table on same K2H.
topologyJson2bin used to convert (human readable) JSON topology file to binary format used by routing algorithm.
routingTableGenTest is a tool to check path existence for group of cartridges and routing table. This tools also perform s a SRIO walk from every node to every other node, listing the nodal path taken(including max number of hops).

These utilities are more explained here [7]

Establishing any to any connectivity with SRIO fabric using packet forwarding[edit]

Communicating from any node to any node via SRIO involves PDSP doing routing on all the intermediate SRIO nodes.

Type 11 Message Routing[edit]

There are several types of SRIO data traffic that is supported by K2H device: DIO, Type 9 and Type 11.
In this version of MPI, Type 11 traffic is used. MPI fragments (64KB for BTL SRIO) are chunked into Type 11 messages up to 2KB big. Traffic between neighboring nodes is directed through QMSS/SRIO HW blocks, both on TX and RX side. But for complex topologies, like 2D torus, multiple hopes need to be made by a message before reaching the destination.
Routing of the messages is performed by grid of PDSPs (dedicated RISC processors in each K2H node), which are looking into first 32-bit word of each message to find destination and source SRIO IDs (8-bit each).

PDSP is using static pre-computed (by A15 and downloaded to PDSP memory in the configuration stage) routing look-up tables to find outgoing SRIO port (4 per K2H node). PDSP is also checking if a neighboring node is destination node. In that case, we use different mailbox to send Type 11 message directly to destination/neighboring node.

Topology Creation[edit]

This step analyzes the SRIO topology (SRIO hardware layout), assigns unique SRIO IDs to all the nodes and stores the information binary file format. This topology binary file will be used by the routing table generation algorithm in the next step. This step involves 3 steps
1) JSON file representation: The SRIO hardware layout, detailing the nodes and their SRIO connections need to be represented in a readable .json file format. This file lists all cartridges, and all nodes within those cartridges, along with their connection details. A sample .json file entry would look like this

cartridges" : [
{ "name": "c1", "nodes":
 [{"name": "c1n1", "connections": [{ "port0": "c1n2"},{ "port1": "c1n4"},{ "port2": "none"},{ "port3": "none"}]},
  {“name": "c1n2", "connections": [{ "port0": "c1n1"},{ "port1": "c1n3"},{ "port2": “c2n2"},{ "port3": "none"}]},
  {"name": "c1n3", "connections": [{ "port0": "c1n2"},{ "port1": "c1n4"},{ "port2": “none"},{ "port3": "none"}]},
  {"name": "c1n4", "connections": [{ "port0": "c1n1"},{ "port1": "c1n3"},{ "port2": "none"},{ "port3": "none"}]},
 ]
},
{ "name": "c2", "nodes":[
{"name": "c2n1", "connections": [{ "port0": "c2n2"},{ "port1": "c2n4"},{ "port2": "none"},{"port3":"none"}]},
{"name": "c2n2", "connections": [{ "port0": "c2n1"},{ "port1": "c2n3"},{ "port2": "c1n2"},{ "port3": “none"}]},
{"name": "c2n3", "connections": [{ "port0": "c2n2"},{ "port1": "c2n4"},{ "port2": "none"},{ "port3": “c1n2"}]},
{"name": "c2n4", "connections": [{ "port0": "c2n1"},{ "port1": "c2n3"},{ "port2": "none"},{ "port3": "none"}]},
]},

2) Run the topology definition utility which will - Parse the .json file for topology representation errors if any. - Assign X-Y coordinates for all the cartridges in the topology - Assign unique SRIO IDs for all nodes as fn(X,Y,Node-id) - Store the above information in a .bin file (eg: srio_topology.bin)

The above topology.bin file will be used for routing table generation algorithm at run-time

Routing Table Generation[edit]

Once the topology is created, a routing table algorithm is run on a cluster to generate routing table entries for all the nodes in the cluster. The following steps need to be followed,

Define Cluster[edit]

A cluster comprises of a set of nodes/cartridges between which SRIO communication will take place. For example, lets assume a topology contains about 50 cartridges, each with say 4 nodes. Of those 50 cartridges, we pick a subset of cartridges, say {c1,c3,c5,c8} and choose to communicate using SRIO amongst each other. i.e SRIO communication is possible between any two nodes in this cluster. Once we define a cluster like that above, a routing table algorithm needs to be run on it to generate routing tables for all the nodes in the cluster.

Routing Table Algorithm[edit]

The routing algorithm tries to find a way between any two nodes in the system,and tries to come up with a routing table for each node in the cluster, such that every node is reachable to every another node in the cluster.

The routing algorithm happens in two steps.

Finding Distances between any two cartridges[edit]

A recursive distance finder is run between all two cartridges to find the distance to another cartridge in all directions. Please note that, after the topology parsing step, X-Y values were assigned to all cartridges in the topology. The distance finder tries to find the distance between the distance between two cartridges, in all directions (Left/Right/Up/Down) in the X-Y coordinate system. To do this, a recursive process is initiated such as 1) Start with one cartridge 2) Find the lowest distance path to all destination cartridge in the cluster, in all directions (left,Right,Up,Down) and note down the outgoing port. i.e distance(src,dest) = MIN( distance(up(src), dest), distance(down(src), dest), distance(left(src), dest), distance(right(src),dest) where up/down/right/left(cartridge) is the adjacent cartridge located up/down/right/left in the cluster 3) This process is continued until all the distances from any cartridge to any cartridge is populated.

Here is a pictorial illustration of how the recursive depth finder works

In this above picture, the goal was to find the distance from C6 (src) to C16(dest). The distance finder search expands first to its neighboring nodes C2,C5,C7,C10 who in turn expands the search recursively to their neighboring nodes and so on, until the destination is reached. As far as the source,i.e C6 is concerned, the distance to C16, if branched out in all four directions, is calculated and stored. Also the minimum depth amongst them is found and will be used to reach the destination. This process is done for all starting cartridge to all other cartridge in the cluster.

Please note that, the above recursive search stops at an inactive cartridge, i.e a cartridge which is not a part of the cluster. Also, special care is taken so that there are no loops ( repetition of cartridges) in a path from one cartridge to the another.

Generate Routing Table entries[edit]

Once the minimum distance from every cartridge to every other cartridge in the cluster is found, routing table for every node can be populated the following way

1) Start with a node ID #1, (C1)
2) For every destination nodeID in the cluster,
    2.1) Find the min distance path (calculated earlier using recursive distance finder) and the outgoing SRIO port.
    2.2) Append the {nodeID, port} to the existing router Table Entries. If it cant be added to the current range, add a new   range.
    2.3) The nodeID is incremented sequentially so that it could be added to routing table easier
    2.4) Do the above steps until the routing tables for all the nodes are populated

Return value[edit]

The algorithm is successful if it can generate routing tables (<=8 per node) which would enable every node to every other node communication within the cluster. If so, it returns 0 signalling success. The routing algorithm fails if

Some nodes are not reachable by others (if there are no connection to other nodes, typical of a 'cartridge island' in the cluster)

In both the cases above, error codes are returned, receiving upon which the user can either change the cluster, or fall back to non SRIO modes of communication in that cluster.

Current Routing Algorithm Characteristics[edit]

Optimized for Distance Of the four paths from a node to another node(via all four ports), the one with least distance is chosen and added to the routing table. This results in minimum node-hops between every node to the other.
Tradeoffs

Distance optimized routing increases number of routing Table entries (Note: max entries allowed is 8) The more discontinuous the cartridges in the cluster are, the higher number of entries which could lead to algorithm failure in some clusters.

Alternate Routing Algorithms[edit]

Optimized for Entries[edit]

Instead of choosing the minimum distance path from one node to the another, of the four ports from a node to another node, the one with easily adds to the current range in the table is chosen, avoiding a new entry.
Tradeoffs
Could add extra node-hops which may affect performance

X-Y Algorithm[edit]

Another approach is to use the X-Y positions of nodes/cartridges in the X-Y grid and come with a fixed procedure to reach one node to the another. For example, from a cartridge (x1,y1) to another cartridge (x2,y2), move to the right/left to match X i.e x2, then move up/down to match Y,i.e y2. The advantage of this algorithm is that it is less complex. Tradeoffs This algorithm assumes that all the cartridges in the X-Y matrix are enabled in the cluster. Hence, this algorithm cannot scale to discontinuous cartridges

Configuring nodes[edit]

Once the Routing table is generated from running the routing algorithm, each node within the cluster is programmed with its routing table entries, generated by the routing algorithm at the above step. The routing table entries are written to PDSP memory (routing look-up table: number of entries equal to number of destination nodes).

Routing Tables in Action[edit]

Here is an example which shows routing tables in action once the nodes are programmed with their routing tables

Useful SRIO utilities[edit]

A few useful utilities for topology creation, and routing Table generation testing are detailed here. These executables for these utilities come along with the installation of ti-Open MPI and kept in /usr/bin directory.

Building the utilities[edit]

The utilities are already built during the ti-Open MPI installation and kept at /usr/bin directory.

However, these utilities could be built from source as well. The source files for these utilities could be obtained with the source installation of ti-Open MPI using the below command

   apt-get source ti-Open MPI

The source files for these utilities are kept at

 ~/ti-Open MPI-1.0.0.21/ompi/mca/btl/srio/pktfwdK2H/

The utilities can be built by issuing the make command inside this directory.

cd ~/ti-Open MPI-1.0.0.21/ompi/mca/btl/srio/pktfwdK2H/
make all

The executables topologyJson2bin, routingTableGenTest, pktfwdConfig can be found in this directory.

Topology Json to Bin file creation[edit]

This utility is used to parse the .json file and create a .bin file to be used by MPI. The usage is as below

./topologyJson2bin  <topology json file> <topology output bin file>

For example,

./topologyJson2bin  <SRIO_Evm.json> srio_topology.bin

A sample output would look like this

  .....
  .....
  c1n2:  Port[3] --> NONE
Find node connections by coordinates ..
  c1n2:  Port No 0 connected to c1n1
  c1n2:  Port No 0 connected to NONE
  c1n2:  Port No 0 connected to NONE
  c1n2:  Port No 0 connected to NONE
Cartridge Connections
 c1 Connected on Left  to c1 ( 0,0,0), through port 0, via node c1n1
 c1 Connected on Right  to c1 ( 0,0,0), through port 0, via node c1n1
 c1 Connected on Up  to c1 ( 0,0,0), through port 0, via node c1n1
 c1 Connected on Down  to c1 ( 0,0,0), through port 0, via node c1n1
 Writing topology to the bin file srio_evm.bin ...

This utility is used to create the topology bin file in the section [8]

The source files for this utility are kept as

~/ti-Open MPI-1.0.0.21/ompi/mca/btl/srio/pktfwdK2H/utils/topologyJson2bin.c
~/ti-Open MPI-1.0.0.21/ompi/mca/btl/srio/pktfwdK2H/utils/topologyJson2binMain.c
~/ti-Open MPI-1.0.0.21/ompi/mca/btl/srio/pktfwdK2H/CJSON/cJSON.c
~/ti-Open MPI-1.0.0.21/ompi/mca/btl/srio/pktfwdK2H/CJSON/cJSON.h

Routing Table Generation Testing[edit]

This utility is used to run the routing table generation algorithm over a cluster. The cluster is a list of cartridges(mentioned in the .json file) as arguments to this utility. This utility will generate the routing table, performs an exhaustive walk from all nodes to all nodes in this cluster and displays if the routing table algorithm is successful, and if so, the paths taken in the exhaustive walk and the max hop count.

The usage is as below

 ./routingTableGenTest  <srio_topology.bin> cluster <list of cartridge ids>  [OPTIONAL] nodes <node1> <node2> ...

For example,

 ./routingTableGenTest  /etc/cluster/srio_topology.bin c1 c4 c7 c8

The above will test the routing table generation for all the nodes in the cluster defined by cartridges 1 4 7 8 , mentioned in the topology file. A sample output of the above would look like this below,

 Walker Simulation: c8n4 to c1n4 SUCCESS: 7 hops --> c8n4 --> c8n3 --> c8n2 --> c7n2 --> c7n1 --> c4n1 --> c1n1 --> c1n4
 Walker Simulation: c8n4 to c4n1 SUCCESS: 5 hops --> c8n4 --> c8n3 --> c8n2 --> c7n2 --> c7n1 --> c4n1
 Walker Simulation: c8n4 to c4n2 SUCCESS: 6 hops --> c8n4 --> c8n3 --> c8n2 --> c7n2 --> c7n1 --> c4n1 --> c4n2
 Walker Simulation: c8n4 to c4n3 SUCCESS: 7 hops --> c8n4 --> c8n3 --> c8n2 --> c7n2 --> c7n1 --> c4n1 --> c4n2 --> c4n3
 Walker Simulation: c8n4 to c4n4 SUCCESS: 6 hops --> c8n4 --> c8n3 --> c8n2 --> c7n2 --> c7n1 --> c4n1 --> c4n4
 Walker Simulation: c8n4 to c7n1 SUCCESS: 4 hops --> c8n4 --> c8n3 --> c8n2 --> c7n2 --> c7n1
 Walker Simulation: c8n4 to c7n2 SUCCESS: 3 hops --> c8n4 --> c8n3 --> c8n2 --> c7n2
 Walker Simulation: c8n4 to c7n3 SUCCESS: 4 hops --> c8n4 --> c8n3 --> c8n2 --> c7n2 --> c7n3
 Walker Simulation: c8n4 to c7n4 SUCCESS: 5 hops --> c8n4 --> c8n3 --> c8n2 --> c7n2 --> c7n3 --> c7n4
 Walker Simulation: c8n4 to c8n1 SUCCESS: 1 hops --> c8n4 --> c8n1
 Walker Simulation: c8n4 to c8n2 SUCCESS: 2 hops --> c8n4 --> c8n3 --> c8n2
 Walker Simulation: c8n4 to c8n3 SUCCESS: 1 hops --> c8n4 --> c8n3
 Walker Simulation: c8n4 to c8n4 SUCCESS: 0 hops --> c8n4
 ************************* WALKER SIMULATION RESULTS *********************************
 Max Hop count=8, from node c1n3 to node c8n4
 Performed walks = 256
 Failed Walks = 0
 ***************************************************************************************

Alternatively, if the test needs to be performed only on a set of nodes within the cluster, the optional paramter 'nodes' needs to be specified , followed by the list of nodes. i.e

 ./routingTableGenTest  <srio_topology.bin> cluster <list of cartridge ids>  nodes <node1> <node2> ...

For example,

./routingTableGenTest  /etc/cluster/srio_topology.bin cluster c1 c4 c7 c8 nodes c1n1 c8n1 c4n2

This will list the simulation between only those three nodes mentioned above.

The source files for this utility are kept as

~/ti-Open MPI-1.0.0.21/ompi/mca/btl/srio/pktfwdK2H/routingTableGen.c
~/ti-Open MPI-1.0.0.21/ompi/mca/btl/srio/pktfwdK2H/utils/routingTableGenTest.c

Packet Forwarding Configuration[edit]

This utility is run on a K2H node to configure the packet forwarding registers, view the packet forwarding registers or reset them.

For viewing the current packet forwarding registers of a node, run the below command

 ./pktfwdConfig display

For resetting the current packet forwarding registers of a node, run the below command

 ./pktfwdConfig reset

For setting custom values the packet forwarding registers of a node as per the , run the below command

 ./pktfwdConfig custom   <entry1-lo> <entry1-hi> <entry1-routing port>   <entry2-lo> <entry2-hi> <entry2-routing port> ..

where entry1, entry2 etc are the packet forwarding entries, and entry#-lo, entry#-hi, and entry#-routing port are the SRIO ID's lower bound value, upper bound value and the SRIO port to route the packet whose ID lies between the entry#-lo and entry2#-hi For example,

   ./pktfwdConfig custom   10 20 1   21 30 3   31 40 2

The source files for this utility are kept as

~/ti-Open MPI-1.0.0.21/ompi/mca/btl/srio/pktfwdK2H/utils/pktFwdConfig.c

Linux Kernel, U-boot and DTS changes[edit]

This section details how to build the linux kernel, device tree file, uboot etc. This steps are not mandatory and provided here for informative purposes.

Following steps are provided to show component details only. If ti-Open MPI package is obtained from PPA, they are not necessary.

Building the kernel for SRIO[edit]

Clone Linux Kernel[edit]

The SRIO enabled Kernel could be obtained from http://git.ti.com/keystone-linux/linux/commits/v3.8/rio-dev-dio
<syntaxhighlight lang="bash">

$ git clone git://git.ti.com/keystone-linux/linux.git
$ cd linux
$ git checkout v3.8/rio-dev-dio

</syntaxhighlight>

Build Linux Kernel[edit]

$ export CROSS_COMPILE=arm-linux-gnueabihf-
$ export ARCH=arm
$ export PATH=<path to installed toolchain>/bin:$PATH
$ make keystone2_defconfig

</syntaxhighlight>
Make the below modifications in the .config file.

CONFIG_HAS_RAPIDIO=y
CONFIG_RAPIDIO=y
# CONFIG_RAPIDIO_TSI721 is not set
CONFIG_TI_KEYSTONE_RAPIDIO=y
CONFIG_RAPIDIO_DISC_TIMEOUT=200
CONFIG_RAPIDIO_ENABLE_RX_TX_PORTS=y
CONFIG_RAPIDIO_DMA_ENGINE=y
CONFIG_RAPIDIO_DEV=y
CONFIG_RAPIDIO_DEBUG=y
CONFIG_RAPIDIO_ENUM_BASIC=y
CONFIG_RAPIDIO_CHMAN=y
CONFIG_RAPIDIO_DEV_MPORT=y

Also, make the RIONET loadable as a module, by making the below modifications

CONFIG_RIONET=m
CONFIG_RIONET_TX_SIZE=128
CONFIG_RIONET_RX_SIZE=128

Now build the kernel with the below commands

$ make oldconfig
$ make uImage

</syntaxhighlight>

Copy the kernel (uImage) to a tftp repository, which the node can download from during boot.

Device Tree Modifications for SRIO[edit]

Device tree changes are required to

Enable SRIO at bootup
Enable UIO access for SRIO (for use with user space LLD in future)

Enable SRIO at bootup[edit]

A few definitions (below) need to be added to the arch/arm/boot/dts/k2hk-evm.dts file to enable SRIO.

 rapidio: rapidio@2900000 {
                      #address-cells = <1>;
                      #size-cells = <1>;
                      reg = <0x2900000 0x40000  /* rio regs */
                             0x2620000 0x1000   /* boot config regs */
                             0x232c000 0x2000>;	/* serdes config regs */
                      clocks = <&clksrio>;
                      clock-names = "clk_srio";
                      compatible = "ti,keystone-rapidio";
                      dma-coherent;

                      keystone2-serdes;
                      baudrate  = <3>; /* 5 Gbps */
                      path_mode = <0xf>; /* 4 port in 1x */
                      lsu       = <0 0>; /* DIO and maintenance LSUs */

                      tx_channel = "riotx";
                      tx_queue_depth = <256>;

                      ports = <0x1>;      /* bitfield of port(s) to probe.  Port 0 is enabled here */
                      dev-id-size = <0>;  /* RapidIO common transport system
                                             * size.
                                             * 0 - Small size. 8-bit deviceID
                                             *     fields. 256 devices.
                                             * 1 - Large size, 16-bit deviceID
                                             *     fields. 65536 devices.
                                             */
                      interrupts = <0 152 0xf01 0 153 0xf01>; /* RIO and LSU IRQs */
                      port-register-timeout = <90>;

                     pkt-forward = <0xffff 0xffff 0
                                   0xffff 0xffff 0
                                   0xffff 0xffff 0
                                   0xffff 0xffff 0
                                   0xffff 0xffff 0
                                   0xffff 0xffff 0
                                   0xffff 0xffff 0
                                   0xffff 0xffff 0>;
                      num-mboxes = <2>;

                     mbox-0 {
                                rx_channel = "riorx0";
                                rx_queue_depth	= <256 0 0 0>;
                                rx_buffer_size	= <4096 0 0 0>;
                                /*stream_id = <0>;*/
                     };

                     mbox-1 {
                                rx_channel = "riorx1";
                                rx_queue_depth	= <256 0 0 0>;
                                rx_buffer_size	= <4096 0 0 0>;
                                /*stream_id = <1>;*/
                      };

           };

Configuring SRIO ports[edit]

In the above dts entries for SRIO, please note that the entry below configures the SRIO port(s) used by the node.

ports = <0xf>;      /* bitfield of port(s) to probe.  Ports 0,1,2,3 are enabled here */

For enabling UIO[edit]

Insert the below in the k2hk-evm.dts file, along with the other uio- sections, add the below to enable uio for SRIO.

    uio_srio: srio {
               compatible = "ti,uio-module-drv";
               mem  = <0x232C000 0x00002000
    		                0x2900000 0x30000
               	        0x0231a000 0x00002000>;
               clocks=<&clksrio>;
               interrupts = <0 154 0xf01>;
               label = "srio";
          };

Note: To use UIO, After booting is done, UIO module needs to be loaded using the below command <syntaxhighlight lang="bash">

> insmod uio_module_drv.ko

</syntaxhighlight>
With the above two additions to the arch/arm/boot/dts/k2hk-evm.dts file, make the device tree file.
<syntaxhighlight lang="bash">

> make k2hk-evm.dtb

</syntaxhighlight>
In the uboot parameters, we will use the above built k2hk-evm.dtb on N1 & N2, to enable SRIO

Uboot Configuration required for SRIO[edit]

The SRIO kernel requires RIO specific uboot variables to be set such as srio variables, device tree and kernel. The following steps need to be followed

Set SRIO specific uboot args for every node[edit]

The 'args_all' should contain the following two additions for enabling SRIO

rio-scan.static_enum=1 rio-scan.scan=0

for example, at the Uboot prompt,

setenv args_all 'setenv bootargs console=ttyS0,115200n8 rootwait=1 rio-scan.static_enum=1 rio-scan.scan=0'

Point to the newly built SRIO kernel and device tree file[edit]

At the uboot prompt, set the below to point to the SRIO enabled kernel and device tree file.

setenv name_kern uImage
setenv name_fdt 'k2hk-evm.dtb'

Here is an example which uses NFS boot:

setenv serverip 10.218.109.20
setenv gatewayip 10.218.109.1
setenv tftp_root 'mcsdk3_0_2_14'
setenv name_initfs
setenv name_mon 'skern-keystone-evm.bin'
setenv addr_fdt 0x87000000
setenv addr_fs 0x90000000
setenv addr_kern 0x88000000
setenv addr_mon 0x0c5f0000
setenv addr_uinitrd '-'
setenv args_net 'setenv bootargs ${bootargs} rootfstype=nfs root=/dev/nfs rw nfsroot=${serverip}:${nfs_root},${nfs_options}'
setenv boot net
setenv bootargs 'console=ttyS0,115200n8 rootwait=1 rootfstype=nfs root=/dev/nfs rw'
setenv get_mon_net 'dhcp ${addr_mon} ${tftp_root}/${name_mon}'
setenv get_kern_net 'dhcp ${addr_kern} ${tftp_root}/${name_kern}'
setenv init_net 'run args_all args_net'
setenv netmask '255.255.255.0'
setenv nfs_options 'v3,tcp rw ip=dhcp'
setenv nfs_root '/evmk2h_nfs_3_0_0_15'
setenv run_kern 'bootm ${addr_kern} ${addr_uinitrd} ${addr_fdt}'
setenv bootcmd 'run init_${boot} get_fdt_${boot} get_mon_${boot} get_kern_${boot} run_mon run_kern'
setenv get_fdt_net 'dhcp ${addr_fdt} ${tftp_root}/${name_fdt}'
setenv name_kern uImage
setenv args_all 'setenv bootargs console=ttyS0,115200n8 rootwait=1 rio-scan.static_enum=1 rio-scan.scan=0'
setenv name_fdt 'k2hk-evm.dtb'

Also note that, the kernel (uImage) above is the SRIO specific kernel built via step 1. If using NFS as above, make sure to change the NFS and tftp folders accordingly.

Known Issues & Limitations[edit]

These are the known (temporary) limitations in current release (ti_Open MPI_1.0.0.21)

Maximum of 180 nodes: Maximum number of participating (K2H) nodes (per communication world) is 180.
Multiple ranks per SoC are not allowed: Currently only one MPI rank is allowed per SoC.
SRIO interfaces currently run at 3.125gbps: MPI is not expected to change for higher speed (@5Gbps)
RDMA operations: MPI RDMA APIs are not natively supported in this release. Please use "--mca osc pt2pt" option.

FAQ[edit]

{{

switchcategory:MultiCore=

For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article MCSDK HPC 3.x MPI over SRIO here.

Keystone=

For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article MCSDK HPC 3.x MPI over SRIO here.

C2000=For technical support on the C2000 please post your questions on The C2000 Forum. Please post only comments about the article MCSDK HPC 3.x MPI over SRIO here.

DaVinci=For technical support on DaVincoplease post your questions on The DaVinci Forum. Please post only comments about the article MCSDK HPC 3.x MPI over SRIO here.

MSP430=For technical support on MSP430 please post your questions on The MSP430 Forum. Please post only comments about the article MCSDK HPC 3.x MPI over SRIO here.

OMAP35x=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article MCSDK HPC 3.x MPI over SRIO here.

OMAPL1=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article MCSDK HPC 3.x MPI over SRIO here.

MAVRK=For technical support on MAVRK please post your questions on The MAVRK Toolbox Forum. Please post only comments about the article MCSDK HPC 3.x MPI over SRIO here.

For technical support please post your questions at http://e2e.ti.com. Please post only comments about the article MCSDK HPC 3.x MPI over SRIO here.

}}

Links

Amplifiers & Linear
Audio
Broadband RF/IF & Digital Radio
Clocks & Timers
Data Converters

DLP & MEMS
High-Reliability
Interface
Logic
Power Management

Processors

Switches & Multiplexers
Temperature Sensors & Control ICs
Wireless Connectivity

MCSDK HPC 3.x MPI over SRIO

Contents

Introduction[edit]

Terminology[edit]

Topology[edit]

Open MPI over SRIO[edit]

Setup[edit]

Boot time[edit]

Before running MPI (one time)[edit]

In MPI run-time[edit]

Testing MPI over SRIO[edit]

Checklist before Test[edit]

Build the Nbody example[edit]

Run the test[edit]

Some MPI SRIO BTL internal details (ti_Open MPI_1.0.0.21)[edit]

Additional utilities[edit]

Establishing any to any connectivity with SRIO fabric using packet forwarding[edit]

Type 11 Message Routing[edit]

Topology Creation[edit]

Routing Table Generation[edit]

Define Cluster[edit]

Routing Table Algorithm[edit]

Finding Distances between any two cartridges[edit]

Generate Routing Table entries[edit]

Return value[edit]

Current Routing Algorithm Characteristics[edit]

Alternate Routing Algorithms[edit]

Optimized for Entries[edit]

X-Y Algorithm[edit]

Configuring nodes[edit]

Routing Tables in Action[edit]

Useful SRIO utilities[edit]

Building the utilities[edit]

Topology Json to Bin file creation[edit]

Routing Table Generation Testing[edit]

Packet Forwarding Configuration[edit]

Linux Kernel, U-boot and DTS changes[edit]

Building the kernel for SRIO[edit]

Clone Linux Kernel[edit]

Build Linux Kernel[edit]

Device Tree Modifications for SRIO[edit]

Enable SRIO at bootup[edit]

Configuring SRIO ports[edit]

For enabling UIO[edit]

Uboot Configuration required for SRIO[edit]

Set SRIO specific uboot args for every node[edit]

Point to the newly built SRIO kernel and device tree file[edit]

Known Issues & Limitations[edit]

FAQ[edit]

Navigation menu

Search