NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.
MCSDK HPC 3.x Trouble shooting
HPC (High Performance Computing) Development Tools for MCSDK
Version 3.0
Trouble Shooting
Last updated: 07/28/2014
Contents
Frequently Asked Questions[edit]
Listed below are some Frequently Asked Questions. Please click on the "Expand" link adjacent to any question to see the answer.
- Can I run OpenMPI demos on a single K2H node?
- Remove k2hnode2 from "-host k2hnode1,k2hnode2"
- If demo uses OpenCL (such as the fftdemos), please specify number of processes as 1 (i.e change "-np 2" to "-np 1")
- I see the error "CMEM Error: init: Failed to open /dev/cmem: 'No such file or directory'" when I try to run one of the demos
- How can I check if CMEM module is properly inserted?
Also, check if the file "/etc/modules-load.d/cmem.conf" exists on the EVM filesystem. If not, you either missed "make install" or the "make install" step didn't succeeddmesg | grep -A 10 CMEMK
==> this will show if Kernel attempted to insert CMEMKcat /dev/mtd1 | grep mem_reserve
==> This should read 1536M (if not, please change uboot variable)cat /dev/mtd1 | grep ddr3a_size
==> Needs to be set if user has >2 GB Memory
- Do I need sudo access?
- Can two users simultaneously use one K2H node? What are the restrictions?
- I have the environment variables set properly. However, when I run make, I get errors as if environment variables are not set
- Hyperlink shows “failed” for some steps. Is this as expected?
Yes. This is expected, since probing of certain Hyperlink interfaces failed. This means Hyperlink initialization failed since remote peer is not opening that particular interface. For example, using node1 and node4 to run testmpi can have console display as follows:
<syntaxhighlight lang="bash"> [fpc1n1:02505] fpc1n1, hyplnk0 attempt opening [fpc1n4:02529] fpc1n4, hyplnk1 attempt opening Timeout exceeded for connection, exiting... Peripheral setup failed, retVal = 3 Peripheral init failed for: arm-remote-hyplnk-0 hyplnk open failed [fpc1n1:02505] fpc1n1, hyplnk0 open failed [fpc1n1:02505] fpc1n1, hyplnk1 attempt opening Timeout exceeded for connection, exiting... Peripheral setup failed, retVal = 3 Peripheral init failed for: arm-remote-hyplnk-1 hyplnk open failed [fpc1n4:02529] fpc1n4, hyplnk1 open failed [fpc1n4:02529] fpc1n4, hyplnk0 attempt opening [fpc1n1:02505] fpc1n1, hyplnk0=(nil) hyplnk1=0x69570 [fpc1n1:02505] fpc1n1, HLINK turned ON !!! [fpc1n4:02529] fpc1n4, hyplnk0=0x69548 hyplnk1=(nil) [fpc1n4:02529] fpc1n4, HLINK turned ON !!! Hello world from processor fpc1n4, rank 1 out of 2 processors locally obtained hostname fpc1n4 Hello world from processor fpc1n1, rank 0 out of 2 processors locally obtained hostname fpc1n1 </syntaxhighlight>
In this example, hyplnk0 opening failed on node1, since node2 (connected via its port1 to port0 of node1) is not started. The same happens for hyplnk1 on node 4 that is connected to hyplnk1 of node 3 that is also not activated. But hyplnk1 on node1 and hyplnk0 on node4 succeeded – as they are connected to each other and both ports are activated.
- New Question Template