NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.

Desktop-linux-sdk 01.00.00.02 Development Guide

From Texas Instruments Wiki
Jump to: navigation, search

TIBanner.png


Desktop Linux SDK


Version 1.0.0.2 Alpha Release

Development Guide

Last updated: 09/20/2012


Introduction[edit]

c66x-multicore.jpg

The Desktop Linux SDK provides a software development environment to help offload highly compute intensive processing from a desktop Linux PC to TI C66x Multicore DSPs. Desktop Linux SDK works with Single, Quad and Octal DSP PCIe cards.

Intended audience for this document is the developers who plan to develop applications using the Desktop Linux SDK. This document provides an overview of the Desktop Linux SDK software architecture, SDK release package and different software components/modules in the SDK and walks through the out of box demo supplied with the package. Different Data I/O schemes for exchanging data and control messages between host processor and DSPs are discussed. DSP download, Memory management and PCIe interface, data and control path complexity is abstracted from the developer by providing an intuitive user-mode APIs so that he/she can focus on the application development. The out of box demo can be used as a starting point to develop applications.

Desktop Linux SDK Software Architecture[edit]

As shown in the picture below, Desktop Linux SDK Software has two components. One runs on the Linux Host and the other runs on DSP. DSP and Host Processor share common header files for message interpretation and DSP reserved addresses for host memory management. The sample out of box application provided with the release is a simple single-threaded application intended to demonstrate the initialization procedure, API usage for the SDK components and demonstrates how functionality can be offloaded to DSP.

Desktop Linux SDK Software Architecture

Most of the real-world host applications will be multi-threaded so that DSPs run in parallel while host processor prepares subsequent data chunks for DSP processing and/or consumes processed data from the DSPs. The SDK APIs are thread-safe and can be called from multiple threads concurrently. It is typical to have multiple threads in the user application (one TX and one RX per device) inorder to use all PCIe lanes concurrently. As explained in the Mailbox chapter later in the document, mailboxes are multi-level deep and that allows for queing tasks into DSPs.

Desktop Linux SDK Release Package[edit]

Host Processor Package includes the following modules

  • Buffer Manager
  • Mailbox
  • Download Manager
  • Contiguous Memory Driver
  • PCIe User Space Driver
  • Out of Box Demo application

DSP Package includes the following modules

  • Mailbox
  • CCS Project to compile .out file
  • Demo application Source code to compile the .out file

Developers can use these components as-is in their final application/solution. Following chapters provide more insight into each of these components and the out of box demo application.

Data I/O between Host and DSP[edit]

As functionality is offloaded to the DSP, host processory has to send and receive data to the DSPs for processing. This data exchange between host processor and DSP can be achieved in two ways.

  1. Host processor's data buffer is copied into DSP's DDR before DSP starts processing
  2. DSP picks up the data from host memory buffer directly as the DSP consumes the data (Host memory is made visible to DSP for the life time of processing of that data)

PCIe driver provided in the SDK allows both these mechanisms to do data exchange. In addition, a certain memory region on host memory can be reserved to be mapped permanently to be visible in all the DSP chips/cores. Such mapped memory could be used as "global shared" memory. It is implied that any accesses to the shared memory has to be done by properly acquiring the lock prior to access and release the lock after access. 

It is efficient to use DSP's EDMA to do the data copy between host and DSP. When host processor wants to send/receive data buffer to/from DSP, X86 programs the DSP's EDMA engine (in the demo application, we have reserved EDMA channel controller 0, first few PaRAMs) to initiate the data I/O. In order for the EDMA to work source and destination buffers have to be contiguous in physical memory. DSP's memory (DDR) is anyways contiguous. On the host, it is recommended to allocate physically contiguous buffers (using CMEM module supplied with the SDK) instead of using malloc.

Desktop Linux SDK components[edit]

This chapter provides high level overview of each of the SDK components.

Buffer Manager (bufMgr)[edit]

As developers are aware, a generic malloc/free causes memory fragmentation. In order to circumvent this probelm, buffer manager (bufMgr) is created. bufMgr divides memory into equal sized chunks – so memory can be allocated & freed without any fragmentation. Multiple pools (all chunks in a pool are of same size) needed to emulate malloc/free. As an example, say - we have 3 pools

  • Pool 0: 400 chunks, 32 KB each
  • Pool 1: 520 chunks, 64 KB each
  • Pool 2: 250 chunks, 100 KB each

All requests < 32 KB are allocated from Pool 0 and any request > 100 KB is an error (because there was no pool created that can handle chunks > 100 KB). Each pool has a pool handle and poolHandle needs to be supplied to all bufMgr APIs (create, alloc, free) to do the operation on the appropriate buffer pool. Number of pools, size of a chunk in each pool, # of chunks in the pool are configurable by the application. bufMgr can manage any type of memory (X86, DSP, PCIe etc). Typical application has multiple pools. Another feature of bufMgr worth noting is that Single alloc, Multiple free supported (when there are multiple consumers). A buffer is really freed when all the consumers call free(). # of consumers is supplied during alloc.

There are two ways to create a buffer pool.

  1. Create a buffer pool from discrete array of buffers
  2. Create a buffer pool from contiguous memory (buffer manager internally chunks up that large memory based on supplied chunk size)

Once a pool is created, there is no distinction w.r.t how the pool is created. Memory allocation for the chunks is done outside of bufMgr.

Typical use case is bufMgr pool is created on DSP DDR memory or PCIe memory. Host can allocate a buffer from this pool, fill up the buffer with the data for DSP processing and queue it into DSP for processing. Once DSP finishes processing, the buffer can be recycled.

Mailbox (mailBox)[edit]

Mailbox is used for exchanging control messages between the host and individual DSP cores. As shown in the picture below, a mailbox is uni-directional, either host->DSP or DSP->host. Mailboxes are identified by a unique integer value which is returned after a mailbox is created on the host and opened on the DSP core. There exists a maximum of 2 mailboxes per DSP core (1 mailbox for Host -> DSP messages and 1 mailbox for DSP -> Host messages). Each mailbox has configurable number of nodes which store a pending message.

Mailbox Block diagram

An empty Mailbox node must be allocated prior to sending a message. Receiving a message frees a node and marks it as being empty. Mailboxes can be queried to obtain the number of unread messages within the mailbox.

  • mailBox_init iniitializes a specific mailbox and this API is called by both DSP and Host
  • mailBox_create is called by the host only to create a mailbox
  • mailBox_open is called by a DSP core to open the mailbox
  • mailBox_write is a blocking call until an empty node is found and once node is found, writes a message to the mailbox
  • mailBox_read is a blocking call if there are no messages to be processed. A message is picked up and returned to the application when available
  • mailBox_query obtains the number of unread messages in the mailbox in addition to the number of message read from and written to the mailbox


Download Manager (dnldmgr)[edit]

Download Manager provides APIs to Download and Reset DSPs. On boot up DSP is out of reset and in Idle loop, waiting for entry point to be set. Download manager API downloads code and writes entry point. Init.hex is downloaded and that would initialize DDR and Ethernet switch. init.hex needs to be done prior to download of application code. Once the init code finishes DDR and Ethernet switch initialization, DSP waits for next entry point. At this time, application program must be loaded through download manager API [Only hex file format currently supported]

Reset API allows reset DSP and download and jump to boot code and Boot code waits for entry point to allow new DSP code download. Developers must refer to the download manager APIs and their usage in the demo application.

dnldmgr_reset_dsp API downloads boot code and brings DSP out of reset. The boot code loops for entry point of application to be set.

dnldmgr_load_image API downloads image to memory, writes entry point and interrupts DSP to bring out of IDLE loop

Contiguous Memory Driver[edit]

memcpy based PCIe transfers are very inefficient w.r.t throughput. Hence, it is advisable to use DMA to do data transfer between DSP and host processor. A prerequisite for DMA to function is that memory needs to be contiguous in physical memory. Memory allocated using malloc is not contiguous. Ubuntu Linux 12.04 doesn’t have any user mode APIs to allocate contiguous physical memory and hence need a Kernel mode driver to allocate contiguous physical memory is necessary. This CMEM module basically accomplishes that. There is a kernel module inserted to allocate/free physically contiguous memory. CMEM driver is a user mode module with the APIs that application can invoke.

CMEM Driver supports 4 User mode APIs supported (open, close, alloc and free). In the kernel space, currently dma_coherent_alloc is invoked to allocate contiguous memory. Alloc uses mmap to map the physical memory to user space Address. Application can fill data into the buffers (network or disk read) using user mode address. DMA can be initiated from the same buffer using corresponding physical address

Two types of physical memory can be allocated using the CMEM driver.

  1. Persistent – Always get the same physical memory [Useful during development, when host process exits and restarts, not necessary to reset/re-download DSPs]
  2. Dynamic – Application has to make clean-up calls to free memory while exiting

Persistent memory allocation may be deprecated in future.

As there is no garbage collection in the driver, calling alloc/free with different sizes will eventually result in memory fragmentation. It is hence recommended to allocate all contiguous buffers up-front at the start of process and use buffer Manager to manage the buffers (bufMgr is fragmentation-free memory management system. Refer to bufMgr chapter for more details). At the end of process, free buffers to Kernel module and unload the module.

There is a restriction from Linux that we cannot allocate more than 4 MB of physical contiguous memory. If application needs a buffer > 4MB, multiple CMEM buffers need to be allocated and grouped together in a descriptor. These discontinuous 4MB buffers can be mapped to be visible as contiguous buffers in the DSP address space.

PCIe Userspace Driver[edit]

Ubuntu Linux Kernel has a PCIe driver and hence, there is no need for a separate Kernel module to access PCIe devices. However, a user mode PCIe driver is necessary. Functionality provided by the user mode driver includes:

  • Search and find TI C667x PCI devices in the system
  • Probe the device to activate.
  • Enable communication with DSP through PCIe BAR region windows by mapping to user space
  • Enable inbound and outbound access on DSP through PCIe
  • Initialize resource allocation for DMA resources
  • Support read and write to DSP memory
  • Support transfer of data to DSP memory through DSP DMA
  • Support mapping of Host contiguous memory to DSP memory
  • Set entry point during DSP startup

Filetest Demo[edit]

Simple out of box demo is provided along with the package. The goal for this demo is to achieve two things

  • Demonstrate SDK modules API usage
  • Provide starter kit to jump start offloading of tasks from host processor to DSPs

Filetest demo test sequence is as follows

  • Host Reads data chunks from input file
  • Host Allocates input and output buffers
  • Host Writes data to: Input buffer
  • Host Creates message with Input and output pointers
  • Host Sends message through mailBox to DSP core.
  • DSP receives mailBox message
  • DSP reads data from input and output pointers
  • DSP process data ( Currently does a simple copy)
  • DSP Sends back message with Input and output pointer to Host
  • Host receives message through mailbox
  • Host reads data from output pointer and host writes data to output file
  • Host Frees input and output buffers

The demo test uses a simple scheduler, which sends data to all the configured cores in a round robin fashion.

The demo has a command line flag to select type of transfer to be used to and from DSP.

  1. Mem copy test: Input and output buffers are allocated from DSP DDR memory pool. Data is loaded into DDR (using PCIe DMA APIs) before DSP starts processing
  2. Dsp map test: Input and output buffers allocated from Host Contiguous memory pool. Passing of data to DSP is by mapping contiguous memory to DSP memory map

The mailbox Message would contain the pointer to the input and output buffers. The pointer would either point to DDR space or PCIe space visible from the DSP

Related Documents[edit]

Technical Support and Product Updates[edit]

For technical discussions and issues, please visit

NoteNote: When asking for help in the forum you should tag your posts in the Subject with “DESKTOP-LINUX-SDK” and the part number (e.g. “C6678”)

E2e.jpg {{
  1. switchcategory:MultiCore=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article Desktop-linux-sdk 01.00.00.02 Development Guide here.

Keystone=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article Desktop-linux-sdk 01.00.00.02 Development Guide here.

C2000=For technical support on the C2000 please post your questions on The C2000 Forum. Please post only comments about the article Desktop-linux-sdk 01.00.00.02 Development Guide here. DaVinci=For technical support on DaVincoplease post your questions on The DaVinci Forum. Please post only comments about the article Desktop-linux-sdk 01.00.00.02 Development Guide here. MSP430=For technical support on MSP430 please post your questions on The MSP430 Forum. Please post only comments about the article Desktop-linux-sdk 01.00.00.02 Development Guide here. OMAP35x=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article Desktop-linux-sdk 01.00.00.02 Development Guide here. OMAPL1=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article Desktop-linux-sdk 01.00.00.02 Development Guide here. MAVRK=For technical support on MAVRK please post your questions on The MAVRK Toolbox Forum. Please post only comments about the article Desktop-linux-sdk 01.00.00.02 Development Guide here. For technical support please post your questions at http://e2e.ti.com. Please post only comments about the article Desktop-linux-sdk 01.00.00.02 Development Guide here.

}}

Hyperlink blue.png Links

Amplifiers & Linear
Audio
Broadband RF/IF & Digital Radio
Clocks & Timers
Data Converters

DLP & MEMS
High-Reliability
Interface
Logic
Power Management

Processors

Switches & Multiplexers
Temperature Sensors & Control ICs
Wireless Connectivity