NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.

User:Renuka.parlapalli

From Texas Instruments Wiki
Jump to: navigation, search

Overview

Texas Instruments' Medical Imaging Demo Application Starter (MIDAS) illustrates the integration of key medical imaging algorithm modules on Texas Instruments (TI) DSPs and System-on-Chips. The MIDAS Optical Coherence Tomogprahy (OCT) v1.0 demo illustrates a system level implementation of midend and backend processing of OCT signal processing on a homogenous C6472 six-core Multicore TI DSP and a PC. We use the TI C6472 Low-cost EVM interfaced via an Ethernet-Ethernet connection to the PC. The TMS320C6472 EVM consists of six high-performance C64x+ DSP cores, each of which can run up to a speed of 700 MHz and approximately 6 watts of power consumption, held together by a high speed switch fabric. The C6472 provides 768 KB of shared L2 memory and provides an internal mechanism for sharing data amongst cores. The on-chip DMA Engine allows automated movement of data between peripherals and memory. The device also contains high speed I/O ports such as SRIO and Gigabit Ethernet. This in integration with the PC provide a power-efficient solution to handle both system controller and back-end processing functions in diagnostic OCT systems for a fraction of the power needed by Graphics Processing Units (GPU's).


Figure 1, showcases the function of the cores on the C6472 and the PC. This version incorporates the Optical Coherence Tomography(OCT) v1.0 demo which integrates components from the OCT signal chain including Background Subtraction, Re-sampling, Real-to-Complex FFT, Magnitude Computation and Log Compression from the Embedded Processor Software Toolkit for Medical Imaging Applications v2.0, on the TMS320C6472 EVM.

OCTDemo FunctionalOverview.png

This document includes various aspects of the demo, including a discussion on the software design, step by step instructions to obtain the source code. and setup and development environment to build and run the demo.

Software Implementation
[edit]

This section discusses the software implementation for OCT v1.0 and showcases how TI's software components including Multicore Software Development Kit (MCSDK), Codec Engine (CE), Digital Media Application Interface (DMAI) and iUniversal APIs can be leveraged by developers to create applications for such systems.

The block diagram showcases the TI production software components that the Multicore DSP application relies on.

OCTDemoMCDSPComponents.png

TI's SYS/BIOS 6.x is highly configurable, real-time operating system that caters to a variety of embedded processors and is included as part of TI's Code Composer Studio integrated development environment. SYS/BIOS provides some key features that enable easy memory management, preemptive multitasking and real-time analysis. Based on the application's requirements, developers can optimize their final runtime image by including/excluding specific SYS/BIOS modules.

The Multicore Software Development Kit (MCSDK) includes key components that ease multicore development including the chip support library, low level drivers, platform software (PDK), Network Developer's Kit (NDK), etc. The Codec Engine, which we describe in more detail later provides a framework and API's to easily plug-and-play algoritms, and handle Inter-Processor Communication (IPC) under the hood.

Multicore DSP (Mid End)
[edit]

The software application that runs on C6472 is based on a Master/Slave model, where Core 0 acts as the centralized controlling core aka the Master core, and Core 1 to Core 5 act as the Slave cores. Note that though in this demo we utilize all six cores to demonstrate our use case. Core 0 which serves as the master takes care of the synchronization and sets up buffer pointers. The signal processing modules are run across Core 1 - Core 5. Note however that even though distribution is done statically, the assignment of algorithms to cores is done outside of the main application. This allows easy reconfiguration and at the application sotware level, the developer can be agnostic to which core is running which algorithm.

Functional Overview[edit]

The software application on C6472's Core 0 is designed to integrate the following functional blocks: Front End Interface, Mid End Controller, Mid End Processing and Back End Interface.

OCT Demo Application Design.png

Mid End Controller
[edit]

As the name suggests, the Mid End Controller is responsible for initializing and initiating the other blocks.

Front End Interface[edit]

The Front End Interface serves two primary functions: it provides periodic events that mark the availability of incoming input data and it provides functions to access this data that has arrived. Since this version of the demo, OCT v1.0 showcases processing blocks post data acquisition, and has no front-end implementation, it becomes necessary to mimic the function of the OCT front-end as in a real system, where input frames would be continuously received at a set acquisition frame rate. The Front End Interface serves this role, where in it fires an INPUT_RDY event every (1/acquisition rate) seconds. The clock ticks are derived from the SYS/BIOS Timer module. Note here that though in this design we use a frame-based processing model, where the frame boundary defines the input block size, it is also possible to have partial frames as input boundaries.

Mid End Processing[edit]

The Mid End Processing function block is pending on the INPUT_RDY event from the Front End Interface and as soon as a new frame is “received,” it initiates processing on that input block. Mid End Processing acts as the Client in the Codec Engine (CE) framework and uses the iUniversal interface to call upon Algorithm Servers that correspond to various functions within the ultrasound midend processing signal chain. In this implementation, there are five algorithm servers implemented, one for each core (Core 1 through Core 5).

Let us now look at the primary execution threads that define the data flow through the Mid End Processing block. The figure below shows three primary tasks that utilize the MessageQ IPC module for message passing. The messages provide pointers to the data and trigger the execution of tasks in the receiving functions. The actual message buffer is setup in shared memory that both the message sender and the message receiver can access. In this case, the process_scatter() pends on a new input frame. When data becomes available, the process_scatter() allocates memory for the message from heap, and assigns the message pointer to the input data. Using the MessageQ_put blocking call, the process_scatter() passes the input data pointers to the process_gather() The process_gather()use the corresponding oct_wait calls to pend on the signal processing in each core. OncOnce a new frame is received, these tasks call the UNIVERSAL_process API provided by the iUniversal interface, to invoke the oct c1 throughoct c5 processing algorithms on Core 1 through Core 5 respectively. Once the data is processed, the octCluster() use the MessageQ_put API to pass the output data pointers to the Send_data_to_host(). The Send_data_to_host() ensures that data atomicity is maintained, so that input data that corresponds to a particular frame always stays together.

Note that the control tasks that involve data scattering and gathering, viz. process_scatter() and Send_data_2_host() are running on Core 0, the Master core, and the processing tasks, viz. process_gather()' and is running on Core 1 and Core 2, respectively. It is important to note here that the IPC between cores is handled under the hood via CE and by using MessageQ, the developer is agnostic to this fact and can simply call the MessageQ APIs for passing data pointers between cores.


OCTMCDistributedProcessing.jpg


Back End Interface[edit]

Once the output data is ready for display, it is time to pass the data to the PC that handles the backend processing and display. The Mid End application's Back End Interface block provides functions to communicate with the PC back end. For this example implementation, we use the ethernet ports on both device EVMs to interface the two together. To implement the communication protocol in software, we use RDSP, an application written on top of the MCSDK's Network Development Kit (NDK), that allows easy passing of data and parameters between the C6472 and the PC.

CE Implementation Details[edit]

To better understand this, we delve into a brief discussion on CE. The CE framework is essentially a set of APIs used to instantiate and run XDAIS-compliant algorithms. XDAIS is an algorithm standard that DSP programmers should follow to ensure that their algorithms easily plug-and-play with other algorithms and can be called using CE APIs. CE requires two essential components to operate in tandem: a CE Client and a CE Algorithm Server. In this demo, the master core, Core 0, serves as the CE Client and uses CE APIs to make “remote procedure calls” to CE Algorithm Server executables that reside on DSP Core 1 through Core 5. Essentially, the CE Algorithm Server combines the core codec (Each core runs the same code on a different set of contiguous lines of the input data) along with the other infrastructure pieces (SYS/BIOS, IPC, etc) to produce an executable (.x64P) that is callable by Core 0, the CE Client. The application on Core 0 invokes the remote algorithms on Core 1 and Core 2 using the iUniversal interface, which is a set of APIs used to provide an easy way for XDAIS-compliant, non-VISA (Video, Image, Speech, Audio) algorithms to run using CE

CE provides some unique features that significantly eases the multicore DSP software development process. One of the primary advantages is that CE eliminates the need for the developer to manually code any Inter-Processor Communication (IPC) details. Once the developer configures memory for IPC, CE takes care of the rest of IPC under the hood. This is illustrated later in this section with code snippets. CE also captures some key TI hardware features, where resource management for memory and EDMA are done via CE. CE also enables code reuse and faster time to market since applications can be easily ported from TI's current generation C64x+ based multicore DSPs like the C6472 to next-generation devices based on the C66x architecture like the C6678. For more information on CE and C66x devices please click on the relevant links in the References section.

Back End[edit]

This section showcases the software design on the PC for backend processing and display

GForge Project Page
[edit]

We use the GForge portal to provide access to source files and other related information. The portal also allows us to collaborate with the community.

Our MIDAS project page is located at https://gforge.ti.com/gf/project/med_opticalcoherencetomography/

Hardware
[edit]

Requirements
[edit]

If building the demo from source:

  • PC workstation
  • TMS320C6472 EVM
  • Power Supply cable for EVM(included with EVM)
  • Ethernet cable (included with EVM)
  • USB560 JTAG Emulator
  • Power Supply cable for Emulator(included with Emulator)

Procedure
[edit]

1. Connect ethernet cable from Tomahawk's lower ethernet port to PC ethernet port
2. Connect USB560 JTAG Emulator to port on C6472
3. Ensure that wireless connection is disabled on the PC

Software Setup
[edit]

Build Process Overview[edit]

Here's an overview of the steps you will take to setup your development environment and build OCT Demo from source.

1. Download and Install OpenCV (version 2.2)

2. Download and Install Tftpd32 application

2. PC IP has to be statically set. Go to Network Connections -> Right click on properties of Wired Network Connection -> Scroll to Internet Protocol (TCP/IP) and set IP address to 192.168.1.99 and Subnet Mask to 255.255.255.0

3. Environment Variables

Variable name: OpenCVDir Variable value: Point to OpenCv Installation

Variable Name: DSPLIB_INSTALL_DIR Variable value: Point to DSPLib Installation

4. Make C6472, Emulator connections and Launch Code Composer Studio.

5. Launch the TI Debugger and connect to Target

6. Import all module libraries (fft, iir, interp, utilCEPkg under miAlgos)

-> Build all libraries

7. Import server projects - server 1, server 2, server 3, server 4, server 5

-> Unzip server#/plat6472.zip in server# -> Build corresponding server #

8. Import projects midendappOct

-> Build octlib. Perform gmake from the octlib directory to create octcore package -> Build all module libraries (fft, iir, interp, utilCEPkg under miAlgos) -> Unzip platformRepo/plat6472.zip in platformRepo/ directory. -> Import project serverlib and build serverlib (This code creates a library based on RDSP for linking the DSP code. The midendapps running on core0 needs this library) -> Build midendappOct

9. Load midendappOct.out in core0 and server1.out through server5.out in cores 1 through core 5.

10. Backend build instructions

-> Compile and run the octDemo.sln file through MSCS. Currently copilation tested on MSVS express edition 2008 -> Point the tftdp server to the oct data location -> Run the octDemo.exe file from command prompt. User choses from 7 data sets and hits enter

11. Output image will be displayed on the screen. Command prompt indicates loading on each core.

Run OCT Demo[edit]

Using Custom Input Data Sets[edit]

Software Design[edit]

Overview[edit]

Benchmarks[edit]

1. Cycles consumed to configure and obtain parameters for OCTDemo -> 28390333 => 40 ms

2. To process 8 lines worth of data of 2048 samples per line - Number of process cycles 380159 => 0.543 seconds to process 8 lines

3. The OCT Demo is implemented such that at a given instance only 200 x 2048 lines are processed by each core - Number of process cycles 8479293 => 12ms

/* To be redone in Tabular format */

Useful References[edit]