NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.

PDK/PDK TDA Datasheet

From Texas Instruments Wiki
Jump to: navigation, search
Pdk tda home page.png

PDK Drivers[edit]

This section provides brief overview of the device drivers supported in PDK release.

PDK Driver Features[edit]

For details on features, refer to PDK_Requirement_to_Test_Traceability_Report.xlsx under <PDK_INSTALL>/docs/traceability folder.

VPS Driver VPDMA List Usage[edit]

In TDA2xx/TDA2Ex/TDA3xx, each VIP and VPE has a separate VPDMA instance. And each VPDMA in turn has 8 lists:

VPDMA usage
Driver DMA usage
VIP Capture One list per port. Hence max 4 list per VIP (Slice0/1 x PortA/B)
M2M VPE (only for TDA2xx/TDA2Ex/TDA2Px) Only one list for VPE1



Setup Details[edit]

Setup Details

Details TDA2xx/TDA2Ex/TDA2Px TDA3xx
SoC Details Core IPU1 (M4) core 0 IPU1 (M4) core 0
Operating speed of Core 212.5 MHz 212.5 MHz
Operating speed of VPE 266 Mpixels/sec NA
EVM Configuration TDA2xx: 2 EMIFs Non-Interleaved, DDR3 @ 532MHz
TDA2Ex/TDA2Px: 1 EMIFs Non-Interleaved, DDR3 @ 666MHz
1 EMIFs Non-Interleaved, DDR3 @ 532MHz
Optimization Details Is the Ducati cache enabled? Yes Yes
Profile release release
M4 compile options (release build) -g -ms -c -qq -pdsw225 --endian=little -mv7M4 --float_support=vfplib --abi=eabi -eo.oem4 -ea.sem4 --symdebug:dwarf --embed_inline_assembly --emit_warnings_as_errors Same as TDA2xx
M4 Linker options (release build) --emit_warnings_as_errors -w -q -u _c_int00 --silicon_version=7M4 -c -x --zero_init=on Same as TDA2xx
DSP Compile options (release build) -mv6600 --abi=eabi -q -mi10 -mo -pden -pds=238 -pds=880 -pds1110 --program_level_compile -g --endian=little -eo.oe66 -ea.se66 --emit_warnings_as_errors Same as TDA2xx
DSP Linker options (release build) --emit_warnings_as_errors --warn_sections -q -e=_c_int00 --silicon_version=6600 -c Same as TDA2xx
Is the code and data placed in L2/L3 memory? No No
Is the L3 interconnect optimized? No No


Resources Details[edit]

Resource usage
Details TDA2xx/TDA2Ex/TDA2Px TDA3xx
Timers M4 Internal timer M4 Internal timer
HWI IPU1_23 (DSS DISPC), IPU1_26 (HDMI_IRQ)
IPU1_27 (VIP1), IPU1_28 (VIP2), IPU1_29 (VIP3)
IPU1_30 (VPE1)
IPU1_41 (I2C1), IPU1_42 (I2C2 on TDA2xx, I2C5 on TDA2Ex), IPU1_43 (I2C3), IPU1_48 (I2C4 on TDA2xx/TDA2Ex, I2C5 - only on TDA2xx-MC)
IPU1_57 (MCSPI1), IPU1_58 (MCSPI2) IPU1_59 (MCSPI3), IPU1_60 (MCSPI4)
IPU1_44 (UART1), IPU1_60 (UART2), IPU1_45 (UART3), IPU1_61 (UART4), IPU1_62 (UART5), IPU1_63 (UART6), IPU1_64 (UART7), IPU1_65 (UART8), IPU1_69 (UART9), IPU1_70 (UART10)

IPU1_23 (DSS DISPC), IPU1_27 (VIP1),
IPU1_41 (I2C1), IPU1_48 (I2C4), IPU1_42 (I2C5)
IPU1_64 (MCSPI1), IPU1_65 (MCSPI2),
IPU1_48 (MCSPI3), IPU1_49 (MCSPI4)
IPU1_44 (UART1), IPU1_43 (UART2), IPU1_45 (UART3)

Low Latency HWI (This cant be preempted or disabled using Hwi_disable() BIOS API) NA NA
I2C Instances (Starting from 1) I2C1, I2C2, I2C5(for TDA2Ex) (Usage can be controlled from App) I2C1, I2C2 (Usage can be controlled from App)
EDMA Channels UART1 (TX-48, RX-49), UART2 (TX-50, RX-51), UART3 (TX-52, RX-53), UART4 (TX-54, RX-55), UART5 (TX-62, RX-63), UART6 (TX-50, RX-51), UART7 (TX-50, RX-51), UART8 (TX-50, RX-51), UART9 (TX-50, RX-51), UART10 (TX-50, RX-51)
MCSPI1TX - 34, MCSPI1RX - 35, MCSPI2TX - 42, MCSPI2RX - 43, MCSPI3TX - 14, MCSPI3RX - 15, MCSPI4TX - 22, MCSPI4RX - 23 (TDA2XX Instance starting from 1)
UART1 (TX-48, RX-49), UART2 (TX-50, RX-51), UART3 (TX-52, RX-53)
MCSPI1TX - 34, MCSPI1RX - 35, MCSPI2TX - 42, MCSPI2RX - 43, MCSPI3TX - 14, MCSPI3RX - 15, MCSPI4TX - 22, MCSPI4RX - 23 (TDA3XX Instance starting from 1)
PLLs Used Video1_PLL and HDMI_PLL (All video PLLs configured according to display resolution selected) DSP_EVE_VID_PLL (configured according to display resolution selected)
PRCM Done PRCM Done None (all through GEL file/SBL)
GPIO GPIO4_13, GPIO4_14, GPIO4_15, GPIO4_16 and GPIO6_17 to control video mux select and sensor power on vision application card
GPIO2_29, GPIO1_4, GPIO6_7 acts as Demux_FPD_A/B/C control signals in LVDS multi-deserializer board.
None
PinMuxing Details (Usage can be controlled from App) See TDA2xx pdk/packages/ti/drv/vps/src/boards file for details See TDA3xx pdk/packages/ti/drv/vps/src/boards file for details
Memory Requirements (Cache able) See pdk/docs/memstat/tda2xx file for details See pdk/docs/memstat/tda3xx file for details
Memory Requirements (Non Cache able) VIP/VPE Descriptor memory, see Memory Footprint table below VIP Descriptor memory, see Memory Footprint table below
SWI 1 per UART instance in case of DMA or Interrupt mode to handle UART RX/TX ISR 1 per UART instance in case of DMA or Interrupt mode to handle UART RX/TX ISR
Tasks 1 (highest priority) 1 (highest priority)




Memory Footprint[edit]

For details on library code and data section, refer to PDK memstat under <PDK_INSTALL>/docs/memstat folder. Below lists the dynamic memory requirement.

TDA2xx Memory Footprint in bytes (Dynamic Heap memories)
Use Case or Example System Stack (Cached section) Task Stack (Cached section) OSAL Objects (Cached section) VPDMA Descriptor Heap (Non-cached section)
Loopback Example (VIP-DSS) 1316 1764 61 Semaphore, 9 HWI 722880 (Static)
M2M VPE Example 404 1344 33 Semaphore, 5 HWI 722880 (Static)


TDA2Px Memory Footprint in bytes (Dynamic Heap memories)
Use Case or Example System Stack (Cached section) Task Stack (Cached section) OSAL Objects (Cached section) VPDMA Descriptor Heap (Non-cached section)
Loopback Example (VIP-DSS) 1236 1988 69 Semaphore, 11 HWI 722880 (Static)
M2M VPE Example 756 2574 68 Semaphore, 11 HWI 722880 (Static)


TDA2Ex Memory Footprint in bytes (Dynamic Heap memories)
Use Case or Example System Stack (Cached section) Task Stack (Cached section) OSAL Objects (Cached section) VPDMA Descriptor Heap (Non-cached section)
Loopback Example (VIP-DSS) 1220 2012 59 Semaphore, 7 HWI 182208 (Static)
M2M VPE Example 404 2104 59 Semaphore, 7 HWI 182208 (Static)


TDA3xx Memory Footprint in bytes (Dynamic Heap memories)
Use Case or Example System Stack (Cached section) Task Stack (Cached section) OSAL Objects (Cached section) VPDMA Descriptor Heap (Non-cached section)
Loopback Example (VIP-DSS) 1328 1764 52 Semaphore, 5 HWI 108544 (Static)


Software Performance Numbers[edit]

SETUP
Profile Clock (MHz) - CTM 425
Platform TDA2XX ES1.0/ES1.1
M4 Clock (MHz) 212.5
Cache Enabled
Build Release
DDR3 (MHz) 532


Summary
Summary FPS Load Mhz
VIP Capture Driver Load (1 Channel 720p60 capture) 60 0.25% 0.53
VPE M2M Driver (1 Channel 720x240 YUV420SP to 360x240 YUV422I, DEI ON) 30 0.32% 0.68
DSS Display Driver (1 Video Pipe @720p60 display) 60 0.11% 0.23


VIP Capture Driver Performance
VIP Capture Driver
(1 Channel 720p60 capture)
Average Max
Ticks Duration
(in us)
Ticks Duration
(in us)
M3 Load per frame (Including App Q/DQ) 16664 41.66 32020 80.05
Queue 2637 6.59 6038 15.10
DeQueue 2441 6.10 5646 14.12


VPE M2M Driver Performance
VPE M2M Driver
(1 Channel 720x240 YUV420SP to 360x240 YUV422I, DEI ON)
Average Max
Ticks Duration
(in us)
Ticks Duration
(in us)
M3 Load per frame (Including App Q/DQ) 42831 107.08 73072 182.68
Queue 32046 80.12 48642 121.61
DeQueue 2416 5.37 12708 31.77


DSS Display Driver Performance
DSS Display Driver
(1 Video Pipe @720p60 display)
Average Max
Ticks Duration
(in us)
Ticks Duration
(in us)
M3 Load per frame (Including App Q/DQ) 47339 18.35 14942 37.36
Queue 1528 3.82 2800 7.00
DeQueue 1341 3.35 3692 9.23


VIP Capture to DSS Display Glass-to-Glass Latency Numbers[edit]

Setup Details

  • TDA2xx EVM running the default video loopback application from OV Sensor->VIP->DSS->LCD
  • OV Sensor is pointing to another monitor displaying millisecond counter running at 60 Hz
  • Both the LCD image and original monitor are captured at the same time side by side using another digital still camera
  • Glass to glass latency is then calculated by taking the difference in time in the LCD and monitor

With this method, it is observed that the glass to glass VIP to DSS latency is measured to vary from 44ms to 66ms.

The explanation and the split-up for the above observation is as below

  • Capture is happening at 30 FPS. This will have a 33.33 ms latency because of end of frame callback is used to trigger the display
  • Display is running at 60 FPS. Since capture VSYNC and display VSYNCs are not synchronized, the latency can vary from 0 – 16.66 ms. Also since the display FPS is more than capture, the display will repeat the frame resulting in another possible 0 - 16.66 ms latency difference
  • Also since this measurement is done by capturing PC monitor which is also running at 60 FPS, that could also introduce some more latency from 0 – 16.66 ms because of quantization error (i.e. counter can’t display any time granular than 16.66 ms)
  • Also the sensor and LCD latency should be considered, which looks like is negligible from the measured and theoretical calculations as above

Video Display Driver[edit]

This section describes the display drivers performance numbers - throughput and CPU load. Display drivers takes the video buffers from the application and display the videos on HDMI/LCD at specified frame rate and resolution. Display drivers follows the FVID2 interface.

Video 1,2,3 and Graphics 1 Display Driver[edit]

Setup Details

  • TDA2xx/TDA2Ex/TDA2Px EVM & TFC-S9700RTWV35TR-01 800x480 LCD from ThreeFive Corp
  • TDA3xx EVM & LG LP101WX2 1280x800 LCD
Video Display performance values
Output Display
(Resolution)
TDA2xx/TDA2Ex/TDA2Px (IPU1 Core0) TDA3xx (IPU1 Core0)
Frame Rate
(in Frames/sec)
CPU Load
(in %)
Frame Rate
(in Frames/sec)
CPU Load
(in %)
On/Off-Chip HDMI 60 FPS (on-Chip HDMI) 1% 60 FPS (Off-Chip HDMI) 1%
LCD 60 FPS 1% 60 FPS 1%



Buffer Queue Latency[edit]

Driver latency to program the buffer to DSS = code execution time from APP queue to programming (T1) + 5 line of display rate (T2). With TDA2XX EVM, T1 is measured to be around 20 micro seconds.

Value of T2 for different resolution
Display Resolution T2 in micro seconds
800x480@60fps 158.25
1280X720@60fps 107.74
1920X1080@60fps 74.07

The total latency comes around 180 us for 800x480 @ 60 FPS display. So if any buffer is queued 180 us before the Vsync then the buffer will be displayed in the next frame period.

Note: This measurement is done with the stand alone display application.In fully loaded system the interrupt latency will add to it.
Reason for 5 lines check: This check is required so that the driver won't program the buffer address around the display VSYNC period. Doing so would result in DSS HW not accepting the programmed buffer resulting in frame drop.



DSS M2M Writeback Driver[edit]

This section describes the DSS M2M writeback driver performance numbers - throughput and CPU load. DSS M2M writeback driver takes input video buffers from the application and writes the scaled/color converted output to memory via the writeback path. Below table shows the DSS M2M driver performance @ the DSS functional clock of 192 MHz

DSS M2M Writeback performance values
Resolution TDA2xx (IPU1 Core0) TDA2Px (IPU1 Core0) TDA2Ex (IPU1 Core0) TDA3xx (IPU1 Core0)
Max Frames per Sec
Mega Pixels per Sec
Hardware Utilization
CPU Load
(in %)
Max Frames per Sec
Mega Pixels per Sec
Hardware Utilization
CPU Load
(in %)
Max Frames per Sec
Mega Pixels per Sec
Hardware Utilization
CPU Load
(in %)
Max Frames per Sec
Mega Pixels per Sec
Hardware Utilization
CPU Load
(in %)
VID1 1280x720 RGB888 to YUV422I 480P 205 188 MP/s 98% 3% 205 188 MP/s 98% 3% 205 188 MP/s 98% 2% 206 190 MP/s 99% 2%
VID2 1280x720 RGB565 to 1280x720 YUYV422I 205 188 MP/s 98% 7% 205 188 MP/s 98% 9% 205 188 MP/s 98% 2% 206 189 MP/s 98% 2%
VID1 1280x720 YUV422I to 1920X1080 RGB888 91 188 MP/s 98% 1% 91 188 MP/s 98% 5% 91 188 MP/s 98% 1% 91 188 MP/s 98% 5%
VID1 1920X1080 YUV422I to 1920X1080 RGB565 91 188 MP/s 98% 1% 91 188 MP/s 98% 5% 91 188 MP/s 98% 5% 91 188 MP/s 98% 4%
VID1 1920X1080 YUV420SP to 1280x720 YUV422I 91 188 MP/s 98% 2% 91 188 MP/s 98% 4% 91 188 MP/s 98% 4% 91 188 MP/s 98% 1%
VID3 1920X1080 RGB888 to 1920X1080 YUV422I 91 188 MP/s 98% 1% 91 188 MP/s 98% 4% 91 188 MP/s 98% 1% NA NA NA NA

Calculating Performance for DSS M2M Writeback Driver[edit]

This section explains how to calculate the theoretical performance when more than one pipeline (with scaling) is used to overlay and then written back as shown in below picture. Below are few main rules or points

  • Count the clock cycles required based on Overlay output rather than any input pipeline fetch. For that effect, split the overlay into various sections and count the pixels separately. Note that, if any upscale is done inside the WB, then the WB output needs to be used for the calculations (as it will be bigger than the overlay output)
  • In TDA3xx, the use-case shown is then just a 720p without any additional overhead (There are minor overheads related to VID DMA pre-fetch and WB DMA flush, which is captured in the performance table)
  • In TDA2xx, TDA2Px and TDA2Ex, there is a horizontal downscaling limitation, whereby when downscaling by N, the VID pipe output (or the overlay input gets) 1 pixel only every N clock cycles. This causes performance difference between TDA2xx and TDA3xx as shown in table. If downscaling is not done in the VID pipelines, then the results would be same between TDA2xx and TDA3xx



PDK DSS M2M Performance With Overlay.jpg

DSS M2M Performance Setup/Input Information
Inputs Value
Overlay Width 1280
Overlay Height 720
VID1 Input Width 1280
VID1 Input Height 720
VID2 Input Width 1280
VID2 Input Height 720
VID1 Output Width 640
VID1 Output Height 480
VID2 Output Width 640
VID2 Output Height 480



DSS M2M Performance Calculation - TDA2xx
Performance Section Split Width in Pixels (W) Height in Lines (H) Downscaling Factor (S) Required DSS Cycles (W x H x S)
VID DMA Prefetch (worst-case) 2048 8 1 16,384
OVR: Section 1 - Top Blank 1280 120 1 153,600
OVR: Section 2 - Bottom Blank 1280 120 1 153,600
OVR: Section 3 - VID1 640 480 2 614,400
OVR: Section 4 - VID2 640 480 2 614,400
WB DMA Flush (worst-case) 2048 8 1 16,384
Total Cycles per Frame - - - 1,568,768
Theoretical FPS (DSS Functional Clock 192MHz/Total Cycles) - - - 122 FPS



DSS M2M Performance Calculation - TDA3xx
Performance Section Split Width in Pixels (W) Height in Lines (H) Downscaling Factor (S) Required DSS Cycles (W x H x S)
VID DMA Prefetch (worst-case) 2048 8 1 16,384
OVR: Section 1 - Top Blank 1280 120 1 153,600
OVR: Section 2 - Bottom Blank 1280 120 1 153,600
OVR: Section 3 - VID1 640 480 1 607,200
OVR: Section 4 - VID2 640 480 1 607,200
WB DMA Flush (worst-case) 2048 8 1 16,384
Total Cycles per Frame - - - 954,368
Theoretical FPS (DSS Functional Clock 192MHz/Total Cycles) - - - 201 FPS



DSS M2M Performance
Platform Theoretical FPS (Worst case) Measured FPS
TDA2xx 122 FPS 123 FPS
TDA3xx 201 FPS 203 FPS



Video Capture Driver[edit]

This section describes the video capture driver performance numbers - throughput and CPU load. VIP capture driver makes use of VIP hardware block to capture data from external video source like sensors and video decoders. The video data is captured from the external video source by the VIP Parser sub-block in the VIP block. The VIP Parser then sends the captured data for further processing in the VIP block which can include color space conversion, scaling, chroma down sampling and finally writes the video data to external DDR memory.

Setup Details

  • TDA2xx/TDA2Ex/TDA2Px Base EVM + Vision App board or TDA3xx Base EVM
  • Sensor - Omnivision OV10635
Video Capture (OV10635 Video Sensor) performance values
Video
(Resolution)
TDA2xx/TDA2Ex/TDA2Px (IPU1 Core0) TDA3xx (IPU1 Core0)
Field Rate per Channel
(in Frames/sec)
CPU Load
(in %)
Field Rate per Channel
(in Frames/sec)
CPU Load
(in %)
1 CH 720P resolution 30 1% 30 1%




VPE Memory to Memory Drivers[edit]

This section describes the memory-to-memory drivers' performance numbers - throughput and CPU load. VPE M2M drivers takes the video buffer from the memory, optionally process the buffer, (processing done on the buffer depends on the specific M2M driver) and puts it back to memory. M2M driver follows the FVID2 interface for the applications. This driver takes YUYV422/YUV420 interlaced/progressive input via the DEI path and provide a scaled version of the deinterlaced/bypassed with optional conversion to YUV422/YUV420/RGB output.
The performance is calculated based on below:

  • Width to consider = MAX(In Width, Out Width)
  • Height to consider = MAX(In Height, Out Height)


Setup Details

  • CPU Idle - Disabled
  • Calculate time required for single scaler operation and for CPU load, issue scaler operation in contiguous loop with queuing buffer for each scaling.
VPE Driver Performance values
Scaling Factor
(Resolution)
TDA2xx (IPU1 Core0) TDA2Ex (IPU1 Core0) TDA2Px (IPU1 Core0)
Max Frames per Sec
Mega Pixels per Sec
Hardware Utilization
CPU Load
(in %)
Max Frames per Sec
Mega Pixels per Sec
Hardware Utilization
CPU Load
(in %)
Max Frames per Sec
Mega Pixels per Sec
Hardware Utilization
CPU Load
(in %)
1 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI OFF (TC0001) 707 243 MP/s 91% 9% 714 244 MP/s 91% 8% 706 244 MP/s 91% 10%
1 CH D1 (720x480) YUYV422I to 1080P YUYV422I with DEI OFF (TC0004) 126 261 MP/s 98% 4% 126 261 MP/s 98% 4% 126 261 MP/s 98% 4%
1 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI ON (TC0021) 692 238 MP/s 89% 11% 700 239 MP/s 89% 11% 691 239 MP/s 89% 12%
4 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI OFF (TC2001) 730 252 MP/s 94% 5% 733 252 MP/s 94% 5% 733 252 MP/s 93% 4%
8 CH D1 (720x480) YUYV422I to D1 (720x480) YUYV422I with DEI OFF (TC2002) 736 254 MP/s 95% 3% 738 254 MP/s 95% 5% 738 255 MP/s 95% 3%
4 CH WXGA (1280x800) YUV420SP_UV to 640x400 YUYV422I with DEI OFF (TC2007) 252 258 MP/s 96% 2% 253 258 MP/s 96% 4% 252 258 MP/s 96% 3%
6 CH WXGA (1280x800) YUYV422I to 640x400 YUYV422I with DEI OFF (TC2008) 254 260 MP/s 97% 2% 254 260 MP/s 97% 2% 254 260 MP/s 97% 2%


VPE Driver Performance values with 304MHz from Video PLL1
Scaling Factor
(Resolution)
TDA2xx (IPU1 Core0)
Max Frames per Sec
Mega Pixels per Sec
Hardware Utilization
CPU Load
(in %)
1 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI OFF (TC0001) 802 277 MP/s 91% 8%
1 CH D1 (720x480) YUYV422I to 1080P YUYV422I with DEI OFF (TC0004) 142 295 MP/s 97% 4%
1 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI ON (TC0021) 782 270 MP/s 88% 8%
4 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI OFF (TC2001) 825 285 MP/s 93% 6%
8 CH D1 (720x480) YUYV422I to D1 (720x480) YUYV422I with DEI OFF (TC2002) 830 287 MP/s 94% 8%
4 CH WXGA (1280x800) YUV420SP_UV to 640x400 YUYV422I with DEI OFF (TC2007) 285 292 MP/s 96% 6%
6 CH WXGA (1280x800) YUYV422I to 640x400 YUYV422I with DEI OFF (TC2008) 286 293 MP/s 96% 2%


Calculating Performance for VPE drivers[edit]

The description below is based on actual performance seen with SW drivers on actual Si.

Performance of Scalar (SC) with DEI OFF
[edit]

This is applicable for TDA2xx VPE & TI814x (DEI-WB path).
Here DEI, whereever applicable, is assumed to be in bypass mode.
When DEI is not in bypass mode the performance description is given in subsequent section.


Each SC operates at 266 Mhz clock (in TDA2xx) and 200Mhz (in TI814x).
In theory it can process 1 pixel per clock, i.e
- about 266 mega pixel per second (MP/s) in TDA2xx.
- about 200 mega pixel per second (MP/s) in TI814x.

But due to inherent overheads due to overlapping needed for various filtering operations, the practical standalone (i.e only SC running in system) speed would be
- about 240-250 MP/s (mega pixels/sec) in TDA2xx
- about 180-190 MP/s (mega pixels/sec) in TI814x

When SC is run with other modules like other driver, or codecs the performance may drop further due to DDR BW.

SW overheads will also reduce SC performance, but with TI BSP driver we see very little impact of SW overheads.

Taking typical use-case, each SC can safely do
- about 186MP/s processing (in TDA2xx).
- about 130MP/s processing (in TI814x).

Number of pixel processed when doing SC for a 1 D1 CH of 720x480 @ 30frames per second, is 720x480x30(frames per second) = 10.3MP/s

Here Output from SC is <= 720x480

Thus SC can safely do about 16CHs of D1 (in TDA2xx) and about 12CH D1 (in TI814x) when its output size is <= 720x480, i.e only downscaling is done in the scaler.

In practice with BSP only applications we found that measured SC performance is
- about 22 D1 CHs (about 236MP/s) in TDA2xx
- about 13 D1 CHs (about 140MP/s) in TI814x

With other activity like codec, performance should drop but we know each SC will safely give
- 20CH D1 performance (200MP/s) in TDA2xx
- 12CH D1 performance (130MP/s) in TI814x

When scalar upsampling is used the results would be bit different.
For use-case of scaling 720x480 to 960x540 output size, the performance for 1CH would be,
960x540(since 960x540 > 720x480) x30(frames per second) = 15.5MP/s

In TDA2xx, assuming SC performance is 200MP/s, thats about 12 CHs
In TI814x, assuming SC performance is 130MP/s, thats about 8 CHs


Performance of Scalar (SC) with DEI ON
[edit]

This is applicable for TDA2xx VPE & TI814x (DEI-WB path).

Each DEI operates at 266Mhz clock (in TDA2xx) and 200Mhz (in TI814x) .

In theory it can process 1 pixel per clock, i.e
- about 266 mega pixel per second. (MP/s) in TDA2xx
- about 200 mega pixel per second. (MP/s) in TI814x

But due to inherent overheads due to overlapping needed for various filtering operations, the practical standalone (only DEI running in system) speed would be
- about 200-210 MP/s (mega pixels/sec) in TDA2xx
- about 150-160 MP/s (mega pixels/sec) in TI814x

When DEI is run with other modules like other driver, or codecs the performance may drop further due to DDR BW.

SW overheads will also reduce DEI performance, but with TI BSP drivers we see very little impact of SW overheads.

Taking DVR kind of use-case, each DEI can safely do
- about 170MP/s processing in TDA2xx
- about 130MP/s processing in TI814x


Number of pixel processed when doing DEI for a 1 D1 CH of 720x240 @ 60fields per second, is

720x240x2(since DEI results in 1 line becoming two lines)x60(frames per second) = 20.7MP/s

Here Output from DEI is <= 720x480

Thus DEI can safely do,
- about 8CHs of D1 in TDA2xx
- about 6CHs of D1 in TI814x

when its output size is <= 720x480, i.e only downscaling is done in the scaler after DEI.

In practice with BSP only applications we found that measured DEI performance is
- about 9-10 D1 CHs (about 200MP/s) in TDA2xx
- about 6-7 D1 CHs (about 140MP/s) in TI814x

With other activity like codec, performance should drop but we know each DEI will safely give
- 8CH D1 performance in TDA2xx.
- 6CH D1 performance in TI814x.

Above is when scalar downsampling is used after DEI.

When scalar upsampling is used the results would be bit different.
For use-case of 960x540 output size, the performance for 1CH would be,

960x540(since 960x540 > 720x480) x60(fields per second) = 31.1MP/s

In TDA2xx, assuming DEI performance is 170MP/s, thats about 5-6 CHs
In TDA2xx, assuming DEI performance is 130MP/s, thats about 4 CHs




ISS Drivers[edit]

ISS Capture Driver (CAL)[edit]

ISS captures video streams via CAL sub-block of the ISS. It provides interfaces to capture via mipi CSI2 and Parallel. Typically used to capture streams from sensors such as Omnivision 10640, Aptina Ar0132 & Aptina AR0140. To measure the performance, RAW 12 video stream @ 30 FPS is captured from OV10640 and written into memory.

Setup Details

  • TDA3xx/TDA2Px EVM
  • Sensor - Omnivision OV10640, Data Format as RAW 12
Video Capture (OV10635 Video Sensor) performance values
Video
(Resolution)
TDA3xx/TDA2Px (IPU1 Core0)
Field Rate per Channel
(in Frames/sec)
CPU Load
(in %)
1 CH 720P resolution 30 < 1%


ISS M2M ISP WDR Driver[edit]

This driver takes RAW 12 video frame, companded and performs 2 pass processing. In pass 1, low exposure is processed and in pass 2 high exposure is processed and merged with low exposure. Writes the processed frame to memory in YUV420 SP (NV12) datafomat.

Setup Details

  • Input frame RAW12
  • Output YUV420 SP (NV12)
WDR Driver Performance values
Image Width/Height TDA3xx FPS for 212 MHz TDA2Px (OPP Norm) FPS for 355 MHz TDA2Px (OPP OD) FPS for 450 MHz TDA2Px (OPP High) FPS for 550 MHz
ISP 2 Pass WDR Flow: Pass 1 1280X960 143 249 296 366
ISP 2 Pass WDR Flow: Pass 1 1280X960 140 243 290 360
LDC Bi Cubic 1920X1080 52 83 101 125
LDC Bi Linear 1280X960 100 164 195 227

ISS CALB M2M Driver[edit]

This driver takes a video frame in MIPI format 12 bit packed and converts it to 12 bit unpacked Linear format which can be used for further processing.

Setup Details

  • TDA2Px EVM
  • Input Mipi format 12 bit packed
  • Output Linear format 12 bit unpacked
ISS CALB M2M Driver Performance values
image resolution OPP (ISS Clk) fps byte rate
1280X720 Nom (355MHz) 642 1.1 GBps
1280X720 High (550MHz) 844 1.44 GBps


UART Driver[edit]

This section describes the UART drivers' performance numbers - throughput and CPU load. The UART drivers in used to transfer data to and from the UART terminal. The UART driver follows the BIOS GIO/IOM driver model.

Setup Details

  • Calculate time and CPU load required for UART transfer operation - issue GIO_submit operation in contiguous loop. Below are the test parameters
  • Instance : UART1
  • Baudrate : 115200
  • Stop Bits : 1
  • Parity : None
  • Character Length : 8 bits
  • Bytes per GIO Submit : 138
UART Driver Performance values
Test Case TDA2xx (IPU1 Core0) TDA2Px (IPU1 Core0) TDA2Ex (IPU1 Core0) TDA3xx (IPU1 Core0)
TX Bytes per Second
Hardware Utilization
CPU Load
(in %)
TX Bytes per Second
Hardware Utilization
CPU Load
(in %)
TX Bytes per Second
Hardware Utilization
CPU Load
(in %)
TX Bytes per Second
Hardware Utilization
CPU Load
(in %)
Polled Mode, FIFO Enable (TC_00102) 11416 BP/s 99% 71% 11416 BP/s 99% 71% 11416 BP/s 99% 70% 11416 BP/s 99% 80%
Polled Mode, FIFO Disable (TC_00132) 1000 BP/s 8% 2% 1000 BP/s 8% 2% 1000 BP/s 8% 2% 1000 BP/s 8% 2%
Interrupt Mode, FIFO Enable, TX Trigger Level 56 bytes (TC_00202) 11450 BP/s 99% 4% 11450 BP/s 99% 4% 11450 BP/s 99% 4% 11451 BP/s 99% 4%
Interrupt Mode, FIFO Disable (TC_00232) 11451 BP/s 99% 13% 11451 BP/s 99% 13% 11451 BP/s 96% 11% 11449 BP/s 96% 12%
Interrupt Mode, FIFO Enable, TX Trigger Level 8 bytes (TC_00241) 11450 BP/s 99% 3% 11450 BP/s 99% 3% 11450 BP/s 99% 3% 11450 BP/s 99% 3%
Interrupt Mode, FIFO Enable, TX Trigger Level 16 bytes (TC_00242) 11451 BP/s 99% 2% 11451 BP/s 99% 2% 11451 BP/s 99% 2% 11451 BP/s 99% 2%
Interrupt Mode, FIFO Enable, TX Trigger Level 32 bytes (TC_00243) 11451 BP/s 99% 2% 11451 BP/s 99% 2% 11451 BP/s 99% 2% 11451 BP/s 99% 2%
DMA Mode, FIFO Enable, TX Trigger Level 56 bytes (TC_00302) 11450 BP/s 99% 2% 11450 BP/s 99% 2% 11450 BP/s 99% 1% 11450 BP/s 99% 2%
DMA Mode, FIFO Disable (TC_00332) 11450 BP/s 99% 1% 11450 BP/s 99% 1% 11449 BP/s 99% 1% 11450 BP/s 99% 1%
DMA Mode, FIFO Enable, TX Trigger Level 8 bytes (TC_00341) 11450 BP/s 99% 1% 11450 BP/s 99% 1% 11450 BP/s 99% 1% 11451 BP/s 99% 1%
DMA Mode, FIFO Enable, TX Trigger Level 16 bytes (TC_00342) 11450 BP/s 99% 1% 11450 BP/s 99% 1% 11450 BP/s 99% 1% 11450 BP/s 99% 1%
DMA Mode, FIFO Enable, TX Trigger Level 32 bytes (TC_00343) 11450 BP/s 99% 1% 11450 BP/s 99% 1% 11450 BP/s 99% 1% 11450 BP/s 99% 2%




CRC CSL-FL[edit]

This section describes the CRC CSL-FL performance numbers - throughput. CRC CSL-FL is used to generate the CRC Signature, which can be used to perform memory checks to verify the integrity of memory system.

CRC Performance for TDA3xx (IPU)
CONFIGURATION PROCESSOR TRANSFER SIZE THROUGHPUT
EDMA used, pattern/ EDMA ACnt = 8bytes, cache enabled M4 1800 KB 486 MB/s
EDMA used, pattern/ EDMA ACnt = 8bytes, cache enabled DSP 1800 KB 489 MB/s


DCAN CSL-FL[edit]

This section describes the DCAN CSL-FL performance numbers - throughput. DCAN driver is used to transfer data between CAN nodes. It also configures ECC for message RAM.

DCAN Performance
PROCESSOR CONFIGURATION PROCESSOR BAUDRATE MESSAGES TRANSMITTED PER SEC MESSAGE SIZE HW UTILIZATION
TDA3xx Cache - Enabled M4 1Mbit/sec 7237 128 bits 92%
TDA2xx Cache - Enabled M4 1Mbit/sec 7237 128 bits 92%
TDA2xx Cache - Enabled A15 1Mbit/sec 7237 128 bits 92%
TDA2Ex Cache - Enabled M4 1Mbit/sec 7237 128 bits 92%
TDA2Ex Cache - Enabled A15 1Mbit/sec 7237 128 bits 92%
TDA2Px Cache - Enabled M4 1Mbit/sec 7237 128 bits 92%
TDA2Px Cache - Enabled A15 1Mbit/sec 7237 128 bits 92%




MCAN CSL-FL[edit]

This section describes the MCAN CSL-FL performance numbers - throughput. MCAN driver is used to transfer data between CAN-FD nodes. It also configures ECC for message RAM.

Setup

  • Platform: TDA3xx/TDA2Px EVM
  • Frame Type: Standard (11bit) ID CAN FD Frame
  • Payload: 64 bytes
  • Nominal Baud rate: 1 Mbps
  • Data Phase Baud rate: 5 Mbps
  • Cache: Enabled
  • Test Type: Both Tx and Rx

Performance Details
Number of message per second: 5658
HW utilization: 76%



MMCSD CSL-FL[edit]

This section describes the MMCSD CSL-FL performance numbers - throughput. MMCSD driver is used to read or write data to the mmc/sd card. This is tested with fatlib. Tested as part of the file iio use case of Processor SDK Vision. The file io use case reads the AppImage file from the SD card and writes it back to SD card. Performance is measured using the timestamp and the size of the AppImage.

Setup

  • Platform: TDA2xx
  • CPU: Cortex M4
  • D-Cache: Disabled
  • I-cache: Enabled

Performance Details

  • The speed of the transfer depends on the class of the SD card used.
  • With class 10 SD card the read speed is 4.5 MBps and write speed is 1.5 MBps.


PCIe CSL-FL[edit]

This section describes the PCIe CSL-FL performance numbers - throughput. PCIe driver is used for board to board communication using single lane. PCIe Gen1 supports 2.5 Gbps and Gen2 supports 5.0 Gbps.

Setup

  • Platform: Both TDA2xx ES2.0 EVM
  • Lane: Single
  • Data Buffer Transferred: 16 MB
  • CPU: Cortex A15
  • D-Cache: Disabled
  • I-cache: Enabled
  • EDMA Params: A-count=0x4000, B-count=0x400, C-count=1
  • Polling Method

Performance Details

  • Gen1 speed is 184 MBps.
  • Gen2 speed is 370 MBps.


Archived[edit]


E2e.jpg {{
  1. switchcategory:MultiCore=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article PDK/PDK TDA Datasheet here.

Keystone=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article PDK/PDK TDA Datasheet here.

C2000=For technical support on the C2000 please post your questions on The C2000 Forum. Please post only comments about the article PDK/PDK TDA Datasheet here. DaVinci=For technical support on DaVincoplease post your questions on The DaVinci Forum. Please post only comments about the article PDK/PDK TDA Datasheet here. MSP430=For technical support on MSP430 please post your questions on The MSP430 Forum. Please post only comments about the article PDK/PDK TDA Datasheet here. OMAP35x=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article PDK/PDK TDA Datasheet here. OMAPL1=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article PDK/PDK TDA Datasheet here. MAVRK=For technical support on MAVRK please post your questions on The MAVRK Toolbox Forum. Please post only comments about the article PDK/PDK TDA Datasheet here. For technical support please post your questions at http://e2e.ti.com. Please post only comments about the article PDK/PDK TDA Datasheet here.

}}

Hyperlink blue.png Links

Amplifiers & Linear
Audio
Broadband RF/IF & Digital Radio
Clocks & Timers
Data Converters

DLP & MEMS
High-Reliability
Interface
Logic
Power Management

Processors

Switches & Multiplexers
Temperature Sensors & Control ICs
Wireless Connectivity