NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.
PDK/PDK TDA Datasheet
Contents
PDK Drivers[edit]
This section provides brief overview of the device drivers supported in PDK release.
PDK Driver Features[edit]
For details on features, refer to PDK_Requirement_to_Test_Traceability_Report.xlsx under <PDK_INSTALL>/docs/traceability folder.
VPS Driver VPDMA List Usage[edit]
In TDA2xx/TDA2Ex/TDA3xx, each VIP and VPE has a separate VPDMA instance. And each VPDMA in turn has 8 lists:
Driver | DMA usage |
---|---|
VIP Capture | One list per port. Hence max 4 list per VIP (Slice0/1 x PortA/B) |
M2M VPE (only for TDA2xx/TDA2Ex/TDA2Px) | Only one list for VPE1 |
Setup Details[edit]
Details | TDA2xx/TDA2Ex/TDA2Px | TDA3xx | |
---|---|---|---|
SoC Details | Core | IPU1 (M4) core 0 | IPU1 (M4) core 0 |
Operating speed of Core | 212.5 MHz | 212.5 MHz | |
Operating speed of VPE | 266 Mpixels/sec | NA | |
EVM Configuration | TDA2xx: 2 EMIFs Non-Interleaved, DDR3 @ 532MHz TDA2Ex/TDA2Px: 1 EMIFs Non-Interleaved, DDR3 @ 666MHz |
1 EMIFs Non-Interleaved, DDR3 @ 532MHz | |
Optimization Details | Is the Ducati cache enabled? | Yes | Yes |
Profile | release | release | |
M4 compile options (release build) | -g -ms -c -qq -pdsw225 --endian=little -mv7M4 --float_support=vfplib --abi=eabi -eo.oem4 -ea.sem4 --symdebug:dwarf --embed_inline_assembly --emit_warnings_as_errors | Same as TDA2xx | |
M4 Linker options (release build) | --emit_warnings_as_errors -w -q -u _c_int00 --silicon_version=7M4 -c -x --zero_init=on | Same as TDA2xx | |
DSP Compile options (release build) | -mv6600 --abi=eabi -q -mi10 -mo -pden -pds=238 -pds=880 -pds1110 --program_level_compile -g --endian=little -eo.oe66 -ea.se66 --emit_warnings_as_errors | Same as TDA2xx | |
DSP Linker options (release build) | --emit_warnings_as_errors --warn_sections -q -e=_c_int00 --silicon_version=6600 -c | Same as TDA2xx | |
Is the code and data placed in L2/L3 memory? | No | No | |
Is the L3 interconnect optimized? | No | No |
Resources Details[edit]
Details | TDA2xx/TDA2Ex/TDA2Px | TDA3xx |
---|---|---|
Timers | M4 Internal timer | M4 Internal timer |
HWI | IPU1_23 (DSS DISPC), IPU1_26 (HDMI_IRQ) IPU1_27 (VIP1), IPU1_28 (VIP2), IPU1_29 (VIP3) IPU1_30 (VPE1) IPU1_41 (I2C1), IPU1_42 (I2C2 on TDA2xx, I2C5 on TDA2Ex), IPU1_43 (I2C3), IPU1_48 (I2C4 on TDA2xx/TDA2Ex, I2C5 - only on TDA2xx-MC) IPU1_57 (MCSPI1), IPU1_58 (MCSPI2) IPU1_59 (MCSPI3), IPU1_60 (MCSPI4) IPU1_44 (UART1), IPU1_60 (UART2), IPU1_45 (UART3), IPU1_61 (UART4), IPU1_62 (UART5), IPU1_63 (UART6), IPU1_64 (UART7), IPU1_65 (UART8), IPU1_69 (UART9), IPU1_70 (UART10) |
IPU1_23 (DSS DISPC), IPU1_27 (VIP1), |
Low Latency HWI (This cant be preempted or disabled using Hwi_disable() BIOS API) | NA | NA |
I2C Instances (Starting from 1) | I2C1, I2C2, I2C5(for TDA2Ex) (Usage can be controlled from App) | I2C1, I2C2 (Usage can be controlled from App) |
EDMA Channels | UART1 (TX-48, RX-49), UART2 (TX-50, RX-51), UART3 (TX-52, RX-53), UART4 (TX-54, RX-55), UART5 (TX-62, RX-63), UART6 (TX-50, RX-51), UART7 (TX-50, RX-51), UART8 (TX-50, RX-51), UART9 (TX-50, RX-51), UART10 (TX-50, RX-51) MCSPI1TX - 34, MCSPI1RX - 35, MCSPI2TX - 42, MCSPI2RX - 43, MCSPI3TX - 14, MCSPI3RX - 15, MCSPI4TX - 22, MCSPI4RX - 23 (TDA2XX Instance starting from 1) |
UART1 (TX-48, RX-49), UART2 (TX-50, RX-51), UART3 (TX-52, RX-53) MCSPI1TX - 34, MCSPI1RX - 35, MCSPI2TX - 42, MCSPI2RX - 43, MCSPI3TX - 14, MCSPI3RX - 15, MCSPI4TX - 22, MCSPI4RX - 23 (TDA3XX Instance starting from 1) |
PLLs Used | Video1_PLL and HDMI_PLL (All video PLLs configured according to display resolution selected) | DSP_EVE_VID_PLL (configured according to display resolution selected) |
PRCM Done | PRCM Done | None (all through GEL file/SBL) |
GPIO | GPIO4_13, GPIO4_14, GPIO4_15, GPIO4_16 and GPIO6_17 to control video mux select and sensor power on vision application card GPIO2_29, GPIO1_4, GPIO6_7 acts as Demux_FPD_A/B/C control signals in LVDS multi-deserializer board. |
None |
PinMuxing Details (Usage can be controlled from App) | See TDA2xx pdk/packages/ti/drv/vps/src/boards file for details | See TDA3xx pdk/packages/ti/drv/vps/src/boards file for details |
Memory Requirements (Cache able) | See pdk/docs/memstat/tda2xx file for details | See pdk/docs/memstat/tda3xx file for details |
Memory Requirements (Non Cache able) | VIP/VPE Descriptor memory, see Memory Footprint table below | VIP Descriptor memory, see Memory Footprint table below |
SWI | 1 per UART instance in case of DMA or Interrupt mode to handle UART RX/TX ISR | 1 per UART instance in case of DMA or Interrupt mode to handle UART RX/TX ISR |
Tasks | 1 (highest priority) | 1 (highest priority) |
Memory Footprint[edit]
For details on library code and data section, refer to PDK memstat under <PDK_INSTALL>/docs/memstat folder. Below lists the dynamic memory requirement.
Use Case or Example | System Stack (Cached section) | Task Stack (Cached section) | OSAL Objects (Cached section) | VPDMA Descriptor Heap (Non-cached section) |
---|---|---|---|---|
Loopback Example (VIP-DSS) | 1316 | 1764 | 61 Semaphore, 9 HWI | 722880 (Static) |
M2M VPE Example | 404 | 1344 | 33 Semaphore, 5 HWI | 722880 (Static) |
Use Case or Example | System Stack (Cached section) | Task Stack (Cached section) | OSAL Objects (Cached section) | VPDMA Descriptor Heap (Non-cached section) |
---|---|---|---|---|
Loopback Example (VIP-DSS) | 1236 | 1988 | 69 Semaphore, 11 HWI | 722880 (Static) |
M2M VPE Example | 756 | 2574 | 68 Semaphore, 11 HWI | 722880 (Static) |
Use Case or Example | System Stack (Cached section) | Task Stack (Cached section) | OSAL Objects (Cached section) | VPDMA Descriptor Heap (Non-cached section) |
---|---|---|---|---|
Loopback Example (VIP-DSS) | 1220 | 2012 | 59 Semaphore, 7 HWI | 182208 (Static) |
M2M VPE Example | 404 | 2104 | 59 Semaphore, 7 HWI | 182208 (Static) |
Use Case or Example | System Stack (Cached section) | Task Stack (Cached section) | OSAL Objects (Cached section) | VPDMA Descriptor Heap (Non-cached section) |
---|---|---|---|---|
Loopback Example (VIP-DSS) | 1328 | 1764 | 52 Semaphore, 5 HWI | 108544 (Static) |
Software Performance Numbers[edit]
SETUP | |
---|---|
Profile Clock (MHz) - CTM | 425 |
Platform | TDA2XX ES1.0/ES1.1 |
M4 Clock (MHz) | 212.5 |
Cache | Enabled |
Build | Release |
DDR3 (MHz) | 532 |
Summary | FPS | Load | Mhz |
---|---|---|---|
VIP Capture Driver Load (1 Channel 720p60 capture) | 60 | 0.25% | 0.53 |
VPE M2M Driver (1 Channel 720x240 YUV420SP to 360x240 YUV422I, DEI ON) | 30 | 0.32% | 0.68 |
DSS Display Driver (1 Video Pipe @720p60 display) | 60 | 0.11% | 0.23 |
VIP Capture Driver (1 Channel 720p60 capture) |
Average | Max | ||
---|---|---|---|---|
Ticks | Duration (in us) |
Ticks | Duration (in us) | |
M3 Load per frame (Including App Q/DQ) | 16664 | 41.66 | 32020 | 80.05 |
Queue | 2637 | 6.59 | 6038 | 15.10 |
DeQueue | 2441 | 6.10 | 5646 | 14.12 |
VPE M2M Driver (1 Channel 720x240 YUV420SP to 360x240 YUV422I, DEI ON) |
Average | Max | ||
---|---|---|---|---|
Ticks | Duration (in us) |
Ticks | Duration (in us) | |
M3 Load per frame (Including App Q/DQ) | 42831 | 107.08 | 73072 | 182.68 |
Queue | 32046 | 80.12 | 48642 | 121.61 |
DeQueue | 2416 | 5.37 | 12708 | 31.77 |
DSS Display Driver (1 Video Pipe @720p60 display) |
Average | Max | ||
---|---|---|---|---|
Ticks | Duration (in us) |
Ticks | Duration (in us) | |
M3 Load per frame (Including App Q/DQ) | 47339 | 18.35 | 14942 | 37.36 |
Queue | 1528 | 3.82 | 2800 | 7.00 |
DeQueue | 1341 | 3.35 | 3692 | 9.23 |
VIP Capture to DSS Display Glass-to-Glass Latency Numbers[edit]
Setup Details
- TDA2xx EVM running the default video loopback application from OV Sensor->VIP->DSS->LCD
- OV Sensor is pointing to another monitor displaying millisecond counter running at 60 Hz
- Both the LCD image and original monitor are captured at the same time side by side using another digital still camera
- Glass to glass latency is then calculated by taking the difference in time in the LCD and monitor
With this method, it is observed that the glass to glass VIP to DSS latency is measured to vary from 44ms to 66ms.
The explanation and the split-up for the above observation is as below
- Capture is happening at 30 FPS. This will have a 33.33 ms latency because of end of frame callback is used to trigger the display
- Display is running at 60 FPS. Since capture VSYNC and display VSYNCs are not synchronized, the latency can vary from 0 – 16.66 ms. Also since the display FPS is more than capture, the display will repeat the frame resulting in another possible 0 - 16.66 ms latency difference
- Also since this measurement is done by capturing PC monitor which is also running at 60 FPS, that could also introduce some more latency from 0 – 16.66 ms because of quantization error (i.e. counter can’t display any time granular than 16.66 ms)
- Also the sensor and LCD latency should be considered, which looks like is negligible from the measured and theoretical calculations as above
Video Display Driver[edit]
This section describes the display drivers performance numbers - throughput and CPU load. Display drivers takes the video buffers from the application and display the videos on HDMI/LCD at specified frame rate and resolution. Display drivers follows the FVID2 interface.
Video 1,2,3 and Graphics 1 Display Driver[edit]
Setup Details
- TDA2xx/TDA2Ex/TDA2Px EVM & TFC-S9700RTWV35TR-01 800x480 LCD from ThreeFive Corp
- TDA3xx EVM & LG LP101WX2 1280x800 LCD
Output Display (Resolution) |
TDA2xx/TDA2Ex/TDA2Px (IPU1 Core0) | TDA3xx (IPU1 Core0) | ||
---|---|---|---|---|
Frame Rate (in Frames/sec) |
CPU Load (in %) |
Frame Rate (in Frames/sec) |
CPU Load (in %) | |
On/Off-Chip HDMI | 60 FPS (on-Chip HDMI) | 1% | 60 FPS (Off-Chip HDMI) | 1% |
LCD | 60 FPS | 1% | 60 FPS | 1% |
Buffer Queue Latency[edit]
Driver latency to program the buffer to DSS = code execution time from APP queue to programming (T1) + 5 line of display rate (T2). With TDA2XX EVM, T1 is measured to be around 20 micro seconds.
Display Resolution | T2 in micro seconds |
---|---|
800x480@60fps | 158.25 |
1280X720@60fps | 107.74 |
1920X1080@60fps | 74.07 |
The total latency comes around 180 us for 800x480 @ 60 FPS display. So if any buffer is queued 180 us before the Vsync then the buffer will be displayed in the next frame period.
Note: This measurement is done with the stand alone display application.In fully loaded system the interrupt latency will add to it.
Reason for 5 lines check: This check is required so that the driver won't program the buffer address around the display VSYNC period. Doing so would result in DSS HW not accepting the programmed buffer resulting in frame drop.
DSS M2M Writeback Driver[edit]
This section describes the DSS M2M writeback driver performance numbers - throughput and CPU load. DSS M2M writeback driver takes input video buffers from the application and writes the scaled/color converted output to memory via the writeback path. Below table shows the DSS M2M driver performance @ the DSS functional clock of 192 MHz
Resolution | TDA2xx (IPU1 Core0) | TDA2Px (IPU1 Core0) | TDA2Ex (IPU1 Core0) | TDA3xx (IPU1 Core0) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Max Frames per Sec |
Mega Pixels per Sec |
Hardware Utilization |
CPU Load (in %) |
Max Frames per Sec |
Mega Pixels per Sec |
Hardware Utilization |
CPU Load (in %) |
Max Frames per Sec |
Mega Pixels per Sec |
Hardware Utilization |
CPU Load (in %) |
Max Frames per Sec |
Mega Pixels per Sec |
Hardware Utilization |
CPU Load (in %) | |
VID1 1280x720 RGB888 to YUV422I 480P | 205 | 188 MP/s | 98% | 3% | 205 | 188 MP/s | 98% | 3% | 205 | 188 MP/s | 98% | 2% | 206 | 190 MP/s | 99% | 2% |
VID2 1280x720 RGB565 to 1280x720 YUYV422I | 205 | 188 MP/s | 98% | 7% | 205 | 188 MP/s | 98% | 9% | 205 | 188 MP/s | 98% | 2% | 206 | 189 MP/s | 98% | 2% |
VID1 1280x720 YUV422I to 1920X1080 RGB888 | 91 | 188 MP/s | 98% | 1% | 91 | 188 MP/s | 98% | 5% | 91 | 188 MP/s | 98% | 1% | 91 | 188 MP/s | 98% | 5% |
VID1 1920X1080 YUV422I to 1920X1080 RGB565 | 91 | 188 MP/s | 98% | 1% | 91 | 188 MP/s | 98% | 5% | 91 | 188 MP/s | 98% | 5% | 91 | 188 MP/s | 98% | 4% |
VID1 1920X1080 YUV420SP to 1280x720 YUV422I | 91 | 188 MP/s | 98% | 2% | 91 | 188 MP/s | 98% | 4% | 91 | 188 MP/s | 98% | 4% | 91 | 188 MP/s | 98% | 1% |
VID3 1920X1080 RGB888 to 1920X1080 YUV422I | 91 | 188 MP/s | 98% | 1% | 91 | 188 MP/s | 98% | 4% | 91 | 188 MP/s | 98% | 1% | NA | NA | NA | NA |
Calculating Performance for DSS M2M Writeback Driver[edit]
This section explains how to calculate the theoretical performance when more than one pipeline (with scaling) is used to overlay and then written back as shown in below picture. Below are few main rules or points
- Count the clock cycles required based on Overlay output rather than any input pipeline fetch. For that effect, split the overlay into various sections and count the pixels separately. Note that, if any upscale is done inside the WB, then the WB output needs to be used for the calculations (as it will be bigger than the overlay output)
- In TDA3xx, the use-case shown is then just a 720p without any additional overhead (There are minor overheads related to VID DMA pre-fetch and WB DMA flush, which is captured in the performance table)
- In TDA2xx, TDA2Px and TDA2Ex, there is a horizontal downscaling limitation, whereby when downscaling by N, the VID pipe output (or the overlay input gets) 1 pixel only every N clock cycles. This causes performance difference between TDA2xx and TDA3xx as shown in table. If downscaling is not done in the VID pipelines, then the results would be same between TDA2xx and TDA3xx
Inputs | Value |
---|---|
Overlay Width | 1280 |
Overlay Height | 720 |
VID1 Input Width | 1280 |
VID1 Input Height | 720 |
VID2 Input Width | 1280 |
VID2 Input Height | 720 |
VID1 Output Width | 640 |
VID1 Output Height | 480 |
VID2 Output Width | 640 |
VID2 Output Height | 480 |
Performance Section Split | Width in Pixels (W) | Height in Lines (H) | Downscaling Factor (S) | Required DSS Cycles (W x H x S) |
---|---|---|---|---|
VID DMA Prefetch (worst-case) | 2048 | 8 | 1 | 16,384 |
OVR: Section 1 - Top Blank | 1280 | 120 | 1 | 153,600 |
OVR: Section 2 - Bottom Blank | 1280 | 120 | 1 | 153,600 |
OVR: Section 3 - VID1 | 640 | 480 | 2 | 614,400 |
OVR: Section 4 - VID2 | 640 | 480 | 2 | 614,400 |
WB DMA Flush (worst-case) | 2048 | 8 | 1 | 16,384 |
Total Cycles per Frame | - | - | - | 1,568,768 |
Theoretical FPS (DSS Functional Clock 192MHz/Total Cycles) | - | - | - | 122 FPS |
Performance Section Split | Width in Pixels (W) | Height in Lines (H) | Downscaling Factor (S) | Required DSS Cycles (W x H x S) |
---|---|---|---|---|
VID DMA Prefetch (worst-case) | 2048 | 8 | 1 | 16,384 |
OVR: Section 1 - Top Blank | 1280 | 120 | 1 | 153,600 |
OVR: Section 2 - Bottom Blank | 1280 | 120 | 1 | 153,600 |
OVR: Section 3 - VID1 | 640 | 480 | 1 | 607,200 |
OVR: Section 4 - VID2 | 640 | 480 | 1 | 607,200 |
WB DMA Flush (worst-case) | 2048 | 8 | 1 | 16,384 |
Total Cycles per Frame | - | - | - | 954,368 |
Theoretical FPS (DSS Functional Clock 192MHz/Total Cycles) | - | - | - | 201 FPS |
Platform | Theoretical FPS (Worst case) | Measured FPS |
---|---|---|
TDA2xx | 122 FPS | 123 FPS |
TDA3xx | 201 FPS | 203 FPS |
Video Capture Driver[edit]
This section describes the video capture driver performance numbers - throughput and CPU load. VIP capture driver makes use of VIP hardware block to capture data from external video source like sensors and video decoders. The video data is captured from the external video source by the VIP Parser sub-block in the VIP block. The VIP Parser then sends the captured data for further processing in the VIP block which can include color space conversion, scaling, chroma down sampling and finally writes the video data to external DDR memory.
Setup Details
- TDA2xx/TDA2Ex/TDA2Px Base EVM + Vision App board or TDA3xx Base EVM
- Sensor - Omnivision OV10635
Video (Resolution) |
TDA2xx/TDA2Ex/TDA2Px (IPU1 Core0) | TDA3xx (IPU1 Core0) | ||
---|---|---|---|---|
Field Rate per Channel (in Frames/sec) |
CPU Load (in %) |
Field Rate per Channel (in Frames/sec) |
CPU Load (in %) | |
1 CH 720P resolution | 30 | 1% | 30 | 1% |
VPE Memory to Memory Drivers[edit]
This section describes the memory-to-memory drivers' performance numbers - throughput and CPU load.
VPE M2M drivers takes the video buffer from the memory, optionally process the buffer, (processing done on the buffer depends on the specific M2M driver) and puts it back to memory. M2M driver follows the FVID2 interface for the applications.
This driver takes YUYV422/YUV420 interlaced/progressive input via the DEI path and provide a scaled version of the deinterlaced/bypassed with optional conversion to YUV422/YUV420/RGB output.
The performance is calculated based on below:
- Width to consider = MAX(In Width, Out Width)
- Height to consider = MAX(In Height, Out Height)
Setup Details
- CPU Idle - Disabled
- Calculate time required for single scaler operation and for CPU load, issue scaler operation in contiguous loop with queuing buffer for each scaling.
Scaling Factor (Resolution) |
TDA2xx (IPU1 Core0) | TDA2Ex (IPU1 Core0) | TDA2Px (IPU1 Core0) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Max Frames per Sec |
Mega Pixels per Sec |
Hardware Utilization |
CPU Load (in %) |
Max Frames per Sec |
Mega Pixels per Sec |
Hardware Utilization |
CPU Load (in %) |
Max Frames per Sec |
Mega Pixels per Sec |
Hardware Utilization |
CPU Load (in %) | |
1 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI OFF (TC0001) | 707 | 243 MP/s | 91% | 9% | 714 | 244 MP/s | 91% | 8% | 706 | 244 MP/s | 91% | 10% |
1 CH D1 (720x480) YUYV422I to 1080P YUYV422I with DEI OFF (TC0004) | 126 | 261 MP/s | 98% | 4% | 126 | 261 MP/s | 98% | 4% | 126 | 261 MP/s | 98% | 4% |
1 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI ON (TC0021) | 692 | 238 MP/s | 89% | 11% | 700 | 239 MP/s | 89% | 11% | 691 | 239 MP/s | 89% | 12% |
4 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI OFF (TC2001) | 730 | 252 MP/s | 94% | 5% | 733 | 252 MP/s | 94% | 5% | 733 | 252 MP/s | 93% | 4% |
8 CH D1 (720x480) YUYV422I to D1 (720x480) YUYV422I with DEI OFF (TC2002) | 736 | 254 MP/s | 95% | 3% | 738 | 254 MP/s | 95% | 5% | 738 | 255 MP/s | 95% | 3% |
4 CH WXGA (1280x800) YUV420SP_UV to 640x400 YUYV422I with DEI OFF (TC2007) | 252 | 258 MP/s | 96% | 2% | 253 | 258 MP/s | 96% | 4% | 252 | 258 MP/s | 96% | 3% |
6 CH WXGA (1280x800) YUYV422I to 640x400 YUYV422I with DEI OFF (TC2008) | 254 | 260 MP/s | 97% | 2% | 254 | 260 MP/s | 97% | 2% | 254 | 260 MP/s | 97% | 2% |
Scaling Factor (Resolution) |
TDA2xx (IPU1 Core0) | |||
---|---|---|---|---|
Max Frames per Sec |
Mega Pixels per Sec |
Hardware Utilization |
CPU Load (in %) | |
1 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI OFF (TC0001) | 802 | 277 MP/s | 91% | 8% |
1 CH D1 (720x480) YUYV422I to 1080P YUYV422I with DEI OFF (TC0004) | 142 | 295 MP/s | 97% | 4% |
1 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI ON (TC0021) | 782 | 270 MP/s | 88% | 8% |
4 CH D1 (720x480) YUYV422I to CIF (360x240) YUYV422I with DEI OFF (TC2001) | 825 | 285 MP/s | 93% | 6% |
8 CH D1 (720x480) YUYV422I to D1 (720x480) YUYV422I with DEI OFF (TC2002) | 830 | 287 MP/s | 94% | 8% |
4 CH WXGA (1280x800) YUV420SP_UV to 640x400 YUYV422I with DEI OFF (TC2007) | 285 | 292 MP/s | 96% | 6% |
6 CH WXGA (1280x800) YUYV422I to 640x400 YUYV422I with DEI OFF (TC2008) | 286 | 293 MP/s | 96% | 2% |
Calculating Performance for VPE drivers[edit]
The description below is based on actual performance seen with SW drivers on actual Si.
Performance of Scalar (SC) with DEI OFF
[edit]
This is applicable for TDA2xx VPE & TI814x (DEI-WB path).
Here DEI, whereever applicable, is assumed to be in bypass mode.
When DEI is not in bypass mode the performance description is given in subsequent section.
Each SC operates at 266 Mhz clock (in TDA2xx) and 200Mhz (in TI814x).
In theory it can process 1 pixel per clock, i.e
- about 266 mega pixel per second (MP/s) in TDA2xx.
- about 200 mega pixel per second (MP/s) in TI814x.
But due to inherent overheads due to overlapping needed for various filtering operations, the practical standalone (i.e only SC running in system) speed would be
- about 240-250 MP/s (mega pixels/sec) in TDA2xx
- about 180-190 MP/s (mega pixels/sec) in TI814x
When SC is run with other modules like other driver, or codecs the performance may drop further due to DDR BW.
SW overheads will also reduce SC performance, but with TI BSP driver we see very little impact of SW overheads.
Taking typical use-case, each SC can safely do
- about 186MP/s processing (in TDA2xx).
- about 130MP/s processing (in TI814x).
Number of pixel processed when doing SC for a 1 D1 CH of 720x480 @ 30frames per second, is 720x480x30(frames per second) = 10.3MP/s
Here Output from SC is <= 720x480
Thus SC can safely do about 16CHs of D1 (in TDA2xx) and about 12CH D1 (in TI814x) when its output size is <= 720x480, i.e only downscaling is done in the scaler.
In practice with BSP only applications we found that measured SC performance is
- about 22 D1 CHs (about 236MP/s) in TDA2xx
- about 13 D1 CHs (about 140MP/s) in TI814x
With other activity like codec, performance should drop but we know each SC will safely give
- 20CH D1 performance (200MP/s) in TDA2xx
- 12CH D1 performance (130MP/s) in TI814x
When scalar upsampling is used the results would be bit different.
For use-case of scaling 720x480 to 960x540 output size, the performance for 1CH would be,
960x540(since 960x540 > 720x480) x30(frames per second) = 15.5MP/s
In TDA2xx, assuming SC performance is 200MP/s, thats about 12 CHs
In TI814x, assuming SC performance is 130MP/s, thats about 8 CHs
Performance of Scalar (SC) with DEI ON
[edit]
This is applicable for TDA2xx VPE & TI814x (DEI-WB path).
Each DEI operates at 266Mhz clock (in TDA2xx) and 200Mhz (in TI814x) .
In theory it can process 1 pixel per clock, i.e
- about 266 mega pixel per second. (MP/s) in TDA2xx
- about 200 mega pixel per second. (MP/s) in TI814x
But due to inherent overheads due to overlapping needed for various filtering operations, the practical standalone (only DEI running in system) speed would be
- about 200-210 MP/s (mega pixels/sec) in TDA2xx
- about 150-160 MP/s (mega pixels/sec) in TI814x
When DEI is run with other modules like other driver, or codecs the performance may drop further due to DDR BW.
SW overheads will also reduce DEI performance, but with TI BSP drivers we see very little impact of SW overheads.
Taking DVR kind of use-case, each DEI can safely do
- about 170MP/s processing in TDA2xx
- about 130MP/s processing in TI814x
Number of pixel processed when doing DEI for a 1 D1 CH of 720x240 @ 60fields per second, is
720x240x2(since DEI results in 1 line becoming two lines)x60(frames per second) = 20.7MP/s
Here Output from DEI is <= 720x480
Thus DEI can safely do,
- about 8CHs of D1 in TDA2xx
- about 6CHs of D1 in TI814x
when its output size is <= 720x480, i.e only downscaling is done in the scaler after DEI.
In practice with BSP only applications we found that measured DEI performance is
- about 9-10 D1 CHs (about 200MP/s) in TDA2xx
- about 6-7 D1 CHs (about 140MP/s) in TI814x
With other activity like codec, performance should drop but we know each DEI will safely give
- 8CH D1 performance in TDA2xx.
- 6CH D1 performance in TI814x.
Above is when scalar downsampling is used after DEI.
When scalar upsampling is used the results would be bit different.
For use-case of 960x540 output size, the performance for 1CH would be,
960x540(since 960x540 > 720x480) x60(fields per second) = 31.1MP/s
In TDA2xx, assuming DEI performance is 170MP/s, thats about 5-6 CHs
In TDA2xx, assuming DEI performance is 130MP/s, thats about 4 CHs
ISS Drivers[edit]
ISS Capture Driver (CAL)[edit]
ISS captures video streams via CAL sub-block of the ISS. It provides interfaces to capture via mipi CSI2 and Parallel. Typically used to capture streams from sensors such as Omnivision 10640, Aptina Ar0132 & Aptina AR0140. To measure the performance, RAW 12 video stream @ 30 FPS is captured from OV10640 and written into memory.
Setup Details
- TDA3xx/TDA2Px EVM
- Sensor - Omnivision OV10640, Data Format as RAW 12
Video (Resolution) |
TDA3xx/TDA2Px (IPU1 Core0) | |
---|---|---|
Field Rate per Channel (in Frames/sec) |
CPU Load (in %) | |
1 CH 720P resolution | 30 | < 1% |
ISS M2M ISP WDR Driver[edit]
This driver takes RAW 12 video frame, companded and performs 2 pass processing. In pass 1, low exposure is processed and in pass 2 high exposure is processed and merged with low exposure. Writes the processed frame to memory in YUV420 SP (NV12) datafomat.
Setup Details
- Input frame RAW12
- Output YUV420 SP (NV12)
Image Width/Height | TDA3xx FPS for 212 MHz | TDA2Px (OPP Norm) FPS for 355 MHz | TDA2Px (OPP OD) FPS for 450 MHz | TDA2Px (OPP High) FPS for 550 MHz | |
---|---|---|---|---|---|
ISP 2 Pass WDR Flow: Pass 1 | 1280X960 | 143 | 249 | 296 | 366 |
ISP 2 Pass WDR Flow: Pass 1 | 1280X960 | 140 | 243 | 290 | 360 |
LDC Bi Cubic | 1920X1080 | 52 | 83 | 101 | 125 |
LDC Bi Linear | 1280X960 | 100 | 164 | 195 | 227 |
ISS CALB M2M Driver[edit]
This driver takes a video frame in MIPI format 12 bit packed and converts it to 12 bit unpacked Linear format which can be used for further processing.
Setup Details
- TDA2Px EVM
- Input Mipi format 12 bit packed
- Output Linear format 12 bit unpacked
image resolution | OPP (ISS Clk) | fps | byte rate |
---|---|---|---|
1280X720 | Nom (355MHz) | 642 | 1.1 GBps |
1280X720 | High (550MHz) | 844 | 1.44 GBps |
UART Driver[edit]
This section describes the UART drivers' performance numbers - throughput and CPU load. The UART drivers in used to transfer data to and from the UART terminal. The UART driver follows the BIOS GIO/IOM driver model.
Setup Details
- Calculate time and CPU load required for UART transfer operation - issue GIO_submit operation in contiguous loop. Below are the test parameters
- Instance : UART1
- Baudrate : 115200
- Stop Bits : 1
- Parity : None
- Character Length : 8 bits
- Bytes per GIO Submit : 138
Test Case | TDA2xx (IPU1 Core0) | TDA2Px (IPU1 Core0) | TDA2Ex (IPU1 Core0) | TDA3xx (IPU1 Core0) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
TX Bytes per Second |
Hardware Utilization |
CPU Load (in %) |
TX Bytes per Second |
Hardware Utilization |
CPU Load (in %) |
TX Bytes per Second |
Hardware Utilization |
CPU Load (in %) |
TX Bytes per Second |
Hardware Utilization |
CPU Load (in %) | |
Polled Mode, FIFO Enable (TC_00102) | 11416 BP/s | 99% | 71% | 11416 BP/s | 99% | 71% | 11416 BP/s | 99% | 70% | 11416 BP/s | 99% | 80% |
Polled Mode, FIFO Disable (TC_00132) | 1000 BP/s | 8% | 2% | 1000 BP/s | 8% | 2% | 1000 BP/s | 8% | 2% | 1000 BP/s | 8% | 2% |
Interrupt Mode, FIFO Enable, TX Trigger Level 56 bytes (TC_00202) | 11450 BP/s | 99% | 4% | 11450 BP/s | 99% | 4% | 11450 BP/s | 99% | 4% | 11451 BP/s | 99% | 4% |
Interrupt Mode, FIFO Disable (TC_00232) | 11451 BP/s | 99% | 13% | 11451 BP/s | 99% | 13% | 11451 BP/s | 96% | 11% | 11449 BP/s | 96% | 12% |
Interrupt Mode, FIFO Enable, TX Trigger Level 8 bytes (TC_00241) | 11450 BP/s | 99% | 3% | 11450 BP/s | 99% | 3% | 11450 BP/s | 99% | 3% | 11450 BP/s | 99% | 3% |
Interrupt Mode, FIFO Enable, TX Trigger Level 16 bytes (TC_00242) | 11451 BP/s | 99% | 2% | 11451 BP/s | 99% | 2% | 11451 BP/s | 99% | 2% | 11451 BP/s | 99% | 2% |
Interrupt Mode, FIFO Enable, TX Trigger Level 32 bytes (TC_00243) | 11451 BP/s | 99% | 2% | 11451 BP/s | 99% | 2% | 11451 BP/s | 99% | 2% | 11451 BP/s | 99% | 2% |
DMA Mode, FIFO Enable, TX Trigger Level 56 bytes (TC_00302) | 11450 BP/s | 99% | 2% | 11450 BP/s | 99% | 2% | 11450 BP/s | 99% | 1% | 11450 BP/s | 99% | 2% |
DMA Mode, FIFO Disable (TC_00332) | 11450 BP/s | 99% | 1% | 11450 BP/s | 99% | 1% | 11449 BP/s | 99% | 1% | 11450 BP/s | 99% | 1% |
DMA Mode, FIFO Enable, TX Trigger Level 8 bytes (TC_00341) | 11450 BP/s | 99% | 1% | 11450 BP/s | 99% | 1% | 11450 BP/s | 99% | 1% | 11451 BP/s | 99% | 1% |
DMA Mode, FIFO Enable, TX Trigger Level 16 bytes (TC_00342) | 11450 BP/s | 99% | 1% | 11450 BP/s | 99% | 1% | 11450 BP/s | 99% | 1% | 11450 BP/s | 99% | 1% |
DMA Mode, FIFO Enable, TX Trigger Level 32 bytes (TC_00343) | 11450 BP/s | 99% | 1% | 11450 BP/s | 99% | 1% | 11450 BP/s | 99% | 1% | 11450 BP/s | 99% | 2% |
CRC CSL-FL[edit]
This section describes the CRC CSL-FL performance numbers - throughput. CRC CSL-FL is used to generate the CRC Signature, which can be used to perform memory checks to verify the integrity of memory system.
CONFIGURATION | PROCESSOR | TRANSFER SIZE | THROUGHPUT |
---|---|---|---|
EDMA used, pattern/ EDMA ACnt = 8bytes, cache enabled | M4 | 1800 KB | 486 MB/s |
EDMA used, pattern/ EDMA ACnt = 8bytes, cache enabled | DSP | 1800 KB | 489 MB/s |
DCAN CSL-FL[edit]
This section describes the DCAN CSL-FL performance numbers - throughput. DCAN driver is used to transfer data between CAN nodes. It also configures ECC for message RAM.
PROCESSOR | CONFIGURATION | PROCESSOR | BAUDRATE | MESSAGES TRANSMITTED PER SEC | MESSAGE SIZE | HW UTILIZATION |
---|---|---|---|---|---|---|
TDA3xx | Cache - Enabled | M4 | 1Mbit/sec | 7237 | 128 bits | 92% |
TDA2xx | Cache - Enabled | M4 | 1Mbit/sec | 7237 | 128 bits | 92% |
TDA2xx | Cache - Enabled | A15 | 1Mbit/sec | 7237 | 128 bits | 92% |
TDA2Ex | Cache - Enabled | M4 | 1Mbit/sec | 7237 | 128 bits | 92% |
TDA2Ex | Cache - Enabled | A15 | 1Mbit/sec | 7237 | 128 bits | 92% |
TDA2Px | Cache - Enabled | M4 | 1Mbit/sec | 7237 | 128 bits | 92% |
TDA2Px | Cache - Enabled | A15 | 1Mbit/sec | 7237 | 128 bits | 92% |
MCAN CSL-FL[edit]
This section describes the MCAN CSL-FL performance numbers - throughput. MCAN driver is used to transfer data between CAN-FD nodes. It also configures ECC for message RAM.
Setup
- Platform: TDA3xx/TDA2Px EVM
- Frame Type: Standard (11bit) ID CAN FD Frame
- Payload: 64 bytes
- Nominal Baud rate: 1 Mbps
- Data Phase Baud rate: 5 Mbps
- Cache: Enabled
- Test Type: Both Tx and Rx
Performance Details
Number of message per second: 5658
HW utilization: 76%
MMCSD CSL-FL[edit]
This section describes the MMCSD CSL-FL performance numbers - throughput. MMCSD driver is used to read or write data to the mmc/sd card. This is tested with fatlib. Tested as part of the file iio use case of Processor SDK Vision. The file io use case reads the AppImage file from the SD card and writes it back to SD card. Performance is measured using the timestamp and the size of the AppImage.
Setup
- Platform: TDA2xx
- CPU: Cortex M4
- D-Cache: Disabled
- I-cache: Enabled
Performance Details
- The speed of the transfer depends on the class of the SD card used.
- With class 10 SD card the read speed is 4.5 MBps and write speed is 1.5 MBps.
PCIe CSL-FL[edit]
This section describes the PCIe CSL-FL performance numbers - throughput. PCIe driver is used for board to board communication using single lane. PCIe Gen1 supports 2.5 Gbps and Gen2 supports 5.0 Gbps.
Setup
- Platform: Both TDA2xx ES2.0 EVM
- Lane: Single
- Data Buffer Transferred: 16 MB
- CPU: Cortex A15
- D-Cache: Disabled
- I-cache: Enabled
- EDMA Params: A-count=0x4000, B-count=0x400, C-count=1
- Polling Method
Performance Details
- Gen1 speed is 184 MBps.
- Gen2 speed is 370 MBps.
Archived[edit]
- PDK TDA Datasheet 1.07.00
- PDK TDA Datasheet 1.08.00
- PDK TDA Datasheet 1.08.01
- PDK TDA Datasheet 1.09.00
- PDK TDA Datasheet 1.10.00
- PDK TDA Datasheet 1.10.01
- PDK TDA Datasheet 1.10.02