NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.
TI811X-HDVPSS-01.00.01.44 Feature Performance Guide
Feature performance guide for HDVPSS release 01.00.01.44
Contents
TI811x HD-VPSS Drivers[edit]
This section provides brief overview of the device drivers supported in HDVPSS release. Drivers are mainly classified into three categories:
- Display Drivers
- Memory-to-Memory(M2M) Drivers.
HDVPSS Driver Features[edit]
- Most of the drivers runs on VPSS-M3 core with BIOS operating system and FVID2 interface.
- Display (V4L2) and fbdev drivers are supported on Cortex-A8 core with Linux as operating system using proxy server
- Ships with sample applications and documentation.
VPDMA List Usage[edit]
VPDMA had 8 lists which are shared across all drivers:
Driver | DMA usage |
---|---|
Display | One List for each TV output used |
M2M | Depends on the path used (1-6 lists) |
Setup Details[edit]
Details | TI811X | |
---|---|---|
SoC Details | Core | VPSS-M3 |
Operating speed of Core | 200 MHz | |
Operating speed of HD-VPSS | 200 Mpixels/sec | |
EVM Configuration | Ducati, HDVPSS, EMIF, DDR2 | |
Optimization Details | Is the Ducati cache enabled? | Yes |
Profile | whole program debug | |
Is the code and data placed in L2/L3 memory? | No | |
Is the L3 interconnect optimized? | No |
Video Display Drivers[edit]
This section describes the display drivers' performance numbers - throughput and CPU load.
Introduction[edit]
Display drivers takes the video buffers from the application and display the videos on VENCs at specified frame rate and resolution. Display drivers follows the FVID2 interface.
Bypass Path 0/1 and Secondary 1 Path Display Driver[edit]
Bypass path display driver controls the two bypass paths in the hardware. It configures only up to the muxes. The rest of the hardware below the mux/switch like CIG, COMP, VENC etc is controlled by display controller driver.
Setup Details
- TI811x EVM
- TV
- DVD Player
Output Display (Resolution) |
TI811x From VPSS-M3 | |
---|---|---|
Frame Rate (in Frames/sec) |
CPU Load (in %) | |
Off-Chip HDMI - DVO1 (With Hardware Mosaic) | 60 FPS for 1080I60, 1080P60 and 50FPS for 1080P50 | 2% |
Off-Chip HDMI - DVO1 (With Hardware Mosaic) | 50FPS for 1080P50 and 30 fps for 1080P30 | 1% |
DVO2 (With Hardware Mosaic) | NRY | NRY |
Graphics Path 0/1/2 Driver[edit]
Graphics path display driver controls the three graphics paths in the hardware to display graphics planes including multi-regions support. The rest of the hardware below like COMP, VENC etc is controlled by display controller driver.
Output Display (Resolution) |
TI811x VPSS-M3 | TI811x Cortex-A8 | |||
---|---|---|---|---|---|
Frame Rate (in Frames/sec) |
CPU Load (in %) |
Frame Rate (in Frames/sec) |
CPU Load M3 (in %) |
CPU Load A8 (in %) | |
DVO1 | 60 FPS for 1080P60, 1080I60, 720P60 and 50FPS for 1080P50, 1080I50, 720P50 | 2% | NRY | NRY | NRY |
DVO1 | 30 FPS for 1080P30 | 1% | NRY | NRY | NRY |
DVO2 | NRY | NRY | NRY | NRY | NRY |
Video Capture Driver[edit]
This section describes the video capture driver performance numbers - throughput and CPU load.
Introduction[edit]
VIP capture driver makes use of VIP hardware block in HDVPSS to capture data from external video source like video decoders (example, TVP5158, TVP7002). The video data is captured from the external video source by the VIP Parser sub-block in the VIP block. The VIP Parser then sends the captured data for further processing in the VIP block which can include color space conversion, scaling, chroma down sampling and finally writes the video data to external DDR memory.
Setup Details
- TI811x EVM
- TV
- DVD Player
Output Display (Resolution) |
TI811x M3 Core | |
---|---|---|
Frame Rate (in Frames/sec) |
CPU Load (in %) | |
NTSC single in | 60 | 3% |
Memory to Memory Drivers[edit]
This section describes the memory-to-memory drivers' performance numbers - throughput and CPU load.
Introduction[edit]
M2M drivers takes the video buffer from the memory, optionally process the buffer, (processing done on the buffer depends on the specific M2M driver) and puts it back to memory. M2M driver follows the FVID2 interface for the applications.
Secondary 0 Or Bypass path 0/1 to SC5 and Sec 0/1 to SC3/SC4 M2M driver[edit]
This driver takes video data from one of the three paths(SEC0/BP0/BP1), does scaling(SC5) and writes output video to memory. Other variants take data from secondary path 0/1(SEC0/SEC1) and scales via VIP path scalars (SC3/SC4) and writes output videoto memory.
Setup Details:
- Calculate time required for single scaling operation and for CPU load, issue scaling operation in contiguous loop with queuing buffer for each resize.
Scaling Factor (Resolution) |
TI811x VPSS-M3 | |
---|---|---|
Frames per Sec |
CPU Load (in %) | |
SEC0-SC5 Single Ch (720480YUV420 => 720X480YUV422 interleaved ) | NRY | NRY |
SEC0-SC5 Single Ch 4x (720x480 YUV420 => 1920x1080 YUV422 interleaved) | 94 | 2% |
SEC0-SC5 1/4x (1920x1080YUV420 => 720x480 YUV422 interleaved) | 94 | 1% |
BP0-SC5 Single Ch 4x (720x480 YUV420 => 1920x1080 YUV422 interleaved) | NRY | NRY |
BP1-SC5 Single Ch 4x (720x480 YUV420 => 1920x1080 YUV422 interleaved) | NRY | NRY |
SEC0-SC3-VIP0 Single Ch 4x (720x480 YUV420 => 1920x1080 YUV420) | NRY | NRY |
SEC1-SC4-VIP1 Single Ch 4x (720x480 YUV420 => 1920x1080 YUV420) | 94 | 2% |
MultiCh - 3Ch (720x480 => 720xXXX) | NRY | NRY |
8D1@60fps | NRY | NRY |
SubFrame level processing in Secondary 0 Or Bypass path 0/1 to SC5 M2M driver[edit]
This driver takes video data from one of the three paths(SEC0/BP0/BP1), does scaling(SC5) subfframe by sub-frame and writes output video to memory.Frame is divided into multiple subframes and processed.
Setup Details:
- Calculate time required for single scaling operation and for CPU load, issue scaling operation in contiguous loop with queuing buffer for each resize.
Scaling Factor (Resolution) |
TI811x VPSS-M3 | ||
---|---|---|---|
Frames per Sec |
CPU Load (in %) | ||
SEC0-SC5 Single Ch 4x (720x480 YUV420 => 1920x1080 YUV422 interleaved) | 83 | 14% | |
BP0-SC5 Single Ch 4x (720x480 YUV420 => 1920x1080 YUV422 interleaved) | NRY | NRY | |
BP1-SC5 Single Ch 4x (720x480 YUV420 => 1920x1080 YUV422 interleaved) | NRY | NRY |
DEI M2M Driver[edit]
This driver takes YUYV422/YUV420 interlaced/progressive input via the DEI path and provide one/two scaled version of the deinterlaced/bypassed outputs - one via writeback path 0/1 and another via VIP 0/1.
Setup Details
- CPU Idle - Disabled
- Tool Used for measurement - LFTB
- Calculate time required for single resize operation and for CPU load, issue resize operation in contiguous loop with queuing buffer for each resize.
Scaling Factor (Resolution) |
TI811x VPSS-M3 | |
---|---|---|
Frames per Sec |
CPU Load (in %) | |
DEI-WB1 - Single Ch 720x240 YUV420 => scaled to 360x240 YUYV422 via WB1 | NRY | NRY |
DEI-WB1-VIP1 - Single Ch 720x240 YUV420 => dual scaled to 360x240 YUYV422 via WB1 and 720x480 YUV420 via VIP1 | NRY | NRY |
Single o/p writeback path 4x (720x480 => 1920x1080) | NRY | NRY |
Single o/p VIP path 4x (720x480 => 1920x1080) | NRY | NRY |
Single o/p writeback path 1/4x (1920x1080 => 720x480) | NRY | NRY |
Single o/p VIP path 1/4x (1920x1080 => 720x480) | NRY | NRY |
Noise Filter (NSF) M2M Driver[edit]
Noise filter driver allows user to filter noise from video data by processing them through the noise filter hardware. This driver can also be used for only YUV422 to YUV420 chroma downsampling.
Mode |
TI811x VPSS-M3 | |
---|---|---|
Frames per Sec |
CPU Load (in %) | |
Single Ch Chroma downsampling (640X480YUV422 interleaved => 640X480 YUV420 Semiplanar) | 502 | 5% |
NF spatial (1080P input) | NRY | NRY |
NF temporal (1080P input) | NRY | NRY |
MultiCh NF spatial (480P input) | NRY | NRY |
MultiCh NF temporal (480P input) | NRY | NRY |
16 Ch Chroma downsampling (720X240YUV422 interleaved => 720X240 YUV420 Semiplanar) | NRY | NRY |
Calculating Performance for different Memory to memory paths
[edit]
The description below is based on actual performance seen with SW drivers on actual Si.
Performance of Scalar (SC) Path
[edit]
This is applicable for all SC's in TI811x.
Here DEI, whereever applicable, is assumed to be in bypass mode.
When DEI is not in bypass mode the performance description is given in subsequent section.
Each SC operates at 200Mhz clock.
In theory it can process 1 pixel per clock, i.e, about 200 mega pixel per second. (MP/s).
But due to inherent overheads due to overlapping needed for various filtering operations, the practical standalone (i.e only SC running in system) speed would be about 180-190 MP/s (mega pixels/sec)
When SC is run with other modules like other driver, or codecs the performance may drop further due to DDR BW.
SW overheads will also reduce SC performance, but with TI HDVPSS driver we see very little impact of SW overheads. With SW overheads DEI can safely do about 130MP/s processing.
Number of pixel processed when doing SC for a 1 D1 CH of 720x480 @ 30frames per second, is 720x480x30(frames per second) = 10.3MP/s
Here Output from SC is <= 720x480
Thus SC can safely do about 12CHs of D1 when its output size is <= 720x480, i.e only downscaling is done in the scaler.
In practice with HDVPSS only applications we found that measured SC performance is about 13 D1 CHs (about 140MP/s)
With other activity like codec, performance should drop but each SC will safely give 12CH D1 performance (130MP/s)
When scalar upsampling is used the results would be bit different.
For use-case of scaling 720x480 to 1920x1080 output size, the performance for 1CH would be,
1920x1080(since 1920x1080 > 720x480) x30(frames per second) = 62.2MP/s
In TI811x, assuming SC performance is 130MP/s, thats about 2 CHs
Performance of DEI[edit]
Each DEI operates at 200Mhz clock in TI811x.
In theory it can process 1 pixel per clock, i.e, about 200 mega pixel per second. (MP/s)
But due to inherent overheads due to overlapping needed for various filtering operations, the practical standalone (only DEI running in system) speed would be about 150-160 MP/s (mega pixels/sec)
When DEI is run with other modules like other driver, or codecs the performance may drop further due to DDR BW.
SW overheads will also reduce DEI performance, but with TI HDVPSS drivers we see very little impact of SW overheads. With SW overheads DEI can safely do about 130MP/s processing.
Number of pixel processed when doing DEI for a 1 D1 CH of 720x240 @ 60fields per second, is
720x240x2(since DEI results in 1 line becoming two lines)x60(fields per second) = 20.7MP/s
Here Output from DEI is <= 720x480
Thus DEI can safely do, about 6CHs of D1 in TI811x
when its output size is <= 720x480, i.e only downscaling is done in the scaler after DEI.
In practice with HDVPSS only applications we found that measured DEI performance is about 6-7 D1 CHs (about 140MP/s).
With other activity like codec, performance should drop but each DEI will safely give 6CH D1 performance.
Above is when scalar downsampling is used after DEI.
When scalar upsampling is used the results would be bit different.
For use-case of 704x480 output size, the performance for 1CH would be,
704x480(since 704x480 > 720x240) x60(fields per second) = 20.3MP/s
Assuming DEI performance is 130MP/s, thats about 6 CHs
Performance of Noise Filter (NF)[edit]
NF operates at 200Mhz clock .
In theory it can process 1 pixel per clock, i.e about 200 mega pixel per second. (MP/s).
But due to inherent overheads due to overlapping needed for various filtering operations, the practical standalone (only NF running in system) speed would be about 130-140 MP/s (mega pixels/sec).
When NF is run with other modules like other driver, or codecs the performance may drop further due to DDR BW.
SW overheads will also reduce NF performance, but with our driver we see very little impact of SW overheads. With SW overheads DEI can safely do about 130MP/s processing.
Number of pixel processed when doing NF for a 1 D1 CH of 720x240 @ 60fields per second, is
720x240x60(fields per second) = 10.3MP/s
Thus NF can safely do about 12CHs of D1 in TI811x.
In practice with HDVPSS only applications we found that measured NF performance is also about 12 D1 CHs (about 130 MP/s).
With other activity like codec performance should drop but each NF will safely give 12 CH D1 performance (130MP/s).
Overall System Performance[edit]
HDVPSS BIOS package is having Links and Chains example. It shows the typical use cases exercising many different HDVPSS drivers. Below table shows the performance numbers for the different combination of the HDVPSS drivers. Details of each of the different combination can be found in the Links and Chains UserGuide
Mode |
TI811x VPSS-M3 |
---|---|
CPU Load (in %) | |
Single CH Capture + Scale + Display [Option 1] | 4% |
Multi CH Capture + Scale + Display [Option 2] | 6% |
Multi CH Capture + NSF + Scale + Display[Option 3] | 7% |
Multi CH Capture + DEI + Scale + Display[Option 4] | 32% |
Single CH Capture + NSF + DEI + Display (Full screen DEI) [Option 6] | 6% |