NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.
OMAP-L1x/C674x/AM1x UART Throughput and Optimization Techniques
^ Up to main OMAP-L1x/C674x/AM1x SOC Architecture and Throughput Overview Table of Contents
This article was created to present throughput measurements conducted on the Universal Async Receiver/Transmitter (UART) of OMAP-L1x/C674x/AM1xx devices. Different variables were explored to assess their impact on UART performance.
The information in this article deals mainly with OMAP-L1x8/C674m/AM18xx devices (where m is an even number), however it can also be generally applied to OMAP-L1x7/C674n/AM17xx devices (where n is an odd number) since both family of devices use a similar architecture.
Contents
UART Overview[edit]
The UART peripheral provides an asynchronous communications interface for data transfers. Data is serialized in the transmitter, asynchronously sent to the receiver, and deserialized by the receiver. Handshaking is done through start and stop bits appended to the beginning and end of the data byte, and no clock signal needs to be transmitted.
Data on the device peripheral bus enters the UART peripheral and is stored either in a one-byte holding register or in a configurable FIFO. A transmitter shift register serializes the data and sends it out one byte at a time over the dedicated transmit pin (UART_TXD).
Similarly, external data arrives on the dedicated receive pin (UART_RXD) and is shifted into a receive shift register. The data bits are then either stored in the holding register or the receive FIFO, which can then be read by the device peripheral bus.
UART Clocking[edit]
The UART peripheral in OMAP-L1x/C674x is sourced by SYSCLK2, which is half the input clock. The programmable baud generator takes the input clock and divides it by a divisor in the range between 1 and 65,565 to produce a baud clock (BCLK). The frequency of BCLK is either 16x or 13x the UART baud rate, depending on which over-sampling mode is chosen. The following equations show the relationship between the divisor, clock frequency, baud rate, and over-sampling mode:
Since UART is an asynchronous interface, both transmitter and receiver must be configured to use the same baud rate in order for the data to be sampled correctly at the receiver.
Parameter Definitions
[edit]
The following parameters were fixed or varied when measuring the UART throughput:
- UART Input Clock: Frequency of input clock of UART peripheral, equal to SYSCLK1/2.
- Divisor: Divisor used by the UART baud generator to produce the desired baud rate, set by DLH and DLL registers.
- Sample Mode: Number of times each bit was sampled, either 13x or 16x, set by the MDR register.
- RX FIFO Trigger Level: Number of bytes in the RX FIFO before EDMA interrupt was generated.
- Transfer Size: Size of buffer being transmitted.
- Memory Source/Destination: Memory location where data was stored and read from. For all tests, the destination memory was the same as the source memory.
- Character Length: Number of data bits sent per UART transfer.
- Buffer Size: Total size of the transmit and receive buffers.
- Loopback Mode: Controls whether UART_RX and UART_TX pins were internally connected ("internal") or connected outside the chip ("external").
- Parity Bit: Whether or not parity bit was transmitted for detecting transmission errors.
- Stop Bit(s): Number of STOP bit(s) transmitted during each transaction.
- Autoflow Control: Whether or not autoflow control was enabled to prevent overrun errors.
- Background EDMA Queue Number: Whether or not the background EDMA task used the same EDMA channel controller queue as the UART transactions.
- EDMA Burst Size: Number of bytes transferred for each EDMA sync event received.
- Background EDMA Read Rate: Number of cycles the EDMA waited before issuing subsequent reads to the target memory for the background task.
- Background DDR2 PBBRP: DDR2 controller peripheral bus burst priority register, which sets the number of transfers before the controller elevates the priority of the oldest command in the command FIFO. A higher number means the background task has more frequent access to the DDR2 memory.
Test Environment
[edit]
Standalone Mode[edit]
In this mode, no other background tasks were running when UART throughput data was taken. This provides data on what the theoretical maximum throughput would be in a standalone UART system and how the various parameters affect throughput. The following test environment applies:
- UART Input Clock:: 150 MHz
- DDR2 Clock Frequency: 150 MHz
- RX FIFO Trigger Level: Always equal to Transfer Size
- Loopback Mode: Internal
- Parity Bit: Disabled
- Stop Bit(s): 1
- Autoflow Control Disabled
Background Task Mode[edit]
In this mode, an EDMA background task continually wrote to and read from the target memory using EDMA transfers while the UART throughput data was taken. This provides a more realistic situation where the UART is competing for resources with higher priority tasks. The following test environment applies:
- UART Input Clock:: 150 MHz
- DDR2 Clock Frequency: 150 MHz
- RX FIFO Trigger Level: Always equal to Transfer Size
- Loopback Mode: Internal
- Parity Bit: Disabled
- Stop Bit(s): 1
- Autoflow Control Disabled
- Sample Mode:: 13x
- Character Length:: 8 bits
Factors Affecting UART Throughput
[edit]
Standalone Mode[edit]
Baud Rate[edit]
The primary factor affecting UART throughput is the baud rate. Table 1 shows the throughput achieved at various baud rates, set by the divisor and over-sampling mode. At 150 MHz, the maximum achievable throughput for the peripheral is 8.6 Mb/s. Figure 1 shows that the throughput scales linearly with baud rate.
Additionally, Figure 2 shows that regardless of the baud rate, the throughput approaches 80% of the baud rate. Since there is one start bit, one stop bit, and 8 data bits, the utilization matches what would be expected (8 data bits per 10 total bits transferred).
Character Length[edit]
The character length parameter sets how many bits make up each UART character, configurable from 5 to 8 bits. As character length decreases, start and stop bits are sent more often, increasing the overhead of each UART transaction. Figure 3 shows this relationship and the effect it has on throughput at different baud rates.
Memory Source/Destination[edit]
For this throughput analysis, data was transmitted from one memory location to another via internal UART loopback. The three memory types were L2 RAM, L3 RAM, and DDR2 RAM. As Figure 4 shows, the source/destination has no affect on the UART throughput, since the UART peripheral itself is the bottleneck.
Receive FIFO Trigger Level[edit]
When the RX FIFO reaches a predefined level, an EDMA interrupt is generated to read the data. The more often this occurs, the more overhead is added. Figure 5 shows that despite the increased overhead, the FIFO trigger level has no effect on the overall throughput across different baud rates.
Note that at the highest baud rates (Divider=1, Oversampling = 13x, 16x), there is no data. There is an issue that occurs when the baud clock and the module clock are the same frequency, causing two EDMA sync events to occur initially. The RX FIFO must be at least double the trigger level to hold the extra data. Therefore, the maximum FIFO trigger level for the 16-byte FIFO is 8 bytes when using a divider of 1.
Transfer Size[edit]
The size of the buffer being transmitted has an effect on the throughput as well. The initial overhead time for starting the UART and EDMA transfers is constant regardless of the baud rate. At lower baud rates, even transmitting 32 bytes takes longer than the initialization time, so the transfer size has no effect on throughput. As the baud rate increases, however, the transfer time shrinks, so the initial setup time has a large effect on throughput. Figure 6 shows how for small transfers, the high baud rates do not increase throughput.
Background Task Mode[edit]
Background DDR2 PBBRP[edit]
In this case, the UART peripheral and the background task were both reading and writing to the DDR2 memory using separate EDMA queues. The PBBRP setting affects the UART throughput by preventing the background task from constantly accessing the DDR2 memory, allowing the UART more frequent access.
Figure 7 shows that by setting the PBBRP to a low value, the UART throughput can still achieve 8.5 Mb/s with a background task accessing the same memory. If the PBBRP is increased, the UART throughput decreases significantly, since the background task can now access the memory more often.
Background EDMA Read Rate[edit]
The EDMA read rate parameter determines how many cycles the EDMA waits between each background task read. For example, at a baud rate of 3,846,153, Figure 8 shows that when the background task is on a different EDMA queue than the UART task, throughput is not affected. However, when they are on the same EDMA queue, UART throughput decreases as the number of wait cycles decreases, since the background task is executing more often.
EDMA Burst Size[edit]
The EDMA burst size sets how many bytes are transferred after each sync event from the UART peripheral or the background task. Figure 9 shows that at a baud rate of 2,884,615, the UART throughput decreases as the burst size increases, when both processes are on the same EDMA queue. This is due to the fact that with a 128 byte EDMA transfer controller FIFO, the background task takes longer to complete as the burst size increases, which reduces UART throughput.
UART Summary Slides[edit]
A summary of the information found on this page, as well as the graphs and tables, are provided in presentation format File:Omapl1x c674x uartThroughput-v01.zip.