NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.

Keystone SoC Level Optimizations

From Texas Instruments Wiki
Jump to: navigation, search

KeyStone SoC Level Optimizations[edit]


SoC level optimization of KeyStone devices requires the consideration of the entire SoC. The KeyStone SoC is composed of many components that will vie for the same resources. In order to optimize the performance for a system, one needs to have a complete picture of the system, and then in turn look at how the various tuning knobs in those area's may be used to get the best performance.

To have the best understanding, this will go over the Architecture, Constraints and Tuning.

This is intended to provide the big picture of the SoC and not the fine details, which will be found in the various user guides for the relevant parts of the SoC.

KeyStone SoC Architectural Overview[edit]


System Interconnect Overview[edit]


The C66x DSP Cores, A15 ARM Cores, System DMA, QMSS, MSMC and device specific peripherals are interconnected within the Keystone devices by the a low latency switch fabric called the TeraNet. This switch fabric is described in detail in the subsequent articles.

The major components in the Keystone devices are:

  • Masters (C66x DSP CorePac, A15 ARM CorePac, EDMA3, QMSS, MSMC and select peripherals)
  • Slaves (most peripherals, various memory controllers)
  • TeraNets (switch fabric)
  • Bridges and Buses


In each of the below diagrams the Masters (Transaction Initiators) are shown on the left hand side of the diagram, and the Slaves (Transaction Responders) are shown on the right hand side of the diagram. The switch fabric is shown in between the Masters and the Slaves as TeraNets and Bridges (BRs).

The arrows indicate the master/slave relationship. Each arrow originates at a master and terminates at a slave (or group of slaves). The direction of the arrow does not indicate the direction of data flow. Data flow is typically bi-directional for each of the documented bus paths but there are minor exceptions. One example of an exception are the read/write ports of the EDMA3 Transfer Controller. The read port reads data from source memory/peripherals, and the write port writes data to destination memory/peripheral.

Some modules are shown on both sides of TeraNets or have multiple instances in the block diagram. This is because select peripherals have both a Master DMA Port as well as one or more slave port(s). This dual port functionality allows the peripheral to act as both a Master and and Slave. Examples include PCIe, SRIO, etc.

On chip/Off chip memory controllers are considered slave end points (targets) and are always shown on the right side of main TeraNets. The memory controllers can be classified as one of the following

  • Stand alone memory controllers (such as Shared RAM or EMIF controllers)
  • Part of a module/peripheral (for example, system masters can access the C66x CorePac L1/L2 memory via the C66x CorePac Slave DMA Port (SDMA)

Keystone System Interconnect Diagrames[edit]

Keystone C6671/72/74/78 System Interconnect Diagrams[edit]

TMS320C667x TeraNet 2 A

Keystone C6671/72/74/78 System Interconnect Diagram


TMS320C667x TeraNet 3 A

Keystone C6671/72/74/78 System Interconnect Diagram

TMS320C667x TeraNet 3P A & 3P B

Keystone C6671/72/74/78 System Interconnect Diagram

TMS320C667x TeraNet 6P B & 3P Tracer

Keystone C6671/72/74/78 System Interconnect Diagram
Keystone C6670 System Interconnect Diagrams[edit]

TMS320C6670 TeraNet 2 A

Keystone C6670 System Interconnect Diagram


TMS320C6670 TeraNet 3 A

Keystone C6670 System Interconnect Diagram


TMS320C6670 TeraNet 3P 3M 2M

Keystone C6670 System Interconnect Diagram


TMS320C6670 TeraNet 3P A

Keystone C6670 System Interconnect Diagram


TMS320C6670 TeraNet 3P B

Keystone C6670 System Interconnect Diagram


Keystone C665x System Interconnect Diagrams[edit]

TMS320C6657 TeraNet 3 A

Keystone C6657 System Interconnect Diagram


TMS320C6657 TeraNet 3P A

Keystone C6657 System Interconnect Diagram



TMS320C6657 TeraNet 3P B

Keystone C6657 System Interconnect Diagram


TMS320C6657 TeraNet 3P Tracer

Keystone C6657 System Interconnect Diagram


TMS320C6657 TeraNet 6P B

Keystone C6657 System Interconnect Diagram


Keystone 66AK2H06/12 System Interconnect Diagrams[edit]

66AK2H06/12 TeraNet 3P A

Keystone 66AK2H06/12 System Interconnect Diagram


66AK2H06/12 TeraNet 3P B

Keystone 66AK2H06/12 System Interconnect Diagram


66AK2H06/12 TeraNet 3P P Tracer

Keystone 66AK2H06/12 System Interconnect Diagram


66AK2H06/12 TeraNet 6P B

Keystone 66AK2H06/12 System Interconnect Diagram


Masters[edit]

Masters (Transaction Initiators) in the Keystone device are IP modules which are capable of initiating their own read/write transfer requests. This includes both CPUs (A15 CorePacs / C66x CorePacs), System DMAs (EDMA3), QMSS and select peripherals that utilize their own Master DMA Ports to initiate their own read/write transfers.


The following Table lists the Masters for the respective Keystone devices

List of Masters on Keystone[1]
Masters Description TMS320C6671/2/4/8 TMS320C6670 TMS320C665x 66AK2H06/12 66AK2E02/05 AMK2E04/05

A15-I

A15 Instruction:Instruction/Program accesses. I-Cache accesses


X X X

A15-D

A15 Data Port: Load/Store instructions, D-cache and peripheral register accessess


X X X
C66x DSP Megamodule Master DMA Port (XMC/MDMA)

C66x long distances memory access port (i.e. load/stores to external memory and shared RAM outside the DSP Mega Module Memory)and cache controller accesses to external memory and shared RAM.

X
X
X
X
X

C66x DSP Megamodule Configuration Port (CFG) Provides access to the peripheral configuration bus. (i.e. accesses to UART, PLLC, other Memory Mapped Peripherals) X
X
X
X
X

NetCP Network Coprocessor DMA engine accesses to internal/external device memory X
X
X
X
X
X
EDMA3 EDMA3 Transfer Controllers: Read and Write access to peripherals, internal memory, external memories, via Queue submissions X X X X X X
PCIe Peripheral Component Interconnect Express: To/From internal, external memories and peripherals. X X X X X X
SRIO_M Serial RapidIO Master for Direct IO access X X X X
SRIO_PacketDMA Serial RapidIO Packet DMA access for Type 11 and 9 packets X X X X
QMSS_PacketDMA Queue Manager Subsystem Packet DMA X X X X X X
QMSS_Second Queue Manager Sybsystem Secondary X X X X X X
TSIP Telecom Serial Interface Port X


Debug_SS Debug Subsystem X X X X X X
FFTC_A Fast Fourier Transform Coprocessor
X

FFTC_B Fast Fourier Transform Coprocessor
X

FFTC_PacketDMA Fast Fourier Transform Coprocessor Packet DMA
X

RAC_A_BE0/1 Receive Accelerator Coprocessor
X

RAC_B_BE0/1 Receive Accelerator Coprocessor
X

TAC_FE Transmit Accelerator Coprocessor
X

AIF Antenna Interface
X

BCP_PacketDMA Bitrate Coprocessor Packet DMA
X

BCP_DIO_0/1 Bitrate Coprocessor Direct IO
X








[1] The device data manual is the primary document and always takes precedence over the content of this wiki.

Slaves[edit]

Slaves (Transaction Request Recipients) in the device are IP modules that accept / service the transfer requests from the masters. Modules that fall under this category are peripherals (such as SPI, McBSP, UART, I2C, etc) that rely on CPU or EDMA3 to initiate transactions on their behalf. On-chip memory (DSP L1/L2, ARM RAM, Shared RAM) and off-chip memory (NOR/NAND flash, etc) are also considered to be slave modules.


List of Slaves in Keystone Devices[2]
Slaves Description TMS320C6671/2/4/8 TMS320C6670 TMS320C665x 66AK2H06/12 66AK2E02/05 AM5K2E02/04

C66x DSP Megamodule Slave DMA Port (SDMA)

DSP SDMA port allows accessing only the DSP memories. DSPSS registers (e.g. program counter, IDMA, Cache controller, DSP Interrupt Controller, etc) are only accessible by DSP (other masters like ARM and HPI cannot access these registers) X
X
X
X
X

HyperLink

HyperLink Slave Port X
X
X
X
X
X

PCIe

Peripheral Component Interconnect Express Slave Port X
X
X
X
X
X

SRIO_S

Serial RapidIO Slave port
X
X
X
X


EMIFA

SDRAM, NOR, NAND flash. Off-chip/External memory slave port
X

X
X
X

RAC_A_FE

Receive Accelerator Coprocessor Front End Slave Port
X




RAC_B_FE

Receive Accelerator Coprocessor Front End Slave Port
X



TAC_BE

Transmit Accelerator Coprocessor Back End Slave Port
X



TCP3d_A

Turbo Coprocessor 3 Decoder Slave Port
X



TCP3d_B Turbo Coprocessor 3 Decoder Slave Port
X



TCP3e_r Turbo Coprocessor 3 Encoder Read Slave Port
X



TCP3e_w Turbo Coprocessor 3 Encoder Write Slave Port
X



VCP2[3:0] Viterbi Coprocessor 3 Slave Port
X



SPI Serial Port Interface Slave Port X X X X X
X
BootROM BootROM X X X X X
X
TCP3d_DMA Turbo Coprocessor 3 Decoder DMA Slave Port
X



TCP3d_CFG Turbo Coprocessor 3 Decoder Configuration Port
X



FFTC_CFG Fast Fourier Transform Coprocessor Configuration Port
X



BCP_CFG Bitrate Coprocessor Configuration Port
X



RAC_A_CFG Receive Accelerator Configuration Port
X



RAC_B_CFG Receive Accelerator Configuration Port
X



UPP Universal Parrallel Port

X


GPIO General Purpose IO Port X X X X X
X
UART Universal Asynchronous Receive/Transmit port X X X X X X
I2C Inter-Intergrated Circuit Port X X X X X X

















































[2] The device data manual is the primary document and always takes precedence over the content of this wiki.


NOTE: Not all masters can access all slaves. For additional details on restrictions, if any, on a particular master's accessibility to a given slave memory/peripheral on the device, please refer to the section on Master/Slave Connectivity.


TeraNet
[edit]

TeraNets provides low-latency interconnectivity between the masters and slaves. TeraNets(also called Switch Fabric or crossbars) direct the access requests by providing address decoding, arbitration, and routing of the requests to the various slaves.


For uniquely exclusive master/slave pairs, concurrent transactions can be sent in parallel through and TeraNets. For example, if a request from the C66x Megamodule to configure the I2C and another request from the ARM to configure the UART arrive at a particular crossbar concurrently, both of the requests will pass through the crossbar at the same time.

The interconnect of a device typically consists of one or two main crossbars and a number of smaller, satellite crossbars . The main crossbars handle the majority of the traffic and are typically clocked at a higher frequency than the satellite crossbars.

The figure below illustrates one such data flow example, where the highlighted paths are completely independent of each other and will allow true concurrency and parallelism.


TeraNet concurrent data movement 


6678 TeraNet Composite Markup.png





Arbitration[edit]

Arbitration for the Switch Central Resources is a two layer arbitration scheme which occurs on the burst size boundary. Arbitration occurs when two competing resources attempt concurrent access to the same slave.


Layer 1: Master Priority Level Arbitration Each master in a device is assigned to a system priority level. Following a Power On Reset (POR), each masters priority level is reset to the its default as specified in the System Configuration (SYSCFG) Section of the device System Reference Guide.

All SCRs on the device perform arbitration based on the priority level of the master that sends the read/write access request. System programmers are expected to modify the default priority values in order to fine tune the system for their application requirements.

Level 0: Highest Priority Level
Level 1: 2nd Highest Priority Level
Level 2: 3rd Highest Priority Level
Level 3: 4th Highest Priority Level
Level 4: 5th Highest Priority Level
Level 5: 6th Highest Priority Level
Level 6: 7th Highest Priority Level
Level 7: Lowest Priority Level

If two concurrent requests to the same slave concurrently arrive from masters with different priority levels, the master with the highest priority level wins the arbitration.


Layer 2: Inter-Priority Level Aribitration

Inter-Priority Level Arbitration is implemented as round robin style arbitration. If two concurrent requests to the same slave arrive from masters with the same priority level, round robin arbitration decides which master is allowed to continue with the request, and which master must wait until the SCR is free before continuing with the request.


NOTE:Arbitration/Re-Arbitration occurs at burst size boundaries (or lower if a burst request is not used)



Bandwidth Management Arbitration Registers[edit]

A set of registers called arbitration registers implement the bandwidth management architecture. The registers are implemented in the following memory controller blocks: L1D, L2, and extended memory controller (EMC).

  • CPUARBD, CPUARBU, and CPUARBE - DSP CorePac Arbitration Control Register
  • IMDAARBD, IMDAARBU, and IMDAARBE - IDMA Arbitration Control Register
  • SDMAARBD, SDMAARBU, and SDMAARE - Slave DMA Arbitration Control Register
  • MDMAARBU - Master DMA Arbitration Control Register
  • UCARBD, and UCRABU - User Coherence Arbitration Control Register
  • ECFGARBE - CFG Arbitration Control Register


The last letter indicates what memory controller interface it's for D for L1D, U for L2, and E for EMC

Bandwidth Management Arbitration Registers Table[edit]
Acronym
Register Name
Reg Bit Default Value Reg Exists In ...
PRI MAXWAIT L1P L1D L2 EMC
CPUARB DSP CorePac Arbitration Control Register 1 16 No Yes Yes Yes
IMDAARB IDMA Arbitration Control Register NA 16 No Yes Yes Yes
SDMAARB Slave DMA Arbitration Control Register NA 1 No Yes Yes Yes
MDMAARB Master DMA Arbitration Control Register NA 32 No No Yes No
UCARB User Coherence Arbitration Control Register 7 NA No Yes Yes No
ECFGARB CFG Arbitration Control Register 7 NA No No No Yes


PRI is the Priority used for arbitration at the boundary.

Often you will have a case in which two or more items are trying to access the same memories at the same time. And transfers of lower priority can be blocked by higher priority transfers, even if you have small packets of the lower priority, and in some cases you can also have head of the line blocking from items of the same priority. In order to prevent this from totally blocking these lower priority or head of the line blocking items for too long of a period, a MAXWAIT has been added. In which case the blocking will only be for a MAXWAIT number of cycles. So of MAXWAIT is set to 16, it will only be blocked for 16 cycles before it will allow the transfer to transmit data (for one cycle.) This way it can get the data out for lower priority items, without significant impact to high priority channels.

Default Burst Size[edit]

A transaction from a master is typically broken down into smaller bursts at the system interconnect level. This is done to increase the collective efficiency of transfers through the interconnect.

Since large burst sizes of higher priority threads will always win arbitration and subsequently block other low priority transactions until they are finished, It is assumed that the system programmer will fine tune the burst size of all the masters in the system. This allows the lower priority requests a chance to win arbitration between contiguous bursts of higher priority requests.

The default burst size for a master is shown in the table below.

Default Burst Size for Master Peripherals on Keystone
Master
Default Burst Size
Fixed/Configurable
MSMC

64 bytes for cache lines

Data size for direct non-cached access

Case Dependent


CorePac_n CFG

Load/Store word/double word: 4/8 bytes

Case dependent
EDMA3_x_TCx
64/128 bytes
Fixed - See Data Manual for TCx_x DBS values.
HyperLink 256B Write / 1024B Read  Burst Sizes may be smaller - Write burst size is automatically generated by HyperLink Master






Bridges[edit]

Within the SoC/DSP, individual modules/peripherals/memories (or group of peripherals) may run at different clock rates and may have different bus width interfaces (typically 256, 128, 64 and 32 bit buses). Logic is needed to synchronize communication between two modules that operate at different clock rates and/or bus widths. Bridges provide the means of resolving these differences by performing bus-width conversion as well as bus operating clock frequency conversion.

Bridges are also responsible for buffering read and write commands and data. Buffering is implemented with First In First Out (FIFO) style buffering. One implication of this is that any high priority request that is passed into a bridge must wait until the bridge FIFO finishes its previous transactions before it is allowed to continue.

Bridge.PNG

There are two types of Bridges:

  • Synchronous Bridge: A bridge for which clock rates X and Y are either equal to, or are an integer multiple (i.e. "synchronous") of one another.
  • Asynchronous Bridge: A bridge for which clock rates X and Y are asynchronous to each other. These bridges are typically used when a peripheral or group of peripherals have module clocks that are not an integer multiple of the primary (typically CPU) clocks (ex: 133MHz for EMIF16), or when modules are not constrained to a "fixed" ratio with respect to (w.r.t) the primary clocks.



There are two main types of buses on the Keystone devices:

  • A 256 or 128-bit bus with separate read and write interfaces, allowing multiple read and write transactions to occur simultaneously. This bus is best suited for high-speed/high-bandwidth exchanges, especially data transfers between on-chip and off-chip memories. On the Keystone family of devices, TeraNet_2_A interfaces with several modules using this 256-bit bus at CPU/2. Most of the high bandwidth master peripherals (ex: EDMA3 TC0 and TC1 transfer controllers), MSMC, and Hyperlink are directly connected to this 256-bit bus. Peripherals that do not support the 256-bit bus interface are connected to the TeraNet_2_A via TeraNet_3_A (128bit CPU/3 TeraNet) through bridges (responsible for protocol conversion from a 256-bit to 128-bit bus interface).



Head of Line Blocking[edit]

Bridges implement a command first-in-first-out (FIFO) scheme to queue read/write commands from masters/initiators. All requests are queued on a first-in-first-out basis -- bridges do not reorder the commands. It is possible that a high priority request at the tail of a queue can be blocked by lower priority commands that could be at the head of the queue. This scenario is called bridge head of line blocking. In the figure below, the command FIFO size is 4. The FIFO is completely filled with low priority (7) requests before a higher priority request (0) comes in.


In this case, the high priority request is blocked until all four lower priority (7) requests are serviced. When there are multiple masters vying for the same end point (or end points shared by the same bridge), the bridge head of line blocking is one factor that can affect system throughput and a master's ability to service read/write requests targeting a slave peripheral/memory.


Reads vs Writes[edit]

Read transactions are usually more costly (in clock cycles) than writes. For writes, the command and data flow together and can be thought of as "fire-and-forget" in nature. Once a write transaction leaves the master/initiator boundary (ex: sitting in a bridge or an end point's buffer or FIFO), the initiator can proceed to the next write (even before the previous write reaches its final destination). For reads, a read command pends until a read response/data returns. So in general, the initiator cannot issue a new read/write command until the previous read command's response reaches the master/initiator. Therefore, polling on registers can prove to be very expensive.

NOTE: The above is more prominent for the 32 bit buses. The 64 bit buses are capable, to some extent, of issuing multiple outstanding read and write commands. For details on the buses, please refer to the section on Interconnect Buses in the c66x SoC Architecture Overview article.

On-Chip vs Off-Chip Memory[edit]

On-chip memory accesses experience less latency when compared to off-chip memory accesses. Off-chip memory is susceptible to extra latency contributed by refresh cycles, CAS latency, etc. If possible, frequently used code should be kept in on-chip memory.


Maximum Bandwidths[edit]

Memory bandwidth affects the overall system throughput through the TeraNet.  Note that DDR3 is accessed from the TeraNet via the MSMC interface. This section illustrates how to calculate the maximum theoretical bandwidth for some critical on-/off-chip memories.


Maximum Theoritical Memory Bandwidths (device operating @ 1200 MHz)

 Memory

Theoritical Max

Bandwith

(MBytes/Sec)

Calculations
Notes

c66x L2/L1D

6400MB/s

  400MHz x 128 bit

[c66 SDMA port frequency * c66 SDMA port bus width]

  • CorePac SDMA port frequency is divided by 3
  • Bandwidth for accesses from outside the CorePac (e.g EDMA, PCIe, etc.)
MSMC Shared RAM    19200MB/s

  600MHz x 256

[Shared MSMC RAM port frequency  * Shared RAM port bus width ]

  • MSMC Shared RAM frequence is divided by 2
EMIF16 Asynchornous Memories 200MB/s

  100MHz MHz x 16 bit   [ (EMA_CLK)/(Setup+Strobe+Hold) *16 bit]

  • EMIF Clock is based on SYSCLK7 CorePac Frequence devided by 6 and 2 cycles per period
  • Calculations assume a Setup/Strobe/Hold value of 1 cycle
  • Assumes a 16 bit async interface
DDR3 6400MB/s

  800MHz x 64 bit

  [ DDR_CLK x double data rate x DDR3 memory bus width]

  • Max DDR3 operating frequency may depend on the device, check your data manual
  • Assumes 64 bit DDR3 used (32 and 16 bit DDR3 can be implemented.)


NOTE: Theoretical maximum bandwidth is the maximum possible bandwith, which is calculated purely on the basis of memory clock and bandwith. This calculation will not take into consideration any additional system level ineffeciencies such as additional latency in the interconnect, additional cycles for off-chip access due to memory configuration/characteristics, additional latency incurred by competing traffic, prioritization, and use of shared resources (for example, a single bridge buffering read/write commands in a path to multiple slave end points).

KeyStone SoC Tuning[edit]

Multicore Shared Memory Controller - MSMC[edit]

The Multicore Shared Memory Controller (MSMC) manages traffic among multiple TMS320C66x CorePacs, DMA, other mastering peripherals, and the EMIF in a multicore device. MSMC also provides a shared on-chip SRAM that is accessible by all the C66x CorePacs and the mastering peripherals on the device. For full details on the MSMC, please see the Multicore Shared Memory Controller (MSMC) for KeyStone Devices UG or Shared Memory Controller (MSMC) for KeyStone II Devices UG

MSMC Diagram.PNG

The MSMC has slave interfaces to connect to the C66x CorePacs (one slave interface per CorePac), two slave interfaces to connect to the system interconnect (TeraNet), one master port to connect to the EMIF, and one master port to connect to the system interconnect (TeraNet).

The MSMC has slave interfaces to connect to the MDMA port of C66x CorePacs. C66x CorePacs use this interface to access the MSMC on-chip memory, the external memory, and EMIF memory-mapped registers through the MSMC-EMIF master port or system level resources through the MSMC-system master port.

The MSMC has two slave interfaces to handle accesses from mastering peripherals in the system (apart from the C66x CorePacs connected to MSMC through the C66x CorePac slave ports) for the MSMC SRAM and EMIF.

The SES interface handles accesses to external DDR3 memory and memory-mapped registers inside the EMIF module that originate from a system master that is not a C66x CorePac.

Accesses presented on this interface to any addresses outside of the address range mapped to the external memory, or the EMIF memory mapped registers, result in an addressing error returned to the requesting master. Note—When MSMC SRAM is remapped to an external address space using MPAX, such accesses will not result in an addressing error as long as the accesses are within the valid external memory address range. The address width on this interface is 32 bits. Address extension to a 36-bit external memory address is done inside the MSMC via MPAX described below.

The SMS interface handles accesses to MSMC SRAM that originate from a system master that is not a C66x CorePac. Accesses from masters in the system to MSMC configuration registers are also expected to be presented at this interface. Any accesses from the SMS interface that do not address the MSMC SRAM or configuration registers result in an addressing error returned to the requesting master.

The MSMC features one master interface for the C66x CorePac to access system resources other than the MSMC SRAM, MSMC MMRs, DDR3 memory, and the EMIF MMRs. Traffic from the system slave interfaces does not pass through the master interface.

The external memory interface (EMIF) module is connected to the MSMC through the external memory master interface. The address width for this interface is 36 bits because it supports the extended memory addressing space beyond 4 GB. The MSMC implements an address extension to 36 bits. Address extension to a 36-bit external memory address is done inside the MSMC via MPAX described below.

Memory Protection and Address Extension - MPAX[edit]

MPAX is part of the MSMC controller - It provides Memory Protection for the MSMC RAM and DDR3 as well as and Address Extension for the MSMC RAM and DDR3.

Memory Protection[edit]

Provided Memory protection to the MSMC SRAM and DDR3 Addresses.

The MPAX process is performed for a variable-sized segment of memory and is controlled with a register pair for each segment: MPAXH and MPAXL control registers.

  • The MPAXH specifies the base address and size of the segment to match.
  • The MPAXL specifies the replacement address and permissions for the segment.

Each MPAX unit provides eight control register pairs per Privilege ID (PrivID) of the system masters, which allows eight independent and potentially overlapping variable-size memory segments to be operated upon. See the device-specific data manual for the Privilege ID (PrivID) values assigned for various system masters.

Address Extension[edit]

Provides an Address extension from 32 to 36bits for the MSMC SRAM and DDR3, providing External addressing for up to 64GB of memory space even though native DSP addressing remains 32bits. Some KeyStone devices (see device-specific data sheet) can support only up to 8 GBytes of external memory space.

Queue Manager Subsystem - QMSS[edit]

(TBP)

DDR3 Arbitration and Class of Service[edit]


DDR3 Arbitration[edit]

The DDR3 memory controller performs command reordering and scheduling to provide efficient transfers with maximum throughput.

The DDR3 memory controller attempts to maximize the utilization of the data, address, and command buses while hiding the overhead of opening and closing DDR3 SDRAM rows. Command reordering takes place within the command FIFO.

The DDR3 memory controller examines all the commands stored in the command FIFO to schedule commands to the external memory. For each master, the DDR3 memory controller reorders the commands based on the following rules:

  • The DDR3 controller will advance a read command before an older write command from the same master if the read is to a different block address (2048 bytes) and the read priority is equal to or greater than the write priority.
  • The DDR3 controller will block a read command, regardless of the master or priority if that read command is to the same block address (2048 bytes) as an older write command.

Thus, one pending read or write for a master might exist.

  • Among all pending reads, the DDR3 controller selects all reads that have their corresponding SDRAM banks already open.
  • Among all pending writes, the DDR3 controller selects all writes that have their corresponding SDRAM banks already open.

As a result of the reordering, several pending reads and writes may exist that have their corresponding banks open. The highest priority read is selected from pending reads, and the highest priority write from pending writes. If two or more commands have the highest priority, the oldest command is selected. As a result, there might exist a final read and a final write command. Either the read or the write command will be selected depending programmation of the Read Write Execution Threshold Register

DDR3 Class of Service[edit]

The commands in the Command FIFO can be mapped to two classes of service: 1 and 2. The mapping of commands to a particular class of service can be done based on the priority or the master ID. The mapping based on priority can be done by setting the appropriate values in the Priority to Class of Service Mapping register. The mapping based on master ID can be done by setting the appropriate values of master ID and the masks in the Master ID to Class of Service Mapping registers.

Each class of service has an associated latency counter. The value of this counter can be set in the Latency Configuration register. When the latency counter for a command expires, i.e., reaches the value programmed for the class of service that the command belongs to, that command will be the one that is executed next. If there is more that one command that has expired latency counters, the command with the highest priority will be executed first. One exception to this rule is as follows: if any of the commands with the expired latency counters is also the oldest command in the queue, that command will be executed first irrespective of priority.

Master Slave Connectivity[edit]

Not all masters can access all slaves in the device. For a detailed description of which masters can access which slaves, refer to the System Interconnect chapter of the device specific System Reference Guide.




E2e.jpg {{
  1. switchcategory:MultiCore=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article Keystone SoC Level Optimizations here.

Keystone=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article Keystone SoC Level Optimizations here.

C2000=For technical support on the C2000 please post your questions on The C2000 Forum. Please post only comments about the article Keystone SoC Level Optimizations here. DaVinci=For technical support on DaVincoplease post your questions on The DaVinci Forum. Please post only comments about the article Keystone SoC Level Optimizations here. MSP430=For technical support on MSP430 please post your questions on The MSP430 Forum. Please post only comments about the article Keystone SoC Level Optimizations here. OMAP35x=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article Keystone SoC Level Optimizations here. OMAPL1=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article Keystone SoC Level Optimizations here. MAVRK=For technical support on MAVRK please post your questions on The MAVRK Toolbox Forum. Please post only comments about the article Keystone SoC Level Optimizations here. For technical support please post your questions at http://e2e.ti.com. Please post only comments about the article Keystone SoC Level Optimizations here.

}}

Hyperlink blue.png Links

Amplifiers & Linear
Audio
Broadband RF/IF & Digital Radio
Clocks & Timers
Data Converters

DLP & MEMS
High-Reliability
Interface
Logic
Power Management

Processors

Switches & Multiplexers
Temperature Sensors & Control ICs
Wireless Connectivity