NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.

Trace Limitations

From Texas Instruments Wiki
Jump to: navigation, search

Standard Trace Limitations[edit]

As with any tool or software package, trace has some limitations. It is recommended that you understand the limitations before spending hours trying to use trace to solve a problem that is wasn't meant for. Nothing is more frustrating than wasting time heading down a path that will inevitably dead end. It's better to spend a little bit of time learning where the dead ends are.

In this section, the trace limitations will be spelled out. In some cases, examples of cases where the limitation is a problem will be given, and, when available, alternate configurations will be mentioned that work around the limitation.

Standard Trace Only has visibility to data that travels over the CPU Memory Bus[edit]

Without getting into too much detail about the design of trace, it can be said that trace works like a logic analyzer that monitors activity on the CPU Program and Data buses. This means that there are a number of different types of memory accesses that trace provides no visibility into:

  • DMA
  • HPI
  • CPU Register Accesses
  • Any other peripheral access

Additionally, Trace gives no insight into whether a value was read from cache, or whether it came from memory. In either case, the values travel over the same memory bus and Trace cannot distinguish between the two.


DeadEnd.png

Dead End

Don't try to use trace to watch memory locations that might be getting overwritten by DMA, HPI, etc. Trace can't see those writes because they don't travel over the memory bus. You can use trace to monitor writes to the DMA Configuration registers, but not to monitor writes actually done by the DMA



Bandwidth Constraints[edit]

Trace does not have the bandwidth to capture everything at once. The trace hardware has a limited FIFO, and if overflowed the user will see gaps in the captured trace data (gaps are marked in the trace Status column). It's not wise to create a Trace On job, and enable every trace data type of every trace stream (PC, Timing, Data Read Address, Data Read Data, Data Write Address, and Data Write Data). As soon as any data begins to be accessed, the trace output will be filled mostly with gaps in the trace data.

Helpful tips image.jpg

Useful Tip

When Using Data Trace, make every attempt to limit the amount of data that is being captured to data that is relevant. This will avoid needing to deal with gaps in the trace data. Being able to use AET to trigger only when necessary is the key to capturing a minimal amount of data

Trace Bandwidth Calculations

The XDS560 Trace Pod is capable of acquiring trace data up to 333Mbits/sec per pin. The XDS560's Trace Pod buffer storage maximum data rate is 3.33 Gbits per second, which means at 333Mbits per second a maximum of 10 pins can be employed to transfer trace data from the device to the XDS560 Trace pod.

Devices that support trace may also include dedicated memory on-chip that is used to collect smaller samples of trace data. This feature is called the Embedded Trace Buffer (ETB). The ETB collects trace over a wider (20-bits wide) on-chip bus with a data rate of approximately 333 MBits/sec/bit resulting in a maximum data rate of 6.67 Gbits/sec.

Both PC and Timing streams are highly compressed. With a processor running at 1 Ghz the PC trace stream requires 1.25 Gbits/sec (assuming relative branches) and the Timing stream requires .625 Gbits/sec. When you combine the PC stream (1.25 Gbits/sec) or PC+Timing streams (1.875 Gbits/sec) with the data trace stream, the bandwidth available for data trace is the difference between the maximum export rate (3.33Gbits/sec for the XDS560 Trace Pod or 6.67 Gbits/sec for ETB) and the PC or PC+Timing streams. In the case of the XDS560 Trace pod, once it's calibrated the trace data rate per pin is provided in the trace display.

Data trace does not typically compress very well due to the random nature of CPU data accesses. Considering no compression a worst case data access (32-bit PC of the access, 32-bit address, 32-data, and access size) takes on the order of ~110 bits to represent in the trace data. Assuming both PC and Timing streams are enabled, using the available bandwidth of the XDS560 Trace Pod (3.33 GBits/sec - 1.875 Gbits/sec) leaves 1.455 Gbits/sec for data trace. If we divide the available bandwidth by the number of bits per access you can trace 13.2M accesses per second, or 1 CPU memory access every 77 cycles with a 1Ghz processor. In the ETB case (6.67 Gbits/sec - 1.875 Bbits/sec) the available bandwidth for data trace in this case is 4.795 Gbits/sec you can trace 43.6M accesses per second, or 1 cpu access every 23 cycles.

The general rule of thumb is the trace FIFOs can absorb a worst case data access every 80 cycles with PC and Timing streams enabled when exporting to the pins, and 23 cycles when exporting to ETB without causing gaps in the trace data. If you have PC and Timing streams disabled or your data trace stream compresses better than typical random accesses (tracing sequential accesses) you may be able to improve on the frequency at which you can trace the data stream.

Overclocking the XDS560T[edit]

The XDS560T is designed to collect DDR trace data at a maximum clock rate of 167Mhz (333Mbit/sec/pin). Trace uses fixed divide-by-factors to derive the trace export clock from the functional clock rate of the device. Some devices may operate at a maximum functional clock rate that requires a trace clock rate that is slightly higher than 167Mhz rate to maximize the trace bandwidth. The maximum clock rate can be modified by changing the MAX_TRACEPORT_CLOCK_MHZ parameter in the ccsv4\emulation\analysis\bin\Receiver_CFEOpts.xml file. If your version of CCS does not include the MAX_TRACEPORT_CLOCK_MHZ parameter then that version does not support this feature. It is not recommended to extend this parameter beyond 175.0Mhz. If the quality of the trace data deteriorates it is recommended that you reduce the MAX_TRACEPORT_CLOCK_MHZ parameter. Trace data quality can be checked by searching the Trace Status column for "Bad" status, although there are other factors that can cause "Bad" status messages, such as executing code that is outside of the load image.


Self Modifying Code[edit]

Decoding of trace data is accomplished by using the executable image (coff or elf file) to fill in the sequential PC data between branches and to extract opcodes for disassemble. If the code is modified by execution on purpose then the executable image from the file will most likely not match causing "Bad Trace Data" status messages in the Trace Status column. Some versions of BIOS will load different interrupt vectors at run time causing this problem. The trace system does support code Overlays per the guidelines provided by TI documentation.

Super Gap Issue[edit]

In some cases, when the Trace FIFO is completely overloaded, and there are potentially huge gaps in between captured data. The Trace Output does not indicate the gaps. So the output data will appear to not have any gaps in it, when in reality there are gaps in the trace data. This is a dangerous scenario in that trace is hiding the fact that the data set that was captured is not complete. When this occurs, typically the trace output will show no data trace where the PC's indicate instructions that are causing data accesses.

Helpful tips image.jpg

Useful Tip

Trace has a setting in the Trace Control Window to tell the CPU to stall whenever Trace data will be lost. This setting can be found in the Tools->Trace Control->Advanced menu under the heading Options to Minimize Trace Data Loss. If you suspect that you are encountering the Super Gap issue, you can enable these stalls and check the output to see if there is additional trace data that is being missed. Keep in mind that when using this option, the application likely won't meet its normal real-time deadlines.

SPLOOP/Delay Slot Of Branch Limitations[edit]

Trace cannot be triggered within an SPLOOP or in the delay slot of a branch (or any other time the pipeline is in a non-interruptable state). This doesn't mean that if PC trace is being captured that it will miss capturing the instructions within each of these condition's. What it means is that trace cannot be started or stopped while either of these conditions is present. The trigger is generated, but cannot be acted on until the SPLOOP/Delay Slot condition is removed. This may not seem like a serious issue at first, but when capturing data trace, it can potentiality cause trace to capture data that wasn't expected.

Example 1[edit]

Take the following scenario... A job is created to capture data write values to a specific address. For simplicity, let's assume that this address is 0x00808000. At some point, while the application is in an SPLOOP, this address is written to. What happens? Well, when address 0x00808000 is written to, AET generates a store write data sample trigger is generated. However, because the code is in an SPLOOP, it cannot act on this trigger. However, the trigger will pend until the SPLOOP condition goes away. Now, on the cycle that that the SPLOOP condition goes away, if data is written to memory (regardless of where it is written to), it will get captured by trace. (Remember that the store sample trigger will store any data on the data bus, regardless of address. The address is handled by AET.) If no data happens to be written on the cycle where the SPLOOP condition goes away, trace will not store any data.

So, as you can see, this limitation can potentially cause two different types of false data in the trace output. In the first case, data that is written to a different address than was specified can be captured. In the second case, data is missed for one of the triggers. In a third case, which wasn't discussed, multiple triggers can be missed. If the specified address is written to multiple times while the SPLOOP condition exists, only one of those triggers actually gets generated when the SPLOOP condition goes away. It behaves very similar to an interrupt that occurs when interrupts are disabled. If multiple interrupts occur while the interrupts are disabled, only one interrupt is serviced when the interrupts are re enabled.

Example 2[edit]

If you setup a Trace In Range job, and set the end point to the address of the last instruction of a branch delay slot, PCs and/or data will continue to be traced until execution advances past the delay slot causing extra data to be traced that is outside of the range you selected. The extra PCs traced may be one or more instructions, and depends on the nesting depth of the branch. You can get the same effect with End Trace jobs.

Timing Stream Limitations[edit]

Store Sample Timing Stream Limitation[edit]

The Timing Stream doesn't advance between store sample triggers. This means that if the trace jobs are strictly storing data samples, the cycles that trace shows between them won't reflect the actual time between writes. The numbers will appear as consecutive cycles.

To avoid this problem you can setup a separate Trace On job with nothing but timing enabled. This will allow timing data to accumulate between Store Sample triggers.

Example[edit]

A simple use case for trace to do some type of thread level profiling would be to instrument the switch function ion the RTOS to write the id of the next task to a global variable. Then, you might create a trace job that stores the data written to that address along with the timestamp for the write. This will give you a crude thread execution graph, as you could see in the data the list of threads that have been executed. However, due to this limiation, the cycle count will show each thread as being a single cycle after the prior thread.

Timing Stream Execute Pipeline Alignment[edit]

Each execute stage of the pipeline generates a portion of the trace data that is required to reconstruct the CPU's PC and Data activity. The trace data from each execute phase is aligned by the Pipeline Flattener. The Timing Stream and Stall Events are aligned to the output of the Pipeline Flattener. Any stall events and pipeline advance delays (Timing Stream data) that occur during a pipeline advance are reported with the PC that is exiting the Pipeline Flattener (at pipeline stage E5). See Understanding Trace Timing to determine the alignment of the execute pipeline with trace data during a stall.

Event Trace Limitations[edit]

Event Trace Jobs[edit]

Every Event Trace job must utilize the same event category (stall, memory, system) for triggering, except for the system event. You can have one system event job and 4 other jobs of the same type, or 5 system event jobs.

Only Highest Priority Stall Events Captured[edit]

In the Unified Breakpoint manager, the user can select up to 4 different events (or sets of events) for Event Trace. In this article, these will be referred to as UBM Events. Note that a UBM Event can consist of multiple processor events. Choosing the individual events from within UBM simply enables inputs to an OR gate that generates the UBM Event. The Limitation here occurs when the user selects multiple UBM Events from the Stall Family. The UBM Events are identified as Event 1, Event 2, Event 3, Event 4. They are prioritized with Event1 being the highest priority, and Event 4 being the lowest. In an Event Trace scenario where two UBM Events are generated on the same cycle, only the higher priority event will be displayed int the output.

Corner Case

Corner.jpg
This limitation is only true when the selected event category is Stalls. If either of the other event categories are chosen (Memory, System), multiple outputs are indicated by reporting a weighted sum value. UBM Event 1 carries weight of 1, where UBM Events 2,3 and 4 carry a weight of 2,4, and 8 respectively. So, for example, a value of 0xB in the trace stream would indicate that UBM Events 4, 2, and 1 triggered on that cycle
E2e.jpg {{
  1. switchcategory:MultiCore=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article Trace Limitations here.

Keystone=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article Trace Limitations here.

C2000=For technical support on the C2000 please post your questions on The C2000 Forum. Please post only comments about the article Trace Limitations here. DaVinci=For technical support on DaVincoplease post your questions on The DaVinci Forum. Please post only comments about the article Trace Limitations here. MSP430=For technical support on MSP430 please post your questions on The MSP430 Forum. Please post only comments about the article Trace Limitations here. OMAP35x=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article Trace Limitations here. OMAPL1=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article Trace Limitations here. MAVRK=For technical support on MAVRK please post your questions on The MAVRK Toolbox Forum. Please post only comments about the article Trace Limitations here. For technical support please post your questions at http://e2e.ti.com. Please post only comments about the article Trace Limitations here.

}}

Hyperlink blue.png Links

Amplifiers & Linear
Audio
Broadband RF/IF & Digital Radio
Clocks & Timers
Data Converters

DLP & MEMS
High-Reliability
Interface
Logic
Power Management

Processors

Switches & Multiplexers
Temperature Sensors & Control ICs
Wireless Connectivity