NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.

Common DDR Issues

From Texas Instruments Wiki
Jump to: navigation, search

Intro[edit]

Perhaps the most important interface in a microprocessor is the DRAM controller. It allows the CPU to access a huge amount of memory in a cost effective manner. Over the years as we've progressed from SDRAM to DDR3 and beyond, the bus speeds have gotten faster while the voltages have gotten lower. This puts increasing pressure on the engineer to make sure everything is designed exactly right, or else issues will be encountered. Sometimes the issues are obvious and the external memory won't work at all. Perhaps even worse is the case where things are seemingly working fine, but there are issues lurking that only manifest themselves at extremes of temperature or process.

This page is intended to serve two purposes:

  1. If you're experiencing DDR issues, it will give some ideas of things to check in order to identify the root cause.
  2. Better yet, as you're finishing up future designs it should serve as a good checklist of issues to avoid!

Looking for Issues[edit]

Quick and Easy Test[edit]

  1. Connect to the board using Code Composer Studio and a JTAG emulator.
  2. Configure the DDR interface. This can be done either through a CCS gel file or by some kind of bootloader software such as u-boot or x-loader. This topic is not intended to cover the actual register configuration. That topic is covered in other articles that contain timing spreadsheets, etc.
  3. Open a memory window to point to external memory. Typically I change the memory size displayed to math the size of the actual memory bus, e.g. set to 16-bit values for 16-bit bus, or set to 32-bit values for 32-bit bus. This makes it easier to match up any corrupted values with specific lanes or bits that are having issues. Since we're not running any code, make the memory window as big as you can with a width that's some power of 2 (eg, 4, 8, 16 words/halfwords). This makes it easier to find timing errors later on because you can see if adjacent rows get corrupted when you do a write, or your write data shows up someplace else.
  4. Hit "refresh" a bunch of times (or alternatively, hit the continuous refresh button so CCS keeps updating the window) On parts that support enabling/disabling data cache within the memory window, it's important to make sure the data cache checkbox is unchecked; we want to see the values in DDR, not the cache. If you see memory changes in the window then clearly something is wrong.
  5. Try poking a value somewhere and make sure it "sticks", and without affecting other addresses. This is why a big memory window is a good idea- typical DDRs have 2KB-8KB page sizes (and maybe larger depending on bus width); so the more you can see the better.

The above method is not the be-all, end-all method of memory testing. There are better available options such as memtester, etc. for more comprehensive testing. The above test is an easy method for seeing if there's an obvious issue in the DDR interface when bringing up a new board.

Full Test Procedure[edit]

  1. Test the board at low temp and high voltage. The low temp should be still within spec of all the parts. The "high voltage" refers to whatever rails can (within reason) be adjusted that pertain to DDR. So for example, on Sitara devices the VDD_CORE rail is what powers the EMIF and the VDDS_DDR rail is used as the bus voltage. If those rails are adjustable (such as through a PMIC) then you should raise the voltages a bit for the low temperature test to stress the interface.
  2. Test the board at high temp and low voltage. The high temp should still be within spec of all the parts. Keep in mind that different parts may spec their temperature differently. For example, TI Sitara devices generally spec a junction temp while DDR3 devices generally spec a case temp. Also keep in mind that for many DDR devices the refresh rate needs to be doubled over 85C. For the voltage, this would again pertain to voltages affecting the DDR, and you would only want to lower them a tiny bit (i.e. still want them to be in spec).

For the tests above you would want to run something more comprehensive such as memtester.

Types of Issues[edit]

Bit Flips[edit]

If there is a specific bit (or bits) that are consistently having issues, this might be indicative of a layout issue. Double check the trace lengths for the specific bit(s) having issues to ensure it meets the skew requirements with respect to DQS and other bits within the lane.

Missing/Zero Byte[edit]

If there's a specific byte that is being "lost", i.e. the whole byte shows up as zero then that might be indicative of an issue with your DQS lines.

  • In particular, you might check the skew report to ensure DQS meets all the requirements. You might also want to look at signal integrity, perhaps through a high quality scope and/or use of IBIS models for simulation.
  • Most newer devices (e.g. devices capable of supporting DDR3) have ability to tune the DQS timing. If you've not already performed the leveling/training to determine the optimal values then that would be a good next step as that will adjust the DQS timing.

Data Changing Everywhere![edit]

This is likely a configuration issue, i.e. double check the DDR data sheet against your configuration. You likely have a mismatch with respect to number of banks, column address bits, etc.

The other signature of this mismatch is that you see repeated data- for example; the data at offset 0 matches the data at offset 256, 1 matches 257, 2, matches 258... and so on. If the offset between repeated patterns (256 in this example) is LESS than the row length of your DDR interface (that is, (2^^# of column addresses)*bus_width_in_bytes); then this is most certainly the issue. If the offset is GREATER THAN or EQUAL to the row length; then the issue could still be in this configuration, but may also point to a subtle timing issue with regards to the lower address bits (A0, A1); which would be used address adjacent rows of the DDR.

Failures at temperature extremes[edit]

There could be multiple contributors:

  1. At high temp, make sure you are still within the specified temperature range of the DDR. At some temperatures you need to double the refresh rate (please refer to DDR data manual).
  2. This could relate to marginal timings/layout since timings will vary with temperature.
  3. This could be related to a PLL. Verify that you observe the expected DDR clock frequency. Temperature extremes can cause a PLL to lose lock if for example it was configured out of spec (intermediate frequency not in spec, etc.) or if there's an issue with the PLL power supply.
  4. Power is always a suspect for any kind of DDR issues. Verify all relevant supplies are within spec during a failing scenario (e.g. PLL supply, CORE/EMIF supply, VDDS_DDR, VREF, etc.).

32-bit accesses work, but 8-bit accesses fail[edit]

This can happen if your DQS lines are swapped. Make sure they connect to the appropriate data lane.

Read Failures[edit]

Often times read failures occur due to signal integrity issues. This often is the case on DDR2 and later memories where On-Die Termination (ODT) has not been properly configured on the processor. ODT needs to be configured for both the DDR and the processor. The configuration for the DDR is specified in the SDRAM_CONFIG register of the EMIF. There is a separate register that can be a DDR PHY control register or sometimes a control register to configure ODT for the processor.

First access of a burst is garbage[edit]

This is often indicative of an issue related to DQS timing. At the hardware level you may want to check the flight times of the DQS and data signals to make sure the skew is within spec.

This can also be related to the internal timings of the DDR PHY. The "read latency" field of the DDR PHY Control Register can be increased by 1 to provide more margin. Side note: this "read latency" field is related to the PHY itself and is not related to the JEDEC "read latency (RL)" from the DDR specifications. Here is some additional info on this field:

  • The read latency in DDR_PHY_CTRL_1 is for the DDR PHY. This should not be confused with the RL in the DDR device datasheet. Both of these are different.
  • Read latency of DDR PHY is used to transfer the read data from the DQS clock domain to its internal clock domain. The DDR PHY has a pseudo-synchronous FIFO that transfers read data from the DQS clock domain to its internal clock domain. The DDR PHY read latency defines how long the DDR PHY should wait before reading the FIFO using internal clock, after it has been written by the DQS clock during reads. The DDR PHY read latency number is defined by the round-trip latency of the system (IO delay + board delay + CL).
  • Therefore, changing DDR PHY read latency will not impact any waveforms observed on the DDR bus. It is strictly an internal timing for data transfer from DQS domain to DDR PHY clock domain.
E2e.jpg {{
  1. switchcategory:MultiCore=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article Common DDR Issues here.

Keystone=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article Common DDR Issues here.

C2000=For technical support on the C2000 please post your questions on The C2000 Forum. Please post only comments about the article Common DDR Issues here. DaVinci=For technical support on DaVincoplease post your questions on The DaVinci Forum. Please post only comments about the article Common DDR Issues here. MSP430=For technical support on MSP430 please post your questions on The MSP430 Forum. Please post only comments about the article Common DDR Issues here. OMAP35x=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article Common DDR Issues here. OMAPL1=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article Common DDR Issues here. MAVRK=For technical support on MAVRK please post your questions on The MAVRK Toolbox Forum. Please post only comments about the article Common DDR Issues here. For technical support please post your questions at http://e2e.ti.com. Please post only comments about the article Common DDR Issues here.

}}

Hyperlink blue.png Links

Amplifiers & Linear
Audio
Broadband RF/IF & Digital Radio
Clocks & Timers
Data Converters

DLP & MEMS
High-Reliability
Interface
Logic
Power Management

Processors

Switches & Multiplexers
Temperature Sensors & Control ICs
Wireless Connectivity