NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.
StarterWare NeonVFP
Contents
Introduction:[edit]
AM335x ARM MPU Subsystem includes the SIMD capable NEON engine and VFP coprocessor. The VFP coprocessor implements the VFPv3 architecture and is fully compliant with IEEE 754 standard. The Neon engine with the SIMD architecture is used to accelerate media codecs, 2D/3D graphics and image Processing.
For more information and help on Neon, VFP coprocessor and SIMD concept refer processors.wiki.ti.com/index.php/Cortex-A8
The below sections discuss about Neon and VFP coprocessors support provided in StarterWare and the different configurable options provided to measure the Neon/VFP engine performance.
StarterWare Support for Neon and VFP coprocessors:[edit]
The following support is provided in StarterWare for Neon and VFP coprocessors.
- Enabling Neon/VFP engine during system initialization. The ARM cortex-A8 comes up with Neon and VFP engines disabled at power up. Support is provided to enable this feature during startup. In the system initialization code the init.s file is updated with assembly code to enable the coprocessors as per the syntax of the different toolChains supported.
The updated init file is located at
\system_config\armv7a\<compiler>\init.s for different compilers
- IRQ handler context save/restore for Neon/VFP registers
The AM335x IRQ handler is updated to save and restore the Neon/VFP registers on the stack during an interrupt.
The exceptionhandler at following location is modified
\system_config\armv7a\<compiler>\exceptionhandler.s
- A basic benchmarking Application neonVFPBenchmark is provided for measuring the performance of the functions on Neon and VFP coprocessors. It measures the performance of functions in terms of time required to execute the functions.
- The Neon and VFP coprocessor registers are saved and restored in demo example as part of sleep wakeup sequence.
Benchmarking Application[edit]
neonVFPBenchmark example[edit]
- neonVFPBenchmark example application provides support for basic benchmarking of functions by giving their execution time in micro-seconds. It provides performance numbers of functions for the following cases
1: For Neon engine performance
2: For VFP engine Performance
3: SoftFloat (without any coprocessors support where floating point operations is implemented using library helper calls by the compiler)
- The example framework permits the users with the flexibility to plug in their own function and have its performance measured, this can be done by updating the structure benchmarkFunction with details about the function to be benchmarked.
- The performance numbers of functions included in example are displayed on the selected Console and the data is displayed in the below format in addition to the ticks clocked by the Timer.
<Function Name> <TimeTaken in micro seconds> <No of Iterations>
- The example gives the performance numbers of functions performing Float point Additions and Multiplications for all toolChains supported i.e. GCC, IAR and CCS with Neon enabled, with VFP enabled and SoftFloat case.
- Additionally for GCC compiler the application measures performance of Sine and Cosine Maths functions from third_party library for Neon engine implemented using with and without Neon Intrinsics.
Default Example settings[edit]
- The example provides performance numbers for Neon engine performance by default for all the toolChains.
- To get the Performance numbers for VFP coprocessor and SoftFloat case refer to the building the example section below.
- All the functions included in the example for performance benchmarking are executed 100000 times, the user has the flexibility to change this to get performance results for different iteration values.
Building the Example Application[edit]
- This section describes the build configuration settings which have to be changed for the neonVFPBenchmark example application to get the performance numbers needed for different cases like with Neon engine enabled, with VFP enabled and Softfloat.
- By default the example Application is compiled with Neon compiler option for all the toolchains.
- To get the performance numbers for different cases the example has to be compiled with options customized for each toolChain.
- The below section describes the build configuration settings which have to be changed for GCC, CCS and IAR toolchains.
GCC Compiler Options
[edit]
This section describes the build configurations required for GCC compiler to build the example application for getting performance benchmarking numbers for different cases.
Below are the steps required to change the build configurations
The below commands are to be executed in the command line at the appropriate build location for different cases.
Ex: ~/build/armv7a/gcc/am335x/evmAM335x/neonVFPBenchmark
a: To get performance numbers for Neon engine performance
$make clean+ $make FPU=NEON
b: To get performance numbers for VFP engine performance
$make clean+ $make FPU=VFP
c: To get performance numbers without any coprocessor support
$make clean+ $make FPU=SOFT
CCS Compiler Options[edit]
This section explains how to enable different build configurations required to be enabled for neonVFPBenchmark application to get the performance numbers for different cases.
Steps to measure the performance numbers for Neon[edit]
- Import the system_config and neonVFPBenchmark project into CCSv 5.4 for details on importing a CCS project refer the following link processors.wiki.ti.com/index.php/AM335X_StarterWare_Environment_Setup
- Build the neonVFPBenchmark example project.
- Run the .out generated on to the target.
- The Benchmark numbers are displayed on to the configured console for Neon.
Steps to measure the performance numbers for VFP[edit]
- Import the system_config and neonVFPBenchmark project into CCSv 5.4
- Right Click on Project Name-> Go to Show Build Settings.
- Select VFPv3 Option from processor options dropdown for both system_config and neonVFPBenchmark project.
- Refer to the screeshots below for changing the compiler options.
- Neon option can be disabled for the neonVFPBenchmark project to get the VFP performance numbers.
Go to Build Settings-> ARM Compiler-> Advanced Options -> Runtime Model Options-> Generate SIMD instructions targeting neon
and uncheck the generates simd instructions checkbox.
- First Build the system_config project with VFPv3 option selected.
- Then build the neonVFPBenchmark project with VFPv3 option selected.
- Run the .out on the Target.
- The Benchmark numbers for VFP are displayed on the configured console.
Note:
- After Running the neonVFPBenchmark application with VFP. Disable the VFP option enabled for system_config by following the steps listed above.
- Failure to do step 1 results in build errors for other dependent projects in Starterware as they are not compliant with VFP calling conventions.
- Neon option has to remain enabled for system_config project to avoid build errors as it uses NeonVFP assembly instructions.
- To get better performance numbers it is recommended to have both Neon and VFP options enabled for neonVFPBenchmark project.
Steps to measure the Performance numbers for SoftFloat.[edit]
- For the neonVFPBenchmark project disable the Neon and/or VFP option if enabled.
- Refer to screenshots below for details on how to enable/disable Neon/VFP option for a project.
- Rebuild the project with the new settings.
- Ensure that the system_config project is built with Neon option to avoid build warnings.
- Run the .out on the target to get the performance numbers displayed on the configured console.
Screenshot for changes required for Enabling/Disabling VFP for system_config and NeonVFPBenchmark application.
Screenshot for enabling/disabling Neon option for a project.
IAR Compiler Options[edit]
This section describes about the different build configurations for Benchmarking example for IAR compiler.
To get the performance numbers for VFP and SoftFloat options follow the steps listed below.
- Import the neonVFPBenchmark Project into the IAR workspace. Refer this link on how to import an IAR project processors.wiki.ti.com/index.php/AM335X_StarterWare_Environment_Setup
- Right click on the project name and select the General Options then under FPU drop down menu select the FPU for which you would like to measure the performance.
- The possible options include VFPv3 + Neon, VFPv3 and None which represent the three different cases for which the example demonstrates the performance numbers.
- Build the example project with newly configured settings.
- Load the .out on the target and get the performance numbers displayed on the configured console.
Refer the ScreenShot below for selecting the appropriate option.
Performance Benchmarks[edit]
The complete list of Benchmark performance numbers measured using the example on EVMSK platform for all the cases of Neon, VFP and softfloat can be found at the following link NeonVFP Benchmark
Known Issues and Limitations[edit]
- IAR compiler does not support Auto-Vectorization feature.
- The Timer used to measure the performance of functions overflows on 170 seconds. So any function which takes more time than this overflow limit cannot be measured accurately.
- StarterWare presently does not have support for printing float point values on the console.
Additional Links
[edit]
- www.arm.com/products/processors/technologies/neon.php
- gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html
- Refer ARM Info Center for Cortex-A8 website for more details on Neon and VFP coprocessors