NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.
StarterWare NeonVFP Benchmark
Contents
Introduction[edit]
This page lists out the Performance Benchmark numbers measured for Neon and VFP coprocessors on EVMSK AM335x Platform using the neonVFPBenchmark example included in the StarterWare 02.00.01.01 Release.
The numbers listed below are the execution time taken by the functions included in the example application on the coprocessors. The numbers listed are for for three different compilers. For more information on how to measure the performance for different cases refer this link NeonVFP Support
Refer the below sections for different compiler settings and the actual performance numbers.
GCC[edit]
This section lists out the compiler options used to measure the performance numbers using GCC 4.7.3 compiler integrated in Linaro baremetal toolChain 4.7 2012q4.
The compiler options used for the example for different cases are as below, the Benchmark numbers are listed in the Performance Numbers section at the end of this page.
Optimization level of O2 is used for getting the performance numbers for the example in Release configuration.
- Neon: -mfpu=neon -mfloat-abi=softfp -ftree-vectorize
- VFP: -mfpu=vfpv3 -mfloat-abi=softfp
- SoftFloat: No hardware coprocessor options enabled
TI ARM Compiler[edit]
This section lists out the compiler options to get the performance numbers measured using TI ARM compiler 5.0.4 integrated with CCSv5.4.
The compiler options used for the example for 3 cases are as below. Benchmark numbers for this compiler are listed in the Performance Numbers section at the end of this page.
The example in Release configuration is enabled with an Optimization level of O2 and additionally --opt_for_speed = 2 is selected to get the performance numbers.
- Neon: --neon option is enabled to generate SIMD instructions to measure Neon engine performance
- VFP: --float_support VFPv3 architecture is selected for measuring VFP engine performance.
- SoftFloat: The Neon and VFP compiler options are unchecked to get the performance numbers without coprocessor support.
IAR Compiler[edit]
This section lists out the compiler options used for this compiler to get the performance numbers measured using IAR compiler 6.50
Below is the list of compiler options used for different cases, the Benchmark numbers for this compiler are listed in the Performance Numbers section at the end of this page
Optimization level of medium is enabled for the example in Release configuration to get the performance numbers.
- Neon: fpu= NEON + VFP is selected to get Neon engine performance
- VFP: fpu= VFP is selected to get VFP engine performance
- SoftFloat: No hardware coprocessor option is selected
Performance Numbers[edit]
This section lists out the performance numbers for different compilers obtained from the benchmarking application for Release Configuration for AM335x EVMSK platform.
GCC Performance numbers[edit]
Functions/Routines | Neon (in ms) | VFP (in ms) | SoftFloat (in ms) |
Floating Point Array Scale & Add | 1462.101 | 1462.101 | 2887.004 |
Floating Point Array Multiply | 921.701 | 921.701 | 581.902* |
Cephes Library Sine Function | 176.556 | NA | NA |
Cephes Library Cosine Function | 186.886 | NA | NA |
Intrinsic Sine Function | 34.802 | NA | NA |
Intrinsic Cosine Function | 35.30 | NA | NA |
* Note:
- The performance benchmark number for Float Multiplications function without any coprocessors' support is not as expected. This issue is currently under investigation.
- To get the best performance, it is recommended to use Neon Intrinsics or Assembly Instructions of Neon and VFP coprocessors.
CCS Performance numbers[edit]
Functions/Routines | Neon (in ms) | VFP (in ms) | SoftFloat (in ms) |
Floating Point Array Scale & Add | 8630.36* | 2767.638 | 8630.367 |
Floating Point Array Multiply | 2074.002 | 1566.728 | 2074.003 |
* Note:
- To get the best performance numbers, it is recommended to enable both Neon and VFP options for the example project in CCS.
- For details on how to enable/disable the compiler settings for Neon and VFP please refer to the CCS compiler option section in this link NeonVFP Support
IAR Performance numbers[edit]
Functions/Routines | Neon (in ms) | VFP (in ms) | SoftFloat (in ms) |
Floating Point Array Scale & Add | 2362.728 | 2362.728 | 7555.405 |
Floating Point Array Multiply | 871.637 | 871.637 | 1640.910 |
Note: The Intrinsic functions included in the example application are compatible with GCC compiler only. For IAR and CCS compilers, the example gives out Benchmark numbers for Float Add and Multiply functions only.