NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.

MCSDK HPC 3.x Linear Algebra Library

From Texas Instruments Wiki
Jump to: navigation, search

The TI Linear Algebra library (LINALG) is an optimized library for performing dense linear algebra computations. Currently, LINALG includes BLAS and LAPACK optimized and tuned for K2H platforms only. BLAS is based on BLIS (http://code.google.com/p/blis/). LAPACK is based on CLAPACK 3.2.1 (http://www.netlib.org/clapack/).

Release Notes[edit]

Release notes for LINALG can be found here.

API and User's Guide[edit]

TI LINALG adapts the CBLAS API for BLAS and CLAPACK API for LAPACK. Refer to following web sites for detailed API documentation and/or user's guide.

Object Libraries[edit]

After installation, LINALG object libraries will be located at /usr/lib. BLAS has two libraries listed below:

  • libcblas_armplusdsp.a
  • libblis.a

LAPACK has three libraries listed below:

  • libcblaswr.a
  • liblapack.a
  • libf2c.a

Header Files[edit]

After installation, LINALG header files will be located at /usr/include:

  • BLAS header file: cblas.h
  • LAPACK header files: f2c.h, blaswrap.h, clapack.h

Note that LAPACK header fc2.h has a complex type which is different from the complex type in C99 complex.h. If f2c.h is included, C99 complex.h should not be used.

Run Time Configuration[edit]

BLAS can be configured to run on either ARM or DSP (offloading). LAPACK can only run on ARM (BLAS functions invoked by LAPACK can run on DSP). When BLAS runs on ARM, it can be configured to run with 1 to 4 cores. When BLAS runs on DSP, it will always run with 8 cores.

The run time configuration is done through the following environment variables:

  • BLIS_IC_NT: to configure number of ARM cores to run BLAS. Value can be 1, 2, 3, or 4.
  • TI_CBLAS_OFFLOAD: to configure BLAS offloading. Set to xyz, where x,y,z correspond to level 1, level 2, and level 3 functions respectively and can take any of 3 values below:
 - 0: no offloading to DSP, i.e. always running on ARM
 - 1: forced offloading to DSP, i.e. always running on DSP
 - 2: optimum offloading to DSP based on matrix sizes in order to achieve best performance in terms of execution time
  • Default number of ARM cores is 1 if environment variable BLIS_IC_NT is not set.
  • Default offloading configuration, when environment variable TI_CBLAS_OFFLOAD is not set, is 002 (no offloading for level 1 and 2, optimum offloading for level 3)
  • Example: TI_CBLAS_OFFLOAD=001 means level 1&2 functions will always run on ARM, and level 3 functions will always run on DSP.
  • Note: in this release, optimum offloading (value 2) is not available for level 1 and level 2. If this option is configured for level 1 and level 2, functions will be always offloaded to DSP.

Tuning[edit]

When level 3 BLAS is configured for optimum offloading, the offloading decision will be based on matrix sizes. Automatic tuning can be performed to find the matrix sizes for which offloading to DSP is faster than running on ARM.

The released BLAS libraries are tuned for 3 ARM cores, i.e., BLIS_IC_NT=3. For users who wish to redo tuning for different number of ARM cores, follow these steps:

  1. set environment variable BLIS_IC_NT to number of ARM cores to run BLAS on (1, 2, 3, or 4)
  2. go to <linalg installation root>/tuning
  3. type "make tune"
  4. after the above step is finished, copy <linalg installation root>/tuning/ofld_tbls/ofld_tbl_*.c to <linalg installation root>/blasblisacc/src.
  5. rebuild linalg: <linalg installation root>/make build
  6. reinstall linalg: <linalg installation root>/make install

Examples[edit]

A few examples are provided to show how to use LINALG with CBLAS and CLAPACK API. They are located in <linalg installation root>/examples:

  • Matrix multiplication (dgemm)
  • Symmetric rank k operation (dsyrk)
  • Triangular matrix multiplication (dtrmm)
  • Triangular matrix equation solver (dtrsm)
  • Eigen decomposition and matrix inversion (eig)
  • LU decomposition and matrix inversion (ludinv)
  • xGEMM benchmarking (gemm_bench)

To run all of these examples, go to <linalg installation root>/examples and type "make test".

To run any individual example, go to that example's folder, e.g. dgemm_test, and type "make run".

Testing[edit]

BLAS was tested in BLIS test suite as well as in CLAPACK's BLAS test suite. LAPACK was tested in CLAPACK's LAPACK test suite.

Following the steps below to run these tests:

  • BLAS test suite:
 > cd <linalg installation root>/clapack/BLAS
 > ./run_blas_tests.sh
  • LAPACK test suite:
 > cd <linalg installation root>/clapack 
 > make lapack_testing
  • BLIS test suite:
 > cd <linalg installation root>/blis 
 > ./configure -p install/arm cortex-a15
 > cd testsuite
 > make lib=OpenCLCBLAS
 > ./test_libblis_cortex-a15.x

Benchmarking[edit]

LINALG benchmarking was performed on 66AK2H12 SoC with clock speed 1 GHz. Table below shows the performance of xGEMM in GFLOPS, measured on the host with computation accelerated on DSP.

M N K DGEMM SGEMM CGEMM ZGEMM
1000 1000 1000 20.954 64.008 78.264 13.985
2000 2000 2000 23.03 82.405 86.507 14.365
3000 3000 3000 23.956 84.344 88.538 14.234
4000 4000 4000 24.118 87.545 89.618 14.314
5000 5000 5000 22.245 89.075 83.231 14.343

For performance of all level-3 BLAS functions, refer to BLAS Benchmarking.

E2e.jpg {{
  1. switchcategory:MultiCore=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article MCSDK HPC 3.x Linear Algebra Library here.

Keystone=
  • For technical support on MultiCore devices, please post your questions in the C6000 MultiCore Forum
  • For questions related to the BIOS MultiCore SDK (MCSDK), please use the BIOS Forum

Please post only comments related to the article MCSDK HPC 3.x Linear Algebra Library here.

C2000=For technical support on the C2000 please post your questions on The C2000 Forum. Please post only comments about the article MCSDK HPC 3.x Linear Algebra Library here. DaVinci=For technical support on DaVincoplease post your questions on The DaVinci Forum. Please post only comments about the article MCSDK HPC 3.x Linear Algebra Library here. MSP430=For technical support on MSP430 please post your questions on The MSP430 Forum. Please post only comments about the article MCSDK HPC 3.x Linear Algebra Library here. OMAP35x=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article MCSDK HPC 3.x Linear Algebra Library here. OMAPL1=For technical support on OMAP please post your questions on The OMAP Forum. Please post only comments about the article MCSDK HPC 3.x Linear Algebra Library here. MAVRK=For technical support on MAVRK please post your questions on The MAVRK Toolbox Forum. Please post only comments about the article MCSDK HPC 3.x Linear Algebra Library here. For technical support please post your questions at http://e2e.ti.com. Please post only comments about the article MCSDK HPC 3.x Linear Algebra Library here.

}}

Hyperlink blue.png Links

Amplifiers & Linear
Audio
Broadband RF/IF & Digital Radio
Clocks & Timers
Data Converters

DLP & MEMS
High-Reliability
Interface
Logic
Power Management

Processors

Switches & Multiplexers
Temperature Sensors & Control ICs
Wireless Connectivity