NOTICE: The Processors Wiki will End-of-Life on January 15, 2021. It is recommended to download any files or other content you may need that are hosted on processors.wiki.ti.com. The site is now set to read only.
Profiling on the C6000 using Function Hooks
Wiki Note | ||
Profiling on the C6000 using Function Hooks | ||
Author(s):Brighton Feng | China Field Applications Team | |
<wikinotenumber/> | July 2014 | |
Abstract | ||
DSP program Profiling is the base of DSP optimization. The profiling feature of CCS (Code Composer Studio) is good for many projects, but there are many complicated projects that are hard to be run with CCS. For this kind of complicated projects, the function hook feature of cgtools (Code Generation Tools) and the task switch hook of Operating System can be used to collect profiling data. This application note introduces Profiling with Hooks, Example code and tool are provided with this Wiki note. | ||
Introduction[edit]
The profiling feature of CCS (Code Composer Studio) is good for many projects, but there are limitations for it:
- Many complicated projects in customers’ real system are hard to be run with CCS.
- Profiling of CCS requires debug information kept with the program, that is, it can not work with release version of a program.
- From CCS4, profiling feature of CCS can only work with simulator and does not support emulator.
For these cases, the function hook feature of cgtools (Code Generation Tools) and the task switch hook of Operating System can be used to collect profiling data.
From C6000 cgtools version 6.1, function hook feature are supported by the C compiler (this feature does not support functions in assembly code). The compiler can automatically generate code to call user defined functions at the beginning and end of a function. An entry hook is a routine that is called upon entry to each function in the program. An exit hook is a routine that is called upon exit of each function. Applications for hooks include debugging, trace, profiling, and stack overflow checking…
The basic flow of Profiling with Hooks is shown in following figure.
Figure 1 - basic flow of Profiling with Hooks |
This application note introduces the implementation of above flow step by step, Example code and tool are provided with this application note.
Compiler Options for Function Hooks[edit]
The compiler options for function hook in CCS5 with cgtools 7.3 looks like below figure.
Figure 2 - Compiler options for function hook in CCS5 |
The compiler options dialog in different CCS version may looks a little bit different from above figure. The command line options are same. Following table describes the command line options.
Options | Description |
---|---|
--entry_hook [=name] |
Enables entry hooks. The hook function is called name. |
--entry_parm
{=name|address|none} |style="border-collapse:collapse; border:1px solid black;"|Specify the parameters to the hook function. The name parameter specifies that the name of the calling function is passed to the hook function as an argument. In this case the signature for the hook function is: void hook(const char *name); | |
--exit_hook [=name] |
Enables exit hooks. The hook function is called name. |
--exit_parm
{=name|address|none} |style="border-collapse:collapse; border:1px solid black;"|Specify the parameters to the hook function. The name parameter specifies that the name of the calling function is passed to the hook function as an argument. In this case the signature for the hook function is: void hook(const char *name); |
The presence of the hook options creates an implicit declaration of the hook function with the given signature. If a declaration or definition of the hook function appears in the compilation unit compiled with the options, it must agree with the signatures listed above.
In C++, the hooks are declared extern "C". Thus you can define them in C (or assembly) without being concerned with name mangling.
Entry hooks and exit hooks are independent. You can enable one but not the other, or both. The same function can be used as both the entry and exit hook.
The compiler options for function hook are independent of other compiler options; you can use highest optimization level without any debug information for your program under test.
You must take care to avoid recursive calls to hook functions. The hook function should not call any function which itself has hook calls inserted. To help prevent this, hooks are not generated for inline functions, or for the hook functions themselves.
You can use the --remove_hooks_when_inlining option to remove entry/exit hooks for functions that are auto-inlined by the optimizer.
The NO_HOOKS pragma prevents entry and exit hook calls from being generated for a function.
The syntax of the pragma in C is:
<syntaxhighlight lang="c">
- pragma NO_HOOKS ( func );
</syntaxhighlight> Additionally, all library calls will have their cycles included by the calling function. Since the library functions are pre-compiled, there is no option for a hook to be added to indicate an entry or exit. This is typically no a big deal, since a user can’t do anything to optimize the library functions anyway, but they should just be aware that this occurs.
Task switch hook[edit]
If multiple tasks are used, task switch should be recorded with task switch hook feature of OS (Operating System). Without task switch record, the function executing when task switch happens can not be counted correctly.
Below is example configuration script for BIOS in CCS5.
<syntaxhighlight lang="c">
/* Define and add one Task Hook Set */
Task.addHookSet({
registerFxn: null,
createFxn: null,
readyFxn: null,
switchFxn: '&task_switch_hook',
exitFxn: null,
deleteFxn: null,
});
</syntaxhighlight>
Implementing Hook Functions[edit]
Hook functions can be implemented for various purposes, such as, debugging, trace, and stack overflow checking. In this application note, we use it for Profiling.
Below are the example hook functions code for Profiling.
<syntaxhighlight lang="c">
/*if record buffer is used as circular buffer,
MAX record number must be power of 2 for easy wrap*/
- define MAX_PROFILE_RECORD_NUM (1*1024*1024)
typedef enum {
HOOK_TYPE_FUNC_ENTRY = 0, /*function entry hook*/ HOOK_TYPE_FUNC_EXIT, /*function exit hook*/ HOOK_TYPE_TASK_ENTRY, /*task entry hook*/ HOOK_TYPE_TASK_EXIT /*task exit hook*/
}Hook_Record_Type;
typedef struct{
/*Address always has last 2 bits == 0. lower two bits are used as record type*/ unsigned int address_type; unsigned int TSCL; /*Lower 32 bits of Time Stamp Counter*/ unsigned int TSCH; /*Higher 32 bits of Time Stamp Counter*/
} Hook_Record;
/*since the profile data buffer is huge, normally it is allocated in
big external memory. If it is in external memory,
do NOT make this buffer cacheable because this is write only buffer,
cache is not helpful, and it will cause cache conflict if it is cacheable*/
- pragma DATA_SECTION (ProfileData, ".far:External_NonCache_Data")
Hook_Record ProfileData[MAX_PROFILE_RECORD_NUM];
/*this variable should be located in internal memory for performance. Initializing it to be 0xFFFFFFFF actually disable hook recording*/ unsigned int uiProfileDataCnt= 0xFFFFFFFF; //counter of records
/*call this function where you want to begin record, recalling this function actually restarts the record. If profile data buffer is in DDR, this function can only be called after DDR initialization*/
- pragma NO_HOOKS(hook_init);
void hook_init() {
TSCL= 0; //enable TSC
uiProfileDataCnt= 0;
/*0xFFFFFFFF represents a invalid record*/ memset(ProfileData, 0xFFFFFFFF, sizeof(ProfileData));
}
/*-------------------------------------------------------------- Following hook functions stop recording when record buffer full, thus save the records from the beginning of the program
*/
- pragma CODE_SECTION (entry_hook, ".text:hook_funcs")
void entry_hook(void (* func_addr)()) {
unsigned int uiOldGIE;
//disable interrupt to make sure every record is atomic uiOldGIE= _disable_interrupts();
if(uiProfileDataCnt<MAX_PROFILE_RECORD_NUM) { ProfileData[uiProfileDataCnt].address_type= (unsigned int)func_addr; ProfileData[uiProfileDataCnt].TSCL= TSCL; ProfileData[uiProfileDataCnt].TSCH= TSCH;
uiProfileDataCnt++; }
_restore_interrupts(uiOldGIE); //restore interrupt
}
- pragma CODE_SECTION (exit_hook, ".text:hook_funcs")
void exit_hook(void (* func_addr)()) {
unsigned int uiOldGIE;
//disable interrupt to make sure every record is atomic uiOldGIE= _disable_interrupts();
if(uiProfileDataCnt<MAX_PROFILE_RECORD_NUM) { /*function Address always has last 2 bits == 0. "|HOOK_TYPE_FUNC_EXIT" to signal this is a function exit record in the record buffer, make it possible to distinguish record type in the post processing tool*/ ProfileData[uiProfileDataCnt].address_type= ((unsigned int)func_addr)|HOOK_TYPE_FUNC_EXIT; ProfileData[uiProfileDataCnt].TSCL= TSCL; ProfileData[uiProfileDataCnt].TSCH= TSCH;
uiProfileDataCnt++; }
_restore_interrupts(uiOldGIE); //restore interrupt
}
/*if multiple tasks are used, task switch must be recorded with task switch hook feature of OS. Without task switch record, the function executing when task switch happens can not be recognized correctly by post processing tool. function hook should NOT be generated for this function*/
- pragma CODE_SECTION (task_switch_hook, ".text:hook_funcs")
- pragma NO_HOOKS(task_switch_hook);
void task_switch_hook(unsigned int prevTask,
unsigned int nextTask)
{
unsigned int uiOldGIE; unsigned int uiTSCL, uiTSCH;
//disable interrupt to make sure every record is atomic uiOldGIE= _disable_interrupts();
uiTSCL= TSCL; uiTSCH= TSCH;
if(uiProfileDataCnt<MAX_PROFILE_RECORD_NUM) { /*task handle Address always has last 2 bits == 0. "|HOOK_TYPE_TASK_EXIT" to signal this is a task exit record in the record buffer, make it possible to distinguish record type in the post processing tool*/ ProfileData[uiProfileDataCnt].address_type= prevTask|HOOK_TYPE_TASK_EXIT; ProfileData[uiProfileDataCnt].TSCL= uiTSCL; ProfileData[uiProfileDataCnt].TSCH= uiTSCH;
uiProfileDataCnt++; }
if(uiProfileDataCnt<MAX_PROFILE_RECORD_NUM) { /*task handle Address always has last 2 bits == 0. "|HOOK_TYPE_TASK_ENTRY" to signal this is an task entry record in the record buffer, make it possible to distinguish record type in the post processing tool*/ ProfileData[uiProfileDataCnt].address_type= nextTask|HOOK_TYPE_TASK_ENTRY; ProfileData[uiProfileDataCnt].TSCL= uiTSCL; ProfileData[uiProfileDataCnt].TSCH= uiTSCH;
uiProfileDataCnt++; }
_restore_interrupts(uiOldGIE); //restore interrupt
}
/*-------------------------------------------------------------- Following hook functions use record buffer as a circular buffer, thus save the last records when program stops
*/
- pragma CODE_SECTION (entry_hook_circular, ".text:hook_funcs_circular")
void entry_hook_circular(void (* func_addr)()) {
unsigned int uiOldGIE; unsigned int uiProfileDataIndex; //index in the circular buffer
//return before hook is initialized if(0xFFFFFFFF==uiProfileDataCnt) return;
//disable interrupt to make sure every record is atomic uiOldGIE= _disable_interrupts();
/*make data pointer circular*/ uiProfileDataIndex= uiProfileDataCnt & (MAX_PROFILE_RECORD_NUM-1);
ProfileData[uiProfileDataIndex].address_type= (unsigned int)func_addr; ProfileData[uiProfileDataIndex].TSCL= TSCL; ProfileData[uiProfileDataIndex].TSCH= TSCH;
uiProfileDataCnt++;
_restore_interrupts(uiOldGIE); //restore interrupt
}
- pragma CODE_SECTION (exit_hook_circular, ".text:hook_funcs_circular")
void exit_hook_circular(void (* func_addr)()) {
unsigned int uiOldGIE; unsigned int uiProfileDataIndex; //index in the circular buffer
//return before hook is initialized if(0xFFFFFFFF==uiProfileDataCnt) return;
//disable interrupt to make sure every record is atomic uiOldGIE= _disable_interrupts();
/*make data pointer circular*/ uiProfileDataIndex= uiProfileDataCnt & (MAX_PROFILE_RECORD_NUM-1);
/*Address always has last 2 bits == 0. So, "| 1" to signal this is an exit record in the buffer. makes it possible to distinguish enter against exit in the post processing tool*/ ProfileData[uiProfileDataIndex].address_type= ((unsigned int)func_addr)|HOOK_TYPE_FUNC_EXIT; ProfileData[uiProfileDataIndex].TSCL= TSCL; ProfileData[uiProfileDataIndex].TSCH= TSCH;
uiProfileDataCnt++;
_restore_interrupts(uiOldGIE); //restore interrupt
}
/*if multiple tasks are used, task switch must be recorded with task switch hook feature of OS. Without task switch record, the function executing when task switch happens can not be recognized correctly by post processing tool. function hook should NOT be generated for this function*/
- pragma CODE_SECTION (task_switch_hook_circular, ".text:hook_funcs_circular")
- pragma NO_HOOKS(task_switch_hook_circular);
void task_switch_hook_circular(unsigned int prevTask,
unsigned int nextTask)
{
unsigned int uiOldGIE; unsigned int uiProfileDataIndex; //index in the circular buffer unsigned int uiTSCL, uiTSCH;
//return before hook is initialized if(0xFFFFFFFF==uiProfileDataCnt) return;
//disable interrupt to make sure every record is atomic uiOldGIE= _disable_interrupts();
uiTSCL= TSCL; uiTSCH= TSCH;
/*make data pointer circular*/ uiProfileDataIndex= uiProfileDataCnt & (MAX_PROFILE_RECORD_NUM-1);
/*task handle Address always has last 2 bits == 0. "|HOOK_TYPE_TASK_EXIT" to signal this is an task exit record in the record buffer, make it possible to distinguish record type in the post processing tool*/ ProfileData[uiProfileDataIndex].address_type= prevTask|HOOK_TYPE_TASK_EXIT; ProfileData[uiProfileDataIndex].TSCL= uiTSCL; ProfileData[uiProfileDataIndex].TSCH= uiTSCH;
uiProfileDataCnt++;
/*make data pointer circular*/ uiProfileDataIndex= uiProfileDataCnt & (MAX_PROFILE_RECORD_NUM-1);
/*task handle Address always has last 2 bits == 0. "|HOOK_TYPE_TASK_ENTRY" to signal this is an task entry record in the record buffer, make it possible to distinguish record type in the post processing tool*/ ProfileData[uiProfileDataIndex].address_type= nextTask|HOOK_TYPE_TASK_ENTRY; ProfileData[uiProfileDataIndex].TSCL= uiTSCL; ProfileData[uiProfileDataIndex].TSCH= uiTSCH;
uiProfileDataCnt++;
_restore_interrupts(uiOldGIE); //restore interrupt
} </syntaxhighlight>
The buffer for enter/exit records should be big enough to cover all functions and tasks interested. For periodical processing system, the buffer should be big enough to record all function calls in a processing period.
Two sets of hook functions are implemented in above code. One set stops recording when the record buffer full, thus saves the records from the beginning of the program; the other set uses record buffer as a circular buffer, thus saves the last records when program stops. User can choose one of them according to his application.
Since Hook is an intrusive method, it affects the performance of original program. Hook functions are called most frequently, so they should be put into internal memory to alleviate the impact.
The code size of each function may increase about 64 to 128 bytes. The code size of the entry/exit hook function is about 80 bytes. If entry/exit hook functions are inlined, each function size additionally increases about 160 bytes. As a result of code size increasing, you may need to change you memory allocation to move something in internal memory to external memory.
The cycles for each function call may increase about 40 to 80 cycles, which may reduce the processing capability or channel density of original system.
To alleviate the impact, we can use Hooks only for functions we are interested in. You may use file specific options to add function hooks for functions in a specific source code file. You may use NO_HOOKS pragma to disable hooks for specific function.
Though the overall performance of original program is affected, it does not change the relative relationship between functions, while this is the key information we need dig out with this tool for most cases. For example, we can use this tool to find the top 10 functions which consume most of DSP cycles.
Run program and dump the raw data[edit]
The Hook feature is independent of CCS or emulator, the program with Hook feature can be run with or without CCS/emulator.
After running the program, the function and task entry/exit records with time stamp are saved in the raw profile data buffer. You can use any convenient method to dump this buffer and save the contents to a file. These methods may include:
- Load the DSP through emulator/CCS, run the DSP, and then dump the buffer through CCS.
- Load the DSP through system host, run the DSP, and then dump the buffer through host interface.
- Connect emulator to your DSP board before power up, but boot load the DSP through your system host as a normal standalone system without emulator, run the DSP, and then connect the CCS to the DSP, and dump the buffer through CCS.
With the hook functions introduced in previous section, the raw profile data buffer is
<syntaxhighlight lang="c">
Hook_Record ProfileData[MAX_PROFILE_RECORD_NUM];
</syntaxhighlight>
Number of records are stored in:
<syntaxhighlight lang="c">
unsigned int uiProfileDataCnt;
</syntaxhighlight>
If uiProfileDataCnt is less than the buffer size, only number of uiProfileDataCnt records (uiProfileDataCnt*3 words) should be dumped; if uiProfileDataCnt is equal or larger than buffer size (circular buffer), the full buffer should be dumped.
The raw data may be saved in a text file or binary file. The text format should comply with file generated by memory dump feature of CCS like below (32-bit Hex – C Style):
<syntaxhighlight lang="c">
1651 1 90000000 0 60000
0x0C000E8C
0x0064AAA8
0x00000000
0x0C000E24
0x0064AB16
0x00000000
0x0C000E25
0x0064ABEA
0x00000000
0x0C000E8D
0x0064AD93
0x00000000
0x0C00C644
0x0064FBA3
0x00000000
0x0C00C645
0x0064FC8D
0x00000000
0x00000003
0x006500D6
0x00000000
0x00804C12
0x006500D6
0x00000000
0x0C00C608
0x00650298
0x00000000
0x00804C13
0x00650438
0x00000000
0x008043BA
0x00650438
0x00000000
0x0C00C598
0x006505F6
0x00000000
……
</syntaxhighlight>
The first line is a file header generated by CCS, which is not really needed by profiling. The real data begins from the second line, which is generated with the hook functions introduced in above section, they are: address, TSCL, TSCH; address, TSCL, TSCH…
If the data is saved into a binary file, it is just exactly same as the content in the DSP memory buffer (no file header like the text file generated by CCS memory dumping). If you save data to binary file through CCS, 32-bit data type size should be selected.
Process raw entry/exit records with a tool[edit]
Huge raw data dumped from DSP buffer need be processed with a tool to generate profile.
An example tool is provided with this application note. Following figure shows the tool.
Figure 3 - A tool for profile post processing |
The raw data only record the address of function or task, while function or task name is not recorded for memory saving. According to the address of the function, the function name can be found in DSP .out file or .map file generated by cgtools. This tool uses .out file as input to resolve function names.
Please note, some tasks may not have a corresponding symbol in the .out file because they are created at run time. For this kind of task or function can not find name from the .out file, they are given name in the format like “?task #1”.
If there is static function in the program, to get the name of static function, you need uncheck the “Parse global symbol only” checkbox in the tool. However, when all symbols are parsed, we may get multiple symbols with same address, for this case, those symbols’ names are concatenated with “ - ” as the name of the address. For example, if both symbol “SL2MemoryBufInit” and “.text” has same value of “0x0c0137e0”, then in the output file, you will see the function name for “0x0c0137e0” is “.text - SL2MemoryBufInit”.
The tool generates two output files, one is call list file, it is actually a list of function/task call history, this may be useful for software debug; and the other output file is final profile result. The file format is CSV (Comma Separated Values), it is actually a text format, but can be opened with Microsoft Excel like below.
Figure 4 - Example of call list file |
Please note, the function call history is listed in the order of competition time (exit TSC) of a function. You can re-sort the data by enter time (enter TSC) in the Excel.
The inclusive cycles in above table is the cycles for a function including all sub-functions, the exclusive cycles in above table is the cycles for a function excluding its sub-functions.
Figure 5 - Example of profile output file |
In above table, the first line of the table records the TSC (Time Stamp Counter) of first record and last record, the total cycles = (LastTSC - FirstTSC).
The valid cycles of all functions be recorded should be less than the total cycles, because some functions in libraries or in assembly code can not be recorded by hook feature provided by C compiler. The percentage of valid cycles over total cycles is shown in row 2 column F.
In above table, the “Inclu” is abbreviation of “Inclusive”; the “Exclu” is abbreviation of “Exclusive”. For example, “MaxExcluCycle” means the maximum cycles of a function exclude the cycles of its sub-functions.
Since exclusive cycle is more valuable for most applications, so the profile output is sorted by descending order of total exclusive cycle of each function (ExecuteTimes*AverageExcluCycle). Percentage value in the table is the percentage of a function’s exclusive cycles in the total valid cycles of all functions, that is, Percentage= (ExecuteTimes*AverageExcluCycle)/(Functions ValidCycles).
The second part in the profile output file is for tasks. Task cycles are collected by task switch hook, it is supported by Operation System such as BIOS, which can record almost all cycles, so, “Valid/Total” can be close to 100%. Inclusive cycle is less meaningful for task, so the cycles collected for tasks by this tool is something like the Exclusive cycles, it excludes the cycles of other tasks, but the cycles may include the cycles of hardware or software interrupt happens during a task execution.
The tool was developed with Borland Turbo C++ Explorer, the code are also provided along with this application note. The main mechanism behind this implementation is function call stack. Following figure shows the basic flow diagram.
Figure 6 - Flow diagram of function profile post processing |
The tool reports some status information in the status window during processing. The number of functions and number of tasks founded in the raw data file are reported.
Since the data record may begin or end during some functions executing, there may be several enter records without corresponding exit record, or there may be several exit records without corresponding enter record. The tool reports the number of this kind of records.
The task stack data structure are allocated according to call stack size configured in the tool’s GUI (Graphical User Interface). The default values should be enough for most application.
The maximum call stack depth is also reported in the status window, it actually represents maximum function calls pushed into the call stack. The stack size configuration in the GUI must be larger than actual call stack depth.
Example[edit]
An example project is provided along with this application note to demonstrate how to use the tool. Following figure shows the project in CCS5.3.
Figure 7 - Example project |
It is a test created based on the “mutex” example of SYS/BIOS. A timer is setup to switch three tasks circularly. Each task just runs a delay function to consume some cycles.
This project can be run on C6670 EVM, C6678 EVM or TCI6614 EVM. If you want to run this project on other C66x DSP board, you need change configuration for main PLL and DDR3 accordingly.
The steps to profile the program are:
- Add “hook.c” and “hook.h” to the original project.
- Modify the macro “MAX_PROFILE_RECORD_NUM” defined in the “hook.h” to determine how many profiling records to save, that is, the size of the buffer for profiling records.
- Add code to call “hook_init” function in the main function. Please note, we recommend to store the raw profiling data in DDR, so please call “hook_init” function after the DDR initialization.
- Allocate the ".far:External_NonCache_Data" section defined for the profiling data buffer to Non-cacheable DDR through additional link command file “KeyStone_BIOS.cmd”.
- In the compiler options of the project, setup entry hook and exit hook to be the hook functions implemented in the “hook.c”. If relative smaller buffer is used and you are more interested in the last part of records, the “xxxx_hook_circular” functions should be used.
- In the SYS/BIOS configuration script, setup the task switch hook to be the hook function implemented in the “hook.c”. If relative smaller buffer is used and you are more interested in the last part of records, the “task_switch_hook_circular” function should be used.
- Compile the program and run it on core 0 of a C66x DSP.
- Wait for several seconds and then stop the program.
- Check the variable “uiProfileDataCnt” in the expression window to see how many profiling records we got.
- In the memory window, save the buffer “ProfileData” to a data file in “32-bit Hex – C Style” (not TI Style) format. If the uiProfileDataCnt<MAX_PROFILE_RECORD_NUM, only save uiProfileDataCnt*3 words (each record occupies 3 words); otherwise, the buffer was wrapped around, then we should save the full buffer, that is, number of words to be saved should be MAX_PROFILE_RECORD_NUM*3.
- Run the post processing tool. Select the data file and corresponding .out file, and then click the “Process” button (Figure 3).
- Check the information in the output window. When it shows “Profile data processing complete.”, you should get the output in files xxx_profile.csv (Figure 5) and xxx_call_list.csv (Figure 4).
References[edit]
- TMS320C6000 Optimizing Compiler User's Guide (spru187)
- TMS320 DSP/BIOS User Guide (spru423)
- Common Object File Format (SPRAAO8)
- Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification