diff options
author | Leena Miettinen <riitta-leena.miettinen@qt.io> | 2020-01-23 11:45:07 +0100 |
---|---|---|
committer | Leena Miettinen <riitta-leena.miettinen@qt.io> | 2020-01-27 09:05:07 +0000 |
commit | 5fc456dd2283b2d1e6c4e6d34856052658f34cc4 (patch) | |
tree | 7a7cd26a33014e401536a149fa47f6586b68c247 /doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc | |
parent | c9f90047ac701416e439f492069c1a0bb364fc08 (diff) | |
download | qt-creator-5fc456dd2283b2d1e6c4e6d34856052658f34cc4.tar.gz |
Doc: Rearrange files in the doc folder
Source and configuration files for each manual are now located in a
separate subdirectory, with common configuration files in doc/config.
doc
|_config
|_qtcreator
|_qtcreatordev
|_qtdesignstudio
Edit the config files accordingly.
Change-Id: Idc747a7c16e84f3e06add91234dc5fc908e64cc5
Reviewed-by: Eike Ziller <eike.ziller@qt.io>
Diffstat (limited to 'doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc')
-rw-r--r-- | doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc | 499 |
1 files changed, 499 insertions, 0 deletions
diff --git a/doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc b/doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc new file mode 100644 index 0000000000..61e04508cf --- /dev/null +++ b/doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc @@ -0,0 +1,499 @@ +/**************************************************************************** +** +** Copyright (C) 2019 The Qt Company Ltd. +** Contact: https://www.qt.io/licensing/ +** +** This file is part of the Qt Creator documentation. +** +** Commercial License Usage +** Licensees holding valid commercial Qt licenses may use this file in +** accordance with the commercial license agreement provided with the +** Software or, alternatively, in accordance with the terms contained in +** a written agreement between you and The Qt Company. For licensing terms +** and conditions see https://www.qt.io/terms-conditions. For further +** information use the contact form at https://www.qt.io/contact-us. +** +** GNU Free Documentation License Usage +** Alternatively, this file may be used under the terms of the GNU Free +** Documentation License version 1.3 as published by the Free Software +** Foundation and appearing in the file included in the packaging of +** this file. Please review the following information to ensure +** the GNU Free Documentation License version 1.3 requirements +** will be met: https://www.gnu.org/licenses/fdl-1.3.html. +** +****************************************************************************/ + +// ********************************************************************** +// NOTE: the sections are not ordered by their logical order to avoid +// reshuffling the file each time the index order changes (i.e., often). +// Run the fixnavi.pl script to adjust the links to the index order. +// ********************************************************************** + +/*! + \contentspage index.html + \previouspage creator-heob.html + \page creator-cpu-usage-analyzer.html + \nextpage creator-cppcheck.html + + \title Analyzing CPU Usage + + \QC is integrated with the Linux Perf tool that can be + used to analyze the CPU and memory usage of an application on embedded + devices and, to a limited extent, on Linux desktop platforms. The + Performance Analyzer uses the Perf tool bundled with the Linux kernel to + take periodic snapshots of the call chain of an application and visualizes + them in a timeline view or as a flame graph. + + \section1 Using the Performance Analyzer + + The Performance Analyzer usually needs to be able to locate debug symbols for + the binaries involved. + + Profile builds produce optimized binaries with separate debug symbols and + should generally be used for profiling. + + To manually set up a build configuration to provide separate debug symbols, + edit the project build settings: + + \list 1 + \li To generate debug symbols also for applications compiled in release + mode, select \uicontrol {Projects}, and then select + \uicontrol Details next to \uicontrol {Build Steps} to view the + build steps. + + \li Select the \uicontrol {Generate separate debug info} check box. + + \li Select \uicontrol Yes to recompile the project. + + \endlist + + You can start the Performance Analyzer in the following ways: + + \list + \li Select \uicontrol Analyze > \uicontrol {Performance Analyzer} to + profile the current application. + + \li Select the + \inlineimage qtcreator-analyze-start-button.png + (\uicontrol Start) button to start the application from the + Performance Analyzer. + + \endlist + + \note If data collection does not start automatically, select the + \inlineimage recordfill.png + (\uicontrol {Collect profile data}) button. + + When you start analyzing an application, the application is launched, and + the Performance Analyzer immediately begins to collect data. This is indicated + by the time running in the \uicontrol Recorded field. However, as the data + is passed through the Perf tool and an extra helper program bundled with + \QC, and both buffer and process it on the fly, data may arrive in \QC + several seconds after it was generated. An estimate for this delay is given + in the \uicontrol {Processing delay} field. + + Data is collected until you select the + \uicontrol {Stop collecting profile data} button or terminate the + application. + + Select the \uicontrol {Stop collecting profile data} button to disable the + automatic start of the data collection when an application is launched. + Profile data will still be generated, but \QC will discard it until you + select the button again. + + \section1 Profiling Memory Usage on Devices + + To create trace points for profiling memory usage on a target device, select + \uicontrol Analyze > \uicontrol {Performance Analyzer Options} > + \uicontrol {Create Memory Trace Points}. + + To add events for the trace points, see \l{Choosing Event Types} + + You can record a memory trace to view usage graphs in the samples rows of + the timeline and to view memory allocations, peaks, and releases in the + flame graph. + + \section1 Specifying Performance Analyzer Settings + + To specify global settings for the Performance Analyzer, select + \uicontrol Tools > \uicontrol Options > \uicontrol Analyzer > + \uicontrol {CPU Usage}. For each run configuration, you can also + use specialized settings. Select \uicontrol Projects > \uicontrol Run, and + then select \uicontrol Details next to + \uicontrol {Performance Analyzer Settings}. + + \image qtcreator-performance-analyzer-settings.png + + To edit the settings for the current run configuration, you can also select + the dropdown menu next to the \uicontrol {Collect profile data} button. + + \section2 Choosing Event Types + + In the \uicontrol Events table, you can specify which events should trigger + the Performance Analyzer to take a sample. The most common way of analyzing + CPU usage involves periodic sampling, driven by hardware performance + counters that react to the number of instructions or CPU cycles executed. + Alternatively, a software counter that uses the CPU clock can be chosen. + + Select \uicontrol {Add Event} to add events to the table. + In the \uicontrol {Event Type} column, you can choose the general type of + event to be sampled, most commonly \uicontrol {hardware} or + \uicontrol {software}. In the \uicontrol {Counter} column, you can choose + which specific counter should be used for the sampling. For example, + \uicontrol {instructions} in the \uicontrol {hardware} group or + \uicontrol {cpu-clock} in the \uicontrol {software} group. + + More specialized sampling, for example by cache misses or cache hits, is + possible. However, support for it depends on specific features of the CPU + involved. For those specialized events, you can give more detailed sampling + instructions in the \uicontrol {Operation} and \uicontrol {Result} columns. + For example, you can choose a \uicontrol {cache} event for + \uicontrol {L1-dcache} on the \uicontrol {load} operation with a result + of \uicontrol {misses}. That would sample L1-dcache misses on reading. + + Select \uicontrol {Remove Event} to remove the selected event from the + table. + + Select \uicontrol {Use Trace Points} to replace the current selection of + events with trace points defined on the target device and set the + \uicontrol {Sample mode} to \uicontrol {event count} and the + \uicontrol {Sample period} to \c {1}. If the trace points on the target + were defined using the \uicontrol {Create Trace Points} option, the + Performance Analyzer will automatically use them to profile memory usage. + + Select \uicontrol {Reset} to revert the selection of events, as well as the + \uicontrol {Sample mode} and \uicontrol {Sample period} to the default + values. + + \section2 Choosing a Sampling Mode and Period + + In the \uicontrol {Sample mode} and \uicontrol {Sample period} fields, you + can specify how samples are triggered: + + \list + + \li Sampling by \uicontrol {event count} instructs the kernel to take + a sample every \c n times one of the chosen events has occurred, + where \c n is specified in the \uicontrol {Sample period} field. + + \li Sampling by \uicontrol {frequency (Hz)} instructs the kernel to try and + take a sample \c n times per second, by automatically adjusting the + sampling period. Specify \c n in the \uicontrol {Sample period} + field. + + \endlist + + High frequencies or low event counts result in more accurate data, at the + expense of a higher overhead and a larger volume of data being + generated. The actual sampling period is determined by the Linux kernel on + the target device, which takes the period set for Perf merely as advice. + There may be a significant difference between the sampling period you + request and the actual result. + + In general, if you configure the Performance Analyzer to collect more data + than it can transmit over the connection between the target and the host + device, the application may get blocked while Perf is trying to send the + data, and the processing delay may grow excessively. You should then change + the \uicontrol {Sample period} or the \uicontrol {Stack snapshot size}. + + \section2 Selecting Call Graph Mode + + In the \uicontrol {Call graph mode} field, you can specify how the + Performance Analyzer recovers call chains from your application: + + \list + + \li The \uicontrol {Frame Pointer}, or \c fp, mode relies on frame pointers + being available in the profiled application and will instruct the kernel on + the target device to walk the chain of frame pointers in order to retrieve + a call chain for each sample. + + \li The \uicontrol {Dwarf} mode works also without frame pointers, but + generates significantly more data. It takes a snapshot of the current + application stack each time a sample is triggered and transmits that + snapshot to the host computer for analysis. + + \li The \uicontrol {Last Branch Record} mode does not use a memory buffer. + It automatically decodes the last 16 taken branches every time execution + stops. It is supported only on recent Intel CPUs. + + \endlist + + Qt and most system libraries are compiled without frame pointers by + default, so the frame pointer mode is only useful with customized systems. + + \section2 Setting Stack Snapshot Size + + The Performance Analyzer will analyze and \e unwind the stack snapshots + generated by Perf in dwarf mode. Set the size of the stack snapshots in the + \uicontrol {Stack snapshot size} field. Large stack snapshots result in a + larger volume of data to be transferred and processed. Small stack + snapshots may fail to capture call chains of highly recursive applications + or other intense stack usage. + + \section2 Adding Command Line Options For Perf + + You can specify additional command line options to be passed to Perf when + recording data in the \uicontrol {Additional arguments} field. You may want + to specify \c{--no-delay} or \c{--no-buffering} to reduce the processing + delay. However, those options are not supported by all versions of Perf and + Perf may not start if an unsupported option is given. + + \section2 Resolving Names for JIT-compiled JavaScript Functions + + Since version 5.6.0, Qt can generate perf.map files with information about + JavaScript functions. The Performance Analyzer will read them and show the + function names in the \uicontrol Timeline, \uicontrol Statistics, and + \uicontrol {Flame Graph} views. This only works if the process being + profiled is running on the host computer, not on the target device. To + switch on the generation of perf.map files, add the environment variable + \c QV4_PROFILE_WRITE_PERF_MAP to the \uicontrol {Run Environment} and set + its value to \c 1. + + \section1 Analyzing Collected Data + + The \uicontrol Timeline view displays a graphical representation of CPU + usage per thread and a condensed view of all recorded events. + + \image qtcreator-performance-analyzer-timeline.png "Performance Analyzer" + + Each category in the timeline describes a thread in the application. Move + the cursor on an event (5) on a row to see how long it takes and which + function in the source it represents. To display the information only when + an event is selected, disable the + \uicontrol {View Event Information on Mouseover} button (4). + + The outline (9) summarizes the period for which data was collected. Drag + the zoom range (7) or click the outline to move on the outline. You can + also move between events by selecting the + \uicontrol {Jump to Previous Event} and \uicontrol {Jump to Next Event} + buttons (1). + + Select the \uicontrol {Show Zoom Slider} button (2) to open a slider that + you can use to set the zoom level. You can also drag the zoom handles (8). + To reset the default zoom level, right-click the timeline to open the + context menu, and select \uicontrol {Reset Zoom}. + + \section2 Selecting Event Ranges + + You can select an event range (6) to view the time it represents or to zoom + into a specific region of the trace. Select the \uicontrol {Select Range} + button (3) to activate the selection tool. Then click in the timeline to + specify the beginning of the event range. Drag the selection handle to + define the end of the range. + + You can use event ranges also to measure delays between two subsequent + events. Place a range between the end of the first event and the beginning + of the second event. The \uicontrol Duration field displays the delay + between the events in milliseconds. + + To zoom into an event range, double-click it. + + To remove an event range, close the \uicontrol Selection dialog. + + \section2 Understanding the Data + + Generally, events in the timeline view indicate how long a function call + took. Move the mouse over them to see details. The details always include + the address of the function, the approximate duration of the call, the ELF + file the function resides in, the number of samples collected with this + function call active, the total number of times this function was + encountered in the thread, and the number of samples this function was + encountered in at least once. + + For functions with debug information available, the details include the + location in source code and the name of the function. You can click on such + events to move the cursor in the code editor to the part of the code the + event is associated with. + + As the Perf tool only provides periodic samples, the Performance Analyzer + cannot determine the exact time when a function was called or when it + returned. You can, however, see exactly when a sample was taken in the + second row of each thread. The Performance Analyzer assumes that if the same + function is present at the same place in the call chain in multiple + consecutive samples, then this represents a single call to the respective + function. This is, of course, a simplification. Also, there may be other + functions being called between the samples taken, which do not show up in + the profile data. However, statistically, the data is likely to show the + functions that spend the most CPU time most prominently. + + If a function without debug information is encountered, further unwinding + of the stack may fail. Unwinding will also fail for some symbols + implemented in assembly language. If unwinding fails, only a part of the + call chain is displayed, and the surrounding functions may seem to be + interrupted. This does not necessarily mean they were actually interrupted + during the execution of the application, but only that they could not be + found in the stacks where the unwinding failed. + + JavaScript functions from the QML engine running in the JIT mode can be + unwound. However, their names will only be displayed when + \c QV4_PROFILE_WRITE_PERF_MAP is set. Compiled JavaScript generated by the + \l{http://doc.qt.io/QtQuickCompiler/}{Qt Quick Compiler} can also be + unwound. In this case the C++ names generated by the compiler are shown for + JavaScript functions, rather than their JavaScript names. When running in + interpreted mode, stack frames involving QML can also be unwound, showing + the interpreter itself, rather than the interpreted JavaScript. + + Kernel functions included in call chains are shown on the third row of each + thread. + + The coloring of the events represents the actual sample rate for the + specific thread they belong to, across their duration. The Linux kernel + will only take a sample of a thread if the thread is active. At the same + time, the kernel tries to honor the requested event period. + Thus, differences in the sampling frequency between different threads + indicate that the thread with more samples taken is more likely to be the + overall bottleneck, and the thread with less samples taken has likely spent + time waiting for external events such as I/O or a mutex. + + \section1 Viewing Statistics + + \image qtcreator-performance-analyzer-statistics.png + + The \uicontrol Statistics view displays the number of samples each function + in the timeline was contained in, in total and when on the top of the + stack (called \c self). This allows you to examine which functions you need + to optimize. A high number of occurrences might indicate that a function is + triggered unnecessarily or takes very long to execute. + + Click on a row to move to the respective function in the source code in the + code editor. + + The \uicontrol Callers and \uicontrol Callees panes show dependencies + between functions. They allow you to examine the internal functions of the + application. The \uicontrol Callers pane summarizes the functions that + called the function selected in the main view. The \uicontrol Callees pane + summarizes the functions called from the function selected in the main + view. + + Click on a row to move to the respective function in the source code in the + code editor and select it in the main view. + + To copy the contents of one view or row to the clipboard, select + \uicontrol {Copy Table} or \uicontrol {Copy Row} in the context menu. + + \section2 Visualizing Statistics as Flame Graphs + + \image qtcreator-performance-analyzer-flamegraph.png + + The \uicontrol {Flame Graph} view shows a more concise statistical overview + of the execution. The horizontal bars show an aspect of the samples + taken for a certain function, relative to the same aspect of all samples + together. The nesting shows which functions were called by which other ones. + + The \uicontrol {Visualize} button lets you choose what aspect to show in the + \uicontrol {Flame Graph}. + + \list + + \li \uicontrol {Samples} is the default visualization. The size of the + horizontal bars represents the number of samples recorded for the given + function. + + \li In \uicontrol {Peak Usage} mode, the size of the horizontal bars + represents the amount of memory allocated by the respective functions, at + the point in time when the allocation's memory usage was at its peak. + + \li In \uicontrol {Allocations} mode, the size of the horizontal bars + represents the number of memory allocations triggered by the respective + functions. + + \li In \uicontrol {Releases} mode, the size of the horizontal bars + represents the number of memory releases triggered by the respective + functions. + + \endlist + + The \uicontrol {Peak Usage}, \uicontrol {Allocations}, and + \uicontrol {Releases} modes will only show any data if samples from memory + trace points have been recorded. + + \section2 Interaction between the views + + When you select a stack frame in either of the \uicontrol {Timeline}, + \uicontrol {Flame Graph}, or \uicontrol {Statistics} views, information + about it is displayed in the other two views. To view a time range in the + \uicontrol {Statistics} and \uicontrol {Flame Graph} views, select + \uicontrol Analyze > \uicontrol {Performance Analyzer Options} > + \uicontrol {Limit to the Range Selected in Timeline}. To show the full + stack frame, select \uicontrol {Show Full Range}. + + \section1 Loading Perf Data Files + + You can load any \c perf.data files generated by recent versions of the + Linux Perf tool and view them in \QC. Select \uicontrol Analyze > + \uicontrol {Performance Analyzer Options} > \uicontrol {Load perf.data} to + load a file. + + \image qtcreator-cpu-usage-analyzer-load-perf-trace.png + + The Performance Analyzer needs to know the context in which the + data was recorded to find the debug symbols. Therefore, you have to specify + the kit that the application was built with and the folder where the + application executable is located. + + The Perf data files are generated by calling \c {perf record}. Make sure to + generate call graphs when recording data by starting Perf with the + \c {--call-graph} option. Also check that the necessary debug symbols are + available to the Performance Analyzer, either at a standard location + (\c /usr/lib/debug or next to the binaries), or as part of the Qt package + you are using. + + The Performance Analyzer can read Perf data files generated in either frame + pointer or dwarf mode. However, to generate the files correctly, numerous + preconditions have to be met. All system images for the + \l{http://doc.qt.io/QtForDeviceCreation/qtee-supported-platforms.html} + {Qt for Device Creation reference devices}, except for Freescale iMX53 Quick + Start Board and SILICA Architect Tibidabo, are correctly set up for + profiling in the dwarf mode. For other devices, check whether Perf can read + back its own data in a sensible way by checking the output of + \c {perf report} or \c {perf script} for the recorded Perf data files. + + \section1 Loading and Saving Trace Files + + You can save and load trace data in a format specific to the + Performance Analyzer with the respective entries in \uicontrol Analyze > + \uicontrol {Performance Analyzer Options}. This format is self-contained, and + therefore loading it does not require you to specify the recording + environment. You can transfer such trace files to a different computer + without any tool chain or debug symbols and analyze them there. + + \section1 Troubleshooting + + The Performance Analyzer might fail to record data for the following reasons: + + \list 1 + \li Perf events may be globally disabled on your system. The + preconfigured Boot to Qt images come with perf events enabled. For + a custom configuration you need to make sure that the file + \c {/proc/sys/kernel/perf_event_paranoid} contains a value smaller + than \c {2}. For maximum flexibility in recording traces you can + set the value to \c {-1}. This allows any user to record any kind + of trace, even using raw kernel trace points. + \li The connection between the target device and the host may not be + fast enough to transfer the data produced by Perf. Try modifying + the values of the \uicontrol {Stack snapshot size} or + \uicontrol {Sample period} settings. + \li Perf may be buffering the data forever, never sending it. Add + \c {--no-delay} or \c {--no-buffering} to the + \uicontrol {Additional arguments} field. + \li Some versions of Perf will not start recording unless given a + certain minimum sampling frequency. Try with a + \uicontrol {Sample period} value of 1000. + \li On some devices, in particular various i.MX6 Boards, the hardware + performance counters are dysfunctional and the Linux kernel may + randomly fail to record data after some time. Perf can use different + types of events to trigger samples. You can get a list of available + event types by running \c {perf list} on the device and then choose + the respective event types in the settings. The choice of event type + affects the performance and stability of the sampling. The + \c {cpu-clock} \c {software} event is a safe but relatively slow + option as it does not use the hardware performance counters, but + drives the sampling from software. After the sampling has failed, + reboot the device. The kernel may have disabled important parts of + the performance counters system. + \endlist + + Output from the helper program that processes the data is displayed in the + \uicontrol {General Messages} output pane. +*/ |