summaryrefslogtreecommitdiff
path: root/doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc
diff options
context:
space:
mode:
authorLeena Miettinen <riitta-leena.miettinen@qt.io>2020-01-23 11:45:07 +0100
committerLeena Miettinen <riitta-leena.miettinen@qt.io>2020-01-27 09:05:07 +0000
commit5fc456dd2283b2d1e6c4e6d34856052658f34cc4 (patch)
tree7a7cd26a33014e401536a149fa47f6586b68c247 /doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc
parentc9f90047ac701416e439f492069c1a0bb364fc08 (diff)
downloadqt-creator-5fc456dd2283b2d1e6c4e6d34856052658f34cc4.tar.gz
Doc: Rearrange files in the doc folder
Source and configuration files for each manual are now located in a separate subdirectory, with common configuration files in doc/config. doc |_config |_qtcreator |_qtcreatordev |_qtdesignstudio Edit the config files accordingly. Change-Id: Idc747a7c16e84f3e06add91234dc5fc908e64cc5 Reviewed-by: Eike Ziller <eike.ziller@qt.io>
Diffstat (limited to 'doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc')
-rw-r--r--doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc499
1 files changed, 499 insertions, 0 deletions
diff --git a/doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc b/doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc
new file mode 100644
index 0000000000..61e04508cf
--- /dev/null
+++ b/doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc
@@ -0,0 +1,499 @@
+/****************************************************************************
+**
+** Copyright (C) 2019 The Qt Company Ltd.
+** Contact: https://www.qt.io/licensing/
+**
+** This file is part of the Qt Creator documentation.
+**
+** Commercial License Usage
+** Licensees holding valid commercial Qt licenses may use this file in
+** accordance with the commercial license agreement provided with the
+** Software or, alternatively, in accordance with the terms contained in
+** a written agreement between you and The Qt Company. For licensing terms
+** and conditions see https://www.qt.io/terms-conditions. For further
+** information use the contact form at https://www.qt.io/contact-us.
+**
+** GNU Free Documentation License Usage
+** Alternatively, this file may be used under the terms of the GNU Free
+** Documentation License version 1.3 as published by the Free Software
+** Foundation and appearing in the file included in the packaging of
+** this file. Please review the following information to ensure
+** the GNU Free Documentation License version 1.3 requirements
+** will be met: https://www.gnu.org/licenses/fdl-1.3.html.
+**
+****************************************************************************/
+
+// **********************************************************************
+// NOTE: the sections are not ordered by their logical order to avoid
+// reshuffling the file each time the index order changes (i.e., often).
+// Run the fixnavi.pl script to adjust the links to the index order.
+// **********************************************************************
+
+/*!
+ \contentspage index.html
+ \previouspage creator-heob.html
+ \page creator-cpu-usage-analyzer.html
+ \nextpage creator-cppcheck.html
+
+ \title Analyzing CPU Usage
+
+ \QC is integrated with the Linux Perf tool that can be
+ used to analyze the CPU and memory usage of an application on embedded
+ devices and, to a limited extent, on Linux desktop platforms. The
+ Performance Analyzer uses the Perf tool bundled with the Linux kernel to
+ take periodic snapshots of the call chain of an application and visualizes
+ them in a timeline view or as a flame graph.
+
+ \section1 Using the Performance Analyzer
+
+ The Performance Analyzer usually needs to be able to locate debug symbols for
+ the binaries involved.
+
+ Profile builds produce optimized binaries with separate debug symbols and
+ should generally be used for profiling.
+
+ To manually set up a build configuration to provide separate debug symbols,
+ edit the project build settings:
+
+ \list 1
+ \li To generate debug symbols also for applications compiled in release
+ mode, select \uicontrol {Projects}, and then select
+ \uicontrol Details next to \uicontrol {Build Steps} to view the
+ build steps.
+
+ \li Select the \uicontrol {Generate separate debug info} check box.
+
+ \li Select \uicontrol Yes to recompile the project.
+
+ \endlist
+
+ You can start the Performance Analyzer in the following ways:
+
+ \list
+ \li Select \uicontrol Analyze > \uicontrol {Performance Analyzer} to
+ profile the current application.
+
+ \li Select the
+ \inlineimage qtcreator-analyze-start-button.png
+ (\uicontrol Start) button to start the application from the
+ Performance Analyzer.
+
+ \endlist
+
+ \note If data collection does not start automatically, select the
+ \inlineimage recordfill.png
+ (\uicontrol {Collect profile data}) button.
+
+ When you start analyzing an application, the application is launched, and
+ the Performance Analyzer immediately begins to collect data. This is indicated
+ by the time running in the \uicontrol Recorded field. However, as the data
+ is passed through the Perf tool and an extra helper program bundled with
+ \QC, and both buffer and process it on the fly, data may arrive in \QC
+ several seconds after it was generated. An estimate for this delay is given
+ in the \uicontrol {Processing delay} field.
+
+ Data is collected until you select the
+ \uicontrol {Stop collecting profile data} button or terminate the
+ application.
+
+ Select the \uicontrol {Stop collecting profile data} button to disable the
+ automatic start of the data collection when an application is launched.
+ Profile data will still be generated, but \QC will discard it until you
+ select the button again.
+
+ \section1 Profiling Memory Usage on Devices
+
+ To create trace points for profiling memory usage on a target device, select
+ \uicontrol Analyze > \uicontrol {Performance Analyzer Options} >
+ \uicontrol {Create Memory Trace Points}.
+
+ To add events for the trace points, see \l{Choosing Event Types}
+
+ You can record a memory trace to view usage graphs in the samples rows of
+ the timeline and to view memory allocations, peaks, and releases in the
+ flame graph.
+
+ \section1 Specifying Performance Analyzer Settings
+
+ To specify global settings for the Performance Analyzer, select
+ \uicontrol Tools > \uicontrol Options > \uicontrol Analyzer >
+ \uicontrol {CPU Usage}. For each run configuration, you can also
+ use specialized settings. Select \uicontrol Projects > \uicontrol Run, and
+ then select \uicontrol Details next to
+ \uicontrol {Performance Analyzer Settings}.
+
+ \image qtcreator-performance-analyzer-settings.png
+
+ To edit the settings for the current run configuration, you can also select
+ the dropdown menu next to the \uicontrol {Collect profile data} button.
+
+ \section2 Choosing Event Types
+
+ In the \uicontrol Events table, you can specify which events should trigger
+ the Performance Analyzer to take a sample. The most common way of analyzing
+ CPU usage involves periodic sampling, driven by hardware performance
+ counters that react to the number of instructions or CPU cycles executed.
+ Alternatively, a software counter that uses the CPU clock can be chosen.
+
+ Select \uicontrol {Add Event} to add events to the table.
+ In the \uicontrol {Event Type} column, you can choose the general type of
+ event to be sampled, most commonly \uicontrol {hardware} or
+ \uicontrol {software}. In the \uicontrol {Counter} column, you can choose
+ which specific counter should be used for the sampling. For example,
+ \uicontrol {instructions} in the \uicontrol {hardware} group or
+ \uicontrol {cpu-clock} in the \uicontrol {software} group.
+
+ More specialized sampling, for example by cache misses or cache hits, is
+ possible. However, support for it depends on specific features of the CPU
+ involved. For those specialized events, you can give more detailed sampling
+ instructions in the \uicontrol {Operation} and \uicontrol {Result} columns.
+ For example, you can choose a \uicontrol {cache} event for
+ \uicontrol {L1-dcache} on the \uicontrol {load} operation with a result
+ of \uicontrol {misses}. That would sample L1-dcache misses on reading.
+
+ Select \uicontrol {Remove Event} to remove the selected event from the
+ table.
+
+ Select \uicontrol {Use Trace Points} to replace the current selection of
+ events with trace points defined on the target device and set the
+ \uicontrol {Sample mode} to \uicontrol {event count} and the
+ \uicontrol {Sample period} to \c {1}. If the trace points on the target
+ were defined using the \uicontrol {Create Trace Points} option, the
+ Performance Analyzer will automatically use them to profile memory usage.
+
+ Select \uicontrol {Reset} to revert the selection of events, as well as the
+ \uicontrol {Sample mode} and \uicontrol {Sample period} to the default
+ values.
+
+ \section2 Choosing a Sampling Mode and Period
+
+ In the \uicontrol {Sample mode} and \uicontrol {Sample period} fields, you
+ can specify how samples are triggered:
+
+ \list
+
+ \li Sampling by \uicontrol {event count} instructs the kernel to take
+ a sample every \c n times one of the chosen events has occurred,
+ where \c n is specified in the \uicontrol {Sample period} field.
+
+ \li Sampling by \uicontrol {frequency (Hz)} instructs the kernel to try and
+ take a sample \c n times per second, by automatically adjusting the
+ sampling period. Specify \c n in the \uicontrol {Sample period}
+ field.
+
+ \endlist
+
+ High frequencies or low event counts result in more accurate data, at the
+ expense of a higher overhead and a larger volume of data being
+ generated. The actual sampling period is determined by the Linux kernel on
+ the target device, which takes the period set for Perf merely as advice.
+ There may be a significant difference between the sampling period you
+ request and the actual result.
+
+ In general, if you configure the Performance Analyzer to collect more data
+ than it can transmit over the connection between the target and the host
+ device, the application may get blocked while Perf is trying to send the
+ data, and the processing delay may grow excessively. You should then change
+ the \uicontrol {Sample period} or the \uicontrol {Stack snapshot size}.
+
+ \section2 Selecting Call Graph Mode
+
+ In the \uicontrol {Call graph mode} field, you can specify how the
+ Performance Analyzer recovers call chains from your application:
+
+ \list
+
+ \li The \uicontrol {Frame Pointer}, or \c fp, mode relies on frame pointers
+ being available in the profiled application and will instruct the kernel on
+ the target device to walk the chain of frame pointers in order to retrieve
+ a call chain for each sample.
+
+ \li The \uicontrol {Dwarf} mode works also without frame pointers, but
+ generates significantly more data. It takes a snapshot of the current
+ application stack each time a sample is triggered and transmits that
+ snapshot to the host computer for analysis.
+
+ \li The \uicontrol {Last Branch Record} mode does not use a memory buffer.
+ It automatically decodes the last 16 taken branches every time execution
+ stops. It is supported only on recent Intel CPUs.
+
+ \endlist
+
+ Qt and most system libraries are compiled without frame pointers by
+ default, so the frame pointer mode is only useful with customized systems.
+
+ \section2 Setting Stack Snapshot Size
+
+ The Performance Analyzer will analyze and \e unwind the stack snapshots
+ generated by Perf in dwarf mode. Set the size of the stack snapshots in the
+ \uicontrol {Stack snapshot size} field. Large stack snapshots result in a
+ larger volume of data to be transferred and processed. Small stack
+ snapshots may fail to capture call chains of highly recursive applications
+ or other intense stack usage.
+
+ \section2 Adding Command Line Options For Perf
+
+ You can specify additional command line options to be passed to Perf when
+ recording data in the \uicontrol {Additional arguments} field. You may want
+ to specify \c{--no-delay} or \c{--no-buffering} to reduce the processing
+ delay. However, those options are not supported by all versions of Perf and
+ Perf may not start if an unsupported option is given.
+
+ \section2 Resolving Names for JIT-compiled JavaScript Functions
+
+ Since version 5.6.0, Qt can generate perf.map files with information about
+ JavaScript functions. The Performance Analyzer will read them and show the
+ function names in the \uicontrol Timeline, \uicontrol Statistics, and
+ \uicontrol {Flame Graph} views. This only works if the process being
+ profiled is running on the host computer, not on the target device. To
+ switch on the generation of perf.map files, add the environment variable
+ \c QV4_PROFILE_WRITE_PERF_MAP to the \uicontrol {Run Environment} and set
+ its value to \c 1.
+
+ \section1 Analyzing Collected Data
+
+ The \uicontrol Timeline view displays a graphical representation of CPU
+ usage per thread and a condensed view of all recorded events.
+
+ \image qtcreator-performance-analyzer-timeline.png "Performance Analyzer"
+
+ Each category in the timeline describes a thread in the application. Move
+ the cursor on an event (5) on a row to see how long it takes and which
+ function in the source it represents. To display the information only when
+ an event is selected, disable the
+ \uicontrol {View Event Information on Mouseover} button (4).
+
+ The outline (9) summarizes the period for which data was collected. Drag
+ the zoom range (7) or click the outline to move on the outline. You can
+ also move between events by selecting the
+ \uicontrol {Jump to Previous Event} and \uicontrol {Jump to Next Event}
+ buttons (1).
+
+ Select the \uicontrol {Show Zoom Slider} button (2) to open a slider that
+ you can use to set the zoom level. You can also drag the zoom handles (8).
+ To reset the default zoom level, right-click the timeline to open the
+ context menu, and select \uicontrol {Reset Zoom}.
+
+ \section2 Selecting Event Ranges
+
+ You can select an event range (6) to view the time it represents or to zoom
+ into a specific region of the trace. Select the \uicontrol {Select Range}
+ button (3) to activate the selection tool. Then click in the timeline to
+ specify the beginning of the event range. Drag the selection handle to
+ define the end of the range.
+
+ You can use event ranges also to measure delays between two subsequent
+ events. Place a range between the end of the first event and the beginning
+ of the second event. The \uicontrol Duration field displays the delay
+ between the events in milliseconds.
+
+ To zoom into an event range, double-click it.
+
+ To remove an event range, close the \uicontrol Selection dialog.
+
+ \section2 Understanding the Data
+
+ Generally, events in the timeline view indicate how long a function call
+ took. Move the mouse over them to see details. The details always include
+ the address of the function, the approximate duration of the call, the ELF
+ file the function resides in, the number of samples collected with this
+ function call active, the total number of times this function was
+ encountered in the thread, and the number of samples this function was
+ encountered in at least once.
+
+ For functions with debug information available, the details include the
+ location in source code and the name of the function. You can click on such
+ events to move the cursor in the code editor to the part of the code the
+ event is associated with.
+
+ As the Perf tool only provides periodic samples, the Performance Analyzer
+ cannot determine the exact time when a function was called or when it
+ returned. You can, however, see exactly when a sample was taken in the
+ second row of each thread. The Performance Analyzer assumes that if the same
+ function is present at the same place in the call chain in multiple
+ consecutive samples, then this represents a single call to the respective
+ function. This is, of course, a simplification. Also, there may be other
+ functions being called between the samples taken, which do not show up in
+ the profile data. However, statistically, the data is likely to show the
+ functions that spend the most CPU time most prominently.
+
+ If a function without debug information is encountered, further unwinding
+ of the stack may fail. Unwinding will also fail for some symbols
+ implemented in assembly language. If unwinding fails, only a part of the
+ call chain is displayed, and the surrounding functions may seem to be
+ interrupted. This does not necessarily mean they were actually interrupted
+ during the execution of the application, but only that they could not be
+ found in the stacks where the unwinding failed.
+
+ JavaScript functions from the QML engine running in the JIT mode can be
+ unwound. However, their names will only be displayed when
+ \c QV4_PROFILE_WRITE_PERF_MAP is set. Compiled JavaScript generated by the
+ \l{http://doc.qt.io/QtQuickCompiler/}{Qt Quick Compiler} can also be
+ unwound. In this case the C++ names generated by the compiler are shown for
+ JavaScript functions, rather than their JavaScript names. When running in
+ interpreted mode, stack frames involving QML can also be unwound, showing
+ the interpreter itself, rather than the interpreted JavaScript.
+
+ Kernel functions included in call chains are shown on the third row of each
+ thread.
+
+ The coloring of the events represents the actual sample rate for the
+ specific thread they belong to, across their duration. The Linux kernel
+ will only take a sample of a thread if the thread is active. At the same
+ time, the kernel tries to honor the requested event period.
+ Thus, differences in the sampling frequency between different threads
+ indicate that the thread with more samples taken is more likely to be the
+ overall bottleneck, and the thread with less samples taken has likely spent
+ time waiting for external events such as I/O or a mutex.
+
+ \section1 Viewing Statistics
+
+ \image qtcreator-performance-analyzer-statistics.png
+
+ The \uicontrol Statistics view displays the number of samples each function
+ in the timeline was contained in, in total and when on the top of the
+ stack (called \c self). This allows you to examine which functions you need
+ to optimize. A high number of occurrences might indicate that a function is
+ triggered unnecessarily or takes very long to execute.
+
+ Click on a row to move to the respective function in the source code in the
+ code editor.
+
+ The \uicontrol Callers and \uicontrol Callees panes show dependencies
+ between functions. They allow you to examine the internal functions of the
+ application. The \uicontrol Callers pane summarizes the functions that
+ called the function selected in the main view. The \uicontrol Callees pane
+ summarizes the functions called from the function selected in the main
+ view.
+
+ Click on a row to move to the respective function in the source code in the
+ code editor and select it in the main view.
+
+ To copy the contents of one view or row to the clipboard, select
+ \uicontrol {Copy Table} or \uicontrol {Copy Row} in the context menu.
+
+ \section2 Visualizing Statistics as Flame Graphs
+
+ \image qtcreator-performance-analyzer-flamegraph.png
+
+ The \uicontrol {Flame Graph} view shows a more concise statistical overview
+ of the execution. The horizontal bars show an aspect of the samples
+ taken for a certain function, relative to the same aspect of all samples
+ together. The nesting shows which functions were called by which other ones.
+
+ The \uicontrol {Visualize} button lets you choose what aspect to show in the
+ \uicontrol {Flame Graph}.
+
+ \list
+
+ \li \uicontrol {Samples} is the default visualization. The size of the
+ horizontal bars represents the number of samples recorded for the given
+ function.
+
+ \li In \uicontrol {Peak Usage} mode, the size of the horizontal bars
+ represents the amount of memory allocated by the respective functions, at
+ the point in time when the allocation's memory usage was at its peak.
+
+ \li In \uicontrol {Allocations} mode, the size of the horizontal bars
+ represents the number of memory allocations triggered by the respective
+ functions.
+
+ \li In \uicontrol {Releases} mode, the size of the horizontal bars
+ represents the number of memory releases triggered by the respective
+ functions.
+
+ \endlist
+
+ The \uicontrol {Peak Usage}, \uicontrol {Allocations}, and
+ \uicontrol {Releases} modes will only show any data if samples from memory
+ trace points have been recorded.
+
+ \section2 Interaction between the views
+
+ When you select a stack frame in either of the \uicontrol {Timeline},
+ \uicontrol {Flame Graph}, or \uicontrol {Statistics} views, information
+ about it is displayed in the other two views. To view a time range in the
+ \uicontrol {Statistics} and \uicontrol {Flame Graph} views, select
+ \uicontrol Analyze > \uicontrol {Performance Analyzer Options} >
+ \uicontrol {Limit to the Range Selected in Timeline}. To show the full
+ stack frame, select \uicontrol {Show Full Range}.
+
+ \section1 Loading Perf Data Files
+
+ You can load any \c perf.data files generated by recent versions of the
+ Linux Perf tool and view them in \QC. Select \uicontrol Analyze >
+ \uicontrol {Performance Analyzer Options} > \uicontrol {Load perf.data} to
+ load a file.
+
+ \image qtcreator-cpu-usage-analyzer-load-perf-trace.png
+
+ The Performance Analyzer needs to know the context in which the
+ data was recorded to find the debug symbols. Therefore, you have to specify
+ the kit that the application was built with and the folder where the
+ application executable is located.
+
+ The Perf data files are generated by calling \c {perf record}. Make sure to
+ generate call graphs when recording data by starting Perf with the
+ \c {--call-graph} option. Also check that the necessary debug symbols are
+ available to the Performance Analyzer, either at a standard location
+ (\c /usr/lib/debug or next to the binaries), or as part of the Qt package
+ you are using.
+
+ The Performance Analyzer can read Perf data files generated in either frame
+ pointer or dwarf mode. However, to generate the files correctly, numerous
+ preconditions have to be met. All system images for the
+ \l{http://doc.qt.io/QtForDeviceCreation/qtee-supported-platforms.html}
+ {Qt for Device Creation reference devices}, except for Freescale iMX53 Quick
+ Start Board and SILICA Architect Tibidabo, are correctly set up for
+ profiling in the dwarf mode. For other devices, check whether Perf can read
+ back its own data in a sensible way by checking the output of
+ \c {perf report} or \c {perf script} for the recorded Perf data files.
+
+ \section1 Loading and Saving Trace Files
+
+ You can save and load trace data in a format specific to the
+ Performance Analyzer with the respective entries in \uicontrol Analyze >
+ \uicontrol {Performance Analyzer Options}. This format is self-contained, and
+ therefore loading it does not require you to specify the recording
+ environment. You can transfer such trace files to a different computer
+ without any tool chain or debug symbols and analyze them there.
+
+ \section1 Troubleshooting
+
+ The Performance Analyzer might fail to record data for the following reasons:
+
+ \list 1
+ \li Perf events may be globally disabled on your system. The
+ preconfigured Boot to Qt images come with perf events enabled. For
+ a custom configuration you need to make sure that the file
+ \c {/proc/sys/kernel/perf_event_paranoid} contains a value smaller
+ than \c {2}. For maximum flexibility in recording traces you can
+ set the value to \c {-1}. This allows any user to record any kind
+ of trace, even using raw kernel trace points.
+ \li The connection between the target device and the host may not be
+ fast enough to transfer the data produced by Perf. Try modifying
+ the values of the \uicontrol {Stack snapshot size} or
+ \uicontrol {Sample period} settings.
+ \li Perf may be buffering the data forever, never sending it. Add
+ \c {--no-delay} or \c {--no-buffering} to the
+ \uicontrol {Additional arguments} field.
+ \li Some versions of Perf will not start recording unless given a
+ certain minimum sampling frequency. Try with a
+ \uicontrol {Sample period} value of 1000.
+ \li On some devices, in particular various i.MX6 Boards, the hardware
+ performance counters are dysfunctional and the Linux kernel may
+ randomly fail to record data after some time. Perf can use different
+ types of events to trigger samples. You can get a list of available
+ event types by running \c {perf list} on the device and then choose
+ the respective event types in the settings. The choice of event type
+ affects the performance and stability of the sampling. The
+ \c {cpu-clock} \c {software} event is a safe but relatively slow
+ option as it does not use the hardware performance counters, but
+ drives the sampling from software. After the sampling has failed,
+ reboot the device. The kernel may have disabled important parts of
+ the performance counters system.
+ \endlist
+
+ Output from the helper program that processes the data is displayed in the
+ \uicontrol {General Messages} output pane.
+*/