llvm-mca - LLVM Machine Code Analyzer ===================================== .. program:: llvm-mca SYNOPSIS -------- :program:`llvm-mca` [*options*] [input] DESCRIPTION ----------- :program:`llvm-mca` is a performance analysis tool that uses information available in LLVM (e.g. scheduling models) to statically measure the performance of machine code in a specific CPU. Performance is measured in terms of throughput as well as processor resource consumption. The tool currently works for processors with a backend for which there is a scheduling model available in LLVM. The main goal of this tool is not just to predict the performance of the code when run on the target, but also help with diagnosing potential performance issues. Given an assembly code sequence, :program:`llvm-mca` estimates the Instructions Per Cycle (IPC), as well as hardware resource pressure. The analysis and reporting style were inspired by the IACA tool from Intel. For example, you can compile code with clang, output assembly, and pipe it directly into :program:`llvm-mca` for analysis: .. code-block:: bash $ clang foo.c -O2 --target=x86_64 -S -o - | llvm-mca -mcpu=btver2 Or for Intel syntax: .. code-block:: bash $ clang foo.c -O2 --target=x86_64 -masm=intel -S -o - | llvm-mca -mcpu=btver2 (:program:`llvm-mca` detects Intel syntax by the presence of an `.intel_syntax` directive at the beginning of the input. By default its output syntax matches that of its input.) Scheduling models are not just used to compute instruction latencies and throughput, but also to understand what processor resources are available and how to simulate them. By design, the quality of the analysis conducted by :program:`llvm-mca` is inevitably affected by the quality of the scheduling models in LLVM. If you see that the performance report is not accurate for a processor, please `file a bug `_ against the appropriate backend. OPTIONS ------- If ``input`` is "``-``" or omitted, :program:`llvm-mca` reads from standard input. Otherwise, it will read from the specified filename. If the :option:`-o` option is omitted, then :program:`llvm-mca` will send its output to standard output if the input is from standard input. If the :option:`-o` option specifies "``-``", then the output will also be sent to standard output. .. option:: -help Print a summary of command line options. .. option:: -o Use ```` as the output filename. See the summary above for more details. .. option:: -mtriple= Specify a target triple string. .. option:: -march= Specify the architecture for which to analyze the code. It defaults to the host default target. .. option:: -mcpu= Specify the processor for which to analyze the code. By default, the cpu name is autodetected from the host. .. option:: -output-asm-variant= Specify the output assembly variant for the report generated by the tool. On x86, possible values are [0, 1]. A value of 0 (vic. 1) for this flag enables the AT&T (vic. Intel) assembly format for the code printed out by the tool in the analysis report. .. option:: -print-imm-hex Prefer hex format for numeric literals in the output assembly printed as part of the report. .. option:: -dispatch= Specify a different dispatch width for the processor. The dispatch width defaults to field 'IssueWidth' in the processor scheduling model. If width is zero, then the default dispatch width is used. .. option:: -register-file-size= Specify the size of the register file. When specified, this flag limits how many physical registers are available for register renaming purposes. A value of zero for this flag means "unlimited number of physical registers". .. option:: -iterations= Specify the number of iterations to run. If this flag is set to 0, then the tool sets the number of iterations to a default value (i.e. 100). .. option:: -noalias= If set, the tool assumes that loads and stores don't alias. This is the default behavior. .. option:: -lqueue= Specify the size of the load queue in the load/store unit emulated by the tool. By default, the tool assumes an unbound number of entries in the load queue. A value of zero for this flag is ignored, and the default load queue size is used instead. .. option:: -squeue= Specify the size of the store queue in the load/store unit emulated by the tool. By default, the tool assumes an unbound number of entries in the store queue. A value of zero for this flag is ignored, and the default store queue size is used instead. .. option:: -timeline Enable the timeline view. .. option:: -timeline-max-iterations= Limit the number of iterations to print in the timeline view. By default, the timeline view prints information for up to 10 iterations. .. option:: -timeline-max-cycles= Limit the number of cycles in the timeline view, or use 0 for no limit. By default, the number of cycles is set to 80. .. option:: -resource-pressure Enable the resource pressure view. This is enabled by default. .. option:: -register-file-stats Enable register file usage statistics. .. option:: -dispatch-stats Enable extra dispatch statistics. This view collects and analyzes instruction dispatch events, as well as static/dynamic dispatch stall events. This view is disabled by default. .. option:: -scheduler-stats Enable extra scheduler statistics. This view collects and analyzes instruction issue events. This view is disabled by default. .. option:: -retire-stats Enable extra retire control unit statistics. This view is disabled by default. .. option:: -instruction-info Enable the instruction info view. This is enabled by default. .. option:: -show-encoding Enable the printing of instruction encodings within the instruction info view. .. option:: -show-barriers Enable the printing of LoadBarrier and StoreBarrier flags within the instruction info view. .. option:: -all-stats Print all hardware statistics. This enables extra statistics related to the dispatch logic, the hardware schedulers, the register file(s), and the retire control unit. This option is disabled by default. .. option:: -all-views Enable all the view. .. option:: -instruction-tables= Prints resource pressure information based on the static information available from the processor model. This differs from the resource pressure view because it doesn't require that the code is simulated. It instead prints the theoretical uniform distribution of resource pressure for every instruction in sequence. The choice of `` controls number of printed information. `` may be `none` (default), `normal`, `full`. Note: If the option is used without `