|
Man Page bit.1NAME
bit - Binary Improvement Tool
SYNOPSIS
bit instrument [ general_option| instrument_option]... tar-
get
bit analyze [ general_option| analyze_option]... target
bit optimize [ general_option| optimize_option]... target
bit coverage [ general_option| coverage_option|
analyze_option]... target
bit collect [ general_option| analyze_option ]... target
[ target_arguments ]
bit -V
Where:
general_option:= {-i {on| static| off}| -d directory| -s
suffix | -n| -V| -v | -R| -c| -Ye path_to_sun_studio}
instrument_option:= {-m {on| off}| -b suffix}
optimize_option:= {-O {0| 1| 2}| -f| -Q{y|n}|
-xinline[= v[,v]...]}
analyze_option:= {-o filename| -o experiment-name.er | -
e| -E filterspec| -C comment | -A {on| off| copy}| -a
report}
coverage_option:= { -F ignore_function}
report:={ifreq[={N| function}]| cc[={N| function}]| bbc[={N|
function}]|
branch[={N| function}] function[={N| function}]
dis[={N| function}]}
DESCRIPTION
bit is a suite of tools for improving binaries. These tools
are used via five subcommands:
The instrument subcommand instruments a binary (the target)
so that when the instrumented target is run, it creates an
instrumentation data file with information about the execu-
tion of target.
The analyze subcommand uses the instrumentation data to pro-
duce reports on instruction execution. Analysis reports can
be generated in ascii text, or as an experiment. An experi-
ment may be examined with a GUI (analyzer) or a command-line
program (er_print).
The optimize subcommand uses the instrumentation data to
optimize target.
The coverage subcommand uses the instrumentation data to
produce a code coverage report. A summary textual report is
written to the current -o filename, and an analyzer experi-
ment is created with detailed counters for more precise
coverage information.
The collect subcommand combines an instrument subcommand, a
target run, and an analyze subcommand. It first instruments
target, then runs it, passing target_arguments at run time.
Then it analyzes it if certain analyze_options are present
on the command line.
target is the path name of the executable for which you want
to collect performance data. (bit will not search PATH to
find target.) target must be prepared by compiling with -
xbinopt=prepare. (If the Sun(TM) Studio 11 compiler is used
to compile the original target, optimization must also be
turned on using the -O or -xO[1-5] options.) In order to
see annotated source when viewing the experiment, target
should be compiled with the -g flag, and should not be
stripped.
NOTE: Experiments depend on Sun Studio
All features that mention experiments, analyzer, and/or
er_print are only available if Sun Studio 11 is
installed on the system. See the description of the -Ye
general option for more information. If Sun Studio 11
is not available, the output of bit analyze and bit
coverage is limited to textual reports.
Example Use
Typically one would use the commands in a sequence to
instrument, run, and analyze or optimize. To create a simple
experiment, use this sequence of commands:
cc -O -xbinopt=prepare *.c
bit instrument a.out
a.out.instr < input1
bit analyze -e a.out
Or to optimize a binary:
cc -O -xbinopt=prepare *.c
bit instrument a.out
a.out.instr < input1
bit optimize a.out
The first example above can be rewritten more simply using
the collect subcommand:
gcc -xbinopt=prepare *.c
bit collect -e a.out < input1
The above example illustrates that GCC for SPARC(R) Systems
(GCCfss) can be used to prepare binaries, as well as Sun
Studio 11. With the GCCfss compiler, however, no particular
optimization level is required to prepare the binary.
OPTIONS
If invoked with no arguments, print a usage message.
General options
-i {on| static| off}
Instrumentation data source. This option is not allowed
in the instrument subcommand.
on Use dynamic instrumentation data. This is the
default mode. The data is stored in a file called
target.instr by default.
off Do not use instrumentation data. This is only
applicable in the optimize subcommand.
static
Analyze an executable statically. Every instruc-
tion in the program is assumed to execute once.
This is only applicable in the analyze or collect
subcommands. In the collect subcommand, the
instrument and target-run phases are disabled.
-d directory
Place the experiment, the instrumented binary, and the
instrumentation data file in directory. By default,
use the current working directory.
-s suffix
Add suffix to target for the instrumented data file
name. The default is ".instrdata"
-n Print the commands that would be run without executing
them.
-V Print the current version. Do not examine further
arguments, and do no further processing.
-v Print the current version and verbose information about
the commands being executed.
-R Replace target. When using the instrument subcommand
(or during the instrumentation phase of the collect
subcommand), cache target and replace it with the
instrumented target.
When using other subcommands (or after the instrumented
target is run in the collect subcommand), replace tar-
get with the cached original version before performing
the requested operation.
If -R is used when instrumenting, it must also be used
when analyzing, optimizing, or analyzing coverage.
-c Collect. Before doing the specified subcommand,
instrument and run the target. Ignored in the collect
and instrument subcommands.
-Ye,path_to_sun_studio
When using the verson of bit distributed with GCCfss,
certain components of Sun Studio must be available in
order for bit to produce experiments. If Sun Studio is
installed in the standard place, i.e. /opt/SUNWspro,
bit will be able to find it. Otherwise, use this flag
to indicate where Sun Studio is installed. This option
is ignored in the bit instrument and bit optimize sub-
commands.
Instrument options
-m {on| off}
-m on means instrument for multithreading. Any mul-
tithreaded application should be instrumented with this
option. The default is on. Turning it off for single-
threaded applications may result in faster instrumenta-
tion runs.
-b suffix
Add suffix to target for the instrumented binary name.
The default is ".instr"
Optimize options
These options are valid only for the optimize subcommand.
-O{0| 1| 2}
Optimize target. The optimized target overwrites the
original. At level 0, no optimizations are performed.
At level 1, do code reordering optimizations. At level
2, data-flow information is constructed and more
aggressive optimizations like inlining and address
related optimizations are performed.
The default optimization level is 1 when the optimize
subcommand is used. There is no default when the col-
lect subcommand is used; a -On must be given to turn on
optimization.
-f Finalize the output binary so that no more binary
optimizations may be performed.
-Q{y| n}
If -Qy is used, identification information is added to
the output binary. If -Qn is used, this information is
not added. -Qy is the default.
-xinline[=v[,v]...]
where v is [{%auto,func_name,no%func_name}].
Inline only those functions specified in the list. The
list is comprised of either a comma-separated list of
function names, or a comma separated list of
no%func_name values, or the value %auto. If
no%func_name is specified, do not inline func_name. If
%auto is specified, attempt to automatically inline
functions.
Coverage options
-F ignore_function
For code coverage purposes, ignore blocks that contain
a call to ignore_function.
Analyze options
These options are valid in the analyze, coverage, and col-
lect subcommands. In the collect subcommand, -e, -E, and -a
turn on analysis. If none of these options are present, any
other analyze options are ignored.
At least one -e or -E option must be provided in order to
create an experiment. As noted above, Sun Studio and the
SPOT add-on package are necessary in order to produce an
experiment. All experiment-related options are ignored if
bit cannot find the necessary components of Sun Studio.
-o filename
If filename does not end in ".er", write textual
reports (see -a report below) to filename. Multiple -o
filename options may be given; each one affects the
destination of subsequent -a report options on the com-
mand line. Default is standard out.
-o experiment-name .er
Use experiment-name as the name of the experiment to be
recorded. Only one -o experiment-name .er option is
allowed on the command line.
If -o experiment-name .er is not specified, and experi-
ment generation is requested with the -e and/or -E
options, record an experiment with a name in the form
stem.n.er, where stem is a string, and n is a number.
If a -g argument is given, use the string appearing
before the .erg suffix in the group name as the stem
prefix; if no -g argument is given, set the stem prefix
to "test".
If the name is not specified in the form stem.n.er,
and the given name is in use, print an error message
and do not generate an experiment. If the name is of
the form stem.n.er, and the name is in use, record the
experiment under a name corresponding to the first
available value of n that is not in use. Issue a warn-
ing if the name is changed.
-e Create an experiment with simulated hardware counters
representing function count, instruction executed
count, and instruction annulled count.
A bit experiment also contains summary data describing
the execution frequency of various instructions in the
run. The data is shown in response to the ifreq com-
mand in er_print, and on the Inst.Freq. tab in the
Analyzer.
-E filterspec
Generate a custom counter in the experiment, which will
be viewed in its own column in analyzer or er_print.
See the FILTERSPEC section for more information. Any
number of -E options can be given. Each unique -E
option will produce one counter in the experiment.
-C comment
Put the comment, either a single token, or a quoted
string, into the experiment. Up to ten comments may be
provided.
-g group_name
Consider the experiment to be part of experiment group
group_name. The group_name string must end in ".erg";
if not, report an error and do not create the experi-
ment.
-A option
Control whether or not load-objects used by the target
process should be copied into the recorded experiment.
The allowed values of option are:
Value Meaning
on Archive load objects into the experiment.
off Do not archive load objects into the experi-
ment.
copy Copy and archive load objects into the exper-
iment.
If the user copies experiments onto a different
machine, or reads them on a different machine, the user
should specify -A copy. Note that doing so does not
copy any sources or object files. It is the
responsibility of the user to ensure that those files
are accessible on the machine where the experiment is
copied.
-a report
Write a textual (ascii) report to the current output
filename (see -o filename above.) In any of these
reports, if the optional argument (=N or =function) is
not given, or if a limit of 0 is given, the report cov-
ers the whole program. Available reports are:
ifreq[=N]
Instruction frequency. Print a profile of instruc-
tion execution counts for the sum of the top N hot
functions, in descending order of frequency.
Example:
bit analyze -a ifreq a.out | head -13
Instruction frequencies for whole program
Instruction Executed (%)
TOTAL 169067648498 (100.0)
float ops 170346 ( 0.0)
float ld st 170346 ( 0.0)
load store 36788000338 ( 21.7)
load 25144202260 ( 14.8)
store 11643798078 ( 6.8)
-------------------------------------------
Instruction Executed (%) Annulled In Delay Slot
add 16935512560 ( 10.0) 2992 3112858420
br 16762242816 ( 9.9) 0 0
sll 14368909396 ( 8.4) 16 916733870
subcc 13842547720 ( 8.1) 0 1938670930
ifreq=<function>
Prints an instruction frequency breakdown for the
named function.
cc[=N]
Caller-callee report. Prints the top N hottest
caller-callee edges. JMPL's (dynamic calls) are
indicated by the function name "**INDIRECT
CALL**".
Example:
bit analyze -a cc=9 a.out
Top 9 caller-callee edges
Count Caller ---> Callee
563227968 compress_block ---> send_bits
397429280 deflate ---> ct_tally
263338416 deflate_fast ---> ct_tally
165937792 deflate_fast ---> longest_match
5842268 build_tree ---> bi_reverse
2805034 send_tree ---> send_bits
313216 send_all_trees ---> send_bits
82828 huft_build ---> malloc
82828 inflate_dynamic ---> free
cc=<function>
Prints a list of callers in frequency order, then
the given function, followed by a frequency-
ordered list of callees. All calls through a jmpl
are summed and attributed to "unknown".
Example:
bit analyze -a cc=deflate a.out
Callers and callees of deflate
Callers and callees of deflate
Callers: Count
15 zip
55 **unknown**
------> 70 deflate
Callees: Count
596143936 ct_tally
18423 fill_window
18213 flush_block
6 deflate_fast
bbc[=N]
Basic Block Count. Prints a list of the top N hot
basic blocks. If the block happens to be a func-
tion entry point, the function name is printed.
The listing includes the PC of the first instruc-
tion of the block and the number of instructions
in the block.
Example:
bit analyze -a bbc=6 a.out
Basic Block Counts for top 6 blocks
Count PC #Instrs Function name
991151488 0x10000e940 24 ct_tally
991151488 0x10000e9cc 2
991151488 0x10000e9dc 16
985851840 0x10000e9a0 11
966297600 0x10000e9d8 1
867284096 0x10000e9d4 1
bbc=<function>
Basic Block Count for all blocks in a function.
branch[=N]
Branch taken/not-taken report. For the top N hot
branches, print branch statistics, including
branch direction (Forward or Backward), total exe-
cution count, taken and not taken counts and per-
centages, and an indication of whether the
compiler correct set the prediction bit.
Example:
bit analyze -a branch=6 a.out
Branch taken/not taken report for top 6 branches
PC Dir %Taken %Not Compiler Trip Cnt Taken etc...
Taken Prediction
Correct?
10000e998 F 0.5% 99.5% Y 991151488 5299648 ..
10000e9cc F 12.5% 87.5% Y 991151488 123867416 ..
100003668 F 49.4% 50.6% Y 849549760 420060832 ..
100009484 F 1.4% 98.6% Y 834797568 11554240 ..
1000094d4 F 0.2% 99.8% Y 834797568 1372160 ..
1000094e8 F 0.6% 99.4% Y 834797568 5033466 ..
branch=<function>
Print branch taken/not-taken information for all
branches in <function>.
function[=N]
Function Count. Prints a list of the top N hot
functions, along with their execution counts.
Example:
bit analyze -function=3 a.out
Function Counts for top 3 functions
Count PC Function name
847659008 0x1000fc7a0 __1cKggSpectrumDSet6Mf_v_
297246944 0x10013a100 __1cHmy_rand6F_l_
216990656 0x100134880 __1cKmrMaterial_splat__
function=<function>
Print the execution counts of the given func-
tion.
dis[=N]
Print a disassembly of the top N functions, with
the raw
execution count for each instruction.
Example:
bit analyze -dis=2 a.out
Disassembly for top 2 routines
Disassembly for routine __1cKggSpectrumDSet6Mf_v_
ROUTINE: __1cKggSpectrumDSet6Mf_v_ FREQUENCY: 847659008.0
BLOCK: __1cKggSpectrumDSet6Mf_v_: FREQUENCY: 847659008.0PC: 0x1000fc7a0
[ 847659008.0] 0x1000fc7a0: st B[B%o0, B#sint=0], B%f3
[ 847659008.0] 0x1000fc7a4: st B[B%o0, B#sint=4], B%f3
[ 847659008.0] 0x1000fc7a8: st B[B%o0, B#sint=8], B%f3
[ 847659008.0] 0x1000fc7ac: st B[B%o0, B#sint=12], B%f3
[ 847659008.0] 0x1000fc7b0: st B[B%o0, B#sint=16], B%f3
[ 847659008.0] 0x1000fc7b4: st B[B%o0, B#sint=20], B%f3
[ 847659008.0] 0x1000fc7b8: st B[B%o0, B#sint=24], B%f3
[ 847659008.0] 0x1000fc7bc: jmpl B[B%o7, B#sint=8], B%g0
[ 847659008.0] 0x1000fc7c0: st B[B%o0, B#sint=28], B%f3
Disassembly for routine __1cHmy_rand6F_l_
ROUTINE: __1cHmy_rand6F_l_ FREQUENCY: 297246944.0
BLOCK: __1cHmy_rand6F_l_: FREQUENCY: 297246944.0 PC: 0x10013a100
[ 297246944.0] 0x10013a100: sethi B#sint=1048576, B%g5
[ 297246944.0] 0x10013a104: sethi B#sint=126976, B%o4
[ 297246944.0] 0x10013a108: or B%g5, B#sint=537, B%o0
[ 3300639.0] 0x10013a1d4: br@(a),B%icc B%disp22($LABEL___1cHmy_rand6F_l__7_6280_10013a198)
.
. (long function shortened for example)
.
[ 3300639.0] 0x10013a1d8: sra,x B%o5, B#sint=8, B%g2
dis=<function>
Print the disassembly listing of the named func-
tion, with execution counts.
FILTERSPECS
This syntax is used to produce a custom analysis column for
er_print or analyzer. You can use many of these flags in a
single invocation of bit to produce multiple custom columns
in the experiment.
----------------------------------------
Spec for filter parameters to bit flags.
Format:
-E filterspec
filterspec := element [ ':' element ... ]
element := instrselector | limiter | metricspecifier
instrselector := instr | instrgroup
instr := <lower case mnemonic from SPARCinstruction>
instrgroup := 'BR' | 'LD' | 'ST' | 'CALL' | 'JUMP' | 'SAVE'
| 'RESTORE' | 'CMP' | 'BA' | 'BN' | 'CBR' | 'ICALL' |
'SWITCH'
limiter := positive_limiter | 'n' positive_limiter
positive_limiter := 'ds' | 'float'
metricspecifier := metric ['%'] | 'targetmark'
metric := pmetric | 'n' pmetric
pmetric := 'executed' | 'annul' | 'taken' | 'correct' |
'target'
NOTES:
1. All instrselector elements are logically OR'd to
produce a pool of instructions. Each of the limiters is
logically AND'd against the result.
2. If no instrselector is given, all instructions are
selected.
3. The default metricspecifier is 'executed'
4. Only one metricspecifier is allowed.
5. Only one limiter is allowed.
6. The "%" metricspecifier calls out the frequency of
the specified metric vs. the frequency of the contain-
ing block.
7. target is the count is the sum of all counts on
incoming branch edges to the instruction. 'ntarget' is
the fallthrough count.
8. targetmark prints the number of branch edges coming
in to the instruction. The 'n' or '%' modifiers are not
allowed.
Examples:
LD:ST:ds will display the issue count for every load or
store in the program that happens to be placed in a delay
slot.
BR:ds will produce a count of zero for every instruction
because a branch cannot be in a delay slot.
BR:ntaken will produce a count showing how many times each
branch was not taken.
BR:correct% will show how often the compiler branch predic-
tion was correct as a percentage.
nop:nds produces counts for all nops that are not in delay
slots
Mnemonics for instr specifier
add addc addcc addccc alignaddr
alignaddrl and andcc andn andncc
array16 array32 array8 bitextract bmask
bpr br bshuffle call casa
casxa done edge16 edge16l edge16ln
edge16n edge32 edge32l edge32ln edge32n
edge8 edge8l edge8ln edge8n fabsd
fabsq fabss faddd faddq fadds
faligndata fand fandnot1 fandnot1s fandnot2
fandnot2s fands fbr fchksm16 fcmpd
fcmped fcmpeq fcmpeq16 fcmpeq32 fcmpes
fcmpgt16 fcmpgt32 fcmple16 fcmple32 fcmpne16
fcmpne32 fcmpq fcmps fdivd fdivq
fdivs fdmulq fdtoi fdtoq fdtos
fdtox fexpand fitod fitoq fitos
flcmpd flcmps flush flushw fmean16
fmovd fmovq fmovrd fmovrq fmovrs
fmovs fmul8sux16 fmul8ulx16 fmul8x16 fmul8x16al
fmul8x16au fmuld fmuld16x16 fmuld8sux16 fmuld8ulx16
fmulq fmuls fnand fnands fnegd
fnegq fnegs fnor fnors fnot1
fnot1s fnot2 fnot2s fone fones
for fornot1 fornot1s fornot2 fornot2s
fors fpack16 fpack32 fpackfix fpadd16
fpadd16s fpadd32 fpadd32s fpadds16 fpadds16s
fpadds32 fpadds32s fpmerge fpmovc16 fpmovc32
fpsub16 fpsub16s fpsub32 fpsub32s fpsubs16
fpsubs16s fpsubs32 fpsubs32s fqtod fqtoi
fqtos fqtox fshl16 fshl32 fshlas16
fshlas32 fshra16 fshra32 fshrl16 fshrl32
fsmuld fsqrtd fsqrtq fsqrts fsrc1
fsrc1s fsrc2 fsrc2s fstod fstoi
fstoq fstox fsubd fsubq fsubs
fxnor fxnors fxor fxors fxtod
fxtoq fxtos fzero fzeros illtrap
jmpl ld lda ldd ldda
ldq ldqa ldsb ldsba ldsh
ldsha ldstub ldstuba ldsw ldswa
ldub lduba lduh lduha lduw
lduwa ldx ldxa lzd membar
mov movr mulscc mulx nop
or orcc orn orncc pdist
popc prefetch prefetcha rd rdpr
restore restored retry return save
saved sbshuffle sdiv sdivcc sdivx
sethi sfabss sfadds sfcmpseq sfcmpsgt
sfcmpsle sfcmpsne sfitos sfmuls sfnegs
sfstoi sfsubs shutdown siam sir
sll smul smulcc sra srl
st sta stb stba stbar
std stda sth stha stq
stqa stw stwa stx stxa
sub subc subcc subccc swap
swapa taddcc taddcctv trap tsubcc
tsubcctv udiv udivcc udivx umul
umulcc wr wrpr xnor xnorcc
xor xorcc
CAVEATS
GCC-compiled Code With Exceptions
If GCC for Sparc(R) Systems (GCCfss) is used to compile
the prepared binary, and exception handling is enabled,
bit will refuse to instrument or optimize the prepared
binary.
Only Prepared Code is Counted
bit can only instrument, analyze, or optimize code
which has been prepared by the compiler by compiling it
with -xbinopt=prepare, and an appropriate optimization
level.
The Sun Studio compilers require that -O or -xO[1-5]
options be used. GCCfss has no specific optimization
requirement.
Specifically excluded from instrumentation are assembly
language modules, functions which contain "asm" state-
ments or .il templates, C++ template functions, and
modules compiled with -xF.
Only the code which is linked into the executable is
instrumented. Shared libraries and dynamic libraries
are excluded. Extremely small functions in prepared
code modules are not counted when called from
nonprepared code.
Some instructions may be overcounted in the face of
asynchronous events, for example if a signal handler
calls longjmp().
EXIT STATUS
The following exit values are returned:
0 Successful completion.
1 An error occurred.
SEE ALSO
analyzer(1), binopt(1), collect(1), er_archive(1),
er_bit(1), er_cp(1), er_export(1), er_mv(1), er_print(1),
er_rm(1), er_src(1), and the Performance Analyzer manual.
Man(1) output converted with man2html |
|
![]() |
By any use of this Website, you agree to be bound by these Policies and Terms of Use. |