- -dalign
- Equivalent to -xmemalign=8s.
- -fast
- Is a macro that can be used as a starting point
for turning an executable for maximum runtime performance.
Modules that are compiled with
-fast must also be linked with -fast.
The -fast option is unsuitable for
programs intended to run on a
different target than the compilation machine. In such cases, follow
-fast with the appropriate -xtarget option. For example:
% gcc -fast -xtarget=ultra ...
With -xlibmil, exceptions cannot
be noted by setting errno or calling
matherr(3m).
The -fast option is unsuitable for
programs that require strict
conformance to the IEEE 754 Standard.
-fast expands to: -O3 -ffast-math
-fns -fsimple=2 -ftrap=%none
-xalias_level=basic -xbuiltin=%all -xdepend -xlibmil -xlibmopt
-xprefetch=auto,explicit -xprefetch_level=1 -xtarget=native
Note - Some optimizations make
certain assumptions about program
behavior. If the program does not conform to these assumptions, the
application may crash or produce incorrect results. Please refer to the
description of the individual options to determine if your program is
suitable for compilation with -fast.
The optimizations performed by
these options may alter the behavior of
programs from that defined by the ISO C and IEEE standards. See the
description of the specific option for details.
-fast acts like a macro expansion
on the command line. Therefore, you
can override the optimization level and code generation option aspects
by following -fast with the desired optimization level or code
generation option. Compiling with the -fast -O2 pair is like compiling
with the -O3 -O2 pair. The latter specification takes precedence.
Do not use this option for
programs that depend on IEEE standard
exception handling; you can get different numerical results, premature
program termination, or unexpected SIGFPE signals.
- -fnonstd
- This option is a macro for -fns and
-ftrap=common.
- -fns[={no|yes}]
- Turns on the SPARC nonstandard
floating-point mode.
The default is -fns=no, the SPARC
standard floating-point mode. -fns is
the same as -fns=yes.
Optional use of =yes or =no
provides a way of toggling the -fns flag
following some other macro flag that includes -fns, such as -fast.
On some SPARC systems, the
nonstandard floating point mode disables
"gradual underflow," causing tiny results to be flushed to zero rather
than producing subnormal numbers. It also causes subnormal operands to
be replaced silently by zero. On those SPARC systems that do not
support
gradual underflow and subnormal numbers in hardware, use of this option
can significantly improve the performance of some programs.
When nonstandard mode is enabled,
floating point arithmetic may produce
results that do not conform to the requirements of the IEEE 754
standard.
This option is effective only f
used when
compiling the main program.
- -fsimple[=n]
- Allows the optimizer to make
simplifying assumptions concerning
floating-point arithmetic.
The compiler defaults to
-fsimple=0. Specifying -fsimple, is equivalent
to -fsimple=1.
If n is present, it must
be 0, 1, or 2.
- -fsimple=0
Permits no simplifying assumptions. Preserve strict IEEE 754
conformance.
- -fsimple=1
Allows conservative simplifications. The resulting code does not
strictly conform to IEEE 754, but numeric results of most programs are
unchanged.
With -fsimple=1, the optimizer
can assume the following:
- IEEE 754 default rounding/trapping modes do
not change after process initialization.
- Computations producing no visible result
other than potential floating point exceptions may be deleted.
- Computations with Infinity or NaNs as
operands need not propagate NaNs to their results; for example, x*0 may
be replaced by 0.
- Computations do not depend on sign of zero.
With -fsimple=1, the optimizer is not allowed to optimize
completely
without regard to roundoff or exceptions. In particular, a
floating-point computation cannot be replaced by one that produces
different results with rounding modes held constant at runtime.
- -fsimple=2
The compiler attempts aggressive floating point optimizations that may
cause many programs to produce different numeric results due to changes
in rounding. For example, -fsimple=2 permits the optimizer to replace
all computations of x/y in a given loop with x*z, where x/y is
guaranteed to be evaluated at least once in the loop, z=1/y, and the
values of y and z are known to have constant values during execution of
the loop.
Even with -fsimple=2, the
optimizer is not permitted to introduce a
floating point exception in a program that otherwise produces none.
- -ftrap=t,[t...]
- Sets the IEEE trapping mode in effect at startup but does not install a SIGFPE handler. You can use ieee_handler(3M) or fex_set_handling(3M) to simultaneously enable traps and install a SIGFPE handler. If you specify more than one value, the list is processed sequentially from left to right.
t can be one of the following values:
- [no%]division
[Do not] Trap on division by zero.
- [no%]inexact[Do not] Trap on inexact result.
- [no%]invalid[Do not] Trap on invalid operation.
- [no%]overflow[Do not] Trap on overflow.
- %all Trap on all of the above.
- %none Trap on none of the above.
- common Trap on invalid, division by zero, and overflow.
- -KPIC
- Same as -xcode=pic32.
- -Kpic
- Same as -xcode=pic13.
- -mt
- Macro option that expands to -D_REENTRANT -lthread.
- -native
- Same as -xtargset=native.
- -Xc -xc99
- -Xc -xc99=all
- Same as the GCC option -std=c99.
- -Xc -xc99=none
- -Xc
- Same as the GCC option -std=iso9899:199409
- -xalias_level[=l]
- The compiler uses the -xalias_level option to
determine what assumptions it can make in order to perform optimizations using
type-based alias-analysis. This option places the indicated alias level into
effect for the translation units being compiled.
If you do not specify the -xalias_level
command, the compiler assumes -xalias_level=any. If you specify -xalias_level
without a value, the default is -xalias_level=layout.
Remember that if you issue the -xalias_level
option but you fail to adhere to all of the assumptions and restrictions about
aliasing described for any of the alias levels, the behavior of your program is undefined.
The value of l is one of the following:
- any
The compiler assumes that all memory references can alias at this level.
There is no type-based alias analysis at the level of -xalias_level=any.
- basic
If you use the -xalias_level=basic option, the compiler assumes that
memory references that involve different C basic types do not alias each
other. The compiler also assumes that references to all other types can
alias each other as well as any C basic type. The compiler assumes that
references using char * can alias any other type.
For example, at the -xalias_level=basic
level, the compiler assumes that a pointer variable of type int * is not going
to access a float object. Therefore it is safe for the compiler to perform
optimizations that assume a pointer of type float * will not alias the same
memory that is referenced with a pointer of type int *.
- weak
If you use the -xalias_level=weak option, the compiler assumes that any
structure pointer can point to any structure type.
Any structure or union type that
contains a reference to any type that is either referenced in an expression
in the source being compiled or is referenced from outside the source being
compiled, must be declared prior to the expression in the source being compiled.
You can satisfy this restriction
by including all the header files of a program that contain types that
reference any of the types of the objects referenced in any expression
of the source being compiled.
At the level of -xalias_level=weak,
the compiler assumes that memory references that involve different C basic
types do not alias each other. The compiler assumes that references using
char * alias memory references that involve any other type.
- layout
If you use the -xalias_level=layout option, the compiler assumes that
memory references that involve types with the same sequence of types in
memory can alias each other.
The compiler assumes that two
references with types that do not look the
same in memory do not alias each other. The compiler assumes that any
two memory accesses through different struct types alias if the initial
members of the structures look the same in memory. However, at this
level, you should not use a pointer to a struct to access some field of
a dissimilar struct object that is beyond any of the common initial
sequence of members that look the same in memory between the two
structs. This is because the compiler assumes that such references do
not alias each other.
At the level of
-xalias_level=layout the compiler assumes that memory
references that involve different C basic types do not alias each
other.
The compiler assumes that references using char * can alias memory
references involving any other type.
- strict
If you use the -xalias_level=strict option, the compiler assumes that
memory references, that involve types such as structs or unions, that
are the same when tags are removed, can alias each other. Conversely,
the compiler assumes that memory references involving types that are
not
the same even after tags are removed do not alias each other.
However, any structure or
union type that contains a reference to any
type that is part of any object referenced in an expression in the
source being compiled, or is referenced from outside the source being
compiled, must be declared prior to the expression in the source being
compiled.
You can satisfy this
restriction by including all the header files of a
program that contain types that reference any of the types of the
objects referenced in any expression of the source being compiled. At
the level of -xalias_level=strict the compiler assumes that memory
references that involve different C basic types do not alias each
other.
The compiler assumes that references using char * can alias any other
type.
- std
If you use the -xalias_level=std option, the compiler assumes that
types
and tags need to be the same to alias, however, references using char *
can alias any other type. This rule is the same as the restrictions on
the dereferencing of pointers that are found in the 1999 ISO C
standard.
Programs that properly use this rule will be very portable and should
see good performance gains under optimization.
- strong
If you use the -xalias_level=strong option, the same restrictions apply
as at the std level, but additionally, the compiler assumes that
pointers of type char * are used only to access an object of type char.
Also, the compiler assumes that there are no interior pointers. An
interior pointer is defined as a pointer that points to a member of a
struct.
- -xarch=isa
- Specify instruction set architecture (ISA).
The default architecture for which
the GCC for SPARC Systems compilers produce code is v8plus (for the UltraSPARC
® processor).
If you compile and link in
separate steps, make sure you specify the same value for -xarch in both steps.
Architectures that are accepted by
-xarch keyword isa are generic, generic64, native, native64,
v8plus, v8plusa, v8plusb, v9, v9a, v9b
Note that although -xarch can be
used alone, it is part of the expansion of the -xtarget option and may
be used to override the -xarch value that is set by a specific -xtarget option.
The possible values of the keyword
isa are
- generic
Compile for good performance on most 32-bit systems.
This option uses the best
instruction set for good performance on most processors without major
performance degradation on any of them. With each new release, the
definition of "best" instruction set may be adjusted, if appropriate.
- generic64
Compile 64-bit object binaries for good performance on most 64-bit
platform architectures.
This option uses the best instruction set for good performance on
SolarisTM operating systems with 64-bit kernels, without major
performance
degradation on any of them. With each new release, the definition of
best instruction set may be adjusted, if appropriate. Currently, this
is
equivalent to -xarch=v9.
- native
Compile for good performance on this system.
The compiler chooses the appropriate setting for producing 32-bit
binaries for the current system on which the processor is running.
- native64
Compile 64-bit object binaries for good performance on this system.
The compiler chooses the appropriate setting for producing 64-bit
binaries for the system on which the processor is running.
- v8a
Compile for the V8a version of the SPARC-V8 ISA.
By definition, V8a means the
V8 ISA, but without the fsmuld instruction.
This option enables the
compiler to generate code for good performance
on the V8a ISA.
- v8plus
Compile for the V8plus version of the SPARC-V9 ISA.
This is the default. By
definition, V8plus means the V9 ISA, but limited
to the 32-bit subset defined by the V8plus ISA specification, without
the Visual Instruction Set (VIS), and without other
implementation-specific ISA extensions.
This option enables the
compiler to generate code for good
performance on the V8plus ISA.
The resulting object code is in SPARC-V8+ ELF32 format and only
executes in a Solaris UltraSPARC processor environment--it does not run on a
V7 or V8 processor.
- v8plusa
Compile for the V8plusa version of the SPARC-V9 ISA.
By definition, V8plusa means
the V8plus architecture, plus the Visual
Instruction Set (VIS) version 1.0, and with UltraSPARC extensions.
This option enables the
compiler to generate code for good
performance on the UltraSPARC architecture, but limited to the
32-bit subset defined by the V8plus specification.
The resulting object code is in SPARC-V8+ ELF32 format and only
executes in a Solaris UltraSPARC processor environment--it does not run on a
V8 processor.
- v8plusb
Compile for the V8plusb version of the SPARC-V8plus ISA with UltraSPARC
III extensions.
Enables the compiler to
generate object code for the UltraSPARC
architecture, plus the Visual Instruction Set (VIS) version 2.0, and
with UltraSPARC III extensions.
The resulting object code is
in SPARC-V8+ ELF32 format and
executes only in a Solaris UltraSPARC III processor environment.
Compiling with this option uses the best instruction set for good
performance on the UltraSPARC III architecture.
- v9
Compile for the SPARC-V9 ISA.
Enables the compiler to
generate code for good performance on the V9
SPARC architecture.
The resulting .o object files
are in ELF64 format and can only be
linked with other SPARC-V9 object files in the same format.
The resulting executable can only be run on an UltraSPARC
processor running a 64-bit enabled Solaris operating system with
the 64-bit kernel.
-xarch=v9 is only available when compiling in a 64-bit enabled
Solaris environment.
- v9a
Compile for the SPARC-V9 ISA with UltraSPARC extensions.
Adds to the SPARC-V9 ISA the
Visual Instruction Set (VIS) and extensions
specific to UltraSPARC processors, and enables the compiler to generate
code for good performance on the V9 SPARC architecture.
The resulting .o object files
are in ELF64 format and can only be
linked with other SPARC-V9 object files in the same format.
The resulting executable can only be run on an UltraSPARC
processor running a 64-bit enabled Solaris operating system with
the 64-bit kernel.
-xarch=v9a is only available when compiling in a 64-bit enabled
Solaris operating system.
- v9b
Compile for the SPARC-V9 ISA with UltraSPARC III extensions.
Adds UltraSPARC III extensions
and VIS version 2.0 to the V9a version of
the SPARC-V9 ISA. Compiling with this option uses the best instruction
set for good performance in a Solaris UltraSPARC III processor environment.
The resulting object code is
in SPARC-V9 ELF64 format and can only
be linked with other SPARC-V9 object files in the same format.
The resulting executable can only be run on an UltraSPARC III
processor running a 64-bit enabled Solaris operating system with
the 64-bit kernel.
-xarch=v9b is only available when compiling in a 64-bit enabled
Solaris operating system.
- -xautopar
- Turns on automatic parallelization
for multiple processors.
Does dependence analysis (analyze loops for inter-iteration data
dependence) and loop restructuring. If optimization is not at -O2 or
higher, optimization is raised to -O2 and a warning is emitted.
Avoid -xautopar if you do your own
thread management.
To achieve faster execution, this
option requires a multiple processor
system. On a single-processor system, the resulting binary usually runs
slower.
To request a number of processors,
set the PARALLEL environment
variable. The default is 1.
- Do not request more processors than are
available.
- If N is the number of processors on the machine,
then for a
one-user, multiprocessor system, try PARALLEL=N-1.
If you use -xautopar and compile and link in one step, then
linking
automatically includes the microtasking library and the threads-safe C
runtime library. If you use -xautopar and compile and link in separate
steps, then you must also link with -xautopar.
- -xbuiltin[=(%all|%none)]
- Use the -xbuiltin[=(%all|%none)]
command when you want to improve the
optimization of code that calls standard library functions. Many
standard library functions, such as the ones defined in math.h and
stdio.h, are commonly used by various programs. This command lets the
compiler substitute intrinsic functions or inline system functions
where
profitable for performance. See the er_src(1) man page for an
explanation of how to read compiler commentary in object files to
determine for which functions the compiler actually makes a
substitution.
However, these substitutions can
cause the setting of errno to become
unreliable. If your program depends on the value of errno, avoid this
option.
If you do not specify -xbuiltin,
the default is -xbuiltin=%none, which
means no functions from the standard libraries are substituted or
inlined. If you specify -xbuiltin, but do not provide any argument, the
default is -xbuiltin%all, which means the compiler substitutes
intrinsics or inlines standard library functions as it determines the
optimization benefit.
If you compile with -fast, then
-xbuiltin is set to %all.
Note - -xbuiltin only inlines
global functions defined in system
header files, never static functions defined by the user.
- -xcache[=c]
- Defines the cache properties for
use by the optimizer. c must be one
of the following:
- generic
- s1/l1/a1
- s1/l1/a1:s2/l2/a2
- s1/l1/a1:s2/l2/a2:s3/l3/a3
The si, li, ai are defined as follows:
- si
The size of the data cache at level i, in kilobytes
- li
The line size of the data cache at level i, in bytes
- ai
The associativity of the data cache at level i
Although this option can be used alone, it is part of the expansion of
the -xtarget option; its primary use is to override a value
supplied
by the -xtarget option.
This option specifies the cache
properties that the optimizer can use.
It does not guarantee that any particular cache property is used. The
following lists the -xcache values.
Example: -xcache=16/32/4:1024/32/1 specifies the following:
Level 1 cache has:
- 16K bytes
- 32 bytes line size
- 4-way associativity
Level 2 cache has:
- 1024K bytes
- 32 bytes line size
- Direct mapping associativity
- -xchip[=c]
- Specifies the target processor for
use by the optimizer.
c must be one of the
following: generic, ultra, ultra2, ultra2e, ultra2i, ultra3,
ultra3cu, ultra4.
Although this option can be used
alone, it is part of the expansion of
the -xtarget option; its /primary /use is to override a value supplied
by the -xtarget option.
This option specifies timing
properties by specifying the target processor.
Some effects are:
- The ordering of instructions, that is, scheduling
- The way the compiler uses branches
- The instructions to use in cases where
semantically equivalent
alternatives are available
The following lists the -xchip values for SPARC platforms:
- -xcode[=v]
- Specify code address space.
v must be one of:
- abs32
Generate 32-bit absolute addresses. Code + data + bss size is limited
to
2**32 bytes. This is the default on 32-bit architectures:
-xarch=(generic|v8a|v8plus|v8plusa|v8plusb)
- abs44
Generate 44-bit absolute addresses. Code + data + bss size is limited
to
2**44 bytes. This is the default on 64-bit architectures:
-xarch=(v9|v9a|v9b)
- abs64
Generate 64-bit absolute addresses. Available only on 64-bit
architectures: -xarch=(v9|v9a|v9b)
- pic13
Generate position-independent code for use in shared libraries (small
model). Equivalent to -Kpic. Permits references to at most 2**11 unique
external symbols on 32-bit architectures, 2**10 on 64-bit
architectures.
The -xcode=pic13 command is
similar to -xcode=pic32, except that the
size of the global offset table is limited to 8 Kbytes.
- pic32
Generate position-independent code for use in shared libraries (large
model). Equivalent to -KPIC. Permits references to at most 2**30 unique
external symbols on 32-bit architectures, 2**29 on 64-bit
architectures.
Each reference to a global
datum is generated as a dereference of a
pointer in the global offset table. Each function call is generated in
pc-relative addressing mode through a procedure linkage table. With
this
option, the global offset table spans the range of 32-bit addresses in
those rare cases where there are too many global data objects for
-xcode=pic32.
The default for SPARC and
UltraSPARC V9 (with -xarch=v9|v9a|v9b) is -xcode=abs44.
When building shared dynamic
libraries, the default -xcode values of
abs44 and abs32 will not work with -xarch=v9 or v9a or v9b, so a -xcode
value must be given. Specify -xcode=pic13 or -xcode=pic32. There are
two
nominal performance costs with -xcode=pic13 and -xcode=pic32 on SPARC:
- A routine compiled with either -xcode=pic13 or
-xcode=pic32
executes a few extra instructions upon entry to set a register to
point at a table (_GLOBAL_OFFSET_TABLE_) used for accessing a
shared library's global or static variables.
- Each access to a global or static variable
involves an extra
indirect memory reference through _GLOBAL_OFFSET_TABLE_. If the
compile is done with -xcode=pic32, there are two additional
instructions per global and static memory reference.
When considering the above costs, remember that the use of -xcode=pic13
and -xcode=pic32 can significantly reduce system memory requirements,
due to the effect of library code sharing. Every page of code in a
shared library compiled -xcode=pic13 or -xcode=pic32 can be shared by
every process that uses the library. If a page of code in a shared
library contains even a single non-pic (that is, absolute) memory
reference, the page becomes nonsharable, and a copy of the page must be
created each time a program using the library is executed.
The easiest way to tell whether or
not a .o file has been compiled with -xcode=pic13 or -xcode=pic32 is
with the nm command:
% nm file.o | grep
_GLOBAL_OFFSET_TABLE_ U _GLOBAL_OFFSET_TABLE_
A .o file containing
position-independent code contains an unresolved external reference to
_GLOBAL_OFFSET_TABLE_, as indicated by the letter U.
To determine whether to use
-xcode=pic13 or -xcode=pic32, check the size
of the Global Offset Table (GOT) by using elfdump -c (see the
elfdump(1)
man page for more information) and to look for the section header,
sh_name: .got. The sh_size value is the size of the GOT. If the GOT is
less than 8,192 bytes, specify -xcode=pic13, otherwise specify
-xcode=pic32.
In general, use the following
guidelines to determine how you should use
-xcode:
- If you are building an executable you should not
use -xcode=pic13 or -xcode=pic32.
- If you are building an archive library only for
linking into
executables you should not use -xcode=pic13 or -xcode=pic32.
- If you are building a shared library, start with
-xcode=pic13 and
once the GOT size exceeds 8,192 bytes, use -xcode=pic32.
- If you are building an archive library for
linking into shared
libraries you should just use -xcode=pic32.
- -xdebugformat=dwarf
- Same as the GCC option -gdwarf-2
- -xdepend=[yes|no]
- Analyzes loops for inter-iteration
data dependencies and does
loop restructuring.
Loop restructuring includes loop
interchange, loop fusion, scalar
replacement, and elimination of "dead" array assignments.
If you do not specify -xdepend,
the default is -xdepend=no which means
the compiler does not analyze loops for data dependencies. If you
specify -xdepend, but do not specify an argument, the compiler sets the
option to -xdepend=yes which means the compiler analyzes loops for data
dependencies.
Dependency analysis is also
included with -xautopar or -xparallel. The
dependency analysis is done at compile time.
Dependency analysis may help on
single-processor systems. However, if
you try -xdepend on single-processor systems, you should not use either
-xautopar or -xexplicitpar. If either of them is on, then the -xdepend
optimization is done for multiple-processor systems.
- -xhwcprof
- Enables compiler support for
hardware counter-based profiling.
When -xhwcprof=[enable|disable] is
enabled, the compiler generates
information that helps tools match hardware-counter data reference and
miss events with associated instructions. Corresponding data-types and
structure-members may also be identified in conjunction with symbolic
information (produced with -g). This information can be useful in
performance analysis and it is not easily identified from profiles
based
on code addresses, source statements, or routines.
You can compile a specified set of
object files with -xhwcprof. However,
-xhwcprof is most useful when applied to all object files in the
application. This will provide coverage to identify and correlate all
memory references distributed in the application's object files.
If you are compiling and linking
in separate steps, use -xhwcprof at
link time as well. Future extensions to -xhwcprof may require its use
at
link time.
An instance of -xhwcprof=enable or
-xhwcprof=disable overrides all
previous instances of -xhwcprof in the same command line.
-xhwcprof is disabled by default.
Specifying -xhwcprof without any
arguments is the equivalent to -xhwcprof=enable.
-xhwcprof requires that
optimization be turned on and that the debug
data format be set to DWARF (-xdebugformat=dwarf).
The combination of -xhwcprof and
-g increases compiler temporary file
storage requirements by more than the sum of the increases due to
-xhwcprof and -g specified alone.
The following command compiles
example.c and specifies support for
hardware counter profiling and symbolic analysis of data types and
structure members using DWARF symbols:
example% gcc -c -O -xhwcprof -g
-xdebugformat=dwarf example.c
- -xinline=list
- The format of the list for
-xinline is as follows:
[{%auto,func_name,no%func_name}[,{%auto,func_name,no%func_name}]...]
-xinline tries to inline only those functions specified in the optional
list. The list is either empty, or comprised of a comma-separated list
of func_name, no%func_name, or %auto, where func_name is a
function
name. -xinline only has an effect at -O2 or higher.
- %auto
Specifies that the compiler is to attempt to automatically inline all
functions in the source file. %auto only takes effect at -xO4 or higher
optimization levels. %auto is silently ignored at -xO3 or lower
optimization levels.
- func_name
Specifies that the compiler is to attempt to inline the named function.
- no%func_name
Specifies that the compiler is not to inline the named function.
- no value
Specifies that the compiler is not to attempt to inline any functions
in
the source files.
The list of values accumulates from left to right. So for a
specification of -xinline=%auto,no%foo the compiler attempts to inline
all functions except foo. For a specification of
-xinline=%bar,%myfunc,no%bar the compiler only tries to inline myfunc.
A function is not inlined if any
of the following conditions apply. No
warning is issued.
- Optimization is less than -O2.
- The routine cannot be found.
- Inlining the routine does not look practicable to
the optimizer.
- The source for the routine is not in the file
being compiled.
If you specify multiple -xinline options on the command line, they do
not accumulate. The last -xinline on the command line specifies what
functions the compiler attempts to inline.
- -xipo[=a]
- Replace a with 0, 1, or 2.
-xipo without any arguments is
equivalent -xipo=1. -xipo=0 is the default setting and turns off -xipo.
With -xipo=1, the compiler performs inlining across all source files.
With -xipo=2, the compiler
performs interprocedural aliasing analysis as
well as optimizations of memory allocation and layout to improve cache
performance.
The compiler performs
whole-program optimizations by invoking an
interprocedural analysis component. Unlike -xcrossfile, -xipo performs
optimizations across all object files in the link step, and is not
limited to just the source files of the compile command. However
whole-program optimizations performed with -xipo do
not include assembly (.s) source files.
You must specify -xipo both at
compile time and at link time.
The -xipo option generates
significantly larger object files due to the
additional information needed to perform optimizations across files.
However, this additional information does not become part of the final
executable binary file. Any increase in the size of the executable
program is due to the additional optimizations performed. The object
files created in the compilation steps have additional analysis
information compiled within them to permit crossfile optimizations to
take place at the link step.
-xipo is particularly useful when
compiling and linking large multi-file
applications. Object files compiled with this flag have analysis
information compiled within them that enables interprocedural analysis
across source and pre-compiled program files.
However, analysis and optimization
is limited to the object files
compiled with -xipo, and does not extend to object files in the
libraries.
-xipo is multiphased, so you need
to specify -xipo for each step if you
compile and link in separate steps.
Other important information about
-xipo:
- It requires an optimization level of at least
-O3.
- Objects that are compiled without -xipo can be
linked freely with
objects that are compiled with -xipo.
- -xipo_archive=[a]
- The -xipo_archive option enables
the compiler to optimize object files
that are passed to the linker with object files that were compiled with
-xipo and that reside in the archive library (.a) before producing an
executable. Any object files contained in the library that were
optimized during the compilation are replaced with their optimized
version.
a is one of the following:
- writeback
The compiler optimizes object files passed to the linker with object
files compiled with -xipo that reside in the archive library (.a)
before
producing an executable. Any object files contained in the library that
were optimized during the compilation are replaced with an optimized
version.
- readonly
The compiler optimizes object files passed to the linker with object
files compiled with -xipo that reside in the archive library (.a)
before
producing an executable.
- none
There is no processing of archive files.
If you do not specify a setting for -xipo_archive, the compiler sets it
to -xipo_archive=none.
- -xlibmil
- Inlines some library routines for
faster execution. This option selects
the appropriate assembly language inline templates for the
floating-point option and platform for your system.
-xlibmil inlines a function
regardless of any specification of the
function as part of the -xinline flag.
However, these substitutions can
cause the setting of errno to become
unreliable. If your program depends on the value of errno, avoid this
option.
- -xlibmopt
- Enables the compiler to use a
library of optimized math routines.
The math routine library is
optimized for performance and usually
generates faster code. The results may be slightly different from those
produced by the normal math library. If so, they usually differ in the
last bit.
However, these substitutions can
cause the setting of errno to become
unreliable. If your program depends on the value of errno, avoid this
option.
The order on the command line for
this library option is not significant.
This option is set by the -fast
option.
- -xlinkopt[=level]
- Instructs the compiler to perform
link-time optimizations on
relocatable object files. These optimizations are performed at link
time
by analyzing the object binary code. The object files are not rewritten
but the resulting executable code may differ from the original object
codes.
You must use -xlinkopt on at least
some of the compilation commands for
-xlinkopt to be useful at link time. The optimizer can still perform
some limited optimizations on object binaries that are not compiled
with
-xlinkopt.
-xlinkopt optimizes code coming
from static libraries that appear on the
compiler command line, but it skips and does not optimize code coming
from shared (dynamic) libraries that appear on the command line. You
can
also use -xlinkopt when you build shared libraries (compiling with -G
).
level sets the level of
optimizations performed, and must be 0, 1, or
2. The optimization levels are:
- 0
The post-optimizer is disabled. (This is the default.)
- 1
Perform optimizations based on control flow analysis, including
instruction cache coloring and branch optimizations, at link time.
- 2
Perform additional data flow analysis, including dead-code elimination
and address computation simplification, at link time.
If you compile in separate steps, -xlinkopt must appear on both compile
and link steps:
example% gcc -c -xlinkopt a.c b.c
example% gcc -o myprog -xlinkopt=2
a.o
Note that the level parameter is
only used when the compiler is linking.
In the example above, the post- optimization level used is 2 even
though
the object binaries were compiled with an implied level of 1.
Specifying -xlinkopt without a
level parameter implies -xlinkopt=1.
This option is most effective when
you use it to compile the whole
program, and with profile feedback. Profiling reveals the most and
least
used parts of the code and building directs the optimizer to focus its
effort accordingly. This is particularly important with large
applications where optimal placement of code performed at link time can
reduce instruction cache misses. Typically, this compiles as follows:
example% gcc -o progt -O3
-xprofile=collect:prog file.c
example% progt
example% gcc -o prog -O3
-xprofile=use:prog -xlinkopt file.c
For details on using profile
feedback, see -xprofile=p
You cannot use the link-time
post-optimizer with the incremental linker,
ild. -xlinkopt sets the default linker to be ld. If you enable the
incremental linker explicitly with -xildon and also specify -xlinkopt,
-xlinkopt is disabled.
Note that compiling with this
option increases link time slightly.
Object file sizes also increase, but the size of the executable remains
the same. Compiling with -xlinkopt and -g increases the size of the
executable by including debugging information.
- -xloopinfo
- Shows which loops are parallelized
and which are not. Gives a
short reason for not parallelizing a loop. The -xloopinfo option is
valid only if -xautopar or -xparallel is specified;
otherwise, the compiler issues a warning.
To achieve faster execution, this
option requires a multiprocessor
system. On a single-processor system, the generated code usually runs
slower.
- -xmemalign=ab
- Specify maximum assumed memory
alignment and behavior of
misaligned data accesses. There must be a value for both a
(alignment)
and b (behavior). a specifies the maximum assumed
memory alignment
and b specifies the behavior for misaligned memory accesses.
The
following lists the alignment and behavior values for -xmemalign
- 1
Assume at most 1 byte alignment.
- i
Interpret access and continue execution.
- 2
Assume at most 2 byte alignment.
- s
Raise signal SIGBUS.
- 4
Assume at most 4 byte alignment.
- f
Raise signal SIGBUS for alignments less or equal to 4,otherwise
interpret access and continue execution.
- 8
Assume at most 8 byte alignment.
- 16
Assume at most 16 byte alignment
You must specify -xmemalign whenever you want to link to an object file
that was compiled with the value of /b/ set to either i or f.
For memory accesses where the
alignment is determinable at compile time,
the compiler generates the appropriate load/store instruction sequence
for that alignment of data.
For memory accesses where the
alignment cannot be determined at compile
time, the compiler must assume an alignment to generate the needed
load/store sequence.
The -xmemalign option allows you
to specify the maximum memory alignment
of data to be assumed by the compiler in these indeterminable
situations. It also specifies the error behavior to be followed at run
time when a misaligned memory access does take place.
The following default values only
apply when no -xmemalign option is
present:
- -xmemalgin=8i for all v8 architectures.
- -xmemalign=8s for all v9 architrectures.
The default when -xmemalign option is present but no value is
given is -xmemalign=1i for all -xarch values.
The following list shows how you
can use -xmemalign to handle different
alignment situations.
- -xmemalign=1s
There are many misaligned accesses so trap handling is too slow.
- -xmemalign=8i
There are occasional, intentional, misaligned accesses in code that is
otherwise correct.
- -xmemalign=8s
There should be no misaligned accesses in the program.
- -xmemalign=2s
You want to check for possible odd-byte accesses.
- -xmemalign=2i
You want to check for possible odd-byte access and you want the program
to work.
- -xpagesize=n
- Sets the preferred page size for
the stack and the heap.
The n value must be one of
the following: 8K, 64K, 512K, 4M, 32M,
256M, 2G, 16G, or default.
You must specify a valid page size
for the Solaris operating system on
the target platform, as returned by getpagesize(3C). If you do not
specify a valid pagesize, the request is silently ignored at run-time.
The Solaris operating system offers no guarantee that the page size
request will be honored.
You can use pmap(1) or meminfo(2)
to determine page size of the target
platform.
The -xpagesize option has no
effect unless you use it at compile time
and at link time.
If you specify -xpagesize=default,
the Solaris operating system sets the
page size.
Compiling with this option has the
same effect as setting the LD_PRELOAD
environment variable to mpss.so.1 with the equivalent options, or
running the Solaris 9 command ppgsz(1) with the equivalent options
before running the program. See the Solaris 9 man pages for details.
This option is a macro for
-xpagesize_heap and -xpagesize_stack. These
two options accept the same arguments as -xpagesize: 8K, 64K, 512K, 4M,
32M, 256M, 2G, 16G, or default. You can set them both with the same
value by specifying -xpagesize or you can specify them individually
with
different values.
- -xpagesize_heap=n
- Set the page size in memory for the
heap.
The value for n must be
one of the following: 8K, 64K, 512K, 4M, 32M,
256M, 2G, 16G, or default. You must specify a valid page size for the
Solaris operating system on the target platform, as returned by
getpagesize(3C). If you do not specify a valid page size, the request
is
silently ignored at run-time.
You can use pmap(1) or meminfo(2)
to determine page size at the target
platform.
If you specify
-xpagesize_heap=default, the Solaris operating system
sets the page size.
Compiling with this option has the
same effect as setting the LD_PRELOAD
environment variable to mpss.so.1 with the equivalent options, or
running the Solaris 9 command ppgsz(1) with the equivalent options
before running the program. See the Solaris 9 man pages for details.
The -xpagesize_heap option has no
effect unless you use it at compile
time and at link time.
- -xpagesize_stack=n
- Set the page size in memory for the
stack.
The value for n must be
one of the following: 8K, 64K, 512K, 4M, 32M,
256M, 2G, 16G, or default. You must specify a valid page size for the
Solaris operating system on the target platform, as returned by
getpagesize(3C). If you do not specify a valid page size, the request
is
silently ignored at run-time. You can use pmap(1) or meminfo(2) to
determine page size at the target platform.
If you specify
-xpagesize_stack=default, the Solaris operating system
sets the page size.
Compiling with this option has the
same effect as setting the LD_PRELOAD
environment variable to mpss.so.1 with the equivalent options, or
running the Solaris 9 command ppgsz(1) with the equivalent options
before running the program. See the Solaris 9 man pages for details.
The -xpagesize_stack option has no
effect unless you use it at compile
time and at link time.
- -xparallel
- Parallelizes loops both
automatically by the compiler and
explicitly specified by the programmer. The -xparallel option is a
macro, and is equivalent to specifying all three of -xautopar,
-xdepend,
and -xexplicitpar. With explicit parallelization of loops, there is a
risk of producing incorrect results. If optimization is not at -O3 or
higher, optimization is raised to -O3 and a warning is issued.
Avoid -xparallel if you do your
own thread management.
To get faster code, this option
requires a multiprocessor system. On a
single-processor system, the generated code usually runs slower.
If you compile and link in one
step, -xparallel links with the
microtasking library and the threads-safe C runtime library. If you
compile and link in separate steps, and you compile with -xparallel,
then link with -xparallel.
- -xprefetch[=val[,val]]
- Enable prefetch instructions on
those architectures that
support prefetch, such as the UltraSPARC II processor environment. (-xarch=v8plus, v9plusa, v9,
or
v9a)
Explicit prefetching should only
be used under special circumstances
that are supported by measurements.
val must be one of the
following:
- latx:factor
Adjust the compiler's assumed prefetch-to-load and prefetch-to-store
latencies by the specified factor. You can only combine this flag with
-xprefetch=auto.
- [no%]auto
[Disable] Enable automatic generation of prefetch instructions
- [no%]explicit
[Disable] Enable explicit prefetch macros
- yes
Obsolete - do not use. Use -xprefetch=auto,explicit instead.
The default for -xprefetch is
-xprefetch=auto,explicit.
The sun_prefetch.h header file
provides the macros that you can use to
specify explicit prefetch instructions. The prefetches are
approximately
at the place in the executable that corresponds to where the macros
appear.
Prefetch Latency Ratio
The prefetch latency is the
hardware delay between the execution of a
prefetch instruction and the time the data being prefetched is
available
in the cache.
The factor must be a positive
number of the form n.n.
The compiler assumes a prefetch
latency value when determining how far
apart to place a prefetch instruction and the load or store instruction
that uses the prefetched data. The assumed latency between a prefetch
and a load may not be the same as the assumed latency between a
prefetch
and a store.
The compiler tunes the prefetch
mechanism for optimal performance across
a wide range of machines and applications. This tuning may not always
be
optimal. For memory-intensive applications, especially applications
intended to run on large multiprocessors, you may be able to obtain
better performance by increasing the prefetch latency values. To
increase the values, use a factor that is greater than 1 (one). A value
between .5 and 2.0 will most likely pro vide the maximum performance.
For applications with datasets
that reside entirely within the external
cache, you may be able to obtain better performance by decreasing the
prefetch latency values. To decrease the values, use a factor that is
less than one.
To use the latx:factor suboption,
start with a factor value near 1.0 and
run performance tests against the application. Then increase or
decrease
the factor, as appropriate, and run the performance tests again.
Continue adjusting the factor and running the performance tests until
you achieve optimum performance. When you increase or decrease the
factor in small steps, you will see no performance difference for a few
steps, then a sudden difference, then it will level off again.
- -xprefetch_level=l
- Use the -xprefetch_level option to
control the aggressiveness
of automatic insertion of prefetch instructions as determined with
-xprefetch=auto. l must be 1, 2, or 3. The compiler becomes
more
aggressive, or in other words, introduces more prefetches with each,
higher, level of -xprefetch_level.
The appropriate value for the
-xprefetch_level depends on the number of
cache misses the application may have. Higher -xprefetch_level values
have the potential to improve the performance of applications.
This option is effective only when
it is compiled with -xprefetch=auto,
with optimization level 3 or greater, and generate code for a platform
that supports prefetch (v8plus, v8plusa, v9, v9a, v9b, generic64,
native64).
-xprefetch_level=1 enables
automatic generation of prefetch
instructions. -xprefetch_level=2 enables additional generation beyond
level 1 and -xprefetch_level=3 enables additional generation beyond
level 2.
The default is -xprefetch_level=1
when you specify -xprefetch=auto.
- -xprofile=p
- Use this option to collect and save
execution-frequency data so you can
then use the data in subsequent runs to improve performance.
You must specify -xprofile at
compile time as well as link time.
Compiling with high optimization
levels (for example -O3) is enhanced
by providing the compiler with runtime-performance feedback. In order
to
produce runtime-performance feedback, you must compile with
-xprofile=collect, run the executable against a typical data set, and
then recompile at the highest optimization level and with
-xprofile=use.
Profile collection is safe for
multithreaded applications. That is,
profiling a program that does its own multitasking ( -mt ) produces
accurate results.
p must be collect[:name]
or use[:name].
- collect[:name]
Collects and saves execution-frequency data for later use by the
optimizer with -xprofile=use. The compiler generates code to measure
statement execution-frequency.
The name is the name
of the program that is being analyzed. This name
is optional. If name is not specified, a.out is assumed to be
the name
of the executable.
You can set the environment
variables SUN_PROFDATA and SUN_PROFDATA_DIR
to control where a program compiled with -xprofile=collect stores the
profile data. If set, the -xprofile=collect data is written to
SUN_PROFDATA_DIR/SUN_PROFDATA.
If these environment variables
are not set, the profile data is written
to name.profile/feedback in the current directory, where name
is the
name of the executable or the name specified in the
-xprofile=collect:name flag. -xprofile does not append .profile
to
name if name already ends in
.profile. If you run the program
several times, the executions-frequency data accumulates in the
feedback
file; that is, output from prior executions is not lost.
If you are compiling and
linking in separate steps, make sure that any
object files compiled with -xprofile=collect are also linked with
-xprofile=collect.
- use[:name]
The program is optimized by using the execution-frequency data
generated
and saved in the feedback files from a previous execution of the
program
that was compiled with -xprofile=collect.
The name is the name
of the program that is being analyzed. This name
is optional. If name is not specified, a.out is assumed to be
the name
of the executable.
Except for the -xprofile
option which changes from -xprofile=collect to
-xprofile=use, the source files and other compiler options must be
exactly the same as those used for the compilation that created the
compiled program which in turn generated the feedback file. The same
version of the compiler must be used for both the collect build and the
use build as well. If compiled with -xprofile=collect:name, the
same
program name name must appear in the optimizing compilation:
-xprofile=use:name.
The association between an
object file and its profile data is based on
the UNIX pathname of the object file when it is compiled with
-xprofile=collect. In some circumstances, the compiler will not
associate an object file with its profile data: the object file has no
profile data because it was not previously compiled with
-xprofile=collect, the object file is not linked in a program with
-xprofile=collect, the program has never been executed.
The compiler can also become
confused if an object file was previously
compiled in a different directory with -xprofile=collect and this
object
file shares a common basename with other object files compiled with
-xprofile=collect but they cannot be uniquely identified by the names
of
their containing directories. In this case, even if the object file has
profile data, the compiler will not be able to find it in the feedback
directory when the object file is recompiled with -xprofile=use.
All of these situations cause
the compiler to lose the association
between an object file and its profile data.
- -xreduction
- Turns on reduction recognition
during automatic
parallelization. -xreduction must be specified with -xautopar, or
-xparallel otherwise the compiler issues a warning.
When reduction recognition is
enabled, the compiler parallelizes
reductions such as dot products, maximum and minimum finding.
These
reductions yield different roundoffs than obtained by unparallelized
code.
- -xregs=r[,r...]
- Specifies the usage of registers
for the generated code.
r is a comma-separated list
that consists of one or more of the
following: [no%]appl, [no%]float.
Example: -xregs=appl,no%float
The meaning of the values are:
- [no%]appl
[Does not] Allow the compiler to generate code using the application
registers as scratch registers. The application registers are:
g2, g3, g4 (v8a, v8, v8plus,
v8plusa, v8plusb)
g2, g3 (v9, v9a, v9b)
It is strongly recommended
that all system software and libraries be
compiled using -xreg=no%appl. System software (including shared
libraries) must preserve these registers' values for the application.
Their use is intended to be controlled by the compilation system and
must be consistent throughout the application.
In the SPARC ABI, these
registers are described as /application/
registers. Using these registers can increase performance because fewer
load and store instructions are needed. However, such use can conflict
with some old library programs written in assembly code.
- [no%]float
[Does not] Allow the compiler to generate code by using the
floating-point registers as scratch registers for integer values. Use
of
floating-point values may use these registers regardless of this
option.
If you want your code to be free of all references to floating point
registers, you need to use -xregs=no%float and also make sure your code
does not use floating point types in any way.
The default is
-xregs=appl,float.
It is strongly recommended
that you compile code intended for shared
libraries that will link with applications, with -xregs=no%appl,float.
At the very least, the shared library should explicitly document how it
uses the application registers so that applications linking with those
libraries know how to cope with the issue.
For example, an application
using the registers in some global sense
(such as using a register to point to some critical data structure)
would need to know exactly how a library with code compiled without
-xregs=no%appl is using the application registers in order to safely
link with that library.
- -xrestrict
- Treats pointer-valued function
parameters as restricted
pointers.
if -xrestrict is specified, all pointer parameters
in the entire C file are treated as restricted.
This command-line option can be
used on its own, but it is best used
with optimization. For example, the command:
%gcc -O3 -xrestrict prog.c
treats all pointer parameters in
the file prog.c as restricted pointers.
This option is off by default.
- -xsafe=mem
- Allows the compiler to assume no
memory-based traps occur.
This option grants permission to
use the speculative load instruction on
V9 machines. It is only effective when you specify -O3 optimization and
-xarch=v8plus|v8plusa|v9|v9a.
Note - Because non-faulting loads
do not cause a trap when a fault
such as address misalignment or segmentation violation occurs, you
should use this option only for programs in which such faults cannot
occur. Because few programs incur memory-based traps, you can safely
use
this option for most programs. Do not use this option for programs that
explicitly depend on memory-based traps to handle exceptional
conditions.
- -xspace
- Does no optimizations or
parallelization of loops that increase code size.
Example: The compiler will not
unroll loops or parallelize loops if it
increases code size.
- -xtarget=t
- Specifies the target system for
instruction set and optimization.
The value of t must be one
of the following: native, generic,
system-name.
The -xtarget option is a macro
that permits a quick and easy
specification of the -xarch, -xchip, and -xcache combinations that
occur
on real systems. The only meaning of -xtarget is in its expansion.
- native
Gets the best performance on the host system.
The compiler generates code
for the best performance on the host system.
It determines the available architecture, chip, and cache properties of
the machine on which the compiler is running.
- generic
Gets the best performance for generic architecture, chip, and cache.
>
The compiler expands -xtarget=generic to:
>
-xarch=generic -xchip=generic -xcache=generic
This is the default value.
- system-name
Gets the best performance for the specified system.
The performance of some programs may benefit by providing the compiler
with an accurate description of the target computer hardware. When
program performance is critical, the proper specification of the target
hardware could be very important. This is especially true when running
on the newer SPARC processors. However, for most programs and older
SPARC processors, the performance gain is negligible and a generic
specification is sufficient.
Each specific value for -xtarget
expands into a specific set of values
for the -xarch, -xchip, and -xcache options.
- -xunroll=n
- Suggests to the optimizer to unroll
loops n times. n is a positive
integer. When n is 1, it is a command, and the compiler unrolls
no
loops. When n is greater than 1, the -xunroll=n merely
suggests to
the compiler that it unroll loops n times.
- -xvector[={yes|no}]
- Enable automatic generation of
calls to the vector library
functions. You must use default rounding mode by specifying
-fround=nearest when you use this option.
-xvector=yes permits the compiler
to transform math library calls within
loops into single calls to the equivalent vector math routines when
such
transformations are possible. Such transformations could result in a
performance improvement for loops with large loop counts.
If you do not specify -xvector,
the default is -xvector=no. -xvector=no
undoes a previously specified -xvector=yes. If you specify -xvector but
do not supply a value, the default is -xvector=yes.
If you use -xvector on the command
line without previously specifying
-xdepend, -xvector triggers -xdepend.
The compiler includes the libmvec
libraries in the load step.
If you compile and link with
separate commands, be sure to use -xvector
in the linking gcc command.