Login | Register
Login | Register

My pages Projects SunSource.net openCollabNet
OpenSPARC.net >  Cool Tools >  GCC for SPARC® Systems >  Additional flags

Cool Tools - GCC for SPARC® Systems

Additional command line option flags

-dalign
Equivalent to -xmemalign=8s.
-fast
Is a macro that can be used as a starting point for turning an executable for maximum runtime performance.

Modules that are compiled with -fast must also be linked with -fast.

The -fast option is unsuitable for programs intended to run on a different target than the compilation machine. In such cases, follow -fast with the appropriate -xtarget option. For example:

% gcc -fast -xtarget=ultra ...

With -xlibmil, exceptions cannot be noted by setting errno or calling matherr(3m).

The -fast option is unsuitable for programs that require strict conformance to the IEEE 754 Standard.

-fast expands to: -O3 -ffast-math -fns -fsimple=2 -ftrap=%none -xalias_level=basic -xbuiltin=%all -xdepend -xlibmil -xlibmopt -xprefetch=auto,explicit -xprefetch_level=1 -xtarget=native

Note - Some optimizations make certain assumptions about program behavior. If the program does not conform to these assumptions, the application may crash or produce incorrect results. Please refer to the description of the individual options to determine if your program is suitable for compilation with -fast.

The optimizations performed by these options may alter the behavior of programs from that defined by the ISO C and IEEE standards. See the description of the specific option for details.

-fast acts like a macro expansion on the command line. Therefore, you can override the optimization level and code generation option aspects by following -fast with the desired optimization level or code generation option. Compiling with the -fast -O2 pair is like compiling with the -O3 -O2 pair. The latter specification takes precedence.

Do not use this option for programs that depend on IEEE standard exception handling; you can get different numerical results, premature program termination, or unexpected SIGFPE signals.

-fnonstd
This option is a macro for -fns and -ftrap=common.
-fns[={no|yes}]
Turns on the SPARC nonstandard floating-point mode.

The default is -fns=no, the SPARC standard floating-point mode. -fns is the same as -fns=yes.

Optional use of =yes or =no provides a way of toggling the -fns flag following some other macro flag that includes -fns, such as -fast.

On some SPARC systems, the nonstandard floating point mode disables "gradual underflow," causing tiny results to be flushed to zero rather than producing subnormal numbers. It also causes subnormal operands to be replaced silently by zero. On those SPARC systems that do not support gradual underflow and subnormal numbers in hardware, use of this option can significantly improve the performance of some programs.

When nonstandard mode is enabled, floating point arithmetic may produce results that do not conform to the requirements of the IEEE 754 standard.

This option is effective only f used when compiling the main program.

-fsimple[=n]
Allows the optimizer to make simplifying assumptions concerning floating-point arithmetic.

The compiler defaults to -fsimple=0. Specifying -fsimple, is equivalent to -fsimple=1.

If n is present, it must be 0, 1, or 2.

  • -fsimple=0 Permits no simplifying assumptions. Preserve strict IEEE 754 conformance.
  • -fsimple=1 Allows conservative simplifications. The resulting code does not strictly conform to IEEE 754, but numeric results of most programs are unchanged.

    With -fsimple=1, the optimizer can assume the following:

    • IEEE 754 default rounding/trapping modes do not change after process initialization.
    • Computations producing no visible result other than potential floating point exceptions may be deleted.
    • Computations with Infinity or NaNs as operands need not propagate NaNs to their results; for example, x*0 may be replaced by 0.
    • Computations do not depend on sign of zero.

    With -fsimple=1, the optimizer is not allowed to optimize completely without regard to roundoff or exceptions. In particular, a floating-point computation cannot be replaced by one that produces different results with rounding modes held constant at runtime.

  • -fsimple=2 The compiler attempts aggressive floating point optimizations that may cause many programs to produce different numeric results due to changes in rounding. For example, -fsimple=2 permits the optimizer to replace all computations of x/y in a given loop with x*z, where x/y is guaranteed to be evaluated at least once in the loop, z=1/y, and the values of y and z are known to have constant values during execution of the loop.

    Even with -fsimple=2, the optimizer is not permitted to introduce a floating point exception in a program that otherwise produces none.

-ftrap=t,[t...]
Sets the IEEE trapping mode in effect at startup but does not install a SIGFPE handler. You can use ieee_handler(3M) or fex_set_handling(3M) to simultaneously enable traps and install a SIGFPE handler. If you specify more than one value, the list is processed sequentially from left to right.

t can be one of the following values:

  • [no%]division [Do not] Trap on division by zero.
  • [no%]inexact[Do not] Trap on inexact result.
  • [no%]invalid[Do not] Trap on invalid operation.
  • [no%]overflow[Do not] Trap on overflow.
  • %all Trap on all of the above.
  • %none Trap on none of the above.
  • common Trap on invalid, division by zero, and overflow.
-KPIC
Same as -xcode=pic32.
-Kpic
Same as -xcode=pic13.
-mt
Macro option that expands to -D_REENTRANT -lthread.
-native
Same as -xtargset=native.
-Xc -xc99
-Xc -xc99=all
Same as the GCC option -std=c99.
-Xc -xc99=none
-Xc
Same as the GCC option -std=iso9899:199409
-xalias_level[=l]
The compiler uses the -xalias_level option to determine what assumptions it can make in order to perform optimizations using type-based alias-analysis. This option places the indicated alias level into effect for the translation units being compiled.

If you do not specify the -xalias_level command, the compiler assumes -xalias_level=any. If you specify -xalias_level without a value, the default is -xalias_level=layout.

Remember that if you issue the -xalias_level option but you fail to adhere to all of the assumptions and restrictions about aliasing described for any of the alias levels, the behavior of your program is undefined.

The value of l is one of the following:

  • any The compiler assumes that all memory references can alias at this level. There is no type-based alias analysis at the level of -xalias_level=any.
  • basic If you use the -xalias_level=basic option, the compiler assumes that memory references that involve different C basic types do not alias each other. The compiler also assumes that references to all other types can alias each other as well as any C basic type. The compiler assumes that references using char * can alias any other type.

    For example, at the -xalias_level=basic level, the compiler assumes that a pointer variable of type int * is not going to access a float object. Therefore it is safe for the compiler to perform optimizations that assume a pointer of type float * will not alias the same memory that is referenced with a pointer of type int *.

  • weak If you use the -xalias_level=weak option, the compiler assumes that any structure pointer can point to any structure type.

    Any structure or union type that contains a reference to any type that is either referenced in an expression in the source being compiled or is referenced from outside the source being compiled, must be declared prior to the expression in the source being compiled.

    You can satisfy this restriction by including all the header files of a program that contain types that reference any of the types of the objects referenced in any expression of the source being compiled.

    At the level of -xalias_level=weak, the compiler assumes that memory references that involve different C basic types do not alias each other. The compiler assumes that references using char * alias memory references that involve any other type.

  • layout If you use the -xalias_level=layout option, the compiler assumes that memory references that involve types with the same sequence of types in memory can alias each other.

    The compiler assumes that two references with types that do not look the same in memory do not alias each other. The compiler assumes that any two memory accesses through different struct types alias if the initial members of the structures look the same in memory. However, at this level, you should not use a pointer to a struct to access some field of a dissimilar struct object that is beyond any of the common initial sequence of members that look the same in memory between the two structs. This is because the compiler assumes that such references do not alias each other.

    At the level of -xalias_level=layout the compiler assumes that memory references that involve different C basic types do not alias each other. The compiler assumes that references using char * can alias memory references involving any other type.

  • strict If you use the -xalias_level=strict option, the compiler assumes that memory references, that involve types such as structs or unions, that are the same when tags are removed, can alias each other. Conversely, the compiler assumes that memory references involving types that are not the same even after tags are removed do not alias each other.

    However, any structure or union type that contains a reference to any type that is part of any object referenced in an expression in the source being compiled, or is referenced from outside the source being compiled, must be declared prior to the expression in the source being compiled.

    You can satisfy this restriction by including all the header files of a program that contain types that reference any of the types of the objects referenced in any expression of the source being compiled. At the level of -xalias_level=strict the compiler assumes that memory references that involve different C basic types do not alias each other. The compiler assumes that references using char * can alias any other type.

  • std If you use the -xalias_level=std option, the compiler assumes that types and tags need to be the same to alias, however, references using char * can alias any other type. This rule is the same as the restrictions on the dereferencing of pointers that are found in the 1999 ISO C standard. Programs that properly use this rule will be very portable and should see good performance gains under optimization.
  • strong If you use the -xalias_level=strong option, the same restrictions apply as at the std level, but additionally, the compiler assumes that pointers of type char * are used only to access an object of type char. Also, the compiler assumes that there are no interior pointers. An interior pointer is defined as a pointer that points to a member of a struct.
-xarch=isa
Specify instruction set architecture (ISA).

The default architecture for which the GCC for SPARC Systems compilers produce code is v8plus (for the UltraSPARC ® processor).

If you compile and link in separate steps, make sure you specify the same value for -xarch in both steps.

Architectures that are accepted by -xarch keyword isa are generic, generic64, native, native64, v8plus, v8plusa, v8plusb, v9, v9a, v9b

Note that although -xarch can be used alone, it is part of the expansion of the -xtarget option and may be used to override the -xarch value that is set by a specific -xtarget option.

The possible values of the keyword isa are

  • generic Compile for good performance on most 32-bit systems. This option uses the best instruction set for good performance on most processors without major performance degradation on any of them. With each new release, the definition of "best" instruction set may be adjusted, if appropriate.
  • generic64 Compile 64-bit object binaries for good performance on most 64-bit platform architectures. This option uses the best instruction set for good performance on SolarisTM operating systems with 64-bit kernels, without major performance degradation on any of them. With each new release, the definition of best instruction set may be adjusted, if appropriate. Currently, this is equivalent to -xarch=v9.
  • native Compile for good performance on this system. The compiler chooses the appropriate setting for producing 32-bit binaries for the current system on which the processor is running.
  • native64 Compile 64-bit object binaries for good performance on this system. The compiler chooses the appropriate setting for producing 64-bit binaries for the system on which the processor is running.
  • v8a Compile for the V8a version of the SPARC-V8 ISA.

    By definition, V8a means the V8 ISA, but without the fsmuld instruction.

    This option enables the compiler to generate code for good performance on the V8a ISA.

  • v8plus Compile for the V8plus version of the SPARC-V9 ISA.

    This is the default. By definition, V8plus means the V9 ISA, but limited to the 32-bit subset defined by the V8plus ISA specification, without the Visual Instruction Set (VIS), and without other implementation-specific ISA extensions.

    This option enables the compiler to generate code for good performance on the V8plus ISA. The resulting object code is in SPARC-V8+ ELF32 format and only executes in a Solaris UltraSPARC processor environment--it does not run on a V7 or V8 processor.

  • v8plusa Compile for the V8plusa version of the SPARC-V9 ISA.

    By definition, V8plusa means the V8plus architecture, plus the Visual Instruction Set (VIS) version 1.0, and with UltraSPARC extensions.

    This option enables the compiler to generate code for good performance on the UltraSPARC architecture, but limited to the 32-bit subset defined by the V8plus specification. The resulting object code is in SPARC-V8+ ELF32 format and only executes in a Solaris UltraSPARC processor environment--it does not run on a V8 processor.

  • v8plusb Compile for the V8plusb version of the SPARC-V8plus ISA with UltraSPARC III extensions.

    Enables the compiler to generate object code for the UltraSPARC architecture, plus the Visual Instruction Set (VIS) version 2.0, and with UltraSPARC III extensions.

    The resulting object code is in SPARC-V8+ ELF32 format and executes only in a Solaris UltraSPARC III processor environment. Compiling with this option uses the best instruction set for good performance on the UltraSPARC III architecture.

  • v9 Compile for the SPARC-V9 ISA.

    Enables the compiler to generate code for good performance on the V9 SPARC architecture.

    The resulting .o object files are in ELF64 format and can only be linked with other SPARC-V9 object files in the same format. The resulting executable can only be run on an UltraSPARC processor running a 64-bit enabled Solaris operating system with the 64-bit kernel. -xarch=v9 is only available when compiling in a 64-bit enabled Solaris environment.

  • v9a Compile for the SPARC-V9 ISA with UltraSPARC extensions.

    Adds to the SPARC-V9 ISA the Visual Instruction Set (VIS) and extensions specific to UltraSPARC processors, and enables the compiler to generate code for good performance on the V9 SPARC architecture.

    The resulting .o object files are in ELF64 format and can only be linked with other SPARC-V9 object files in the same format. The resulting executable can only be run on an UltraSPARC processor running a 64-bit enabled Solaris operating system with the 64-bit kernel. -xarch=v9a is only available when compiling in a 64-bit enabled Solaris operating system.

  • v9b Compile for the SPARC-V9 ISA with UltraSPARC III extensions.

    Adds UltraSPARC III extensions and VIS version 2.0 to the V9a version of the SPARC-V9 ISA. Compiling with this option uses the best instruction set for good performance in a Solaris UltraSPARC III processor environment.

    The resulting object code is in SPARC-V9 ELF64 format and can only be linked with other SPARC-V9 object files in the same format. The resulting executable can only be run on an UltraSPARC III processor running a 64-bit enabled Solaris operating system with the 64-bit kernel. -xarch=v9b is only available when compiling in a 64-bit enabled Solaris operating system.

-xautopar
Turns on automatic parallelization for multiple processors. Does dependence analysis (analyze loops for inter-iteration data dependence) and loop restructuring. If optimization is not at -O2 or higher, optimization is raised to -O2 and a warning is emitted.

Avoid -xautopar if you do your own thread management.

To achieve faster execution, this option requires a multiple processor system. On a single-processor system, the resulting binary usually runs slower.

To request a number of processors, set the PARALLEL environment variable. The default is 1.

  • Do not request more processors than are available.
  • If N is the number of processors on the machine, then for a one-user, multiprocessor system, try PARALLEL=N-1.

If you use -xautopar and compile and link in one step, then linking automatically includes the microtasking library and the threads-safe C runtime library. If you use -xautopar and compile and link in separate steps, then you must also link with -xautopar.

-xbuiltin[=(%all|%none)]
Use the -xbuiltin[=(%all|%none)] command when you want to improve the optimization of code that calls standard library functions. Many standard library functions, such as the ones defined in math.h and stdio.h, are commonly used by various programs. This command lets the compiler substitute intrinsic functions or inline system functions where profitable for performance. See the er_src(1) man page for an explanation of how to read compiler commentary in object files to determine for which functions the compiler actually makes a substitution.

However, these substitutions can cause the setting of errno to become unreliable. If your program depends on the value of errno, avoid this option.

If you do not specify -xbuiltin, the default is -xbuiltin=%none, which means no functions from the standard libraries are substituted or inlined. If you specify -xbuiltin, but do not provide any argument, the default is -xbuiltin%all, which means the compiler substitutes intrinsics or inlines standard library functions as it determines the optimization benefit.

If you compile with -fast, then -xbuiltin is set to %all.

Note - -xbuiltin only inlines global functions defined in system header files, never static functions defined by the user.

-xcache[=c]
Defines the cache properties for use by the optimizer. c must be one of the following:
  • generic
  • s1/l1/a1
  • s1/l1/a1:s2/l2/a2
  • s1/l1/a1:s2/l2/a2:s3/l3/a3

The si, li, ai are defined as follows:

  • si The size of the data cache at level i, in kilobytes
  • li The line size of the data cache at level i, in bytes
  • ai The associativity of the data cache at level i

Although this option can be used alone, it is part of the expansion of the -xtarget option; its primary use is to override a value supplied by the -xtarget option.

This option specifies the cache properties that the optimizer can use. It does not guarantee that any particular cache property is used. The following lists the -xcache values.

  • generic This is the default value which directs the compiler to use cache properties for good performance on most x86 and SPARC processors, without major performance degradation on any of them.

    With each new release, these best timing properties will be adjusted, if appropriate.

  • native Set the parameters for the best performance on the host environment.
  • s1/l1/a1 Define level 1 cache properties.
  • s1/l1/a1:s2/l2/a2 Define levels 1 and 2 cache properties.
  • s1/l1/a1:s2/l2/a2:s3/l3/a3 Define levels 1, 2, and 3 cache properties.

Example: -xcache=16/32/4:1024/32/1 specifies the following:

Level 1 cache has:

  • 16K bytes
  • 32 bytes line size
  • 4-way associativity
Level 2 cache has:
  • 1024K bytes
  • 32 bytes line size
  • Direct mapping associativity
-xchip[=c]
Specifies the target processor for use by the optimizer.

c must be one of the following: generic, ultra, ultra2, ultra2e, ultra2i, ultra3, ultra3cu, ultra4.

Although this option can be used alone, it is part of the expansion of the -xtarget option; its /primary /use is to override a value supplied by the -xtarget option.

This option specifies timing properties by specifying the target processor.

Some effects are:

  • The ordering of instructions, that is, scheduling
  • The way the compiler uses branches
  • The instructions to use in cases where semantically equivalent alternatives are available

The following lists the -xchip values for SPARC platforms:

  • generic Use timing properties for good performance on most SPARC architectures.

    This is the default value that directs the compiler to use the best timing properties for good performance on most processors, without major performance degradation on any of them.

  • native Set the parameters for the best performance on the host environment.
  • ultra Uses timing properties of the UltraSPARC processors.
  • ultra2 Uses timing properties of the UltraSPARC II processors.
  • ultra2e Uses timing properties of the UltraSPARC IIe processors.
  • ultra2i Uses timing properties of the UltraSPARC IIi processors.
  • ultra3 Uses timing properties of the UltraSPARC III processors.
  • ultra3cu Uses timing properties of the UltraSPARC III Cu processors.
  • ultra3i Uses the timing properties of the UltraSPARC IIIi processors.
  • ultra4 Uses timing properties of the UltraSPARC IV processors.
-xcode[=v]
Specify code address space. v must be one of:
  • abs32 Generate 32-bit absolute addresses. Code + data + bss size is limited to 2**32 bytes. This is the default on 32-bit architectures: -xarch=(generic|v8a|v8plus|v8plusa|v8plusb)
  • abs44 Generate 44-bit absolute addresses. Code + data + bss size is limited to 2**44 bytes. This is the default on 64-bit architectures: -xarch=(v9|v9a|v9b)
  • abs64 Generate 64-bit absolute addresses. Available only on 64-bit architectures: -xarch=(v9|v9a|v9b)
  • pic13 Generate position-independent code for use in shared libraries (small model). Equivalent to -Kpic. Permits references to at most 2**11 unique external symbols on 32-bit architectures, 2**10 on 64-bit architectures.

    The -xcode=pic13 command is similar to -xcode=pic32, except that the size of the global offset table is limited to 8 Kbytes.

  • pic32 Generate position-independent code for use in shared libraries (large model). Equivalent to -KPIC. Permits references to at most 2**30 unique external symbols on 32-bit architectures, 2**29 on 64-bit architectures.

    Each reference to a global datum is generated as a dereference of a pointer in the global offset table. Each function call is generated in pc-relative addressing mode through a procedure linkage table. With this option, the global offset table spans the range of 32-bit addresses in those rare cases where there are too many global data objects for -xcode=pic32.

The default for SPARC and UltraSPARC V9 (with -xarch=v9|v9a|v9b) is -xcode=abs44.

When building shared dynamic libraries, the default -xcode values of abs44 and abs32 will not work with -xarch=v9 or v9a or v9b, so a -xcode value must be given. Specify -xcode=pic13 or -xcode=pic32. There are two nominal performance costs with -xcode=pic13 and -xcode=pic32 on SPARC:

  • A routine compiled with either -xcode=pic13 or -xcode=pic32 executes a few extra instructions upon entry to set a register to point at a table (_GLOBAL_OFFSET_TABLE_) used for accessing a shared library's global or static variables.
  • Each access to a global or static variable involves an extra indirect memory reference through _GLOBAL_OFFSET_TABLE_. If the compile is done with -xcode=pic32, there are two additional instructions per global and static memory reference.

When considering the above costs, remember that the use of -xcode=pic13 and -xcode=pic32 can significantly reduce system memory requirements, due to the effect of library code sharing. Every page of code in a shared library compiled -xcode=pic13 or -xcode=pic32 can be shared by every process that uses the library. If a page of code in a shared library contains even a single non-pic (that is, absolute) memory reference, the page becomes nonsharable, and a copy of the page must be created each time a program using the library is executed.

The easiest way to tell whether or not a .o file has been compiled with -xcode=pic13 or -xcode=pic32 is with the nm command:

% nm file.o | grep _GLOBAL_OFFSET_TABLE_ U _GLOBAL_OFFSET_TABLE_

A .o file containing position-independent code contains an unresolved external reference to _GLOBAL_OFFSET_TABLE_, as indicated by the letter U.

To determine whether to use -xcode=pic13 or -xcode=pic32, check the size of the Global Offset Table (GOT) by using elfdump -c (see the elfdump(1) man page for more information) and to look for the section header, sh_name: .got. The sh_size value is the size of the GOT. If the GOT is less than 8,192 bytes, specify -xcode=pic13, otherwise specify -xcode=pic32.

In general, use the following guidelines to determine how you should use -xcode:

  • If you are building an executable you should not use -xcode=pic13 or -xcode=pic32.
  • If you are building an archive library only for linking into executables you should not use -xcode=pic13 or -xcode=pic32.
  • If you are building a shared library, start with -xcode=pic13 and once the GOT size exceeds 8,192 bytes, use -xcode=pic32.
  • If you are building an archive library for linking into shared libraries you should just use -xcode=pic32.
-xdebugformat=dwarf
Same as the GCC option -gdwarf-2
-xdepend=[yes|no]
Analyzes loops for inter-iteration data dependencies and does loop restructuring.

Loop restructuring includes loop interchange, loop fusion, scalar replacement, and elimination of "dead" array assignments.

If you do not specify -xdepend, the default is -xdepend=no which means the compiler does not analyze loops for data dependencies. If you specify -xdepend, but do not specify an argument, the compiler sets the option to -xdepend=yes which means the compiler analyzes loops for data dependencies.

Dependency analysis is also included with -xautopar or -xparallel. The dependency analysis is done at compile time.

Dependency analysis may help on single-processor systems. However, if you try -xdepend on single-processor systems, you should not use either -xautopar or -xexplicitpar. If either of them is on, then the -xdepend optimization is done for multiple-processor systems.

-xhwcprof
Enables compiler support for hardware counter-based profiling.

When -xhwcprof=[enable|disable] is enabled, the compiler generates information that helps tools match hardware-counter data reference and miss events with associated instructions. Corresponding data-types and structure-members may also be identified in conjunction with symbolic information (produced with -g). This information can be useful in performance analysis and it is not easily identified from profiles based on code addresses, source statements, or routines.

You can compile a specified set of object files with -xhwcprof. However, -xhwcprof is most useful when applied to all object files in the application. This will provide coverage to identify and correlate all memory references distributed in the application's object files.

If you are compiling and linking in separate steps, use -xhwcprof at link time as well. Future extensions to -xhwcprof may require its use at link time.

An instance of -xhwcprof=enable or -xhwcprof=disable overrides all previous instances of -xhwcprof in the same command line.

-xhwcprof is disabled by default. Specifying -xhwcprof without any arguments is the equivalent to -xhwcprof=enable.

-xhwcprof requires that optimization be turned on and that the debug data format be set to DWARF (-xdebugformat=dwarf).

The combination of -xhwcprof and -g increases compiler temporary file storage requirements by more than the sum of the increases due to -xhwcprof and -g specified alone.

The following command compiles example.c and specifies support for hardware counter profiling and symbolic analysis of data types and structure members using DWARF symbols:

example% gcc -c -O -xhwcprof -g -xdebugformat=dwarf example.c

-xinline=list
The format of the list for -xinline is as follows: [{%auto,func_name,no%func_name}[,{%auto,func_name,no%func_name}]...]

-xinline tries to inline only those functions specified in the optional list. The list is either empty, or comprised of a comma-separated list of func_name, no%func_name, or %auto, where func_name is a function name. -xinline only has an effect at -O2 or higher.

  • %auto Specifies that the compiler is to attempt to automatically inline all functions in the source file. %auto only takes effect at -xO4 or higher optimization levels. %auto is silently ignored at -xO3 or lower optimization levels.
  • func_name Specifies that the compiler is to attempt to inline the named function.
  • no%func_name Specifies that the compiler is not to inline the named function.
  • no value Specifies that the compiler is not to attempt to inline any functions in the source files.

The list of values accumulates from left to right. So for a specification of -xinline=%auto,no%foo the compiler attempts to inline all functions except foo. For a specification of -xinline=%bar,%myfunc,no%bar the compiler only tries to inline myfunc.

A function is not inlined if any of the following conditions apply. No warning is issued.

  • Optimization is less than -O2.
  • The routine cannot be found.
  • Inlining the routine does not look practicable to the optimizer.
  • The source for the routine is not in the file being compiled.

If you specify multiple -xinline options on the command line, they do not accumulate. The last -xinline on the command line specifies what functions the compiler attempts to inline.

-xipo[=a]
Replace a with 0, 1, or 2. -xipo without any arguments is equivalent -xipo=1. -xipo=0 is the default setting and turns off -xipo. With -xipo=1, the compiler performs inlining across all source files.

With -xipo=2, the compiler performs interprocedural aliasing analysis as well as optimizations of memory allocation and layout to improve cache performance.

The compiler performs whole-program optimizations by invoking an interprocedural analysis component. Unlike -xcrossfile, -xipo performs optimizations across all object files in the link step, and is not limited to just the source files of the compile command. However whole-program optimizations performed with -xipo do not include assembly (.s) source files.

You must specify -xipo both at compile time and at link time.

The -xipo option generates significantly larger object files due to the additional information needed to perform optimizations across files. However, this additional information does not become part of the final executable binary file. Any increase in the size of the executable program is due to the additional optimizations performed. The object files created in the compilation steps have additional analysis information compiled within them to permit crossfile optimizations to take place at the link step.

-xipo is particularly useful when compiling and linking large multi-file applications. Object files compiled with this flag have analysis information compiled within them that enables interprocedural analysis across source and pre-compiled program files.

However, analysis and optimization is limited to the object files compiled with -xipo, and does not extend to object files in the libraries.

-xipo is multiphased, so you need to specify -xipo for each step if you compile and link in separate steps.

Other important information about -xipo:

  • It requires an optimization level of at least -O3.
  • Objects that are compiled without -xipo can be linked freely with objects that are compiled with -xipo.
-xipo_archive=[a]
The -xipo_archive option enables the compiler to optimize object files that are passed to the linker with object files that were compiled with -xipo and that reside in the archive library (.a) before producing an executable. Any object files contained in the library that were optimized during the compilation are replaced with their optimized version.

a is one of the following:

  • writeback The compiler optimizes object files passed to the linker with object files compiled with -xipo that reside in the archive library (.a) before producing an executable. Any object files contained in the library that were optimized during the compilation are replaced with an optimized version.
  • readonly The compiler optimizes object files passed to the linker with object files compiled with -xipo that reside in the archive library (.a) before producing an executable.
  • none There is no processing of archive files.

If you do not specify a setting for -xipo_archive, the compiler sets it to -xipo_archive=none.

-xlibmil
Inlines some library routines for faster execution. This option selects the appropriate assembly language inline templates for the floating-point option and platform for your system.

-xlibmil inlines a function regardless of any specification of the function as part of the -xinline flag.

However, these substitutions can cause the setting of errno to become unreliable. If your program depends on the value of errno, avoid this option.

-xlibmopt
Enables the compiler to use a library of optimized math routines.

The math routine library is optimized for performance and usually generates faster code. The results may be slightly different from those produced by the normal math library. If so, they usually differ in the last bit.

However, these substitutions can cause the setting of errno to become unreliable. If your program depends on the value of errno, avoid this option.

The order on the command line for this library option is not significant.

This option is set by the -fast option.

-xlinkopt[=level]
Instructs the compiler to perform link-time optimizations on relocatable object files. These optimizations are performed at link time by analyzing the object binary code. The object files are not rewritten but the resulting executable code may differ from the original object codes.

You must use -xlinkopt on at least some of the compilation commands for -xlinkopt to be useful at link time. The optimizer can still perform some limited optimizations on object binaries that are not compiled with -xlinkopt.

-xlinkopt optimizes code coming from static libraries that appear on the compiler command line, but it skips and does not optimize code coming from shared (dynamic) libraries that appear on the command line. You can also use -xlinkopt when you build shared libraries (compiling with -G ).

level sets the level of optimizations performed, and must be 0, 1, or 2. The optimization levels are:

  • 0 The post-optimizer is disabled. (This is the default.)
  • 1 Perform optimizations based on control flow analysis, including instruction cache coloring and branch optimizations, at link time.
  • 2 Perform additional data flow analysis, including dead-code elimination and address computation simplification, at link time.

If you compile in separate steps, -xlinkopt must appear on both compile and link steps:

example% gcc -c -xlinkopt a.c b.c

example% gcc -o myprog -xlinkopt=2 a.o

Note that the level parameter is only used when the compiler is linking. In the example above, the post- optimization level used is 2 even though the object binaries were compiled with an implied level of 1.

Specifying -xlinkopt without a level parameter implies -xlinkopt=1.

This option is most effective when you use it to compile the whole program, and with profile feedback. Profiling reveals the most and least used parts of the code and building directs the optimizer to focus its effort accordingly. This is particularly important with large applications where optimal placement of code performed at link time can reduce instruction cache misses. Typically, this compiles as follows:

example% gcc -o progt -O3 -xprofile=collect:prog file.c

example% progt

example% gcc -o prog -O3 -xprofile=use:prog -xlinkopt file.c

For details on using profile feedback, see -xprofile=p

You cannot use the link-time post-optimizer with the incremental linker, ild. -xlinkopt sets the default linker to be ld. If you enable the incremental linker explicitly with -xildon and also specify -xlinkopt, -xlinkopt is disabled.

Note that compiling with this option increases link time slightly. Object file sizes also increase, but the size of the executable remains the same. Compiling with -xlinkopt and -g increases the size of the executable by including debugging information.

-xloopinfo
Shows which loops are parallelized and which are not. Gives a short reason for not parallelizing a loop. The -xloopinfo option is valid only if -xautopar or -xparallel is specified; otherwise, the compiler issues a warning.

To achieve faster execution, this option requires a multiprocessor system. On a single-processor system, the generated code usually runs slower.

-xmemalign=ab
Specify maximum assumed memory alignment and behavior of misaligned data accesses. There must be a value for both a (alignment) and b (behavior). a specifies the maximum assumed memory alignment and b specifies the behavior for misaligned memory accesses. The following lists the alignment and behavior values for -xmemalign
  • 1 Assume at most 1 byte alignment.
  • i Interpret access and continue execution.
  • 2 Assume at most 2 byte alignment.
  • s Raise signal SIGBUS.
  • 4 Assume at most 4 byte alignment.
  • f Raise signal SIGBUS for alignments less or equal to 4,otherwise interpret access and continue execution.
  • 8 Assume at most 8 byte alignment.
  • 16 Assume at most 16 byte alignment

You must specify -xmemalign whenever you want to link to an object file that was compiled with the value of /b/ set to either i or f.

For memory accesses where the alignment is determinable at compile time, the compiler generates the appropriate load/store instruction sequence for that alignment of data.

For memory accesses where the alignment cannot be determined at compile time, the compiler must assume an alignment to generate the needed load/store sequence.

The -xmemalign option allows you to specify the maximum memory alignment of data to be assumed by the compiler in these indeterminable situations. It also specifies the error behavior to be followed at run time when a misaligned memory access does take place.

The following default values only apply when no -xmemalign option is present:

  • -xmemalgin=8i for all v8 architectures.
  • -xmemalign=8s for all v9 architrectures.

The default when -xmemalign option is present but no value is given is -xmemalign=1i for all -xarch values.

The following list shows how you can use -xmemalign to handle different alignment situations.

  • -xmemalign=1s There are many misaligned accesses so trap handling is too slow.
  • -xmemalign=8i There are occasional, intentional, misaligned accesses in code that is otherwise correct.
  • -xmemalign=8s There should be no misaligned accesses in the program.
  • -xmemalign=2s You want to check for possible odd-byte accesses.
  • -xmemalign=2i You want to check for possible odd-byte access and you want the program to work.
-xpagesize=n
Sets the preferred page size for the stack and the heap.

The n value must be one of the following: 8K, 64K, 512K, 4M, 32M, 256M, 2G, 16G, or default.

You must specify a valid page size for the Solaris operating system on the target platform, as returned by getpagesize(3C). If you do not specify a valid pagesize, the request is silently ignored at run-time. The Solaris operating system offers no guarantee that the page size request will be honored.

You can use pmap(1) or meminfo(2) to determine page size of the target platform.

The -xpagesize option has no effect unless you use it at compile time and at link time.

If you specify -xpagesize=default, the Solaris operating system sets the page size.

Compiling with this option has the same effect as setting the LD_PRELOAD environment variable to mpss.so.1 with the equivalent options, or running the Solaris 9 command ppgsz(1) with the equivalent options before running the program. See the Solaris 9 man pages for details.

This option is a macro for -xpagesize_heap and -xpagesize_stack. These two options accept the same arguments as -xpagesize: 8K, 64K, 512K, 4M, 32M, 256M, 2G, 16G, or default. You can set them both with the same value by specifying -xpagesize or you can specify them individually with different values.

-xpagesize_heap=n
Set the page size in memory for the heap.

The value for n must be one of the following: 8K, 64K, 512K, 4M, 32M, 256M, 2G, 16G, or default. You must specify a valid page size for the Solaris operating system on the target platform, as returned by getpagesize(3C). If you do not specify a valid page size, the request is silently ignored at run-time.

You can use pmap(1) or meminfo(2) to determine page size at the target platform.

If you specify -xpagesize_heap=default, the Solaris operating system sets the page size.

Compiling with this option has the same effect as setting the LD_PRELOAD environment variable to mpss.so.1 with the equivalent options, or running the Solaris 9 command ppgsz(1) with the equivalent options before running the program. See the Solaris 9 man pages for details.

The -xpagesize_heap option has no effect unless you use it at compile time and at link time.

-xpagesize_stack=n
Set the page size in memory for the stack.

The value for n must be one of the following: 8K, 64K, 512K, 4M, 32M, 256M, 2G, 16G, or default. You must specify a valid page size for the Solaris operating system on the target platform, as returned by getpagesize(3C). If you do not specify a valid page size, the request is silently ignored at run-time. You can use pmap(1) or meminfo(2) to determine page size at the target platform.

If you specify -xpagesize_stack=default, the Solaris operating system sets the page size.

Compiling with this option has the same effect as setting the LD_PRELOAD environment variable to mpss.so.1 with the equivalent options, or running the Solaris 9 command ppgsz(1) with the equivalent options before running the program. See the Solaris 9 man pages for details.

The -xpagesize_stack option has no effect unless you use it at compile time and at link time.

-xparallel
Parallelizes loops both automatically by the compiler and explicitly specified by the programmer. The -xparallel option is a macro, and is equivalent to specifying all three of -xautopar, -xdepend, and -xexplicitpar. With explicit parallelization of loops, there is a risk of producing incorrect results. If optimization is not at -O3 or higher, optimization is raised to -O3 and a warning is issued.

Avoid -xparallel if you do your own thread management.

To get faster code, this option requires a multiprocessor system. On a single-processor system, the generated code usually runs slower.

If you compile and link in one step, -xparallel links with the microtasking library and the threads-safe C runtime library. If you compile and link in separate steps, and you compile with -xparallel, then link with -xparallel.

-xprefetch[=val[,val]]
Enable prefetch instructions on those architectures that support prefetch, such as the UltraSPARC II processor environment. (-xarch=v8plus, v9plusa, v9, or v9a)

Explicit prefetching should only be used under special circumstances that are supported by measurements.

val must be one of the following:

  • latx:factor Adjust the compiler's assumed prefetch-to-load and prefetch-to-store latencies by the specified factor. You can only combine this flag with -xprefetch=auto.
  • [no%]auto [Disable] Enable automatic generation of prefetch instructions
  • [no%]explicit [Disable] Enable explicit prefetch macros
  • yes Obsolete - do not use. Use -xprefetch=auto,explicit instead.

The default for -xprefetch is -xprefetch=auto,explicit.

The sun_prefetch.h header file provides the macros that you can use to specify explicit prefetch instructions. The prefetches are approximately at the place in the executable that corresponds to where the macros appear.

Prefetch Latency Ratio

The prefetch latency is the hardware delay between the execution of a prefetch instruction and the time the data being prefetched is available in the cache.

The factor must be a positive number of the form n.n.

The compiler assumes a prefetch latency value when determining how far apart to place a prefetch instruction and the load or store instruction that uses the prefetched data. The assumed latency between a prefetch and a load may not be the same as the assumed latency between a prefetch and a store.

The compiler tunes the prefetch mechanism for optimal performance across a wide range of machines and applications. This tuning may not always be optimal. For memory-intensive applications, especially applications intended to run on large multiprocessors, you may be able to obtain better performance by increasing the prefetch latency values. To increase the values, use a factor that is greater than 1 (one). A value between .5 and 2.0 will most likely pro vide the maximum performance.

For applications with datasets that reside entirely within the external cache, you may be able to obtain better performance by decreasing the prefetch latency values. To decrease the values, use a factor that is less than one.

To use the latx:factor suboption, start with a factor value near 1.0 and run performance tests against the application. Then increase or decrease the factor, as appropriate, and run the performance tests again. Continue adjusting the factor and running the performance tests until you achieve optimum performance. When you increase or decrease the factor in small steps, you will see no performance difference for a few steps, then a sudden difference, then it will level off again.

-xprefetch_level=l
Use the -xprefetch_level option to control the aggressiveness of automatic insertion of prefetch instructions as determined with -xprefetch=auto. l must be 1, 2, or 3. The compiler becomes more aggressive, or in other words, introduces more prefetches with each, higher, level of -xprefetch_level.

The appropriate value for the -xprefetch_level depends on the number of cache misses the application may have. Higher -xprefetch_level values have the potential to improve the performance of applications.

This option is effective only when it is compiled with -xprefetch=auto, with optimization level 3 or greater, and generate code for a platform that supports prefetch (v8plus, v8plusa, v9, v9a, v9b, generic64, native64).

-xprefetch_level=1 enables automatic generation of prefetch instructions. -xprefetch_level=2 enables additional generation beyond level 1 and -xprefetch_level=3 enables additional generation beyond level 2.

The default is -xprefetch_level=1 when you specify -xprefetch=auto.

-xprofile=p
Use this option to collect and save execution-frequency data so you can then use the data in subsequent runs to improve performance.

You must specify -xprofile at compile time as well as link time.

Compiling with high optimization levels (for example -O3) is enhanced by providing the compiler with runtime-performance feedback. In order to produce runtime-performance feedback, you must compile with -xprofile=collect, run the executable against a typical data set, and then recompile at the highest optimization level and with -xprofile=use.

Profile collection is safe for multithreaded applications. That is, profiling a program that does its own multitasking ( -mt ) produces accurate results.

p must be collect[:name] or use[:name].

  • collect[:name] Collects and saves execution-frequency data for later use by the optimizer with -xprofile=use. The compiler generates code to measure statement execution-frequency.

    The name is the name of the program that is being analyzed. This name is optional. If name is not specified, a.out is assumed to be the name of the executable.

    You can set the environment variables SUN_PROFDATA and SUN_PROFDATA_DIR to control where a program compiled with -xprofile=collect stores the profile data. If set, the -xprofile=collect data is written to SUN_PROFDATA_DIR/SUN_PROFDATA.

    If these environment variables are not set, the profile data is written to name.profile/feedback in the current directory, where name is the name of the executable or the name specified in the -xprofile=collect:name flag. -xprofile does not append .profile to name if name already ends in .profile. If you run the program several times, the executions-frequency data accumulates in the feedback file; that is, output from prior executions is not lost.

    If you are compiling and linking in separate steps, make sure that any object files compiled with -xprofile=collect are also linked with -xprofile=collect.

  • use[:name] The program is optimized by using the execution-frequency data generated and saved in the feedback files from a previous execution of the program that was compiled with -xprofile=collect.

    The name is the name of the program that is being analyzed. This name is optional. If name is not specified, a.out is assumed to be the name of the executable.

    Except for the -xprofile option which changes from -xprofile=collect to -xprofile=use, the source files and other compiler options must be exactly the same as those used for the compilation that created the compiled program which in turn generated the feedback file. The same version of the compiler must be used for both the collect build and the use build as well. If compiled with -xprofile=collect:name, the same program name name must appear in the optimizing compilation: -xprofile=use:name.

    The association between an object file and its profile data is based on the UNIX pathname of the object file when it is compiled with -xprofile=collect. In some circumstances, the compiler will not associate an object file with its profile data: the object file has no profile data because it was not previously compiled with -xprofile=collect, the object file is not linked in a program with -xprofile=collect, the program has never been executed.

    The compiler can also become confused if an object file was previously compiled in a different directory with -xprofile=collect and this object file shares a common basename with other object files compiled with -xprofile=collect but they cannot be uniquely identified by the names of their containing directories. In this case, even if the object file has profile data, the compiler will not be able to find it in the feedback directory when the object file is recompiled with -xprofile=use.

    All of these situations cause the compiler to lose the association between an object file and its profile data.

-xreduction
Turns on reduction recognition during automatic parallelization. -xreduction must be specified with -xautopar, or -xparallel otherwise the compiler issues a warning.

When reduction recognition is enabled, the compiler parallelizes reductions such as dot products, maximum and minimum finding. These reductions yield different roundoffs than obtained by unparallelized code.

-xregs=r[,r...]
Specifies the usage of registers for the generated code.

r is a comma-separated list that consists of one or more of the following: [no%]appl, [no%]float.

Example: -xregs=appl,no%float

The meaning of the values are:

  • [no%]appl [Does not] Allow the compiler to generate code using the application registers as scratch registers. The application registers are:

    g2, g3, g4 (v8a, v8, v8plus, v8plusa, v8plusb)

    g2, g3 (v9, v9a, v9b)

    It is strongly recommended that all system software and libraries be compiled using -xreg=no%appl. System software (including shared libraries) must preserve these registers' values for the application. Their use is intended to be controlled by the compilation system and must be consistent throughout the application.

    In the SPARC ABI, these registers are described as /application/ registers. Using these registers can increase performance because fewer load and store instructions are needed. However, such use can conflict with some old library programs written in assembly code.

  • [no%]float [Does not] Allow the compiler to generate code by using the floating-point registers as scratch registers for integer values. Use of floating-point values may use these registers regardless of this option. If you want your code to be free of all references to floating point registers, you need to use -xregs=no%float and also make sure your code does not use floating point types in any way.

    The default is -xregs=appl,float.

    It is strongly recommended that you compile code intended for shared libraries that will link with applications, with -xregs=no%appl,float. At the very least, the shared library should explicitly document how it uses the application registers so that applications linking with those libraries know how to cope with the issue.

    For example, an application using the registers in some global sense (such as using a register to point to some critical data structure) would need to know exactly how a library with code compiled without -xregs=no%appl is using the application registers in order to safely link with that library.

-xrestrict
Treats pointer-valued function parameters as restricted pointers.

if -xrestrict is specified, all pointer parameters in the entire C file are treated as restricted.

This command-line option can be used on its own, but it is best used with optimization. For example, the command:

%gcc -O3 -xrestrict prog.c

treats all pointer parameters in the file prog.c as restricted pointers.

This option is off by default.

-xsafe=mem
Allows the compiler to assume no memory-based traps occur.

This option grants permission to use the speculative load instruction on V9 machines. It is only effective when you specify -O3 optimization and -xarch=v8plus|v8plusa|v9|v9a.

Note - Because non-faulting loads do not cause a trap when a fault such as address misalignment or segmentation violation occurs, you should use this option only for programs in which such faults cannot occur. Because few programs incur memory-based traps, you can safely use this option for most programs. Do not use this option for programs that explicitly depend on memory-based traps to handle exceptional conditions.

-xspace
Does no optimizations or parallelization of loops that increase code size.

Example: The compiler will not unroll loops or parallelize loops if it increases code size.

-xtarget=t
Specifies the target system for instruction set and optimization.

The value of t must be one of the following: native, generic, system-name.

The -xtarget option is a macro that permits a quick and easy specification of the -xarch, -xchip, and -xcache combinations that occur on real systems. The only meaning of -xtarget is in its expansion.

  • native Gets the best performance on the host system.

    The compiler generates code for the best performance on the host system. It determines the available architecture, chip, and cache properties of the machine on which the compiler is running.

  • generic Gets the best performance for generic architecture, chip, and cache.

    > The compiler expands -xtarget=generic to:

    > -xarch=generic -xchip=generic -xcache=generic

    This is the default value.

  • system-name Gets the best performance for the specified system.

The performance of some programs may benefit by providing the compiler with an accurate description of the target computer hardware. When program performance is critical, the proper specification of the target hardware could be very important. This is especially true when running on the newer SPARC processors. However, for most programs and older SPARC processors, the performance gain is negligible and a generic specification is sufficient.

Each specific value for -xtarget expands into a specific set of values for the -xarch, -xchip, and -xcache options.

-xunroll=n
Suggests to the optimizer to unroll loops n times. n is a positive integer. When n is 1, it is a command, and the compiler unrolls no loops. When n is greater than 1, the -xunroll=n merely suggests to the compiler that it unroll loops n times.
-xvector[={yes|no}]
Enable automatic generation of calls to the vector library functions. You must use default rounding mode by specifying -fround=nearest when you use this option.

-xvector=yes permits the compiler to transform math library calls within loops into single calls to the equivalent vector math routines when such transformations are possible. Such transformations could result in a performance improvement for loops with large loop counts.

If you do not specify -xvector, the default is -xvector=no. -xvector=no undoes a previously specified -xvector=yes. If you specify -xvector but do not supply a value, the default is -xvector=yes.

If you use -xvector on the command line without previously specifying -xdepend, -xvector triggers -xdepend.

The compiler includes the libmvec libraries in the load step.

If you compile and link with separate commands, be sure to use -xvector in the linking gcc command.