Methods to Optimize Code Size

This section provides some guidance on how to achieve smaller object and smaller executable size when using the optimizing features of Intel compilers.

There are two compiler options that are designed to prioritize code size over performance:

Option Result Notes

Os

Favors size over speed

This option enables optimizations that do not increase code size; it produces smaller code size than option O2.

Option Os disables some optimizations that may increase code size for a small speed benefit.

O1

Minimizes code size

Compared to option Os, option O1 disables even more optimizations that are generally known to increase code size. Specifying option O1 implies option Os.

As an intermediate step in reducing code size, you can replace option O3 with option O2 before specifying option O1.

Option O1 may improve performance for applications with very large code size, many branches, and execution time not dominated by code within loops.

For more information about compiler options mentioned in this topic, see their full descriptions in the Compiler Reference.

The rest of this topic briefly discusses other methods that may help you further improve code size even when compared to the default behaviors of options Os and O1.

Things to remember:

Disable or Decrease the Amount of Inlining

Inlining replaces a call to a function with the body of the function. This lets the compiler optimize the code for the inlined function in the context of its caller, usually yielding more specialized and better performing code. This also removes the overhead of calling the function at runtime.

However, replacing a call to a function by the code for that function usually increases code size. The code size increase can be substantial. To eliminate this code size increase, at the cost of the potential performance improvement, inlining can be disabled.

As an alternative to completely disabling inlining, the default amount of inlining can be decreased by using an inline factor less than the default value of 100. It corresponds to scaling the default values of the main inlining parameters by n%.

Use options to disable inlining:

Linux and macOS

fno-inline

Windows

Ob0

Use options to reduce inlining and factor the main inlining parameters:

Linux and macOS

inline-factor=n

Windows

Qinline-factor:n

Use options to fine tune the main inlining parameters:

Linux and macOS

Windows

Strip Symbols from Your Binaries

You can specify a compiler option to omit debugging and symbol information from the executable without sacrificing its operability.

Use options:

Linux

Wl, --strip-all

Windows

None

Dynamically Link Intel-provided Libraries

By default, some of the Intel support and performance libraries are linked statically into an executable. As a result, the library codes are linked into every executable being built. This means that codes are duplicated.

It may be more profitable to link them dynamically.

Use Options:

Linux and macOS

shared-intel

Windows

MD

Note

Option MD affects all libraries, not only the Intel-provided ones.

Exclude Unused Code and Data from the Executable

Programs often contain dead code or data that is not used during their execution. Even if no expensive whole-program inter-procedural analysis is made at compile time to identify dead code, there are compiler options you can specify to eliminate unused functions and data at link time.

This method is often referred to as function-level or data-level linking.

Use Options:

Linux and macOS

-fdata-sections -ffunction-sections -Wl,--gc-sections

Windows

/Gw /Gy /link /OPT:REF

Note

Option MD affects all libraries, not only the Intel-provided ones.

These options (from the use options example above) are passed to the linker:

Linux and macOS

Wl, --gc-sections

Windows

link /OPT:REF

Disable Recognition and Expansion of Intrinsic Functions

When recognized, intrinsic functions can get expanded inline or their faster implementation in a library may be assumed and linked in. By default, Inline expansion of intrinsic functions is enabled.

In some cases, disabling this behavior may noticeably improve the size of the produced object or binary.

Use Options:

Linux and macOS

fno-builtin

Windows

Oi-

Additional information:

Optimize Exception Handling Data

If a program requires support for exception handling, the compiler creates a special section containing DWARF directives that are used by the Linux and macOSruntime to unwind and catch an exception.

This information is found in the .eh_frame section and may be shrunk using the compiler options listed below.

Use Options:

Linux and macOS

fno-exceptions or fno-asynchronous-unwind-tables

Windows

None

Read the compiler option descriptions, which explain what the defaults and behavior are for each target platform.

Disable Passing Arguments in Registers Instead of on the Stack

You can specify an option that causes the compiler to pass arguments in registers rather than on the stack. This can yield faster code.

However, doing this may require the compiler to create an additional entry point for any function that can be called outside the code being compiled.

In many cases, this will lead to an increase in code size. To prevent this increase in code size, you can disable this optimization.

Use Options:

Linux and macOS

qopt-args-in-regs=none

Windows

Qopt-args-in-regs:none

Additional information:

Disable Loop Unrolling

Unrolling a loop increases the size of the loop proportionally to the unroll factor.

Disabling (or limiting) this optimization may help reduce code size at the expense of performance.

Use Options:

Linux and macOS

unroll=0

Windows

Qunroll:0

Additional information:

This option is already the default if you specify option Os or option O1.

Disable Automatic Vectorization

The compiler finds possibilities to use SIMD (Intel® Streaming SIMD Extensions (Intel® SSE)/Intel® Advanced Vector Extensions (Intel® AVX)) instructions to improve performance of applications. This optimization is called automatic vectorization.

In most cases, this optimization involves transformation of loops and increases code size, in some cases significantly.

Disabling this optimization may help reduce code size at the expense of performance.

Use Options:

Linux and macOS

no-vec

Windows

Qvec-

Additional information:

Depending on code characteristics, this option can sometimes increase binary size.

Avoid References to Compiler-specific Libraries

While compiler-specific libraries are intended to improve the performance of your application, they increase the size of your binaries.

Certain compiler options may improve the code size.

Use Options:

Linux and macOS

ffreestanding

Windows

Qfreestanding-

Additional information:

Avoid Unnecessary 16-Byte Alignment

This topic only applies to Linux systems on IA-32 architecture.

This method should only be used in certain situations that are well understood. It can potentially cause correctness issues when linking with other objects or libraries that aren't built with this option.

The 32-bit Linux ABI states that stacks need only maintain 4-byte alignment. However, for performance reasons in modern architectures, GCC and ICC maintain an alignment of 16-bytes on the stack. Maintaining 16-byte alignment may require additional instructions to adjust the stack on function entries where no stack adjustment would otherwise be needed. This can impact code size, especially in code that consists of many small routines.

You can specify a compiler option that will revert ICC back to maintaining 4-byte alignment, which can eliminate the need for extra stack adjust instructions in some cases.

Use this option only if one of the following is true:

Use Options:

Linux

falign-stack=assume-4-byte

macOS

None

Windows

None

Additional information:

Depending on code characteristics, this option can sometimes increase binary size.

Use Interprocedural Optimization

Using interprocedural optimization (IPO) may reduce code size. It enables dead code elimination and suppresses generation of code for functions that are always inlined or proven that they are never to be called during execution.

Use Options:

Linux and macOS

ipo

Windows

Qipo

Note

This method is not recommended if you plan to ship object files as part of a final product.