Humboldt-Universität zu Berlin - Mathematisch-Naturwissen­schaft­liche Fakultät - Institut für Physik

Intel compilers

Version:

15.0.1

Architectures:

x86_64

Invocation:

icc options prog.c C compiler
icpc options prog.C C++ compiler
ifort options prog.f90
ifort options prog.f
Fortran compiler
same for fixed source format
idb Debugger

(if /usr/global/intel/bin is in PATH).

Documentation:

  • man pages for icc, icpc, ifort
  • /usr/global/intel/Documentation/en_US/documentation_c.htm  (C, C++, ..)
  • /usr/global/intel/Documentation/en_US/documentation_f.htm  (Fortran, ..)

Optimisation options:

-O0 disabled
-O2 default; recommended
-O3 more aggressive
-fast optimise for the local processor
  • The Intel compilers used to be picky about generating highly optimised code for (compatible) processors not made by Intel, as discussed here.
  • -fast is a shorthand for a combination of useful options, including a reference to the processor where the compiler is running. The details can be displayed e.g. with ifort -help | grep fast.

The compiler can generate optimised code for various processors, based mainly on its support for SSE (Streaming SIMD Extensions). Depending on the options chosen, a special hardware feature will be either required (program cannot run without it) or optional (decide at runtime between different code paths).
This is governed e.g. by the following compiler switches:

 

feature support will be  
required optional
SSE -msse
-xSSE
-axSSE 32bit only
SSE2 -msse2
-xSSE2
-axSSE2 32bit only
SSE3 -xSSE3 -axSSE3  
SSSE3 -xSSSE3 -axSSSE3  
SSE4.1 -xSSE4.1 -axSSE4.1  
SSE4.2 -xSSE4.2 -axSSE4.2  
  • 32bit: no specific processor is assumed by default.
  • 64bit: SSE2 is always assumed (but cannot be specified by a switch).
  • options -x... and -ax... can be combined.
  • example (64bit):
      ifort -fast -xSSE2 -axSSE4.1 ...
    generates code which requires SSE2, uses SSE4.1 if possible. You can find out about the SSE capabilities of a processor by looking at the flags field of /proc/cpuinfo.

    Auto-vectorisation

    -O2 and -fast imply auto-vectorisation. It may result in a substantial speedup of sequential, non-recursive loops, using the SSE features of the processor.

    Successful vectorisation is reported upon compilation, with

          -vec-report1
    

    Auto-parallelisation

    Parallelisation for multiprocessor systems (SMP) is supported as follows:
          -parallel -par-threshold0
    
    Successful parallelisation is reported upon compilation, with
          -par-report1
    
    At runtime, one process close to (ncpus x 100)% is shown, together with the total CPU time.

    License:

    floating
    server: liz.cms.hu-berlin.de
    letzte Änderung: B Bunk, 30.06.2015