|Stable release||11.1 / 2009-6-23|
|Operating system||Linux, Microsoft Windows and Mac OS X|
Intel supports compilation for its IA-32, Intel 64, Itanium 2, processors and certain non-Intel but compatible processors, such as certain AMD processors. Developers should check system requirements. The Intel C++ Compiler for IA-32 and Intel 64 features an automatic vectorizer that can generate SSE, SSE2, SSE3 and SSE4 SIMD instructions, the embedded variant for Intel Wireless MMX and MMX 2. Since its introduction, the Intel C++ Compiler for IA-32 has greatly increased adoption of SSE2 in Windows application development.
Intel C++ Compiler further supports both OpenMP 3.0 and automatic parallelization for symmetric multiprocessing. With the add-on capability Cluster OpenMP, the compiler can also automatically generate Message Passing Interface calls for distributed memory multiprocessing from OpenMP directives.
Intel C++ Compiler belongs to the family of compilers with the Edison Design Group frontend (like the SGI MIPSpro, Comeau C++, Portland Group, and others). The compiler is also notable for being widely used for SPEC CPU Benchmarks of IA-32, x86-64, and Itanium 2 architectures.
The Intel C++ Compiler is available in four forms. It is part of Intel Parallel Studio, the Intel C++ Compiler Professional Edition package, the Intel Compiler Suite package and the Intel Cluster Toolkit, Compiler Edition. The Intel Software Products site provides more information.
Intel tunes its compilers to optimize for its hardware platforms to minimize stalls and to produce code that executes in the fewest number of cycles. The Intel C++ Compiler supports three separate high-level techniques for optimizing the compiled program: interprocedural optimization (IPO), profile-guided optimization (PGO), and high-level optimizations (HLO). It also supports tools and techniques for adding and maintaining parallelism to applications.
Profile-guided optimization refers to a mode of optimization where the compiler is able to access data from a sample run of the program across a representative input set. The data would indicate which areas of the program are executed more frequently, and which areas are executed less frequently. All optimizations benefit from profile-guided feedback because they are less reliant on heuristics when making compilation decisions.
High-level optimizations are optimizations performed on a version of the program that more closely represents the source code. This includes loop interchange, loop fusion, loop unrolling, loop distribution, data prefetch, and more. These optimizations are usually very aggressive and may take considerable compilation time.
Interprocedural optimization applies typical compiler optimizations (such as constant propagation) but using a broader scope that may include multiple procedures, multiple files, or the entire program.
The compilers include a parallel debugger extension, Intel Threading Building Blocks, lambda function support, and a source checker tool for use with threaded code.
Early versions of ICC for Linux that predate GCC 3.x use the Dinkumware name mangling scheme in order to provide a more standard implementation of C++ than GCC 2.x. This made its ABI incompatible with both GCC versions. Intel removed the Dinkumware libraries in the 10.0 release (June 2007). Since then, the compiler has been and remains compatible with GCC 3.2 and later.
The following versions of Intel C++ Compiler have been released:
|Compiler version||Release date||Major New Features|
|Intel C++ Compiler 11.1||June 23, 2009||Support for latest Intel SSE SSE4.2, AVX and AES instructions. Parallel Debugger Extension. Improved integration into Microsoft Visual Studio, Eclipse CDT 5.0 and Mac Xcode IDE.|
|Intel C++ Compiler 11.0||November 2008||Initial C++0x support . VS2008 IDE integration on Windows. OpenMP 3.0. Source Checker for static memory/parallel diagnostics.|
|Intel C++ Compiler 10.1||November 7, 2007||New OpenMP* compatibility runtime library: if you use the new
OpenMP RTL, you can mix and match with libraries and objects built
by Visual C++. To use the new libraries, you need to use the new
option "-Qopenmp /Qopenmp-lib:compat" on Windows, and "-openmp
-openmp-lib:compat" on Linux. This version of the Intel compiler
supports more intrinsics from Visual Studio 2005.
VS2008 support - command line only in this release. The IDE integration was not supported yet.
|Intel C++ Compiler 10.0||June 5, 2007||Improved parallelizer and vectorizer, Streaming SIMD Extensions 4 (SSE4), new and enhanced optimization reports for advanced loop transformations, new optimized exception handling implementation.|
|Intel C++ Compiler 9.0||June 14, 2005||AMD64 architecture (for Windows), software-based speculative pre-computation (SSP) optimization, improved loop optimization reports.|
|Intel C++ Compiler 8.1||September, 2004||AMD64 architecture (for Linux).|
|Intel C++ Compiler 8.0||December 15, 2003||Precompiled headers, code-coverage tools. |
|Intel C++ Compiler 7.1||March, 2003||Partial support for the Intel Pentium 4 with Streaming SIMD Extensions 3 (SSE3). |
|Intel C++ Compiler 7.0||November 25, 2002|||
|Intel C++ Compiler 6.0||April 24, 2002|||
In addition, the following "prototype" editions have been made available:
|Compiler version||Release date||Major New Features|
|Intel STM Compiler Prototype Edition||September 17, 2007||Prototype version of the Intel compiler that implements support for Software Transactional Memory (STM). The Intel STM Compiler supports Linux and Windows, producing 32 bit code for x86 (Intel and AMD) processors. Intel stated the belief that "The availability of such a prototype compiler allows unprecedented exploration by C / C++ software developers of a promising technique to make programming for multi-core easier." The STM compiler requires that you already have the Intel compiler installed.|
|Intel Concurrent Collections for C/C++ 0.3||September, 2008||Intel Concurrent Collections for C/C++ provides a mechanism for constructing C++ programs that execute in parallel. It allows developers to ignore issues of parallelism such as low-level threading constructs or scheduling/distribution of computations. The model allows developers to specify high-level computational steps including inputs and outputs without imposing unnecessary ordering on their execution. Code within the computational steps is written using standard serial constructs of the C++ language. Data is either local to a computational step or it is explicitly produced and consumed by them. It supports multiple styles of parallelism (e.g., data, task, pipeline parallel).|
Documentation can be found at the Intel Software Technical Documentation site.
|/O1||-O1||Optimize for size|
|/O2||-O2||Optimize for speed and enable some optimization|
|/O3||-O3||Enable all optimizations as O2, and intensive loop optimizations|
|/QxO||-xO||Enables SSE3, SSE2 and SSE instruction sets optimizations for non-Intel CPUs |
|/fast||-fast||Shorthand. On Windows this equates to "/O3 /Qipo /xT /no-prec-div" ; on Linux "-O3 -ipo -static -xHOST -no-prec-div". Note that the processor specific optimization flag (-xHOST) will optimize for the processor compiled on—it is the only flag of -fast, which may be overridden.|
|/Qprof-gen||-prof_gen||Compile the program and instrument it for a profile generating run.|
|/Qprof-use||-prof_use||May only be used after running a program that was previously compiled using prof_gen. Uses profile information during each step of the compilation process.|
The Intel compiler provides debugging information that is standard for the common debuggers (DWARF 2 on Linux, similar to gdb, and COFF for Windows). The flags to compile with debugging information are /Zi on Windows and -g on Linux.
While the Intel compiler can generate a gprof compatible profiling output, Intel also provides a kernel level, system-wide statistical profiler as a separate product called VTune. VTune features an easy-to-use GUI (integrated into Visual Studio for Windows, Eclipse for Linux) as well as a command line interface.
The 11.x releases of the compiler introduced the Parallel Debugger Extension, which provides techniques for debugging threaded applications. It can be used with other, compatible compilers, such as Microsoft Visual C++ on Windows as available in Visual Studio 2005 and 2008 and gcc on Linux.
The Intel compiler or the Intel function libraries has deliberate inferior performance on AMD and VIA processors. The reason is that the compiler or library can make multiple versions of a piece of code, each optimized for a certain processor and instruction set, for example SSE2, SSE3, etc. The system includes a function that detects which type of CPU it is running on and chooses the optimal code path for that CPU. This is called a CPU dispatcher. However, the Intel CPU dispatcher does not only check which instruction set is supported by the CPU, it also checks the vendor ID string. If the vendor string for example "GenuineIntel" then it uses the optimal code path. If the CPU is not from Intel then, in most cases, it will run the slowest possible version of the code, even if the CPU is fully compatible with a better version.