www/performance-2008-10-31.html - clang - Git at Google

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
           "http://www.w3.org/TR/html4/strict.dtd">
 <html>
 <head>
   <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
   <title>Clang - Performance</title>
   <link type="text/css" rel="stylesheet" href="menu.css" />
   <link type="text/css" rel="stylesheet" href="content.css" />
   <style type="text/css">
 </style>
 </head>
 <body>

 <!--#include virtual="menu.html.incl"-->

 <div id="content">

 <!--*************************************************************************-->
 <h1>Clang - Performance</h1>
 <!--*************************************************************************-->

 <p>This page tracks the compile time performance of Clang on two
 interesting benchmarks:
 <ul>
   <li><i>Sketch</i>: The Objective-C example application shipped on
     Mac OS X as part of Xcode. <i>Sketch</i> is indicative of a
     "typical" Objective-C app. The source itself has a relatively
     small amount of code (~7,500 lines of source code), but it relies
     on the extensive Cocoa APIs to build its functionality. Like many
     Objective-C applications, it includes
     <tt>Cocoa/Cocoa.h</tt> in all of its source files, which represents a
     significant stress test of the front-end's performance on lexing,
     preprocessing, parsing, and syntax analysis.</li>
   <li><i>176.gcc</i>: This is the gcc-2.7.2.2 code base as present in
   SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a
   large amount of C source code (~220,000 lines) with few system
   dependencies. This stresses the back-end's performance on generating
   assembly code and debug information.</li>
 </ul>
 </p>

 <!--*************************************************************************-->
 <h2><a name="enduser">Experiments</a></h2>
 <!--*************************************************************************-->

 <p>Measurements are done by serially processing each file in the
 respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In
 order to track the performance of various subsystems the timings have
 been broken down into separate stages where possible:

 <ul>
   <li><tt>-Eonly</tt>: This option runs the preprocessor but does not
     perform any output. For gcc and llvm-gcc, the -MM option is used
     as a rough equivalent to this step.</li>
   <li><tt>-parse-noop</tt>: This option runs the parser on the input,
     but without semantic analysis or any output. gcc and llvm-gcc have
     no equivalent for this option.</li>
   <li><tt>-fsyntax-only</tt>: This option runs the parser with semantic
     analysis.</li>
   <li><tt>-emit-llvm -O0</tt>: For Clang and llvm-gcc, this option
     converts to the LLVM intermediate representation but doesn't
     generate native code.</li>
   <li><tt>-S -O0</tt>: Perform actual code generation to produce a
     native assembler file.</li>
   <li><tt>-S -O0 -g</tt>: This adds emission of debug information to
     the assembly output.</li>
 </ul>
 </p>

 <p>This set of stages is chosen to be approximately additive, that is
 each subsequent stage simply adds some additional processing. The
 timings measure the delta of the given stage from the previous
 one. For example, the timings for <tt>-fsyntax-only</tt> below show
 the difference of running with <tt>-fsyntax-only</tt> versus running
 with <tt>-parse-noop</tt> (for clang) or <tt>-MM</tt> with gcc and
 llvm-gcc. This amounts to a fairly accurate measure of only the time
 to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).</p>

 <p>These timings are chosen to break down the compilation process for
 clang as much as possible. The graphs below show these numbers
 combined so that it is easy to see how the time for a particular task
 is divided among various components. For example, <tt>-S -O0</tt>
 includes the time of <tt>-fsyntax-only</tt> and <tt>-emit-llvm -O0</tt>.</p>

 <p>Note that we already know that the LLVM optimizers are substantially (30-40%)
 faster than the GCC optimizers at a given -O level, so we only focus on -O0
 compile time here.</p>

 <!--*************************************************************************-->
 <h2><a name="enduser">Timing Results</a></h2>
 <!--*************************************************************************-->

 <!--=======================================================================-->
 <h3><a name="2008-10-31">2008-10-31</a></h3>
 <!--=======================================================================-->

 <center><h4>Sketch</h4></center>
 <img class="img_slide"
      src="timing-data/2008-10-31/sketch.png" alt="Sketch Timings"/>

 <p>This shows Clang's substantial performance improvements in
 preprocessing and semantic analysis; over 90% faster on
 -fsyntax-only. As expected, time spent in code generation for this
 benchmark is relatively small. One caveat, Clang's debug information
 generation for Objective-C is very incomplete; this means the <tt>-S
 -O0 -g</tt> numbers are unfair since Clang is generating substantially
 less output.</p>

 <p>This chart also shows the effect of using precompiled headers (PCH)
 on compiler time. gcc and llvm-gcc see a large performance improvement
 with PCH; about 4x in wall time. Unfortunately, Clang does not yet
 have an implementation of PCH-style optimizations, but we are actively
 working to address this.</p>

 <center><h4>176.gcc</h4></center>
 <img class="img_slide"
      src="timing-data/2008-10-31/176.gcc.png" alt="176.gcc Timings"/>

 <p>Unlike the <i>Sketch</i> timings, compilation of <i>176.gcc</i>
 involves a large amount of code generation. The time spent in Clang's
 LLVM IR generation and code generation is on par with gcc's code
 generation time but the improved parsing & semantic analysis
 performance means Clang still comes in at ~29% faster versus gcc
 on <tt>-S -O0 -g</tt> and ~20% faster versus llvm-gcc.</p>

 <p>These numbers indicate that Clang still has room for improvement in
 several areas, notably our LLVM IR generation is significantly slower
 than that of llvm-gcc, and both Clang and llvm-gcc incur a
 significantly higher cost for adding debugging information compared to
 gcc.</p>

 </div>
 </body>
 </html>
	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
	"http://www.w3.org/TR/html4/strict.dtd">
	<html>
	<head>
	<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
	<title>Clang - Performance</title>
	<link type="text/css" rel="stylesheet" href="menu.css" />
	<link type="text/css" rel="stylesheet" href="content.css" />
	<style type="text/css">
	</style>
	</head>
	<body>

	<!--#include virtual="menu.html.incl"-->

	<div id="content">

	<!--*************************************************************************-->
	<h1>Clang - Performance</h1>
	<!--*************************************************************************-->

	<p>This page tracks the compile time performance of Clang on two
	interesting benchmarks:
	<ul>
	<li><i>Sketch</i>: The Objective-C example application shipped on
	Mac OS X as part of Xcode. <i>Sketch</i> is indicative of a
	"typical" Objective-C app. The source itself has a relatively
	small amount of code (~7,500 lines of source code), but it relies
	on the extensive Cocoa APIs to build its functionality. Like many
	Objective-C applications, it includes
	<tt>Cocoa/Cocoa.h</tt> in all of its source files, which represents a
	significant stress test of the front-end's performance on lexing,
	preprocessing, parsing, and syntax analysis.</li>
	<li><i>176.gcc</i>: This is the gcc-2.7.2.2 code base as present in
	SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a
	large amount of C source code (~220,000 lines) with few system
	dependencies. This stresses the back-end's performance on generating
	assembly code and debug information.</li>
	</ul>
	</p>

	<!--*************************************************************************-->
	<h2><a name="enduser">Experiments</a></h2>
	<!--*************************************************************************-->

	<p>Measurements are done by serially processing each file in the
	respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In
	order to track the performance of various subsystems the timings have
	been broken down into separate stages where possible:

	<ul>
	<li><tt>-Eonly</tt>: This option runs the preprocessor but does not
	perform any output. For gcc and llvm-gcc, the -MM option is used
	as a rough equivalent to this step.</li>
	<li><tt>-parse-noop</tt>: This option runs the parser on the input,
	but without semantic analysis or any output. gcc and llvm-gcc have
	no equivalent for this option.</li>
	<li><tt>-fsyntax-only</tt>: This option runs the parser with semantic
	analysis.</li>
	<li><tt>-emit-llvm -O0</tt>: For Clang and llvm-gcc, this option
	converts to the LLVM intermediate representation but doesn't
	generate native code.</li>
	<li><tt>-S -O0</tt>: Perform actual code generation to produce a
	native assembler file.</li>
	<li><tt>-S -O0 -g</tt>: This adds emission of debug information to
	the assembly output.</li>
	</ul>
	</p>

	<p>This set of stages is chosen to be approximately additive, that is
	each subsequent stage simply adds some additional processing. The
	timings measure the delta of the given stage from the previous
	one. For example, the timings for <tt>-fsyntax-only</tt> below show
	the difference of running with <tt>-fsyntax-only</tt> versus running
	with <tt>-parse-noop</tt> (for clang) or <tt>-MM</tt> with gcc and
	llvm-gcc. This amounts to a fairly accurate measure of only the time
	to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).</p>

	<p>These timings are chosen to break down the compilation process for
	clang as much as possible. The graphs below show these numbers
	combined so that it is easy to see how the time for a particular task
	is divided among various components. For example, <tt>-S -O0</tt>
	includes the time of <tt>-fsyntax-only</tt> and <tt>-emit-llvm -O0</tt>.</p>

	<p>Note that we already know that the LLVM optimizers are substantially (30-40%)
	faster than the GCC optimizers at a given -O level, so we only focus on -O0
	compile time here.</p>

	<!--*************************************************************************-->
	<h2><a name="enduser">Timing Results</a></h2>
	<!--*************************************************************************-->

	<!--=======================================================================-->
	<h3><a name="2008-10-31">2008-10-31</a></h3>
	<!--=======================================================================-->

	<center><h4>Sketch</h4></center>
	<img class="img_slide"
	src="timing-data/2008-10-31/sketch.png" alt="Sketch Timings"/>

	<p>This shows Clang's substantial performance improvements in
	preprocessing and semantic analysis; over 90% faster on
	-fsyntax-only. As expected, time spent in code generation for this
	benchmark is relatively small. One caveat, Clang's debug information
	generation for Objective-C is very incomplete; this means the <tt>-S
	-O0 -g</tt> numbers are unfair since Clang is generating substantially
	less output.</p>

	<p>This chart also shows the effect of using precompiled headers (PCH)
	on compiler time. gcc and llvm-gcc see a large performance improvement
	with PCH; about 4x in wall time. Unfortunately, Clang does not yet
	have an implementation of PCH-style optimizations, but we are actively
	working to address this.</p>

	<center><h4>176.gcc</h4></center>
	<img class="img_slide"
	src="timing-data/2008-10-31/176.gcc.png" alt="176.gcc Timings"/>

	<p>Unlike the <i>Sketch</i> timings, compilation of <i>176.gcc</i>
	involves a large amount of code generation. The time spent in Clang's
	LLVM IR generation and code generation is on par with gcc's code
	generation time but the improved parsing & semantic analysis
	performance means Clang still comes in at ~29% faster versus gcc
	on <tt>-S -O0 -g</tt> and ~20% faster versus llvm-gcc.</p>

	<p>These numbers indicate that Clang still has room for improvement in
	several areas, notably our LLVM IR generation is significantly slower
	than that of llvm-gcc, and both Clang and llvm-gcc incur a
	significantly higher cost for adding debugging information compared to
	gcc.</p>

	</div>
	</body>
	</html>