lld/docs/design.rst - llvm-project - Git at Google

 .. _design:

 Linker Design
 =============

 Introduction
 ------------

 lld is a new generation of linker.  It is not "section" based like traditional
 linkers which mostly just interlace sections from multiple object files into the
 output file.  Instead, lld is based on "Atoms".  Traditional section based
 linking work well for simple linking, but their model makes advanced linking
 features difficult to implement.  Features like dead code stripping, reordering
 functions for locality, and C++ coalescing require the linker to work at a finer
 grain.

 An atom is an indivisible chunk of code or data.  An atom has a set of
 attributes, such as: name, scope, content-type, alignment, etc.  An atom also
 has a list of References.  A Reference contains: a kind, an optional offset, an
 optional addend, and an optional target atom.

 The Atom model allows the linker to use standard graph theory models for linking
 data structures.  Each atom is a node, and each Reference is an edge.  The
 feature of dead code stripping is implemented by following edges to mark all
 live atoms, and then delete the non-live atoms.


 Atom Model
 ----------

 An atom is an indivisible chuck of code or data.  Typically each user written
 function or global variable is an atom.  In addition, the compiler may emit
 other atoms, such as for literal c-strings or floating point constants, or for
 runtime data structures like dwarf unwind info or pointers to initializers.

 A simple "hello world" object file would be modeled like this:

 .. image:: hello.png

 There are three atoms: main, a proxy for printf, and an anonymous atom
 containing the c-string literal "hello world".  The Atom "main" has two
 references. One is the call site for the call to printf, and the other is a
 refernce for the instruction that loads the address of the c-string literal.

 File Model
 ----------

 The linker views the input files as basically containers of Atoms and
 References, and just a few attributes of their own.  The linker works with three
 kinds of files: object files, static libraries, and dynamic shared libraries.
 Each kind of file has reader object which presents the file in the model
 expected by the linker.

 Object File
 ~~~~~~~~~~~

 An object file is just a container of atoms.  When linking an object file, a
 reader is instantiated which parses the object file and instantiates a set of
 atoms representing all content in the .o file.  The linker adds all those atoms
 to a master graph.

 Static Library (Archive)
 ~~~~~~~~~~~~~~~~~~~~~~~~

 This is the traditional unix static archive which is just a collection of object
 files with a "table of contents". When linking with a static library, by default
 nothing is added to the master graph of atoms. Instead, if after merging all
 atoms from object files into a master graph, if any "undefined" atoms are left
 remaining in the master graph, the linker reads the table of contents for each
 static library to see if any have the needed definitions. If so, the set of
 atoms from the specified object file in the static library is added to the
 master graph of atoms.

 Dynamic Library (Shared Object)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Dynamic libraries are different than object files and static libraries in that
 they don't directly add any content.  Their purpose is to check at build time
 that the remaining undefined references can be resolved at runtime, and provide
 a list of dynamic libraries (SO_NEEDED) that will be needed at runtime.  The way
 this is modeled in the linker is that a dynamic library contributes no atoms to
 the initial graph of atoms.  Instead, (like static libraries) if there are
 "undefined" atoms in the master graph of all atoms, then each dynamic library is
 checked to see if exports the required symbol. If so, a "shared library" atom is
 instantiated by the by the reader which the linker uses to replace the
 "undefined" atom.

 Linking Steps
 -------------

 Through the use of abstract Atoms, the core of linking is architecture
 independent and file format independent.  All command line parsing is factored
 out into a separate "options" abstraction which enables the linker to be driven
 with different command line sets.

 The overall steps in linking are:

   #. Command line processing

   #. Parsing input files

   #. Resolving

   #. Passes/Optimizations

   #. Generate output file

 The Resolving and Passes steps are done purely on the master graph of atoms, so
 they have no notion of file formats such as mach-o or ELF.

 Resolving
 ~~~~~~~~~

 The resolving step takes all the atoms graphs from each object file and combines
 them into one master object graph.  Unfortunately, it is not as simple as
 appending the atom list from each file into one big list.  There are many cases
 where atoms need to be coalesced.  That is, two or more atoms need to be
 coalesced into one atom.  This is necessary to support: C language "tentative
 definitions", C++ weak symbols for templates and inlines defined in headers,
 replacing undefined atoms with actual definition atoms, and for merging copies
 of constants like c-strings and floating point constants.

 The linker support coalescing by-name and by-content. By-name is used for
 tentative definitions and weak symbols.  By-content is used for constant data
 that can be merged.

 The resolving process maintains some global linking "state", including a "symbol
 table" which is a map from llvm::StringRef to lld::Atom*.  With these data
 structures, the linker iterates all atoms in all input files. F or each atom, it
 checks if the atom is named and has a global or hidden scope.  If so, the atom
 is added to the symbol table map.  If there already is a matching atom in that
 table, that means the current atom needs to be coalesced with the found atom, or
 it is a multiple definition error.

 When all initial input file atoms have been processed by the resolver, a scan is
 made to see if there are any undefined atoms in the graph.  If there are, the
 linker scans all libraries (both static and dynamic) looking for definitions to
 replace the undefined atoms.  It is an error if any undefined atoms are left
 remaining.

 Dead code stripping (if requested) is done at the end of resolving.  The linker
 does a simple mark-and-sweep. It starts with "root" atoms (like "main" in a main
 executable) and follows each references and marks each Atom that it visits as
 "live".  When done, all atoms not marked "live" are removed.

 The result of the Resolving phase is the creation of an lld::File object.  The
 goal is that the lld::File model is <b>the</b> internal representation
 throughout the linker. The file readers parse (mach-o, ELF, COFF) into an
 lld::File.  The file writers (mach-o, ELF, COFF) taken an lld::File and produce
 their file kind, and every Pass only operates on an lld::File.  This is not only
 a simpler, consistent model, but it enables the state of the linker to be dumped
 at any point in the link for testing purposes.


 Passes
 ~~~~~~

 The Passes step is an open ended set of routines that each get a change to
 modify or enhance the current lld::File object. Some example Passes are:

   * stub (PLT) generation

   * GOT instantiation

   * order_file optimization

   * branch island generation

   * branch shim generation

   * Objective-C optimizations (Darwin specific)

   * TLV instantiation (Darwin specific)

   * dtrace probe processing (Darwin specific)

   * compact unwind encoding (Darwin specific)


 Some of these passes are specific to Darwin's runtime environments.  But many of
 the passes are applicable to any OS (such as generating branch island for out of
 range branch instructions).

 The general structure of a pass is to iterate through the atoms in the current
 lld::File object, inspecting each atom and doing something.  For instance, the
 stub pass, looks for call sites to shared library atoms (e.g. call to printf).
 It then instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for
 each proxy atom needed, and these new atoms are added to the current lld::File
 object.  Next, all the noted call sites to shared library atoms have their
 References altered to point to the stub atom instead of the shared library atom.

 Generate Output File
 ~~~~~~~~~~~~~~~~~~~~

 Once the passes are done, the output file writer is given current lld::File
 object.  The writer's job is to create the executable content file wrapper and
 place the content of the atoms into it.

 lld::File representations
 -------------------------

 Just as LLVM has three representations of its IR model, lld has three
 representations of its File/Atom/Reference model:

  * In memory, abstract C++ classes (lld::Atom, lld::Reference, and lld::File).

  * textual (in YAML)

  * binary format ("native")

 Binary File Format
 ~~~~~~~~~~~~~~~~~~

 In theory, lld::File objects could be written to disk in an existing Object File
 format standard (e.g. ELF).  Instead we choose to define a new binary file
 format. There are two main reasons for this: fidelity and performance.  In order
 for lld to work as a linker on all platforms, its internal model must be rich
 enough to model all CPU and OS linking features.  But if we choose an existing
 Object File format as the lld binary format, that means an on going need to
 retrofit each platform specific feature needed from alternate platforms into the
 existing Object File format.  Having our own "native" binary format side steps
 that issue.  We still need to be able to binary encode all the features, but
 once the in-memory model can represent the feature, it is straight forward to
 binary encode it.

 The reason to use a binary file format at all, instead of a textual file format,
 is speed.  You want the binary format to be as fast as possible to read into the
 in-memory model. Given that we control the in-memory model and the binary
 format, the obvious way to make reading super fast it to make the file format be
 basically just an array of atoms.  The reader just mmaps in the file and looks
 at the header to see how many atoms there are and instantiate that many atom
 objects with the atom attribute information coming from that array.  The trick
 is designing this in a way that can be extended as the Atom mode evolves and new
 attributes are added.

 The native object file format starts with a header that lists how many "chunks"
 are in the file.  A chunk is an array of "ivar data".  The native file reader
 instantiates an array of Atom objects (with one large malloc call).  Each atom
 contains just a pointer to its vtable and a pointer to its ivar data.  All
 methods on lld::Atom are virtual, so all the method implementations return
 values based on the ivar data to which it has a pointer.  If a new linking
 features is added which requires a change to the lld::Atom model, a new native
 reader class (e.g. version 2) is defined which knows how to read the new feature
 information from the new ivar data.  The old reader class (e.g. version 1) is
 updated to do its best to model (the lack of the new feature) given the old ivar
 data in existing native object files.

 With this model for the native file format, files can be read and turned
 into the in-memory graph of lld::Atoms with just a few memory allocations.
 And the format can easily adapt over time to new features


 Textual representations in YAML
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 In designing a textual format we want something easy for humans to read and easy
 for the linker to parse.  Since an atom has lots of attributes most of which are
 usually just the default, we should define default values for every attribute so
 that those can be omitted from the text representation.  Here is the atoms for a
 simple hello world program expressed in YAML::

   target-triple:   x86_64-apple-darwin11

   atoms:
       - name:    _main
         scope:   global
         type:    code
         content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00,
                    00, 00, 31, c0, 5d, c3 ]
         fixups:
         - offset: 07
           kind:   pcrel32
           target: 2
         - offset: 0E
           kind:   call32
           target: _fprintf

       - type:    c-string
         content: [ 73, 5A, 00 ]

   ...

 The biggest use for the textual format will be writing test cases.  Writing test
 cases in C is problematic because the compiler may vary its output over time for
 its own optimization reasons which my inadvertently disable or break the linker
 feature trying to be tested. By writing test cases in the linkers own textual
 format, we can exactly specify every attribute of every atom and thus target
 specific linker logic.

 Testing
 ~~~~~~~

 The lld project contains a test suite which is being built up as new code is
 added to lld.  All new lld functionality should have a tests added to the test
 suite.  The test suite is `lit <http://llvm.org/cmds/lit.html/>`_ driven.  Each
 test is a text file with comments telling lit how to run the test and check the
 result To facilitate testing, the lld project builds a tool called lld-core.
 This tool reads a YAML file (default from stdin), parses it into one or more
 lld::File objects in memory and then feeds those lld::File objects to the
 resolver phase.  The output of the resolver is written as a native object file.
 It is then read back in using the native object file reader and then pass to the
 YAML writer.  This round-about path means that all three representations
 (in-memory, binary, and text) are exercised, and any new feature has to work in
 all the representations to pass the test.


 Resolver testing
 ~~~~~~~~~~~~~~~~

 Basic testing is the "core linking" or resolving phase.  That is where the
 linker merges object files.  All test cases are written in YAML.  One feature of
 YAML is that it allows multiple "documents" to be encoding in one YAML stream.
 That means one text file can appear to the linker as multiple .o files - the
 normal case for the linker.

 Here is a simple example of a core linking test case. It checks that an
 undefined atom from one file will be replaced by a definition from another
 file::

   # RUN: lld-core %s | FileCheck %s

   #
   # Test that undefined atoms are replaced with defined atoms.
   #

   ---
   atoms:
       - name:              foo
         definition:        undefined
   ---
   atoms:
       - name:              foo
         scope:             global
         type:              code
   ...

   # CHECK:       name:       foo
   # CHECK:       scope:      global
   # CHECK:       type:       code
   # CHECK-NOT:   name:       foo
   # CHECK:       ...


 Passes testing
 ~~~~~~~~~~~~~~

 Since Passes just operate on an lld::File object, the lld-core tool has the
 option to run a particular pass (after resolving).  Thus, you can write a YAML
 test case with carefully crafted input to exercise areas of a Pass and the check
 the resulting lld::File object as represented in YAML.


 Design Issues
 -------------

 There are a number of open issues in the design of lld.  The plan is to wait and
 make these design decisions when we need to.


 Debug Info
 ~~~~~~~~~~

 Currently, the lld model says nothing about debug info.  But the most popular
 debug format is DWARF and there is some impedance mismatch with the lld model
 and DWARF.  In lld there are just Atoms and only Atoms that need to be in a
 special section at runtime have an associated section.  Also, Atoms do not have
 addresses.  The way DWARF is spec'ed different parts of DWARF are supposed to go
 into specially named sections and the DWARF references function code by address.

 CPU and OS specific functionality
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Currently, lld has an abstract "Platform" that deals with any CPU or OS specific
 differences in linking.  We just keep adding virtual methods to the base
 Platform class as we find linking areas that might need customization.  At some
 point we'll need to structure this better.


 File Attributes
 ~~~~~~~~~~~~~~~

 Currently, lld::File just has a path and a way to iterate its atoms. We will
 need to add mores attributes on a File.  For example, some equivalent to the
 target triple.  There is also a number of cached or computed attributes that
 could make various Passes more efficient.  For instance, on Darwin there are a
 number of Objective-C optimizations that can be done by a Pass.  But it would
 improve the plain C case if the Objective-C optimization Pass did not have to
 scan all atoms looking for any Objective-C data structures.  This could be done
 if the lld::File object had an attribute that said if the file had any
 Objective-C data in it. The Resolving phase would then be required to "merge"
 that attribute as object files are added.


 Command Line Processing
 ~~~~~~~~~~~~~~~~~~~~~~~

 Eventually, we may want this linker to be able to be a drop in replacement
 linker for existing linker tools.  That means being able to handle command line
 arguments for different platforms (e.g. darwin or linux).  Currently, there is
 no command line processing code in lld. If clang winds up incorporating the lld
 libraries into the clang binary, lld may be able to punt this work because clang
 will be responsible for setting up the state for lld.
	.. _design:

	Linker Design
	=============

	Introduction
	------------

	lld is a new generation of linker. It is not "section" based like traditional
	linkers which mostly just interlace sections from multiple object files into the
	output file. Instead, lld is based on "Atoms". Traditional section based
	linking work well for simple linking, but their model makes advanced linking
	features difficult to implement. Features like dead code stripping, reordering
	functions for locality, and C++ coalescing require the linker to work at a finer
	grain.

	An atom is an indivisible chunk of code or data. An atom has a set of
	attributes, such as: name, scope, content-type, alignment, etc. An atom also
	has a list of References. A Reference contains: a kind, an optional offset, an
	optional addend, and an optional target atom.

	The Atom model allows the linker to use standard graph theory models for linking
	data structures. Each atom is a node, and each Reference is an edge. The
	feature of dead code stripping is implemented by following edges to mark all
	live atoms, and then delete the non-live atoms.


	Atom Model
	----------

	An atom is an indivisible chuck of code or data. Typically each user written
	function or global variable is an atom. In addition, the compiler may emit
	other atoms, such as for literal c-strings or floating point constants, or for
	runtime data structures like dwarf unwind info or pointers to initializers.

	A simple "hello world" object file would be modeled like this:

	.. image:: hello.png

	There are three atoms: main, a proxy for printf, and an anonymous atom
	containing the c-string literal "hello world". The Atom "main" has two
	references. One is the call site for the call to printf, and the other is a
	refernce for the instruction that loads the address of the c-string literal.

	File Model
	----------

	The linker views the input files as basically containers of Atoms and
	References, and just a few attributes of their own. The linker works with three
	kinds of files: object files, static libraries, and dynamic shared libraries.
	Each kind of file has reader object which presents the file in the model
	expected by the linker.

	Object File
	~~~~~~~~~~~

	An object file is just a container of atoms. When linking an object file, a
	reader is instantiated which parses the object file and instantiates a set of
	atoms representing all content in the .o file. The linker adds all those atoms
	to a master graph.

	Static Library (Archive)
	~~~~~~~~~~~~~~~~~~~~~~~~

	This is the traditional unix static archive which is just a collection of object
	files with a "table of contents". When linking with a static library, by default
	nothing is added to the master graph of atoms. Instead, if after merging all
	atoms from object files into a master graph, if any "undefined" atoms are left
	remaining in the master graph, the linker reads the table of contents for each
	static library to see if any have the needed definitions. If so, the set of
	atoms from the specified object file in the static library is added to the
	master graph of atoms.

	Dynamic Library (Shared Object)
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Dynamic libraries are different than object files and static libraries in that
	they don't directly add any content. Their purpose is to check at build time
	that the remaining undefined references can be resolved at runtime, and provide
	a list of dynamic libraries (SO_NEEDED) that will be needed at runtime. The way
	this is modeled in the linker is that a dynamic library contributes no atoms to
	the initial graph of atoms. Instead, (like static libraries) if there are
	"undefined" atoms in the master graph of all atoms, then each dynamic library is
	checked to see if exports the required symbol. If so, a "shared library" atom is
	instantiated by the by the reader which the linker uses to replace the
	"undefined" atom.

	Linking Steps
	-------------

	Through the use of abstract Atoms, the core of linking is architecture
	independent and file format independent. All command line parsing is factored
	out into a separate "options" abstraction which enables the linker to be driven
	with different command line sets.

	The overall steps in linking are:

	#. Command line processing

	#. Parsing input files

	#. Resolving

	#. Passes/Optimizations

	#. Generate output file

	The Resolving and Passes steps are done purely on the master graph of atoms, so
	they have no notion of file formats such as mach-o or ELF.

	Resolving
	~~~~~~~~~

	The resolving step takes all the atoms graphs from each object file and combines
	them into one master object graph. Unfortunately, it is not as simple as
	appending the atom list from each file into one big list. There are many cases
	where atoms need to be coalesced. That is, two or more atoms need to be
	coalesced into one atom. This is necessary to support: C language "tentative
	definitions", C++ weak symbols for templates and inlines defined in headers,
	replacing undefined atoms with actual definition atoms, and for merging copies
	of constants like c-strings and floating point constants.

	The linker support coalescing by-name and by-content. By-name is used for
	tentative definitions and weak symbols. By-content is used for constant data
	that can be merged.

	The resolving process maintains some global linking "state", including a "symbol
	table" which is a map from llvm::StringRef to lld::Atom*. With these data
	structures, the linker iterates all atoms in all input files. F or each atom, it
	checks if the atom is named and has a global or hidden scope. If so, the atom
	is added to the symbol table map. If there already is a matching atom in that
	table, that means the current atom needs to be coalesced with the found atom, or
	it is a multiple definition error.

	When all initial input file atoms have been processed by the resolver, a scan is
	made to see if there are any undefined atoms in the graph. If there are, the
	linker scans all libraries (both static and dynamic) looking for definitions to
	replace the undefined atoms. It is an error if any undefined atoms are left
	remaining.

	Dead code stripping (if requested) is done at the end of resolving. The linker
	does a simple mark-and-sweep. It starts with "root" atoms (like "main" in a main
	executable) and follows each references and marks each Atom that it visits as
	"live". When done, all atoms not marked "live" are removed.

	The result of the Resolving phase is the creation of an lld::File object. The
	goal is that the lld::File model is <b>the</b> internal representation
	throughout the linker. The file readers parse (mach-o, ELF, COFF) into an
	lld::File. The file writers (mach-o, ELF, COFF) taken an lld::File and produce
	their file kind, and every Pass only operates on an lld::File. This is not only
	a simpler, consistent model, but it enables the state of the linker to be dumped
	at any point in the link for testing purposes.


	Passes
	~~~~~~

	The Passes step is an open ended set of routines that each get a change to
	modify or enhance the current lld::File object. Some example Passes are:

	* stub (PLT) generation

	* GOT instantiation

	* order_file optimization

	* branch island generation

	* branch shim generation

	* Objective-C optimizations (Darwin specific)

	* TLV instantiation (Darwin specific)

	* dtrace probe processing (Darwin specific)

	* compact unwind encoding (Darwin specific)


	Some of these passes are specific to Darwin's runtime environments. But many of
	the passes are applicable to any OS (such as generating branch island for out of
	range branch instructions).

	The general structure of a pass is to iterate through the atoms in the current
	lld::File object, inspecting each atom and doing something. For instance, the
	stub pass, looks for call sites to shared library atoms (e.g. call to printf).
	It then instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for
	each proxy atom needed, and these new atoms are added to the current lld::File
	object. Next, all the noted call sites to shared library atoms have their
	References altered to point to the stub atom instead of the shared library atom.

	Generate Output File
	~~~~~~~~~~~~~~~~~~~~

	Once the passes are done, the output file writer is given current lld::File
	object. The writer's job is to create the executable content file wrapper and
	place the content of the atoms into it.

	lld::File representations
	-------------------------

	Just as LLVM has three representations of its IR model, lld has three
	representations of its File/Atom/Reference model:

	* In memory, abstract C++ classes (lld::Atom, lld::Reference, and lld::File).

	* textual (in YAML)

	* binary format ("native")

	Binary File Format
	~~~~~~~~~~~~~~~~~~

	In theory, lld::File objects could be written to disk in an existing Object File
	format standard (e.g. ELF). Instead we choose to define a new binary file
	format. There are two main reasons for this: fidelity and performance. In order
	for lld to work as a linker on all platforms, its internal model must be rich
	enough to model all CPU and OS linking features. But if we choose an existing
	Object File format as the lld binary format, that means an on going need to
	retrofit each platform specific feature needed from alternate platforms into the
	existing Object File format. Having our own "native" binary format side steps
	that issue. We still need to be able to binary encode all the features, but
	once the in-memory model can represent the feature, it is straight forward to
	binary encode it.

	The reason to use a binary file format at all, instead of a textual file format,
	is speed. You want the binary format to be as fast as possible to read into the
	in-memory model. Given that we control the in-memory model and the binary
	format, the obvious way to make reading super fast it to make the file format be
	basically just an array of atoms. The reader just mmaps in the file and looks
	at the header to see how many atoms there are and instantiate that many atom
	objects with the atom attribute information coming from that array. The trick
	is designing this in a way that can be extended as the Atom mode evolves and new
	attributes are added.

	The native object file format starts with a header that lists how many "chunks"
	are in the file. A chunk is an array of "ivar data". The native file reader
	instantiates an array of Atom objects (with one large malloc call). Each atom
	contains just a pointer to its vtable and a pointer to its ivar data. All
	methods on lld::Atom are virtual, so all the method implementations return
	values based on the ivar data to which it has a pointer. If a new linking
	features is added which requires a change to the lld::Atom model, a new native
	reader class (e.g. version 2) is defined which knows how to read the new feature
	information from the new ivar data. The old reader class (e.g. version 1) is
	updated to do its best to model (the lack of the new feature) given the old ivar
	data in existing native object files.

	With this model for the native file format, files can be read and turned
	into the in-memory graph of lld::Atoms with just a few memory allocations.
	And the format can easily adapt over time to new features


	Textual representations in YAML
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	In designing a textual format we want something easy for humans to read and easy
	for the linker to parse. Since an atom has lots of attributes most of which are
	usually just the default, we should define default values for every attribute so
	that those can be omitted from the text representation. Here is the atoms for a
	simple hello world program expressed in YAML::

	target-triple: x86_64-apple-darwin11

	atoms:
	- name: _main
	scope: global
	type: code
	content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00,
	00, 00, 31, c0, 5d, c3 ]
	fixups:
	- offset: 07
	kind: pcrel32
	target: 2
	- offset: 0E
	kind: call32
	target: _fprintf

	- type: c-string
	content: [ 73, 5A, 00 ]

	...

	The biggest use for the textual format will be writing test cases. Writing test
	cases in C is problematic because the compiler may vary its output over time for
	its own optimization reasons which my inadvertently disable or break the linker
	feature trying to be tested. By writing test cases in the linkers own textual
	format, we can exactly specify every attribute of every atom and thus target
	specific linker logic.

	Testing
	~~~~~~~

	The lld project contains a test suite which is being built up as new code is
	added to lld. All new lld functionality should have a tests added to the test
	suite. The test suite is `lit <http://llvm.org/cmds/lit.html/>`_ driven. Each
	test is a text file with comments telling lit how to run the test and check the
	result To facilitate testing, the lld project builds a tool called lld-core.
	This tool reads a YAML file (default from stdin), parses it into one or more
	lld::File objects in memory and then feeds those lld::File objects to the
	resolver phase. The output of the resolver is written as a native object file.
	It is then read back in using the native object file reader and then pass to the
	YAML writer. This round-about path means that all three representations
	(in-memory, binary, and text) are exercised, and any new feature has to work in
	all the representations to pass the test.


	Resolver testing
	~~~~~~~~~~~~~~~~

	Basic testing is the "core linking" or resolving phase. That is where the
	linker merges object files. All test cases are written in YAML. One feature of
	YAML is that it allows multiple "documents" to be encoding in one YAML stream.
	That means one text file can appear to the linker as multiple .o files - the
	normal case for the linker.

	Here is a simple example of a core linking test case. It checks that an
	undefined atom from one file will be replaced by a definition from another
	file::

	# RUN: lld-core %s \| FileCheck %s

	#
	# Test that undefined atoms are replaced with defined atoms.
	#

	---
	atoms:
	- name: foo
	definition: undefined
	---
	atoms:
	- name: foo
	scope: global
	type: code
	...

	# CHECK: name: foo
	# CHECK: scope: global
	# CHECK: type: code
	# CHECK-NOT: name: foo
	# CHECK: ...


	Passes testing
	~~~~~~~~~~~~~~

	Since Passes just operate on an lld::File object, the lld-core tool has the
	option to run a particular pass (after resolving). Thus, you can write a YAML
	test case with carefully crafted input to exercise areas of a Pass and the check
	the resulting lld::File object as represented in YAML.


	Design Issues
	-------------

	There are a number of open issues in the design of lld. The plan is to wait and
	make these design decisions when we need to.


	Debug Info
	~~~~~~~~~~

	Currently, the lld model says nothing about debug info. But the most popular
	debug format is DWARF and there is some impedance mismatch with the lld model
	and DWARF. In lld there are just Atoms and only Atoms that need to be in a
	special section at runtime have an associated section. Also, Atoms do not have
	addresses. The way DWARF is spec'ed different parts of DWARF are supposed to go
	into specially named sections and the DWARF references function code by address.

	CPU and OS specific functionality
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Currently, lld has an abstract "Platform" that deals with any CPU or OS specific
	differences in linking. We just keep adding virtual methods to the base
	Platform class as we find linking areas that might need customization. At some
	point we'll need to structure this better.


	File Attributes
	~~~~~~~~~~~~~~~

	Currently, lld::File just has a path and a way to iterate its atoms. We will
	need to add mores attributes on a File. For example, some equivalent to the
	target triple. There is also a number of cached or computed attributes that
	could make various Passes more efficient. For instance, on Darwin there are a
	number of Objective-C optimizations that can be done by a Pass. But it would
	improve the plain C case if the Objective-C optimization Pass did not have to
	scan all atoms looking for any Objective-C data structures. This could be done
	if the lld::File object had an attribute that said if the file had any
	Objective-C data in it. The Resolving phase would then be required to "merge"
	that attribute as object files are added.


	Command Line Processing
	~~~~~~~~~~~~~~~~~~~~~~~

	Eventually, we may want this linker to be able to be a drop in replacement
	linker for existing linker tools. That means being able to handle command line
	arguments for different platforms (e.g. darwin or linux). Currently, there is
	no command line processing code in lld. If clang winds up incorporating the lld
	libraries into the clang binary, lld may be able to punt this work because clang
	will be responsible for setting up the state for lld.