docs/FuzzingLLVM.rst - llvm-project/llvm - Git at Google

 ================================
 Fuzzing LLVM libraries and tools
 ================================

 .. contents::
    :local:
    :depth: 2

 Introduction
 ============

 The LLVM tree includes a number of fuzzers for various components. These are
 built on top of :doc:`LibFuzzer <LibFuzzer>`. In order to build and run these
 fuzzers, see :ref:`building-fuzzers`.


 Available Fuzzers
 =================

 clang-fuzzer
 ------------

 A |generic fuzzer| that tries to compile textual input as C++ code. Some of the
 bugs this fuzzer has reported are `on bugzilla`__ and `on OSS Fuzz's
 tracker`__.

 __ https://llvm.org/pr23057
 __ https://bugs.chromium.org/p/oss-fuzz/issues/list?q=proj-llvm+clang-fuzzer

 clang-proto-fuzzer
 ------------------

 A |protobuf fuzzer| that compiles valid C++ programs generated from a protobuf
 class that describes a subset of the C++ language.

 This fuzzer accepts clang command line options after `ignore_remaining_args=1`.
 For example, the following command will fuzz clang with a higher optimization
 level:

 .. code-block:: shell

    % bin/clang-proto-fuzzer <corpus-dir> -ignore_remaining_args=1 -O3

 clang-format-fuzzer
 -------------------

 A |generic fuzzer| that runs clang-format_ on C++ text fragments. Some of the
 bugs this fuzzer has reported are `on bugzilla`__
 and `on OSS Fuzz's tracker`__.

 .. _clang-format: https://clang.llvm.org/docs/ClangFormat.html
 __ https://llvm.org/pr23052
 __ https://bugs.chromium.org/p/oss-fuzz/issues/list?q=proj-llvm+clang-format-fuzzer

 llvm-as-fuzzer
 --------------

 A |generic fuzzer| that tries to parse text as :doc:`LLVM assembly <LangRef>`.
 Some of the bugs this fuzzer has reported are `on bugzilla`__.

 __ https://llvm.org/pr24639

 llvm-dwarfdump-fuzzer
 ---------------------

 A |generic fuzzer| that interprets inputs as object files and runs
 :doc:`llvm-dwarfdump <CommandGuide/llvm-dwarfdump>` on them. Some of the bugs
 this fuzzer has reported are `on OSS Fuzz's tracker`__

 __ https://bugs.chromium.org/p/oss-fuzz/issues/list?q=proj-llvm+llvm-dwarfdump-fuzzer

 llvm-demangle-fuzzer
 ---------------------

 A |generic fuzzer| for the Itanium demangler used in various LLVM tools. We've
 fuzzed __cxa_demangle to death, why not fuzz LLVM's implementation of the same
 function!

 llvm-isel-fuzzer
 ----------------

 A |LLVM IR fuzzer| aimed at finding bugs in instruction selection.

 This fuzzer accepts flags after `ignore_remaining_args=1`. The flags match
 those of :doc:`llc <CommandGuide/llc>` and the triple is required. For example,
 the following command would fuzz AArch64 with :doc:`GlobalISel/index`:

 .. code-block:: shell

    % bin/llvm-isel-fuzzer <corpus-dir> -ignore_remaining_args=1 -mtriple aarch64 -global-isel -O0

 Some flags can also be specified in the binary name itself in order to support
 OSS Fuzz, which has trouble with required arguments. To do this, you can copy
 or move ``llvm-isel-fuzzer`` to ``llvm-isel-fuzzer--x-y-z``, separating options
 from the binary name using "--". The valid options are architecture names
 (``aarch64``, ``x86_64``), optimization levels (``O0``, ``O2``), or specific
 keywords, like ``gisel`` for enabling global instruction selection. In this
 mode, the same example could be run like so:

 .. code-block:: shell

    % bin/llvm-isel-fuzzer--aarch64-O0-gisel <corpus-dir>

 llvm-opt-fuzzer
 ---------------

 A |LLVM IR fuzzer| aimed at finding bugs in optimization passes.

 It receives optimization pipeline and runs it for each fuzzer input.

 Interface of this fuzzer almost directly mirrors ``llvm-isel-fuzzer``. Both
 ``mtriple`` and ``passes`` arguments are required. Passes are specified in a
 format suitable for the new pass manager. You can find some documentation about
 this format in the doxygen for ``PassBuilder::parsePassPipeline``.

 .. code-block:: shell

    % bin/llvm-opt-fuzzer <corpus-dir> -ignore_remaining_args=1 -mtriple x86_64 -passes instcombine

 Similarly to the ``llvm-isel-fuzzer`` arguments in some predefined configurations
 might be embedded directly into the binary file name:

 .. code-block:: shell

    % bin/llvm-opt-fuzzer--x86_64-instcombine <corpus-dir>

 llvm-mc-assemble-fuzzer
 -----------------------

 A |generic fuzzer| that fuzzes the MC layer's assemblers by treating inputs as
 target specific assembly.

 Note that this fuzzer has an unusual command line interface which is not fully
 compatible with all of libFuzzer's features. Fuzzer arguments must be passed
 after ``--fuzzer-args``, and any ``llc`` flags must use two dashes. For
 example, to fuzz the AArch64 assembler you might use the following command:

 .. code-block:: console

   llvm-mc-fuzzer --triple=aarch64-linux-gnu --fuzzer-args -max_len=4

 This scheme will likely change in the future.

 llvm-mc-disassemble-fuzzer
 --------------------------

 A |generic fuzzer| that fuzzes the MC layer's disassemblers by treating inputs
 as assembled binary data.

 Note that this fuzzer has an unusual command line interface which is not fully
 compatible with all of libFuzzer's features. See the notes above about
 ``llvm-mc-assemble-fuzzer`` for details.


 .. |generic fuzzer| replace:: :ref:`generic fuzzer <fuzzing-llvm-generic>`
 .. |protobuf fuzzer|
    replace:: :ref:`libprotobuf-mutator based fuzzer <fuzzing-llvm-protobuf>`
 .. |LLVM IR fuzzer|
    replace:: :ref:`structured LLVM IR fuzzer <fuzzing-llvm-ir>`

 lldb-target-fuzzer
 ---------------------

 A |generic fuzzer| that interprets inputs as object files and uses them to
 create a target in lldb.

 Mutators and Input Generators
 =============================

 The inputs for a fuzz target are generated via random mutations of a
 :ref:`corpus <libfuzzer-corpus>`. There are a few options for the kinds of
 mutations that a fuzzer in LLVM might want.

 .. _fuzzing-llvm-generic:

 Generic Random Fuzzing
 ----------------------

 The most basic form of input mutation is to use the built in mutators of
 LibFuzzer. These simply treat the input corpus as a bag of bits and make random
 mutations. This type of fuzzer is good for stressing the surface layers of a
 program, and is good at testing things like lexers, parsers, or binary
 protocols.

 Some of the in-tree fuzzers that use this type of mutator are `clang-fuzzer`_,
 `clang-format-fuzzer`_, `llvm-as-fuzzer`_, `llvm-dwarfdump-fuzzer`_,
 `llvm-mc-assemble-fuzzer`_, and `llvm-mc-disassemble-fuzzer`_.

 .. _fuzzing-llvm-protobuf:

 Structured Fuzzing using ``libprotobuf-mutator``
 ------------------------------------------------

 We can use libprotobuf-mutator_ in order to perform structured fuzzing and
 stress deeper layers of programs. This works by defining a protobuf class that
 translates arbitrary data into structurally interesting input. Specifically, we
 use this to work with a subset of the C++ language and perform mutations that
 produce valid C++ programs in order to exercise parts of clang that are more
 interesting than parser error handling.

 To build this kind of fuzzer you need `protobuf`_ and its dependencies
 installed, and you need to specify some extra flags when configuring the build
 with :doc:`CMake <CMake>`. For example, `clang-proto-fuzzer`_ can be enabled by
 adding ``-DCLANG_ENABLE_PROTO_FUZZER=ON`` to the flags described in
 :ref:`building-fuzzers`.

 The only in-tree fuzzer that uses ``libprotobuf-mutator`` today is
 `clang-proto-fuzzer`_.

 .. _libprotobuf-mutator: https://github.com/google/libprotobuf-mutator
 .. _protobuf: https://github.com/google/protobuf

 .. _fuzzing-llvm-ir:

 Structured Fuzzing of LLVM IR
 -----------------------------

 We also use a more direct form of structured fuzzing for fuzzers that take
 :doc:`LLVM IR <LangRef>` as input. This is achieved through the ``FuzzMutate``
 library, which was `discussed at EuroLLVM 2017`_.

 The ``FuzzMutate`` library is used to structurally fuzz backends in
 `llvm-isel-fuzzer`_.

 .. _discussed at EuroLLVM 2017: https://www.youtube.com/watch?v=UBbQ_s6hNgg


 Building and Running
 ====================

 .. _building-fuzzers:

 Configuring LLVM to Build Fuzzers
 ---------------------------------

 Fuzzers will be built and linked to libFuzzer by default as long as you build
 LLVM with sanitizer coverage enabled. You would typically also enable at least
 one sanitizer to find bugs faster. The most common way to build the fuzzers is
 by adding the following two flags to your CMake invocation:
 ``-DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=On``.

 .. note:: If you have ``compiler-rt`` checked out in an LLVM tree when building
           with sanitizers, you'll want to specify ``-DLLVM_BUILD_RUNTIME=Off``
           to avoid building the sanitizers themselves with sanitizers enabled.

 .. note:: You may run into issues if you build with BFD ld, which is the
           default linker on many unix systems. These issues are being tracked
           in https://llvm.org/PR34636.

 Continuously Running and Finding Bugs
 -------------------------------------

 There used to be a public buildbot running LLVM fuzzers continuously, and while
 this did find issues, it didn't have a very good way to report problems in an
 actionable way. Because of this, we're moving towards using `OSS Fuzz`_ more
 instead.

 You can browse the `LLVM project issue list`_ for the bugs found by
 `LLVM on OSS Fuzz`_. These are also mailed to the `llvm-bugs mailing
 list`_.

 .. _OSS Fuzz: https://github.com/google/oss-fuzz
 .. _LLVM project issue list:
    https://bugs.chromium.org/p/oss-fuzz/issues/list?q=Proj-llvm
 .. _LLVM on OSS Fuzz:
    https://github.com/google/oss-fuzz/blob/master/projects/llvm
 .. _llvm-bugs mailing list:
    http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


 Utilities for Writing Fuzzers
 =============================

 There are some utilities available for writing fuzzers in LLVM.

 Some helpers for handling the command line interface are available in
 ``include/llvm/FuzzMutate/FuzzerCLI.h``, including functions to parse command
 line options in a consistent way and to implement standalone main functions so
 your fuzzer can be built and tested when not built against libFuzzer.

 There is also some handling of the CMake config for fuzzers, where you should
 use the ``add_llvm_fuzzer`` to set up fuzzer targets. This function works
 similarly to functions such as ``add_llvm_tool``, but they take care of linking
 to LibFuzzer when appropriate and can be passed the ``DUMMY_MAIN`` argument to
 enable standalone testing.
	================================
	Fuzzing LLVM libraries and tools
	================================

	.. contents::
	:local:
	:depth: 2

	Introduction
	============

	The LLVM tree includes a number of fuzzers for various components. These are
	built on top of :doc:`LibFuzzer <LibFuzzer>`. In order to build and run these
	fuzzers, see :ref:`building-fuzzers`.


	Available Fuzzers
	=================

	clang-fuzzer
	------------

	A \|generic fuzzer\| that tries to compile textual input as C++ code. Some of the
	bugs this fuzzer has reported are `on bugzilla`__ and `on OSS Fuzz's
	tracker`__.

	__ https://llvm.org/pr23057
	__ https://bugs.chromium.org/p/oss-fuzz/issues/list?q=proj-llvm+clang-fuzzer

	clang-proto-fuzzer
	------------------

	A \|protobuf fuzzer\| that compiles valid C++ programs generated from a protobuf
	class that describes a subset of the C++ language.

	This fuzzer accepts clang command line options after `ignore_remaining_args=1`.
	For example, the following command will fuzz clang with a higher optimization
	level:

	.. code-block:: shell

	% bin/clang-proto-fuzzer <corpus-dir> -ignore_remaining_args=1 -O3

	clang-format-fuzzer
	-------------------

	A \|generic fuzzer\| that runs clang-format_ on C++ text fragments. Some of the
	bugs this fuzzer has reported are `on bugzilla`__
	and `on OSS Fuzz's tracker`__.

	.. _clang-format: https://clang.llvm.org/docs/ClangFormat.html
	__ https://llvm.org/pr23052
	__ https://bugs.chromium.org/p/oss-fuzz/issues/list?q=proj-llvm+clang-format-fuzzer

	llvm-as-fuzzer
	--------------

	A \|generic fuzzer\| that tries to parse text as :doc:`LLVM assembly <LangRef>`.
	Some of the bugs this fuzzer has reported are `on bugzilla`__.

	__ https://llvm.org/pr24639

	llvm-dwarfdump-fuzzer
	---------------------

	A \|generic fuzzer\| that interprets inputs as object files and runs
	:doc:`llvm-dwarfdump <CommandGuide/llvm-dwarfdump>` on them. Some of the bugs
	this fuzzer has reported are `on OSS Fuzz's tracker`__

	__ https://bugs.chromium.org/p/oss-fuzz/issues/list?q=proj-llvm+llvm-dwarfdump-fuzzer

	llvm-demangle-fuzzer
	---------------------

	A \|generic fuzzer\| for the Itanium demangler used in various LLVM tools. We've
	fuzzed __cxa_demangle to death, why not fuzz LLVM's implementation of the same
	function!

	llvm-isel-fuzzer
	----------------

	A \|LLVM IR fuzzer\| aimed at finding bugs in instruction selection.

	This fuzzer accepts flags after `ignore_remaining_args=1`. The flags match
	those of :doc:`llc <CommandGuide/llc>` and the triple is required. For example,
	the following command would fuzz AArch64 with :doc:`GlobalISel/index`:

	.. code-block:: shell

	% bin/llvm-isel-fuzzer <corpus-dir> -ignore_remaining_args=1 -mtriple aarch64 -global-isel -O0

	Some flags can also be specified in the binary name itself in order to support
	OSS Fuzz, which has trouble with required arguments. To do this, you can copy
	or move ``llvm-isel-fuzzer`` to ``llvm-isel-fuzzer--x-y-z``, separating options
	from the binary name using "--". The valid options are architecture names
	(``aarch64``, ``x86_64``), optimization levels (``O0``, ``O2``), or specific
	keywords, like ``gisel`` for enabling global instruction selection. In this
	mode, the same example could be run like so:

	.. code-block:: shell

	% bin/llvm-isel-fuzzer--aarch64-O0-gisel <corpus-dir>

	llvm-opt-fuzzer
	---------------

	A \|LLVM IR fuzzer\| aimed at finding bugs in optimization passes.

	It receives optimization pipeline and runs it for each fuzzer input.

	Interface of this fuzzer almost directly mirrors ``llvm-isel-fuzzer``. Both
	``mtriple`` and ``passes`` arguments are required. Passes are specified in a
	format suitable for the new pass manager. You can find some documentation about
	this format in the doxygen for ``PassBuilder::parsePassPipeline``.

	.. code-block:: shell

	% bin/llvm-opt-fuzzer <corpus-dir> -ignore_remaining_args=1 -mtriple x86_64 -passes instcombine

	Similarly to the ``llvm-isel-fuzzer`` arguments in some predefined configurations
	might be embedded directly into the binary file name:

	.. code-block:: shell

	% bin/llvm-opt-fuzzer--x86_64-instcombine <corpus-dir>

	llvm-mc-assemble-fuzzer
	-----------------------

	A \|generic fuzzer\| that fuzzes the MC layer's assemblers by treating inputs as
	target specific assembly.

	Note that this fuzzer has an unusual command line interface which is not fully
	compatible with all of libFuzzer's features. Fuzzer arguments must be passed
	after ``--fuzzer-args``, and any ``llc`` flags must use two dashes. For
	example, to fuzz the AArch64 assembler you might use the following command:

	.. code-block:: console

	llvm-mc-fuzzer --triple=aarch64-linux-gnu --fuzzer-args -max_len=4

	This scheme will likely change in the future.

	llvm-mc-disassemble-fuzzer
	--------------------------

	A \|generic fuzzer\| that fuzzes the MC layer's disassemblers by treating inputs
	as assembled binary data.

	Note that this fuzzer has an unusual command line interface which is not fully
	compatible with all of libFuzzer's features. See the notes above about
	``llvm-mc-assemble-fuzzer`` for details.


	.. \|generic fuzzer\| replace:: :ref:`generic fuzzer <fuzzing-llvm-generic>`
	.. \|protobuf fuzzer\|
	replace:: :ref:`libprotobuf-mutator based fuzzer <fuzzing-llvm-protobuf>`
	.. \|LLVM IR fuzzer\|
	replace:: :ref:`structured LLVM IR fuzzer <fuzzing-llvm-ir>`

	lldb-target-fuzzer
	---------------------

	A \|generic fuzzer\| that interprets inputs as object files and uses them to
	create a target in lldb.

	Mutators and Input Generators
	=============================

	The inputs for a fuzz target are generated via random mutations of a
	:ref:`corpus <libfuzzer-corpus>`. There are a few options for the kinds of
	mutations that a fuzzer in LLVM might want.

	.. _fuzzing-llvm-generic:

	Generic Random Fuzzing
	----------------------

	The most basic form of input mutation is to use the built in mutators of
	LibFuzzer. These simply treat the input corpus as a bag of bits and make random
	mutations. This type of fuzzer is good for stressing the surface layers of a
	program, and is good at testing things like lexers, parsers, or binary
	protocols.

	Some of the in-tree fuzzers that use this type of mutator are `clang-fuzzer`_,
	`clang-format-fuzzer`_, `llvm-as-fuzzer`_, `llvm-dwarfdump-fuzzer`_,
	`llvm-mc-assemble-fuzzer`_, and `llvm-mc-disassemble-fuzzer`_.

	.. _fuzzing-llvm-protobuf:

	Structured Fuzzing using ``libprotobuf-mutator``
	------------------------------------------------

	We can use libprotobuf-mutator_ in order to perform structured fuzzing and
	stress deeper layers of programs. This works by defining a protobuf class that
	translates arbitrary data into structurally interesting input. Specifically, we
	use this to work with a subset of the C++ language and perform mutations that
	produce valid C++ programs in order to exercise parts of clang that are more
	interesting than parser error handling.

	To build this kind of fuzzer you need `protobuf`_ and its dependencies
	installed, and you need to specify some extra flags when configuring the build
	with :doc:`CMake <CMake>`. For example, `clang-proto-fuzzer`_ can be enabled by
	adding ``-DCLANG_ENABLE_PROTO_FUZZER=ON`` to the flags described in
	:ref:`building-fuzzers`.

	The only in-tree fuzzer that uses ``libprotobuf-mutator`` today is
	`clang-proto-fuzzer`_.

	.. _libprotobuf-mutator: https://github.com/google/libprotobuf-mutator
	.. _protobuf: https://github.com/google/protobuf

	.. _fuzzing-llvm-ir:

	Structured Fuzzing of LLVM IR
	-----------------------------

	We also use a more direct form of structured fuzzing for fuzzers that take
	:doc:`LLVM IR <LangRef>` as input. This is achieved through the ``FuzzMutate``
	library, which was `discussed at EuroLLVM 2017`_.

	The ``FuzzMutate`` library is used to structurally fuzz backends in
	`llvm-isel-fuzzer`_.

	.. _discussed at EuroLLVM 2017: https://www.youtube.com/watch?v=UBbQ_s6hNgg


	Building and Running
	====================

	.. _building-fuzzers:

	Configuring LLVM to Build Fuzzers
	---------------------------------

	Fuzzers will be built and linked to libFuzzer by default as long as you build
	LLVM with sanitizer coverage enabled. You would typically also enable at least
	one sanitizer to find bugs faster. The most common way to build the fuzzers is
	by adding the following two flags to your CMake invocation:
	``-DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=On``.

	.. note:: If you have ``compiler-rt`` checked out in an LLVM tree when building
	with sanitizers, you'll want to specify ``-DLLVM_BUILD_RUNTIME=Off``
	to avoid building the sanitizers themselves with sanitizers enabled.

	.. note:: You may run into issues if you build with BFD ld, which is the
	default linker on many unix systems. These issues are being tracked
	in https://llvm.org/PR34636.

	Continuously Running and Finding Bugs
	-------------------------------------

	There used to be a public buildbot running LLVM fuzzers continuously, and while
	this did find issues, it didn't have a very good way to report problems in an
	actionable way. Because of this, we're moving towards using `OSS Fuzz`_ more
	instead.

	You can browse the `LLVM project issue list`_ for the bugs found by
	`LLVM on OSS Fuzz`_. These are also mailed to the `llvm-bugs mailing
	list`_.

	.. _OSS Fuzz: https://github.com/google/oss-fuzz
	.. _LLVM project issue list:
	https://bugs.chromium.org/p/oss-fuzz/issues/list?q=Proj-llvm
	.. _LLVM on OSS Fuzz:
	https://github.com/google/oss-fuzz/blob/master/projects/llvm
	.. _llvm-bugs mailing list:
	http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


	Utilities for Writing Fuzzers
	=============================

	There are some utilities available for writing fuzzers in LLVM.

	Some helpers for handling the command line interface are available in
	``include/llvm/FuzzMutate/FuzzerCLI.h``, including functions to parse command
	line options in a consistent way and to implement standalone main functions so
	your fuzzer can be built and tested when not built against libFuzzer.

	There is also some handling of the CMake config for fuzzers, where you should
	use the ``add_llvm_fuzzer`` to set up fuzzer targets. This function works
	similarly to functions such as ``add_llvm_tool``, but they take care of linking
	to LibFuzzer when appropriate and can be passed the ``DUMMY_MAIN`` argument to
	enable standalone testing.