| .. role:: raw-html(raw) |
| :format: html |
| |
| Libclang tutorial |
| ================= |
| The C Interface to Clang provides a relatively small API that exposes facilities for parsing source code into an abstract syntax tree (AST), loading already-parsed ASTs, traversing the AST, associating physical source locations with elements within the AST, and other facilities that support Clang-based development tools. |
| This C interface to Clang will never provide all of the information representation stored in Clang's C++ AST, nor should it: the intent is to maintain an API that is :ref:`relatively stable <Stability>` from one release to the next, providing only the basic functionality needed to support development tools. |
| The entire C interface of libclang is available in the file `Index.h`_ |
| |
| Essential types overview |
| ------------------------- |
| |
| All types of libclang are prefixed with ``CX`` |
| |
| CXIndex |
| ~~~~~~~ |
| An Index that consists of a set of translation units that would typically be linked together into an executable or library. |
| |
| CXTranslationUnit |
| ~~~~~~~~~~~~~~~~~ |
| A single translation unit, which resides in an index. |
| |
| CXCursor |
| ~~~~~~~~ |
| A cursor representing a pointer to some element in the abstract syntax tree of a translation unit. |
| |
| |
| Code example |
| """""""""""" |
| |
| .. code-block:: cpp |
| |
| // file.cpp |
| struct foo{ |
| int bar; |
| int* bar_pointer; |
| }; |
| |
| .. code-block:: cpp |
| |
| #include <clang-c/Index.h> |
| #include <iostream> |
| |
| int main(){ |
| CXIndex index = clang_createIndex(0, 0); //Create index |
| CXTranslationUnit unit = clang_parseTranslationUnit( |
| index, |
| "file.cpp", nullptr, 0, |
| nullptr, 0, |
| CXTranslationUnit_None); //Parse "file.cpp" |
| |
| |
| if (unit == nullptr){ |
| std::cerr << "Unable to parse translation unit. Quitting.\n"; |
| return 0; |
| } |
| CXCursor cursor = clang_getTranslationUnitCursor(unit); //Obtain a cursor at the root of the translation unit |
| } |
| |
| Visiting elements of an AST |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| The elements of an AST can be recursively visited with pre-order traversal with ``clang_visitChildren``. |
| |
| .. code-block:: cpp |
| |
| clang_visitChildren( |
| cursor, //Root cursor |
| [](CXCursor current_cursor, CXCursor parent, CXClientData client_data){ |
| |
| CXString current_display_name = clang_getCursorDisplayName(current_cursor); |
| //Allocate a CXString representing the name of the current cursor |
| |
| std::cout << "Visiting element " << clang_getCString(current_display_name) << "\n"; |
| //Print the char* value of current_display_name |
| |
| clang_disposeString(current_display_name); |
| //Since clang_getCursorDisplayName allocates a new CXString, it must be freed. This applies |
| //to all functions returning a CXString |
| |
| return CXChildVisit_Recurse; |
| |
| |
| }, //CXCursorVisitor: a function pointer |
| nullptr //client_data |
| ); |
| |
| The return value of ``CXCursorVisitor``, the callable argument of ``clang_visitChildren``, can return one of the three: |
| |
| #. ``CXChildVisit_Break``: Terminates the cursor traversal |
| |
| #. ``CXChildVisit_Continue``: Continues the cursor traversal with the next sibling of the cursor just visited, without visiting its children. |
| |
| #. ``CXChildVisit_Recurse``: Recursively traverse the children of this cursor, using the same visitor and client data |
| |
| The expected output of that program is |
| |
| .. code-block:: |
| |
| Visiting element foo |
| Visiting element bar |
| Visiting element bar_pointer |
| |
| |
| Extracting information from a Cursor |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| .. The following functions take a ``CXCursor`` as an argument and return associated information. |
| |
| |
| |
| Extracting the Cursor kind |
| """""""""""""""""""""""""" |
| |
| ``CXCursorKind clang_getCursorKind(CXCursor)`` Describes the kind of entity that a cursor refers to. Example values: |
| |
| - ``CXCursor_StructDecl``: A C or C++ struct. |
| - ``CXCursor_FieldDecl``: A field in a struct, union, or C++ class. |
| - ``CXCursor_CallExpr``: An expression that calls a function. |
| |
| |
| Extracting the Cursor type |
| """""""""""""""""""""""""" |
| ``CXType clang_getCursorType(CXCursor)``: Retrieve the type of a CXCursor (if any). |
| |
| A ``CXType`` represents a complete C++ type, including qualifiers and pointers. It has a member field ``CXTypeKind kind`` and additional opaque data. |
| |
| Example values for ``CXTypeKind kind`` |
| |
| - ``CXType_Invalid``: Represents an invalid type (e.g., where no type is available) |
| - ``CXType_Pointer``: A pointer to another type |
| - ``CXType_Int``: Regular ``int`` |
| - ``CXType_Elaborated``: Represents a type that was referred to using an elaborated type keyword e.g. struct S, or via a qualified name, e.g., N::M::type, or both. |
| |
| Any ``CXTypeKind`` can be converted to a ``CXString`` using ``clang_getTypeKindSpelling(CXTypeKind)``. |
| |
| A ``CXType`` holds additional necessary opaque type info, such as: |
| |
| - Which struct was referred to? |
| - What type is the pointer pointing to? |
| - Qualifiers (e.g. ``const``, ``volatile``)? |
| |
| Qualifiers of a ``CXType`` can be queried with: |
| |
| - ``clang_isConstQualifiedType(CXType)`` to check for ``const`` |
| - ``clang_isRestrictQualifiedType(CXType)`` to check for ``restrict`` |
| - ``clang_isVolatileQualifiedType(CXType)`` to check for ``volatile`` |
| |
| Code example |
| """""""""""" |
| .. code-block:: cpp |
| |
| //structs.cpp |
| struct A{ |
| int value; |
| }; |
| struct B{ |
| int value; |
| A struct_value; |
| }; |
| |
| .. code-block:: cpp |
| |
| #include <clang-c/Index.h> |
| #include <iostream> |
| |
| int main(){ |
| CXIndex index = clang_createIndex(0, 0); //Create index |
| CXTranslationUnit unit = clang_parseTranslationUnit( |
| index, |
| "structs.cpp", nullptr, 0, |
| nullptr, 0, |
| CXTranslationUnit_None); //Parse "structs.cpp" |
| |
| if (unit == nullptr){ |
| std::cerr << "Unable to parse translation unit. Quitting.\n"; |
| return 0; |
| } |
| CXCursor cursor = clang_getTranslationUnitCursor(unit); //Obtain a cursor at the root of the translation unit |
| |
| clang_visitChildren( |
| cursor, |
| [](CXCursor current_cursor, CXCursor parent, CXClientData client_data){ |
| CXType cursor_type = clang_getCursorType(current_cursor); |
| |
| CXString type_kind_spelling = clang_getTypeKindSpelling(cursor_type.kind); |
| std::cout << "Type Kind: " << clang_getCString(type_kind_spelling); |
| clang_disposeString(type_kind_spelling); |
| |
| if(cursor_type.kind == CXType_Pointer || // If cursor_type is a pointer |
| cursor_type.kind == CXType_LValueReference || // or an LValue Reference (&) |
| cursor_type.kind == CXType_RValueReference){ // or an RValue Reference (&&), |
| CXType pointed_to_type = clang_getPointeeType(cursor_type);// retrieve the pointed-to type |
| |
| CXString pointed_to_type_spelling = clang_getTypeSpelling(pointed_to_type); // Spell out the entire |
| std::cout << "pointing to type: " << clang_getCString(pointed_to_type_spelling);// pointed-to type |
| clang_disposeString(pointed_to_type_spelling); |
| } |
| else if(cursor_type.kind == CXType_Record){ |
| CXString type_spelling = clang_getTypeSpelling(cursor_type); |
| std::cout << ", namely " << clang_getCString(type_spelling); |
| clang_disposeString(type_spelling); |
| } |
| std::cout << "\n"; |
| return CXChildVisit_Recurse; |
| }, |
| nullptr |
| ); |
| |
| The expected output of program is: |
| |
| .. code-block:: |
| |
| Type Kind: Record, namely A |
| Type Kind: Int |
| Type Kind: Record, namely B |
| Type Kind: Int |
| Type Kind: Record, namely A |
| Type Kind: Record, namely A |
| |
| |
| Reiterating the difference between ``CXType`` and ``CXTypeKind``: For an example |
| |
| .. code-block:: cpp |
| |
| const char* __restrict__ variable; |
| |
| - Type Kind will be: ``CXType_Pointer`` spelled ``"Pointer"`` |
| - Type will be a complex ``CXType`` structure, spelled ``"const char* __restrict__`` |
| |
| Retrieving source locations |
| """"""""""""""""""""""""""" |
| |
| ``CXSourceRange clang_getCursorExtent(CXCursor)`` returns a ``CXSourceRange``, representing a half-open range in the source code. |
| |
| Use ``clang_getRangeStart(CXSourceRange)`` and ``clang_getRangeEnd(CXSourceRange)`` to retrieve the starting and end ``CXSourceLocation`` from a source range, respectively. |
| |
| Given a ``CXSourceLocation``, use ``clang_getExpansionLocation`` to retrieve file, line and column of a source location. |
| |
| Code example |
| """""""""""" |
| .. code-block:: cpp |
| |
| // Again, file.cpp |
| struct foo{ |
| int bar; |
| int* bar_pointer; |
| }; |
| .. code-block:: cpp |
| |
| clang_visitChildren( |
| cursor, |
| [](CXCursor current_cursor, CXCursor parent, CXClientData client_data){ |
| |
| CXType cursor_type = clang_getCursorType(current_cursor); |
| CXString cursor_spelling = clang_getCursorSpelling(current_cursor); |
| CXSourceRange cursor_range = clang_getCursorExtent(current_cursor); |
| std::cout << "Cursor " << clang_getCString(cursor_spelling); |
| |
| CXFile file; |
| unsigned start_line, start_column, start_offset; |
| unsigned end_line, end_column, end_offset; |
| |
| clang_getExpansionLocation(clang_getRangeStart(cursor_range), &file, &start_line, &start_column, &start_offset); |
| clang_getExpansionLocation(clang_getRangeEnd (cursor_range), &file, &end_line , &end_column , &end_offset); |
| std::cout << " spanning lines " << start_line << " to " << end_line; |
| clang_disposeString(cursor_spelling); |
| |
| std::cout << "\n"; |
| return CXChildVisit_Recurse; |
| }, |
| nullptr |
| ); |
| |
| The expected output of this program is: |
| |
| .. code-block:: |
| |
| Cursor foo spanning lines 2 to 5 |
| Cursor bar spanning lines 3 to 3 |
| Cursor bar_pointer spanning lines 4 to 4 |
| |
| Complete example code |
| ~~~~~~~~~~~~~~~~~~~~~ |
| |
| .. code-block:: cpp |
| |
| #include <clang-c/Index.h> |
| #include <iostream> |
| |
| int main(){ |
| CXIndex index = clang_createIndex(0, 0); //Create index |
| CXTranslationUnit unit = clang_parseTranslationUnit( |
| index, |
| "file.cpp", nullptr, 0, |
| nullptr, 0, |
| CXTranslationUnit_None); //Parse "file.cpp" |
| |
| if (unit == nullptr){ |
| std::cerr << "Unable to parse translation unit. Quitting.\n"; |
| return 0; |
| } |
| CXCursor cursor = clang_getTranslationUnitCursor(unit); //Obtain a cursor at the root of the translation unit |
| |
| |
| clang_visitChildren( |
| cursor, |
| [](CXCursor current_cursor, CXCursor parent, CXClientData client_data){ |
| CXType cursor_type = clang_getCursorType(current_cursor); |
| |
| CXString type_kind_spelling = clang_getTypeKindSpelling(cursor_type.kind); |
| std::cout << "TypeKind: " << clang_getCString(type_kind_spelling); |
| clang_disposeString(type_kind_spelling); |
| |
| if(cursor_type.kind == CXType_Pointer || // If cursor_type is a pointer |
| cursor_type.kind == CXType_LValueReference || // or an LValue Reference (&) |
| cursor_type.kind == CXType_RValueReference){ // or an RValue Reference (&&), |
| CXType pointed_to_type = clang_getPointeeType(cursor_type);// retrieve the pointed-to type |
| |
| CXString pointed_to_type_spelling = clang_getTypeSpelling(pointed_to_type); // Spell out the entire |
| std::cout << "pointing to type: " << clang_getCString(pointed_to_type_spelling);// pointed-to type |
| clang_disposeString(pointed_to_type_spelling); |
| } |
| else if(cursor_type.kind == CXType_Record){ |
| CXString type_spelling = clang_getTypeSpelling(cursor_type); |
| std::cout << ", namely " << clang_getCString(type_spelling); |
| clang_disposeString(type_spelling); |
| } |
| std::cout << "\n"; |
| return CXChildVisit_Recurse; |
| }, |
| nullptr |
| ); |
| |
| |
| clang_visitChildren( |
| cursor, |
| [](CXCursor current_cursor, CXCursor parent, CXClientData client_data){ |
| |
| CXType cursor_type = clang_getCursorType(current_cursor); |
| CXString cursor_spelling = clang_getCursorSpelling(current_cursor); |
| CXSourceRange cursor_range = clang_getCursorExtent(current_cursor); |
| std::cout << "Cursor " << clang_getCString(cursor_spelling); |
| |
| CXFile file; |
| unsigned start_line, start_column, start_offset; |
| unsigned end_line, end_column, end_offset; |
| |
| clang_getExpansionLocation(clang_getRangeStart(cursor_range), &file, &start_line, &start_column, &start_offset); |
| clang_getExpansionLocation(clang_getRangeEnd (cursor_range), &file, &end_line , &end_column , &end_offset); |
| std::cout << " spanning lines " << start_line << " to " << end_line; |
| clang_disposeString(cursor_spelling); |
| |
| std::cout << "\n"; |
| return CXChildVisit_Recurse; |
| }, |
| nullptr |
| ); |
| } |
| |
| |
| .. _Index.h: https://github.com/llvm/llvm-project/blob/main/clang/include/clang-c/Index.h |
| |
| .. _Stability: |
| |
| ABI and API Stability |
| --------------------- |
| |
| The C interfaces in libclang are intended to be relatively stable. This allows |
| a programmer to use libclang without having to worry as much about Clang |
| upgrades breaking existing code. However, the library is not unchanging. For |
| example, the library will gain new interfaces over time as needs arise, |
| existing APIs may be deprecated for eventual removal, etc. Also, the underlying |
| implementation of the facilities by Clang may change behavior as bugs are |
| fixed, features get implemented, etc. |
| |
| The library should be ABI and API stable over time, but ABI- and API-breaking |
| changes can happen in the following (non-exhaustive) situations: |
| |
| * Adding new enumerator to an enumeration (can be ABI-breaking in C++). |
| * Removing an explicitly deprecated API after a suitably long deprecation |
| period. |
| * Using implementation details, such as names or comments that say something |
| is "private", "reserved", "internal", etc. |
| * Bug fixes and changes to Clang's internal implementation happen routinely and |
| will change the behavior of callers. |
| * Rarely, bug fixes to libclang itself. |
| |
| The library has version macros (``CINDEX_VERSION_MAJOR``, |
| ``CINDEX_VERSION_MINOR``, and ``CINDEX_VERSION``) which can be used to test for |
| specific library versions at compile time. The ``CINDEX_VERSION_MAJOR`` macro |
| is only incremented if there are major source- or ABI-breaking changes. Except |
| for removing an explicitly deprecated API, the changes listed above are not |
| considered major source- or ABI-breaking changes. Historically, the value this |
| macro expands to has not changed, but may be incremented in the future should |
| the need arise. The ``CINDEX_VERSION_MINOR`` macro is incremented as new APIs |
| are added. The ``CINDEX_VERSION`` macro expands to a value based on the major |
| and minor version macros. |
| |
| In an effort to allow the library to be modified as new needs arise, the |
| following situations are explicitly unsupported: |
| |
| * Loading different library versions into the same executable and passing |
| objects between the libraries; despite general ABI stability, different |
| versions of the library may use different implementation details that are not |
| compatible across library versions. |
| * For the same reason as above, serializing objects from one version of the |
| library and deserializing with a different version is also not supported. |
| |
| Note: because libclang is a wrapper around the compiler frontend, it is not a |
| `security-sensitive component`_ of the LLVM Project. Consider using a sandbox |
| or some other mitigation approach if processing untrusted input. |
| |
| .. _security-sensitive component: https://llvm.org/docs/Security.html#what-is-considered-a-security-issue |