commit	53e92e48d0c03a2475e8517dd4c28968d84fc217	[log] [tgz]
author	Julian Schmidt <git.julian.schmidt@gmail.com>	Fri Nov 15 10:51:15 2024 +0100
committer	GitHub <noreply@github.com>	Fri Nov 15 10:51:15 2024 +0100
tree	2fea498ae66cbd48b43d3c5bae17f3cf113c9991
parent	469f9d5fb8fcfe7dc42baa2daa7e230147f234de [diff]

Reland: [clang][test] add testing for the AST matcher reference (#112168)

## Problem Statement
Previously, the examples in the AST matcher reference, which gets
generated by the Doxygen comments in `ASTMatchers.h`, were untested and
best effort.
Some of the matchers had no or wrong examples of how to use the matcher.

## Solution
This patch introduces a simple DSL around Doxygen commands to enable
testing the AST matcher documentation in a way that should be relatively
easy to use.
In `ASTMatchers.h`, most matchers are documented with a Doxygen comment.
Most of these also have a code example that aims to show what the
matcher will match, given a matcher somewhere in the documentation text.
The way that the documentation is tested, is by using Doxygen's alias
feature to declare custom aliases. These aliases forward to
`<tt>text</tt>` (which is what Doxygen's `\c` does, but for multiple
words). Using the Doxygen aliases is the obvious choice, because there
are (now) four consumers:
 - people reading the header/using signature help
 - the Doxygen generated documentation
 - the generated HTML AST matcher reference
 - (new) the generated matcher tests

This patch rewrites/extends the documentation such that all matchers
have a documented example.
The new `generate_ast_matcher_doc_tests.py` script will warn on any
undocumented matchers (but not on matchers without a Doxygen comment)
and provides diagnostics and statistics about the matchers.

The current statistics emitted by the parser are:

```text
Statistics:
        doxygen_blocks                :   519
        missing_tests                 :    10
        skipped_objc                  :    42
        code_snippets                 :   503
        matches                       :   820
        matchers                      :   580
        tested_matchers               :   574
        none_type_matchers            :     6
```

The tests are generated during building, and the script will only print
something if it found an issue with the specified tests (e.g., missing
tests).

## Description

DSL for generating the tests from documentation.

TLDR:
```
  \header{a.h}
  \endheader     <- zero or more header

  \code
    int a = 42;
  \endcode
  \compile_args{-std=c++,c23-or-later} <- optional, the std flag supports std ranges and
                                          whole languages

  \matcher{expr()} <- one or more matchers in succession
  \match{42}   <- one or more matches in succession

  \matcher{varDecl()} <- new matcher resets the context, the above
                         \match will not count for this new
                         matcher(-group)
  \match{int a  = 42} <- only applies to the previous matcher (not to the
                         previous case)
```

The above block can be repeated inside a Doxygen command for multiple
code examples for a single matcher.
The test generation script will only look for these annotations and
ignore anything else like `\c` or the sentences where these annotations
are embedded into: `The matcher \matcher{expr()} matches the number
\match{42}.`.

### Language Grammar
  [] denotes an optional, and <> denotes user-input

```
  compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>}
  matcher_tag_key ::= type
  match_tag_key ::= type || std || count || sub
  matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value>
  match_tags ::= [match_tag_key=<value>;]match_tag_key=<value>
  matcher ::= \matcher{[matcher_tags$]<matcher>}
  matchers ::= [matcher] matcher
  match ::= \match{[match_tags$]<match>}
  matches ::= [match] match
  case ::= matchers matches
  cases ::= [case] case
  header-block ::= \header{<name>} <code> \endheader
  code-block ::= \code <code> \endcode
  testcase ::= code-block [compile_args] cases
```

### Language Standard Versions

The 'std' tag and '\compile_args' support specifying a specific language
version, a whole language and all of its versions, and thresholds
(implies ranges). Multiple arguments are passed with a ',' separator.
For a language and version to execute a tested matcher, it has to match
the specified '\compile_args' for the code, and the 'std' tag for the
matcher. Predicates for the 'std' compiler flag are used with
disjunction between languages (e.g. 'c || c++') and conjunction for all
predicates specific to each language (e.g. 'c++11-or-later &&
c++23-or-earlier').

Examples:
 - `c`                                    all available versions of C
 - `c++11`                                only C++11
 - `c++11-or-later`                       C++11 or later
 - `c++11-or-earlier`                     C++11 or earlier
- `c++11-or-later,c++23-or-earlier,c` all of C and C++ between 11 and
                                          23 (inclusive)
 - `c++11-23,c`                             same as above

### Tags

#### `type`:
**Match types** are used to select where the string that is used to
check if a node matches comes from.
Available: `code`, `name`, `typestr`, `typeofstr`. The default is
`code`.

- `code`: Forwards to `tooling::fixit::getText(...)` and should be the
preferred way to show what matches.
- `name`: Casts the match to a `NamedDecl` and returns the result of
`getNameAsString`. Useful when the matched AST node is not easy to spell
out (`code` type), e.g., namespaces or classes with many members.
- `typestr`: Returns the result of `QualType::getAsString` for the type
derived from `Type` (otherwise, if it is derived from `Decl`, recurses
with `Node->getTypeForDecl()`)

**Matcher types** are used to mark matchers as sub-matcher with 'sub' or
as deactivated using 'none'. Testing sub-matcher is not implemented.

#### `count`:
Specifying a 'count=n' on a match will result in a test that requires
that the specified match will be matched n times. Default is 1.

#### `std`:
A match allows specifying if it matches only in specific language
versions. This may be needed when the AST differs between language
versions.

#### `sub`:
The `sub` tag on a `\match` will indicate that the match is for a node
of a bound sub-matcher.
E.g., `\matcher{expr(expr().bind("inner"))}` has a sub-matcher that
binds to `inner`, which is the value for the `sub` tag of the expected
match for the sub-matcher `\match{sub=inner$...}`. Currently,
sub-matchers are not tested in any way.

### What if ...?

#### ... I want to add a matcher?

Add a Doxygen comment to the matcher with a code example, corresponding
matchers and matches, that shows what the matcher is supposed to do.
Specify the compile arguments/supported languages if required, and run
`ninja check-clang-unit` to test the documentation.

#### ... the example I wrote is wrong?

The test-failure output of the generated test file will provide
information about
 - where the generated test file is located
 - which line in `ASTMatcher.h` the example is from
 - which matches were: found, not-(yet)-found, expected
- in case of an unexpected match: what the node looks like using the
different `type`s
- the language version and if the test ran with a windows `-target` flag
(also in failure summary)

#### ... I don't adhere to the required order of the syntax?

The script will diagnose any found issues, such as `matcher is missing
an example` with a `file:line:` prefix,
which should provide enough information about the issue.

#### ... the script diagnoses a false-positive issue with a Doxygen
comment?

It hopefully shouldn't, but if you, e.g., added some non-matcher code
and documented it with Doxygen, then the script will consider that as a
matcher documentation. As a result, the script will print that it
detected a mismatch between the actual and the expected number of
failures. If the diagnostic truly is a false-positive, change the
`expected_failure_statistics` at the top of the
`generate_ast_matcher_doc_tests.py` file.

Fixes #57607
Fixes #63748

8 files changed

tree: 2fea498ae66cbd48b43d3c5bae17f3cf113c9991

README.md

The LLVM Compiler Infrastructure

Welcome to the LLVM project!

This repository contains the source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.

The LLVM project has multiple components. The core of the project is itself called “LLVM”. This contains all of the tools, libraries, and header files needed to process intermediate representations and convert them into object files. Tools include an assembler, disassembler, bitcode analyzer, and bitcode optimizer.

C-like languages use the Clang frontend. This component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode -- and from there into object files, using LLVM.

Other components include: the libc++ C++ standard library, the LLD linker, and more.

Getting the Source Code and Building LLVM

Consult the Getting Started with LLVM page for information on building and running LLVM.

For information on how to contribute to the LLVM project, please take a look at the Contributing to LLVM guide.

Getting in touch

Join the LLVM Discourse forums, Discord chat, LLVM Office Hours or Regular sync-ups.

The LLVM project has adopted a code of conduct for participants to all modes of communication within the project.