| ======================== |
| Debugging C++ Coroutines |
| ======================== |
| |
| .. contents:: |
| :local: |
| |
| Introduction |
| ============ |
| |
| For performance and other architectural reasons, the C++ Coroutines feature in |
| the Clang compiler is implemented in two parts of the compiler. Semantic |
| analysis is performed in Clang, and Coroutine construction and optimization |
| takes place in the LLVM middle-end. |
| |
| However, this design forces us to generate insufficient debugging information. |
| Typically, the compiler generates debug information in the Clang frontend, as |
| debug information is highly language specific. However, this is not possible |
| for Coroutine frames because the frames are constructed in the LLVM middle-end. |
| |
| To mitigate this problem, the LLVM middle end attempts to generate some debug |
| information, which is unfortunately incomplete, since much of the language |
| specific information is missing in the middle end. |
| |
| This document describes how to use this debug information to better debug |
| coroutines. |
| |
| Terminology |
| =========== |
| |
| Due to the recent nature of C++20 Coroutines, the terminology used to describe |
| the concepts of Coroutines is not settled. This section defines a common, |
| understandable terminology to be used consistently throughout this document. |
| |
| coroutine type |
| -------------- |
| |
| A `coroutine function` is any function that contains any of the Coroutine |
| Keywords `co_await`, `co_yield`, or `co_return`. A `coroutine type` is a |
| possible return type of one of these `coroutine functions`. `Task` and |
| `Generator` are commonly referred to coroutine types. |
| |
| coroutine |
| --------- |
| |
| By technical definition, a `coroutine` is a suspendable function. However, |
| programmers typically use `coroutine` to refer to an individual instance. |
| For example: |
| |
| .. code-block:: c++ |
| |
| std::vector<Task> Coros; // Task is a coroutine type. |
| for (int i = 0; i < 3; i++) |
| Coros.push_back(CoroTask()); // CoroTask is a coroutine function, which |
| // would return a coroutine type 'Task'. |
| |
| In practice, we typically say "`Coros` contains 3 coroutines" in the above |
| example, though this is not strictly correct. More technically, this should |
| say "`Coros` contains 3 coroutine instances" or "Coros contains 3 coroutine |
| objects." |
| |
| In this document, we follow the common practice of using `coroutine` to refer |
| to an individual `coroutine instance`, since the terms `coroutine instance` and |
| `coroutine object` aren't sufficiently defined in this case. |
| |
| coroutine frame |
| --------------- |
| |
| The C++ Standard uses `coroutine state` to describe the allocated storage. In |
| the compiler, we use `coroutine frame` to describe the generated data structure |
| that contains the necessary information. |
| |
| The structure of coroutine frames |
| ================================= |
| |
| The structure of coroutine frames is defined as: |
| |
| .. code-block:: c++ |
| |
| struct { |
| void (*__r)(); // function pointer to the `resume` function |
| void (*__d)(); // function pointer to the `destroy` function |
| promise_type; // the corresponding `promise_type` |
| ... // Any other needed information |
| } |
| |
| In the debugger, the function's name is obtainable from the address of the |
| function. And the name of `resume` function is equal to the name of the |
| coroutine function. So the name of the coroutine is obtainable once the |
| address of the coroutine is known. |
| |
| Print promise_type |
| ================== |
| |
| Every coroutine has a `promise_type`, which defines the behavior |
| for the corresponding coroutine. In other words, if two coroutines have the |
| same `promise_type`, they should behave in the same way. |
| To print a `promise_type` in a debugger when stopped at a breakpoint inside a |
| coroutine, printing the `promise_type` can be done by: |
| |
| .. parsed-literal:: |
| |
| print __promise |
| |
| It is also possible to print the `promise_type` of a coroutine from the address |
| of the coroutine frame. For example, if the address of a coroutine frame is |
| 0x416eb0, and the type of the `promise_type` is `task::promise_type`, printing |
| the `promise_type` can be done by: |
| |
| .. parsed-literal:: |
| |
| print (task::promise_type)*(0x416eb0+0x10) |
| |
| This is possible because the `promise_type` is guaranteed by the ABI to be at a |
| 16 bit offset from the coroutine frame. |
| |
| Note that there is also an ABI independent method: |
| |
| .. parsed-literal:: |
| |
| print std::coroutine_handle<task::promise_type>::from_address((void*)0x416eb0).promise() |
| |
| The functions `from_address(void*)` and `promise()` are often small enough to |
| be removed during optimization, so this method may not be possible. |
| |
| Print coroutine frames |
| ====================== |
| |
| LLVM generates the debug information for the coroutine frame in the LLVM middle |
| end, which permits printing of the coroutine frame in the debugger. Much like |
| the `promise_type`, when stopped at a breakpoint inside a coroutine we can |
| print the coroutine frame by: |
| |
| .. parsed-literal:: |
| |
| print __coro_frame |
| |
| |
| Just as printing the `promise_type` is possible from the coroutine address, |
| printing the details of the coroutine frame from an address is also possible: |
| |
| :: |
| |
| (gdb) # Get the address of coroutine frame |
| (gdb) print/x *0x418eb0 |
| $1 = 0x4019e0 |
| (gdb) # Get the linkage name for the coroutine |
| (gdb) x 0x4019e0 |
| 0x4019e0 <_ZL9coro_taski>: 0xe5894855 |
| (gdb) # The coroutine frame type is 'linkage_name.coro_frame_ty' |
| (gdb) print (_ZL9coro_taski.coro_frame_ty)*(0x418eb0) |
| $2 = {__resume_fn = 0x4019e0 <coro_task(int)>, __destroy_fn = 0x402000 <coro_task(int)>, __promise = {...}, ...} |
| |
| The above is possible because: |
| |
| (1) The name of the debug type of the coroutine frame is the `linkage_name`, |
| plus the `.coro_frame_ty` suffix because each coroutine function shares the |
| same coroutine type. |
| |
| (2) The coroutine function name is accessible from the address of the coroutine |
| frame. |
| |
| The above commands can be simplified by placing them in debug scripts. |
| |
| Examples to print coroutine frames |
| ---------------------------------- |
| |
| The print examples below use the following definition: |
| |
| .. code-block:: c++ |
| |
| #include <coroutine> |
| #include <iostream> |
| |
| struct task{ |
| struct promise_type { |
| task get_return_object() { return std::coroutine_handle<promise_type>::from_promise(*this); } |
| std::suspend_always initial_suspend() { return {}; } |
| std::suspend_always final_suspend() noexcept { return {}; } |
| void return_void() noexcept {} |
| void unhandled_exception() noexcept {} |
| |
| int count = 0; |
| }; |
| |
| void resume() noexcept { |
| handle.resume(); |
| } |
| |
| task(std::coroutine_handle<promise_type> hdl) : handle(hdl) {} |
| ~task() { |
| if (handle) |
| handle.destroy(); |
| } |
| |
| std::coroutine_handle<> handle; |
| }; |
| |
| class await_counter : public std::suspend_always { |
| public: |
| template<class PromiseType> |
| void await_suspend(std::coroutine_handle<PromiseType> handle) noexcept { |
| handle.promise().count++; |
| } |
| }; |
| |
| static task coro_task(int v) { |
| int a = v; |
| co_await await_counter{}; |
| a++; |
| std::cout << a << "\n"; |
| a++; |
| std::cout << a << "\n"; |
| a++; |
| std::cout << a << "\n"; |
| co_await await_counter{}; |
| a++; |
| std::cout << a << "\n"; |
| a++; |
| std::cout << a << "\n"; |
| } |
| |
| int main() { |
| task t = coro_task(43); |
| t.resume(); |
| t.resume(); |
| t.resume(); |
| return 0; |
| } |
| |
| In debug mode (`O0` + `g`), the printing result would be: |
| |
| .. parsed-literal:: |
| |
| {__resume_fn = 0x4019e0 <coro_task(int)>, __destroy_fn = 0x402000 <coro_task(int)>, __promise = {count = 1}, v = 43, a = 45, __coro_index = 1 '\001', struct_std__suspend_always_0 = {__int_8 = 0 '\000'}, |
| class_await_counter_1 = {__int_8 = 0 '\000'}, class_await_counter_2 = {__int_8 = 0 '\000'}, struct_std__suspend_always_3 = {__int_8 = 0 '\000'}} |
| |
| In the above, the values of `v` and `a` are clearly expressed, as are the |
| temporary values for `await_counter` (`class_await_counter_1` and |
| `class_await_counter_2`) and `std::suspend_always` ( |
| `struct_std__suspend_always_0` and `struct_std__suspend_always_3`). The index |
| of the current suspension point of the coroutine is emitted as `__coro_index`. |
| In the above example, the `__coro_index` value of `1` means the coroutine |
| stopped at the second suspend point (Note that `__coro_index` is zero indexed) |
| which is the first `co_await await_counter{};` in `coro_task`. Note that the |
| first initial suspend point is the compiler generated |
| `co_await promise_type::initial_suspend()`. |
| |
| However, when optimizations are enabled, the printed result changes drastically: |
| |
| .. parsed-literal:: |
| |
| {__resume_fn = 0x401280 <coro_task(int)>, __destroy_fn = 0x401390 <coro_task(int)>, __promise = {count = 1}, __int_32_0 = 43, __coro_index = 1 '\001'} |
| |
| Unused values are optimized out, as well as the name of the local variable `a`. |
| The only information remained is the value of a 32 bit integer. In this simple |
| case, it seems to be pretty clear that `__int_32_0` represents `a`. However, it |
| is not true. |
| |
| An important note with optimization is that the value of a variable may not |
| properly express the intended value in the source code. For example: |
| |
| .. code-block:: c++ |
| |
| static task coro_task(int v) { |
| int a = v; |
| co_await await_counter{}; |
| a++; // __int_32_0 is 43 here |
| std::cout << a << "\n"; |
| a++; // __int_32_0 is still 43 here |
| std::cout << a << "\n"; |
| a++; // __int_32_0 is still 43 here! |
| std::cout << a << "\n"; |
| co_await await_counter{}; |
| a++; // __int_32_0 is still 43 here!! |
| std::cout << a << "\n"; |
| a++; // Why is __int_32_0 still 43 here? |
| std::cout << a << "\n"; |
| } |
| |
| When debugging step-by-step, the value of `__int_32_0` seemingly does not |
| change, despite being frequently incremented, and instead is always `43`. |
| While this might be surprising, this is a result of the optimizer recognizing |
| that it can eliminate most of the load/store operations. The above code gets |
| optimized to the equivalent of: |
| |
| .. code-block:: c++ |
| |
| static task coro_task(int v) { |
| store v to __int_32_0 in the frame |
| co_await await_counter{}; |
| a = load __int_32_0 |
| std::cout << a+1 << "\n"; |
| std::cout << a+2 << "\n"; |
| std::cout << a+3 << "\n"; |
| co_await await_counter{}; |
| a = load __int_32_0 |
| std::cout << a+4 << "\n"; |
| std::cout << a+5 << "\n"; |
| } |
| |
| It should now be obvious why the value of `__int_32_0` remains unchanged |
| throughout the function. It is important to recognize that `__int_32_0` |
| does not directly correspond to `a`, but is instead a variable generated |
| to assist the compiler in code generation. The variables in an optimized |
| coroutine frame should not be thought of as directly representing the |
| variables in the C++ source. |
| |
| Get the suspended points |
| ======================== |
| |
| An important requirement for debugging coroutines is to understand suspended |
| points, which are where the coroutine is currently suspended and awaiting. |
| |
| For simple cases like the above, inspecting the value of the `__coro_index` |
| variable in the coroutine frame works well. |
| |
| However, it is not quite so simple in really complex situations. In these |
| cases, it is necessary to use the coroutine libraries to insert the |
| line-number. |
| |
| For example: |
| |
| .. code-block:: c++ |
| |
| // For all the promise_type we want: |
| class promise_type { |
| ... |
| + unsigned line_number = 0xffffffff; |
| }; |
| |
| #include <source_location> |
| |
| // For all the awaiter types we need: |
| class awaiter { |
| ... |
| template <typename Promise> |
| void await_suspend(std::coroutine_handle<Promise> handle, |
| std::source_location sl = std::source_location::current()) { |
| ... |
| handle.promise().line_number = sl.line(); |
| } |
| }; |
| |
| In this case, we use `std::source_location` to store the line number of the |
| await inside the `promise_type`. Since we can locate the coroutine function |
| from the address of the coroutine, we can identify suspended points this way |
| as well. |
| |
| The downside here is that this comes at the price of additional runtime cost. |
| This is consistent with the C++ philosophy of "Pay for what you use". |
| |
| Get the asynchronous stack |
| ========================== |
| |
| Another important requirement to debug a coroutine is to print the asynchronous |
| stack to identify the asynchronous caller of the coroutine. As many |
| implementations of coroutine types store `std::coroutine_handle<> continuation` |
| in the promise type, identifying the caller should be trivial. The |
| `continuation` is typically the awaiting coroutine for the current coroutine. |
| That is, the asynchronous parent. |
| |
| Since the `promise_type` is obtainable from the address of a coroutine and |
| contains the corresponding continuation (which itself is a coroutine with a |
| `promise_type`), it should be trivial to print the entire asynchronous stack. |
| |
| This logic should be quite easily captured in a debugger script. |
| |
| Get the living coroutines |
| ========================= |
| |
| Another useful task when debugging coroutines is to enumerate the list of |
| living coroutines, which is often done with threads. While technically |
| possible, this task is not recommended in production code as it is costly at |
| runtime. One such solution is to store the list of currently running coroutines |
| in a collection: |
| |
| .. code-block:: c++ |
| |
| inline std::unordered_set<void*> lived_coroutines; |
| // For all promise_type we want to record |
| class promise_type { |
| public: |
| promise_type() { |
| // Note to avoid data races |
| lived_coroutines.insert(std::coroutine_handle<promise_type>::from_promise(*this).address()); |
| } |
| ~promise_type() { |
| // Note to avoid data races |
| lived_coroutines.erase(std::coroutine_handle<promise_type>::from_promise(*this).address()); |
| } |
| }; |
| |
| In the above code snippet, we save the address of every lived coroutine in the |
| `lived_coroutines` `unordered_set`. As before, once we know the address of the |
| coroutine we can derive the function, `promise_type`, and other members of the |
| frame. Thus, we could print the list of lived coroutines from that collection. |
| |
| Please note that the above is expensive from a storage perspective, and requires |
| some level of locking (not pictured) on the collection to prevent data races. |