[Libunwind-devel] Mixed stack unwinding bug on ppc64le

Stephen Roberts

7 years ago

Hi libunwind developers,

I'd like some advice about fixing a bug on ppc64 which prevents libunwind from unwinding through mixed C/C++ stacks. Unwinding fails on this platform whenever libunwind tries to unwind from a callee stack frame without Call Frame Information (CFI) into caller frames with CFI information. This is a common scenario on ppc64 because gcc does not produce CFI by default on this platform while g++ still produces CFI to support exception handling. Any C++ application which calls in to C library code compiled with the default flags can trigger this issue.

This bug comes from the unw_step function in ppc64/Gstep.c. This function first tries to unwind using CFI via dwarf_step and if this fails with -UNW_ENOINFO it falls back to following the backchain and clobbers all registers in the process. Any dwarf_step failure other than -UNW_ENOINFO causes unw_step to bomb out immediately.

The mixed frame bug happens because when libunwind unwinds through the non-CFI frame it will fall back to the backchain and clobber all registers. Then, when it encounters a frame with CFI it will try to use this information to unwind further, which promptly fails because all of the registers have been clobbered. The return value of dwarf_step in this case is not -UNW_ENOINFO because information was found, so unwinding is abandoned. This bug doesn't impact other platforms because they make some attempt to guess the correct values of certain registers after they are clobbered, which in most (but not all) cases will be sufficient to allow CFI dwarf unwinding to resume.

I have two candidate solutions, both of which solve the issue but make different compromises. My first solution is to include an "ignore_cfi" flag inside the struct cursor from tdep-ppc64/libunwind_i.h.
This flag is initially false but set to true whenever dwarf_step fails with -UNW_ENOINFO. It has the effect of skipping the dwarf_step logic entirely, causing libunwind to stick to using back chain unwinding after it encounters its first frame without CFI. This is a robust and simple solution, but I'm not sure if this is the libunwind way to approach the issue.

My second solution is a little more invasive, but has the potential to provide more information. The dwarf_step function uses dwarf_apply_registers internally to 1) compute the new Canonical Frame Address (CFA) and 2) update the register values. My fix effectively breaks the logic in dwarf_apply_registers out into two helper functions, something like "dwarf_advance_cfa" and "dwarf_update_registers".

The dwarf_step function still works exactly as before, but we can now use "dwarf_update_registers" to update registers for a user-specified CFA. Given we have a valid CFA from backchain unwinding, we call "dwarf_update_registers" with our backchain-derived CFA to leave us with some register values even when we've unwound from a non-CFA frame. The next time we unwind then we will *probably* have the registers we need to continue to unwind successfully using CFI. The caveat with this approach is it is possible for compilers to save volatile registers in non-volatile registers rather than on the stack.
In this case, dwarf_update_registers would fail to retrieve the register-saved value and would fail. This is rare in practice, and is still much better than never being able to unwind. If we wanted we could be extra-safe and use both solutions, with the ignore_cfi flag being set only when dwarf_update_registers failed.

My first question is this: which approach is more in-keeping with the libunwind philosophy? Would either (or both) of them be accepted upstream? I've tested both approaches and found they both solve my issues on ppc64le, so either way I'm happy.

A second question involves writing test cases. I have a test case that I run outside the libunwind infrastructure, but it's proving difficult to make it run within libunwind's own testing infrastructure. This bug only manifests when a frame has no CFI, however the libunwind make system turns on -fexceptions everywhere so this can't be tested easily. I know there are test-specific CFLAGS, however these come first in the compile command so they are clobbered by the global options. I think it makes sense to be able to vary the exception/unwind compiler flags as part of testing as these control which libunwind code paths are executed. Did I miss something? Is there some other mechanism to set these flags for test cases?

Many thanks in advance,
Stephen Roberts