Discussion:
[Libunwind-devel] data races in libunwind
Milian Wolff
2018-04-24 11:33:28 UTC
Permalink
Hey all,

trying to figure out a strange bug in heaptrack, I decided to use TSAN to
validate the thread safety. Turns out that there are some issues in libunwind
itself:

a) last_good_addr / lga_victim in x86_64/Ginit.c validate_mem

This cache isn't thread safe but accessed from multiple threads in parallel.
The reads and writes should probably be made atomic?

WARNING: ThreadSanitizer: data race (pid=17824)
Read of size 8 at 0x7f4dac484140 by thread T3:
#0 validate_mem ../../src/x86_64/Ginit.c:201 (libunwind.so.
8+0x00000001f836)
#1 access_mem ../../src/x86_64/Ginit.c:242 (libunwind.so.8+0x00000001fe06)
#2 dwarf_get ../../include/tdep-x86_64/libunwind_i.h:168 (libunwind.so.
8+0x0000000221cf)
#3 _ULx86_64_access_reg ../../src/x86_64/Gregs.c:130 (libunwind.so.
8+0x000000022e99)
#4 _ULx86_64_get_reg ../../src/mi/Gget_reg.c:40 (libunwind.so.
8+0x00000001e5bc)
#5 apply_reg_state ../../src/dwarf/Gparser.c:796 (libunwind.so.
8+0x000000038209)
#6 _ULx86_64_dwarf_step ../../src/dwarf/Gparser.c:961 (libunwind.so.
8+0x000000039dbc)
#7 _ULx86_64_step ../../src/x86_64/Gstep.c:71 (libunwind.so.
8+0x00000002481f)
#8 trace_init_addr ../../src/x86_64/Gtrace.c:248 (libunwind.so.
8+0x000000026f0e)
#9 trace_lookup ../../src/x86_64/Gtrace.c:330 (libunwind.so.
8+0x000000027429)
#10 _ULx86_64_tdep_trace ../../src/x86_64/Gtrace.c:447 (libunwind.so.
8+0x00000002789a)
#11 unw_backtrace ../../src/mi/backtrace.c:69 (libunwind.so.
8+0x00000001cb07)

Previous write of size 8 at 0x7f4dac484140 by thread T2:
#0 validate_mem ../../src/x86_64/Ginit.c:220 (libunwind.so.
8+0x00000001fc54)
#1 access_mem ../../src/x86_64/Ginit.c:242 (libunwind.so.8+0x00000001fe06)
#2 dwarf_get ../../include/tdep-x86_64/libunwind_i.h:168 (libunwind.so.
8+0x0000000221cf)
#3 _ULx86_64_access_reg ../../src/x86_64/Gregs.c:130 (libunwind.so.
8+0x000000022e99)
#4 _ULx86_64_get_reg ../../src/mi/Gget_reg.c:40 (libunwind.so.
8+0x00000001e5bc)
#5 apply_reg_state ../../src/dwarf/Gparser.c:796 (libunwind.so.
8+0x000000038209)
#6 _ULx86_64_dwarf_step ../../src/dwarf/Gparser.c:961 (libunwind.so.
8+0x000000039dbc)
#7 _ULx86_64_step ../../src/x86_64/Gstep.c:71 (libunwind.so.
8+0x00000002481f)
#8 trace_init_addr ../../src/x86_64/Gtrace.c:248 (libunwind.so.
8+0x000000026f0e)
#9 trace_lookup ../../src/x86_64/Gtrace.c:330 (libunwind.so.
8+0x000000027429)
#10 _ULx86_64_tdep_trace ../../src/x86_64/Gtrace.c:447 (libunwind.so.
8+0x00000002789a)
#11 unw_backtrace ../../src/mi/backtrace.c:69 (libunwind.so.
8+0x00000001cb07)

Location is global 'last_good_addr' of size 32 at 0x7f4dac484140
(libunwind.so.8+0x000000273140)

b) cache hints in dwarf/Gparser.c find_reg_state:

This can be reproduced when trying to use libunwind with UNW_CACHE_GLOBAL from
multiple threads, e.g. when libunwind wasn't build with --enable-per-thread-
cache. Here, I believe the issue comes from the put_rs_cache in
find_reg_state, which needs to be moved after the last usage of cache, i.e.
below the `if (rs)` branch? Can someone confirm this? It seems to fix the
error for me...

WARNING: ThreadSanitizer: data race (pid=16551)
Read of size 2 at 0x7f7dda69ed7a by thread T2:
#0 find_reg_state ../../src/dwarf/Gparser.c:939 (libunwind.so.
8+0x000000037254)
#1 _ULx86_64_dwarf_step ../../src/dwarf/Gparser.c:958 (libunwind.so.
8+0x000000037543)
#2 _ULx86_64_step ../../src/x86_64/Gstep.c:71 (libunwind.so.
8+0x000000021fdd)
#3 trace_init_addr ../../src/x86_64/Gtrace.c:248 (libunwind.so.
8+0x0000000246cc)
#4 trace_lookup ../../src/x86_64/Gtrace.c:330 (libunwind.so.
8+0x000000024be7)
#5 _ULx86_64_tdep_trace ../../src/x86_64/Gtrace.c:447 (libunwind.so.
8+0x000000025058)
#6 unw_backtrace ../../src/mi/backtrace.c:69 (libunwind.so.
8+0x00000001a6e7)

Previous write of size 2 at 0x7f7dda69ed7a by thread T3:
#0 find_reg_state ../../src/dwarf/Gparser.c:940 (libunwind.so.
8+0x000000037383)
#1 _ULx86_64_dwarf_step ../../src/dwarf/Gparser.c:958 (libunwind.so.
8+0x000000037543)
#2 _ULx86_64_step ../../src/x86_64/Gstep.c:71 (libunwind.so.
8+0x000000021fdd)
#3 trace_init_addr ../../src/x86_64/Gtrace.c:248 (libunwind.so.
8+0x0000000246cc)
#4 trace_lookup ../../src/x86_64/Gtrace.c:330 (libunwind.so.
8+0x000000024be7)
#5 _ULx86_64_tdep_trace ../../src/x86_64/Gtrace.c:447 (libunwind.so.
8+0x000000025058)
#6 unw_backtrace ../../src/mi/backtrace.c:69 (libunwind.so.
8+0x00000001a6e7)

Location is global 'local_addr_space' of size 26296 at 0x7f7dda698c60
(libunwind.so.8+0x000000268d7a)

c) Not a race, but with valgrind's memcheck I also stumbled upon x86_64's
Ginit.c open_pipe:

/* ignore errors for closing invalid fd's */
close (mem_validate_pipe[0]);
close (mem_validate_pipe[1]);

On the first call, both fds will be -1 and valgrind will complain. Adding the
check is easy, can we add that?
--
Milian Wolff
***@milianw.de
http://milianw.de
Milian Wolff
2018-04-24 14:38:37 UTC
Permalink
Post by Milian Wolff
Hey all,
trying to figure out a strange bug in heaptrack, I decided to use TSAN to
validate the thread safety. Turns out that there are some issues in
I've now created https://github.com/libunwind/libunwind/pull/76 with my
attempts to fix these issues. Please give me feedback there.

Also, it may be that other platforms are equally affected by these races. I
didn't check that yet.

Thanks
--
Milian Wolff
***@milianw.de
http://milianw.de
Dave Watson
2018-04-24 17:10:20 UTC
Permalink
Post by Milian Wolff
Hey all,
trying to figure out a strange bug in heaptrack, I decided to use TSAN to
validate the thread safety. Turns out that there are some issues in libunwind
a) last_good_addr / lga_victim in x86_64/Ginit.c validate_mem
This cache isn't thread safe but accessed from multiple threads in parallel.
The reads and writes should probably be made atomic?
Yea I guess these should technically be relaxed atomic ops, although
word writes are atomic in hardware on x86_64.
Post by Milian Wolff
This can be reproduced when trying to use libunwind with UNW_CACHE_GLOBAL from
multiple threads, e.g. when libunwind wasn't build with --enable-per-thread-
cache. Here, I believe the issue comes from the put_rs_cache in
find_reg_state, which needs to be moved after the last usage of cache, i.e.
below the `if (rs)` branch? Can someone confirm this? It seems to fix the
error for me...
Indeed, looks like a real issue to me also.
Post by Milian Wolff
c) Not a race, but with valgrind's memcheck I also stumbled upon x86_64's
/* ignore errors for closing invalid fd's */
close (mem_validate_pipe[0]);
close (mem_validate_pipe[1]);
On the first call, both fds will be -1 and valgrind will complain. Adding the
check is easy, can we add that?
Sure.

Thanks for the pull request, will add comments there.

Loading...