[Libunwind-devel] tls_model("initial-exec") attribute prevents dynamic loading of libunwind via dlopen()

Discussion:

Milian Wolff

2018-04-29 19:55:58 UTC

Hey all.

I just noticed that one cannot dlopen() libunwind when it is build with --
enable-per-thread-cache. In such scenarios, dlopen will always fail with
"cannot allocate memory in static TLS block".

The issue was also encountered by the jemalloc people, see [1] and [2].

[1]: https://github.com/jemalloc/jemalloc/issues/937
[2]: https://github.com/jemalloc/jemalloc/pull/1180

The reason for this in libunwind comes from src/dwarf/Gparser.c:

static __thread struct dwarf_rs_cache tls_cache
__attribute__((tls_model("initial-exec")));

The initially specified tls_model("initial-exec") triggers the issue above.
jemalloc has work-arounded this by adding a configure option to remove this
explicit tsl_model setting (cf. [2] above).

Indeed, just removing the attribute "fixes" libunwind for my use-case in
heaptrack. So the question is now:

- why was the tls_model explicitly set to initial-exec?
- can we just remove it?
- if not, can we add another configure option similar to what jemalloc did?

Note that the other thread local caches, such as the following in the platform
specific Gtrace.c files, also just uses the default tls model:

static __thread unw_trace_cache_t *tls_cache;

So from my side, I propose we just remove the tls_model attribute.

Thanks

--
Milian Wolff
***@milianw.de
http://milianw.de

Paul Pluzhnikov

2018-05-01 20:22:33 UTC

Permalink

Post by Milian Wolff
- why was the tls_model explicitly set to initial-exec?
- can we just remove it?

Note that when using dlopen-ed library with __thread variables, access to
them may result in malloc being called (at least under GLIBC):
https://sourceware.org/bugzilla/show_bug.cgi?id=16133

That is highly problematic when you want to unwind from within malloc
itself.

--
Paul Pluzhnikov

Milian Wolff

2018-05-01 20:37:15 UTC

Permalink

Post by Paul Pluzhnikov

Post by Milian Wolff
- why was the tls_model explicitly set to initial-exec?
- can we just remove it?

Note that when using dlopen-ed library with __thread variables, access to
https://sourceware.org/bugzilla/show_bug.cgi?id=16133
That is highly problematic when you want to unwind from within malloc
itself.

I do that in heaptrack all the time. And I guard against that via a simple
recursion guard, I've never had an issue with that so far.

Bye

--
Milian Wolff
***@milianw.de
http://milianw.de

Bert Wesarg

2018-05-02 11:23:03 UTC

Permalink

Dear Milian,

I tried to remember why we choose the initial-exec model, but could
not found any reasons. By reading
https://www.akkadia.org/drepper/tls.pdf again, I would say
"local-exec" is the right way to go here. Which should be the default
for 'static' variables anyway. Can you please confirm, that this works
for you. Its just important, that there are no calls to
__tls_get_addr() in the asm output.

Thanks.

Bert

Post by Milian Wolff
Hey all.
I just noticed that one cannot dlopen() libunwind when it is build with --
enable-per-thread-cache. In such scenarios, dlopen will always fail with
"cannot allocate memory in static TLS block".
The issue was also encountered by the jemalloc people, see [1] and [2].
[1]: https://github.com/jemalloc/jemalloc/issues/937
[2]: https://github.com/jemalloc/jemalloc/pull/1180
static __thread struct dwarf_rs_cache tls_cache
__attribute__((tls_model("initial-exec")));
The initially specified tls_model("initial-exec") triggers the issue above.
jemalloc has work-arounded this by adding a configure option to remove this
explicit tsl_model setting (cf. [2] above).
Indeed, just removing the attribute "fixes" libunwind for my use-case in
- why was the tls_model explicitly set to initial-exec?
- can we just remove it?
- if not, can we add another configure option similar to what jemalloc did?
Note that the other thread local caches, such as the following in the platform
static __thread unw_trace_cache_t *tls_cache;
So from my side, I propose we just remove the tls_model attribute.
Thanks
--
Milian Wolff
http://milianw.de
_______________________________________________
Libunwind-devel mailing list
https://lists.nongnu.org/mailman/listinfo/libunwind-devel

Bert Wesarg

2018-05-07 20:08:26 UTC

Permalink

Post by Bert Wesarg
Dear Milian,
I tried to remember why we choose the initial-exec model, but could
not found any reasons. By reading
https://www.akkadia.org/drepper/tls.pdf again, I would say
"local-exec" is the right way to go here. Which should be the default
for 'static' variables anyway. Can you please confirm, that this works
for you. Its just important, that there are no calls to
__tls_get_addr() in the asm output.

No, with the default I do see __tls_get_addr in the asm output (i.e. matches
in Gparser.o). So probably, the default isn't OK to be used everywhere.

Sorry, that were two questions:

1) Does "local-exec" works with dlopen for you?

2) Does the default model produces __tls_get_addr in the asm output?

I assume you answered only the second one?

Bert

Thanks

Dave Watson

2018-05-02 14:37:22 UTC

Permalink

As Paul mentioned, it's because __thread and friends aren't
async-signal-safe on glibc and may allocate. I'm fine adding a
configure-time option to change the tls model if someone wants to
provide a patch for it.

Facebook has codepaths that try to avoid the trace cache __thread if
we actually need async-signal-safety:

https://github.com/facebook/folly/blob/cd1bdc912603c0358ba733d379a74ae90ab3a437/folly/experimental/symbolizer/StackTrace.cpp#L31

Milian Wolff

2018-05-03 08:55:05 UTC

Permalink

Post by Dave Watson

But there's already the --enable-per-thread-cache option which disables
__thread. Isn't that enough for your use case?

Post by Dave Watson
Facebook has codepaths that try to avoid the trace cache __thread if
https://github.com/facebook/folly/blob/cd1bdc912603c0358ba733d379a74ae90ab3a
437/folly/experimental/symbolizer/StackTrace.cpp#L31

Cheers

--
Milian Wolff
***@milianw.de
http://milianw.de