Stream: helpdesk (published)

Topic: Are field loads/stores atomic?


view this post on Zulip Kenta Sato (Aug 05 2021 at 11:19):

I'm curious to know whether field accesses(i.e., Core.getfield and Core.setfield) are atomic by default. For example, can we call the following two accessor functions (getx, setzero!) concurrently from multiple threads without a data race? If not, does Threads.Atomic solve the problem?

mutable struct Foo
    x::Int
end
getx(foo) = foo.x
setzero!(foo) = foo.x = 0

view this post on Zulip Chris Foster (Aug 05 2021 at 12:51):

IIUC there's no memory ordering or atomicity implied for setfield! or getfield so you'll need Threads.Atomic to avoid a data race.

In julia-1.7 there's been a lot of work done on atomic operations and you'll be able to to use the new atomic macro to annotate @atomic x::Intand the use sites of x. See https://github.com/JuliaLang/julia/pull/37847 and https://hackmd.io/s/SyFljvtdO

view this post on Zulip Kenta Sato (Aug 05 2021 at 13:12):

Thank you. I'm interested only in atomicity, not in memory ordering. I'd like to add a bit of my context: When I read the code of lock(::ReentrantLock), I found it accesses the locked_by field before acquiring any lock. So, I thought Julia would implicitly assume atomicity of field access. I haven't read though that Manifesto (I found it yesterday!). Perhaps adding the @atomic macro before each field and access is a safer way.

view this post on Zulip Kenta Sato (Aug 05 2021 at 14:24):

Hmm, on all platforms supported by Julia, relaxed loads and stores seem to be mapped to vanilla loads and stores, respectively. https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

view this post on Zulip chriselrod (Aug 05 2021 at 14:39):

Loads and stores of small single objects like Int are atomic, in that they're a single indivisible operation.

Special "atomic" instructions are needed when what we'd like to do atomically normally would not be atomic, like atomic_add! which loads, adds, and stores as a single indivisible operation.
Or for memory ordering.

view this post on Zulip Kenta Sato (Aug 05 2021 at 15:02):

Right, but the specs of C++ define concurrent loads and stores are a data race and it leads to undefined behavior. I think this is the case even if the manipulated object is an integer.

view this post on Zulip Takafumi Arakaki (tkf) (Aug 05 2021 at 15:52):

That's correct. For some specific architectures and objects of some specific sizes, loads and stores may be indivisible but that's not what the C++ memory model (on which Julia's memory model is defined IIUC) formally defines. We are programming against the abstract machine, not concrete architecture. Otherwise, it's impossible to have an optimizing compiler and portable program.

(Of course, performance tuning would benefit hugely from understanding concrete machine mechanisms. However, for writing correct code, I don't think we should be relying on the behavior of certain architecture as much as possible.)

If you need indivisibility and are sure that no memory ordering is required, use the :monotonic (aka relaxed) ordering. In x86, it's likely compiled down to the normal load/store for small objects.

view this post on Zulip Kenta Sato (Aug 05 2021 at 16:13):

Thank you, all. So, my takeaway here is that field loads and stores are practically atomic for some objects on all supported platforms. However, this is not a solid specification and therefore Julia devs are now working on consolidating its memory model as Julia Atomics Manifesto, which will be introduced in Julia 1.7 (or 1.8?).

view this post on Zulip Takafumi Arakaki (tkf) (Aug 05 2021 at 16:26):

Yes, I think that's a nice summary!

view this post on Zulip Takafumi Arakaki (tkf) (Aug 05 2021 at 16:27):

1.7 already has the atomics, BTW

view this post on Zulip Chris Foster (Aug 06 2021 at 01:04):

field loads and stores are practically atomic for some objects on all supported platforms

It depends on what you mean by "atomic". The term is rather overloaded, I'm afraid.

For one thing, you may worry about whether you can see a "torn read" where some bits of a wide word come before an update vs some after. IIUC this isn't something to worry about in practice right now, as you've stated. At least for reasonable sized word widths with correct alignment.

But considerations like torn reads are really only one consequence of data races, potentially not even the major consequence.

Much more tricky is how data races interact with compiler optimizations. The compiler is allowed to do all sorts of program transformations under the assumption that your code is data race free. On the other hand, if your program has data races many of these compiler transformations can just become invalid. I like the following article about this:
https://software.intel.com/content/www/us/en/develop/blogs/benign-data-races-what-could-possibly-go-wrong.html

view this post on Zulip Takafumi Arakaki (tkf) (Aug 06 2021 at 01:50):

Yeah, I agree torn reads are not the hardest part. Caches also interact with memory ordering, too. FYI, for people who prefer watching over reading, I dumped a bunch of useful talks on atomics in https://julialang.zulipchat.com/#narrow/stream/236830-concurrency/topic/Useful.20talks.20on.20atomics For anyone new to this, I think Herb Sutter's atomic<> Weapons is a great introduction.

view this post on Zulip Chris Foster (Aug 06 2021 at 04:08):

Right, I mentioned compiler optimizations... but hardware can also dynamically optimize some aspects of execution under a race free assumption (eg which cache to read from). In a high level language there's so many layers of this kind of thing. It seems best to assume there's no such thing as a safe data race.

view this post on Zulip Takafumi Arakaki (tkf) (Aug 06 2021 at 04:54):

Well, data race in C/C++ memory model is undefined behavior which is obviously unsafe :troll: . OK, there's a C++ proposal called Tearable Atomics http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0690r1.html which is in a way trying to define almost-but-not-really-a-race. So, it's possible that we can even talk about "conforming torn read" at some point.

(Although I do want this now, for implementing work-stealing deque and concurrent dictionary...)


Last updated: Oct 02 2023 at 04:34 UTC