Are field loads/stores atomic? · helpdesk (published)

I'm curious to know whether field accesses(i.e., Core.getfield and Core.setfield) are atomic by default. For example, can we call the following two accessor functions (getx, setzero!) concurrently from multiple threads without a data race? If not, does Threads.Atomic solve the problem?

mutable struct Foo
    x::Int
end
getx(foo) = foo.x
setzero!(foo) = foo.x = 0

Chris Foster (Aug 05 2021 at 12:51):

IIUC there's no memory ordering or atomicity implied for setfield! or getfield so you'll need Threads.Atomic to avoid a data race.

Kenta Sato (Aug 05 2021 at 13:12):

Thank you. I'm interested only in atomicity, not in memory ordering. I'd like to add a bit of my context: When I read the code of lock(::ReentrantLock), I found it accesses the locked_by field before acquiring any lock. So, I thought Julia would implicitly assume atomicity of field access. I haven't read though that Manifesto (I found it yesterday!). Perhaps adding the @atomic macro before each field and access is a safer way.

Kenta Sato (Aug 05 2021 at 14:24):

chriselrod (Aug 05 2021 at 14:39):

Loads and stores of small single objects like Int are atomic, in that they're a single indivisible operation.

Special "atomic" instructions are needed when what we'd like to do atomically normally would not be atomic, like atomic_add! which loads, adds, and stores as a single indivisible operation.
Or for memory ordering.

Kenta Sato (Aug 05 2021 at 15:02):

Right, but the specs of C++ define concurrent loads and stores are a data race and it leads to undefined behavior. I think this is the case even if the manipulated object is an integer.

Takafumi Arakaki (tkf) (Aug 05 2021 at 15:52):

That's correct. For some specific architectures and objects of some specific sizes, loads and stores may be indivisible but that's not what the C++ memory model (on which Julia's memory model is defined IIUC) formally defines. We are programming against the abstract machine, not concrete architecture. Otherwise, it's impossible to have an optimizing compiler and portable program.

(Of course, performance tuning would benefit hugely from understanding concrete machine mechanisms. However, for writing correct code, I don't think we should be relying on the behavior of certain architecture as much as possible.)

If you need indivisibility and are sure that no memory ordering is required, use the :monotonic (aka relaxed) ordering. In x86, it's likely compiled down to the normal load/store for small objects.

Kenta Sato (Aug 05 2021 at 16:13):

Thank you, all. So, my takeaway here is that field loads and stores are practically atomic for some objects on all supported platforms. However, this is not a solid specification and therefore Julia devs are now working on consolidating its memory model as Julia Atomics Manifesto, which will be introduced in Julia 1.7 (or 1.8?).

Takafumi Arakaki (tkf) (Aug 05 2021 at 16:26):

Takafumi Arakaki (tkf) (Aug 05 2021 at 16:27):

Chris Foster (Aug 06 2021 at 01:04):

It depends on what you mean by "atomic". The term is rather overloaded, I'm afraid.

For one thing, you may worry about whether you can see a "torn read" where some bits of a wide word come before an update vs some after. IIUC this isn't something to worry about in practice right now, as you've stated. At least for reasonable sized word widths with correct alignment.

But considerations like torn reads are really only one consequence of data races, potentially not even the major consequence.

Much more tricky is how data races interact with compiler optimizations. The compiler is allowed to do all sorts of program transformations under the assumption that your code is data race free. On the other hand, if your program has data races many of these compiler transformations can just become invalid. I like the following article about this:
https://software.intel.com/content/www/us/en/develop/blogs/benign-data-races-what-could-possibly-go-wrong.html

Takafumi Arakaki (tkf) (Aug 06 2021 at 01:50):

Chris Foster (Aug 06 2021 at 04:08):

Right, I mentioned compiler optimizations... but hardware can also dynamically optimize some aspects of execution under a race free assumption (eg which cache to read from). In a high level language there's so many layers of this kind of thing. It seems best to assume there's no such thing as a safe data race.

Takafumi Arakaki (tkf) (Aug 06 2021 at 04:54):

Well, data race in C/C++ memory model is undefined behavior which is obviously unsafe

. OK, there's a C++ proposal called Tearable Atomics http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0690r1.html which is in a way trying to define almost-but-not-really-a-race. So, it's possible that we can even talk about "conforming torn read" at some point.

(Although I do want this now, for implementing work-stealing deque and concurrent dictionary...)

Stream: helpdesk (published)

Topic: Are field loads/stores atomic?

Kenta Sato (Aug 05 2021 at 11:19):