I'm curious to know whether field accesses(i.e., Core.getfield
and Core.setfield
) are atomic by default. For example, can we call the following two accessor functions (getx
, setzero!
) concurrently from multiple threads without a data race? If not, does Threads.Atomic
solve the problem?
mutable struct Foo
x::Int
end
getx(foo) = foo.x
setzero!(foo) = foo.x = 0
IIUC there's no memory ordering or atomicity implied for setfield!
or getfield
so you'll need Threads.Atomic
to avoid a data race.
In julia-1.7 there's been a lot of work done on atomic operations and you'll be able to to use the new atomic macro to annotate @atomic x::Int
and the use sites of x
. See https://github.com/JuliaLang/julia/pull/37847 and https://hackmd.io/s/SyFljvtdO
Thank you. I'm interested only in atomicity, not in memory ordering. I'd like to add a bit of my context: When I read the code of lock(::ReentrantLock)
, I found it accesses the locked_by
field before acquiring any lock. So, I thought Julia would implicitly assume atomicity of field access. I haven't read though that Manifesto (I found it yesterday!). Perhaps adding the @atomic
macro before each field and access is a safer way.
Hmm, on all platforms supported by Julia, relaxed loads and stores seem to be mapped to vanilla loads and stores, respectively. https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
Loads and stores of small single objects like Int
are atomic, in that they're a single indivisible operation.
Special "atomic" instructions are needed when what we'd like to do atomically normally would not be atomic, like atomic_add!
which loads, adds, and stores as a single indivisible operation.
Or for memory ordering.
Right, but the specs of C++ define concurrent loads and stores are a data race and it leads to undefined behavior. I think this is the case even if the manipulated object is an integer.
That's correct. For some specific architectures and objects of some specific sizes, loads and stores may be indivisible but that's not what the C++ memory model (on which Julia's memory model is defined IIUC) formally defines. We are programming against the abstract machine, not concrete architecture. Otherwise, it's impossible to have an optimizing compiler and portable program.
(Of course, performance tuning would benefit hugely from understanding concrete machine mechanisms. However, for writing correct code, I don't think we should be relying on the behavior of certain architecture as much as possible.)
If you need indivisibility and are sure that no memory ordering is required, use the :monotonic
(aka relaxed) ordering. In x86, it's likely compiled down to the normal load/store for small objects.
Thank you, all. So, my takeaway here is that field loads and stores are practically atomic for some objects on all supported platforms. However, this is not a solid specification and therefore Julia devs are now working on consolidating its memory model as Julia Atomics Manifesto, which will be introduced in Julia 1.7 (or 1.8?).
Yes, I think that's a nice summary!
1.7 already has the atomics, BTW
field loads and stores are practically atomic for some objects on all supported platforms
It depends on what you mean by "atomic". The term is rather overloaded, I'm afraid.
For one thing, you may worry about whether you can see a "torn read" where some bits of a wide word come before an update vs some after. IIUC this isn't something to worry about in practice right now, as you've stated. At least for reasonable sized word widths with correct alignment.
But considerations like torn reads are really only one consequence of data races, potentially not even the major consequence.
Much more tricky is how data races interact with compiler optimizations. The compiler is allowed to do all sorts of program transformations under the assumption that your code is data race free. On the other hand, if your program has data races many of these compiler transformations can just become invalid. I like the following article about this:
https://software.intel.com/content/www/us/en/develop/blogs/benign-data-races-what-could-possibly-go-wrong.html
Yeah, I agree torn reads are not the hardest part. Caches also interact with memory ordering, too. FYI, for people who prefer watching over reading, I dumped a bunch of useful talks on atomics in https://julialang.zulipchat.com/#narrow/stream/236830-concurrency/topic/Useful.20talks.20on.20atomics For anyone new to this, I think Herb Sutter's atomic<> Weapons is a great introduction.
Right, I mentioned compiler optimizations... but hardware can also dynamically optimize some aspects of execution under a race free assumption (eg which cache to read from). In a high level language there's so many layers of this kind of thing. It seems best to assume there's no such thing as a safe data race.
Well, data race in C/C++ memory model is undefined behavior which is obviously unsafe . OK, there's a C++ proposal called Tearable Atomics http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0690r1.html which is in a way trying to define almost-but-not-really-a-race. So, it's possible that we can even talk about "conforming torn read" at some point.
(Although I do want this now, for implementing work-stealing deque and concurrent dictionary...)
Last updated: Nov 22 2024 at 04:41 UTC