Stream: helpdesk (published)

Topic: memory layout of mutable struct containing Symbol


view this post on Zulip Leandro Martínez (Aug 25 2022 at 12:19):

Can a mutable struct containing a Symbol have a concrete memory layout? I noticed that different symbols have different sizes:

julia> sizeof(:t)
1

julia> sizeof(:output)
6

thus, a struct like

mutable struct A
    s::Symbol
end

cannot have a (not sure about the terminology here) constant memory layout, right?

view this post on Zulip Sukera (Aug 25 2022 at 12:22):

sure it can

view this post on Zulip Sukera (Aug 25 2022 at 12:22):

in fact, it has to - otherwise it wouldn't be possible to have different A where you can swap out the symbol

view this post on Zulip Sukera (Aug 25 2022 at 12:22):

what ends up happening is that for your field a pointer is used internally

view this post on Zulip Sukera (Aug 25 2022 at 12:23):

that way, the symbol can be swapped out while keeping the size of A fixed

view this post on Zulip Sukera (Aug 25 2022 at 12:26):

Symbol in particular is a bit special though

julia> mutable struct A
           s::Symbol
       end

julia> sizeof(A)
8

julia> sizeof(Symbol)
ERROR: Type Symbol does not have a definite size.
Stacktrace:
 [1] sizeof(x::Type)
   @ Base ./essentials.jl:551
 [2] top-level scope
   @ REPL[4]:1

view this post on Zulip Sukera (Aug 25 2022 at 12:26):

suffice it to say, since it does not have a definite size, using it in a mutable struct most likely ends up as a pointer

view this post on Zulip Leandro Martínez (Aug 25 2022 at 12:27):

Ah, ok, so it is boxed (but type-stable). Anyway, if one needs an array of those and access those fields, that won't be great for performance, as there will be lots of memory accessess, right?

What if one uses the const s::Symbol introduced in 1.8?

view this post on Zulip Sukera (Aug 25 2022 at 12:33):

it's not Boxed per se, the pointer is just abstracted away

view this post on Zulip Sukera (Aug 25 2022 at 12:34):

that's a good question :thinking:

view this post on Zulip Leandro Martínez (Aug 25 2022 at 12:34):

Answering my own question: I don't see any difference:

julia> mutable struct B
           const s::Symbol
       end
       v = [B(rand((:a,:b))) for _ in 1:1000]
       @btime count(a -> a.s == :a, $v)
  451.457 ns (0 allocations: 0 bytes)
485

julia> mutable struct A
           s::Symbol
       end
       v = [A(rand((:a,:b))) for _ in 1:1000]
       @btime count(a -> a.s == :a, $v)
  451.325 ns (0 allocations: 0 bytes)
492

Counting isbits stuff is much faster though:

julia> v = [ rand(1:2) for _ in 1:10^3 ]
       @btime count(x -> x == 1, $v)
  55.400 ns (0 allocations: 0 bytes)
504

julia> v = [ rand('A':'B') for _ in 1:10^3 ]
       @btime count(x -> x == 'A', $v)
  67.148 ns (0 allocations: 0 bytes)
509

view this post on Zulip Sukera (Aug 25 2022 at 12:34):

it'll probably still be a pointer, since keeping the size of A constant is more important

view this post on Zulip Sukera (Aug 25 2022 at 12:34):

the symbol can have different sizes after all

view this post on Zulip Sukera (Aug 25 2022 at 13:04):

yes, because for mutable structs wrapping a symbol there's an additional indirection, since the array is an array of pointers as well

view this post on Zulip Jakob Nybo Nissen (Aug 25 2022 at 13:14):

In surprised it's that slow. Since symbols are interned, isn't it just an integer comparison of the pointers?

view this post on Zulip chriselrod (Aug 25 2022 at 13:15):

Jakob Nybo Nissen said:

In surprised it's that slow. Since symbols are interned, isn't it just an integer comparison of the pointers?

I always assumed that was the case (that'd it'd just be a pointer comparison, and hence equivalent to comparing Ints or @enums).
@Christopher Rackauckas , I guess this is another reason to change DiffEq retcodes, aside from :success being a fail.

view this post on Zulip Jakob Nybo Nissen (Aug 25 2022 at 13:19):

I wonder if all the time is being spent fetching out-of-cache stuff from the heap, and the symbol comparison is insignificant relative to the cache misses

view this post on Zulip Christopher Rackauckas (Aug 25 2022 at 14:01):

I guess this is another reason to change DiffEq retcodes, aside from :success being a fail.
It's mostly a correctness thing. Mis-spelled :succcess is too common :sad:

view this post on Zulip Jakob Nybo Nissen (Aug 25 2022 at 14:04):

A bunch of BioJulia including TranscodingStreams also use symbol literals in performance sensitive code. Perhaps it should be replaced with an enum

view this post on Zulip Sukera (Aug 25 2022 at 14:06):

Jakob Nybo Nissen said:

In surprised it's that slow. Since symbols are interned, isn't it just an integer comparison of the pointers?

it'll have trouble since the array of mutable structs containing a symbol ends up as an array of pointers to a pointer - two derefs kill cache coherence, would be my guess - and that probably also kills SIMD


Last updated: Nov 22 2024 at 04:41 UTC