memory layout of mutable struct containing Symbol · helpdesk (published)

Stream: helpdesk (published)

Topic: memory layout of mutable struct containing Symbol

Leandro Martínez (Aug 25 2022 at 12:19):

Can a mutable struct containing a Symbol have a concrete memory layout? I noticed that different symbols have different sizes:

julia> sizeof(:t)
1

julia> sizeof(:output)
6

thus, a struct like

mutable struct A
    s::Symbol
end

cannot have a (not sure about the terminology here) constant memory layout, right?

Sukera (Aug 25 2022 at 12:22):

sure it can

Sukera (Aug 25 2022 at 12:22):

in fact, it has to - otherwise it wouldn't be possible to have different A where you can swap out the symbol

Sukera (Aug 25 2022 at 12:22):

what ends up happening is that for your field a pointer is used internally

Sukera (Aug 25 2022 at 12:23):

that way, the symbol can be swapped out while keeping the size of A fixed

Sukera (Aug 25 2022 at 12:26):

Symbol in particular is a bit special though

julia> mutable struct A
           s::Symbol
       end

julia> sizeof(A)
8

julia> sizeof(Symbol)
ERROR: Type Symbol does not have a definite size.
Stacktrace:
 [1] sizeof(x::Type)
   @ Base ./essentials.jl:551
 [2] top-level scope
   @ REPL[4]:1

Sukera (Aug 25 2022 at 12:26):

suffice it to say, since it does not have a definite size, using it in a mutable struct most likely ends up as a pointer

Leandro Martínez (Aug 25 2022 at 12:27):

Ah, ok, so it is boxed (but type-stable). Anyway, if one needs an array of those and access those fields, that won't be great for performance, as there will be lots of memory accessess, right?

What if one uses the const s::Symbol introduced in 1.8?

Sukera (Aug 25 2022 at 12:33):

it's not Boxed per se, the pointer is just abstracted away

Sukera (Aug 25 2022 at 12:34):

that's a good question :thinking:

Leandro Martínez (Aug 25 2022 at 12:34):

Answering my own question: I don't see any difference:

julia> mutable struct B
           const s::Symbol
       end
       v = [B(rand((:a,:b))) for _ in 1:1000]
       @btime count(a -> a.s == :a, $v)
  451.457 ns (0 allocations: 0 bytes)
485

julia> mutable struct A
           s::Symbol
       end
       v = [A(rand((:a,:b))) for _ in 1:1000]
       @btime count(a -> a.s == :a, $v)
  451.325 ns (0 allocations: 0 bytes)
492

Counting isbits stuff is much faster though:

julia> v = [ rand(1:2) for _ in 1:10^3 ]
       @btime count(x -> x == 1, $v)
  55.400 ns (0 allocations: 0 bytes)
504

julia> v = [ rand('A':'B') for _ in 1:10^3 ]
       @btime count(x -> x == 'A', $v)
  67.148 ns (0 allocations: 0 bytes)
509

Sukera (Aug 25 2022 at 12:34):

it'll probably still be a pointer, since keeping the size of A constant is more important

Sukera (Aug 25 2022 at 12:34):

the symbol can have different sizes after all

Sukera (Aug 25 2022 at 13:04):

yes, because for mutable structs wrapping a symbol there's an additional indirection, since the array is an array of pointers as well

Jakob Nybo Nissen (Aug 25 2022 at 13:14):

In surprised it's that slow. Since symbols are interned, isn't it just an integer comparison of the pointers?

chriselrod (Aug 25 2022 at 13:15):

Jakob Nybo Nissen said:

In surprised it's that slow. Since symbols are interned, isn't it just an integer comparison of the pointers?

I always assumed that was the case (that'd it'd just be a pointer comparison, and hence equivalent to comparing Ints or @enums).
@Christopher Rackauckas , I guess this is another reason to change DiffEq retcodes, aside from :success being a fail.

Jakob Nybo Nissen (Aug 25 2022 at 13:19):

I wonder if all the time is being spent fetching out-of-cache stuff from the heap, and the symbol comparison is insignificant relative to the cache misses

Christopher Rackauckas (Aug 25 2022 at 14:01):

I guess this is another reason to change DiffEq retcodes, aside from :success being a fail.
It's mostly a correctness thing. Mis-spelled :succcess is too common :sad:

Jakob Nybo Nissen (Aug 25 2022 at 14:04):

A bunch of BioJulia including TranscodingStreams also use symbol literals in performance sensitive code. Perhaps it should be replaced with an enum

Sukera (Aug 25 2022 at 14:06):

Jakob Nybo Nissen said:

In surprised it's that slow. Since symbols are interned, isn't it just an integer comparison of the pointers?

it'll have trouble since the array of mutable structs containing a symbol ends up as an array of pointers to a pointer - two derefs kill cache coherence, would be my guess - and that probably also kills SIMD

Last updated: Aug 14 2025 at 04:51 UTC