✔ What is the overhead of `gensym`? · helpdesk (published)

julia> function f1(x)
           total = 0
           for i in 1:x
               total += i
           end
           return total
       end
f1 (generic function with 1 method)

julia> @code_native f1(1000)
    .text
; ┌ @ REPL[37]:3 within `f1`
; │┌ @ range.jl:5 within `Colon`
; ││┌ @ range.jl:354 within `UnitRange`
; │││┌ @ range.jl:359 within `unitrange_last`
    testq   %rdi, %rdi
; │└└└
    jle L43
; │ @ REPL[37] within `f1`
    movq    %rdi, %rax
    sarq    $63, %rax
    andnq   %rdi, %rax, %rax
; │ @ REPL[37]:4 within `f1`
    leaq    -1(%rax), %rdx
    leaq    -2(%rax), %rcx
    mulxq   %rcx, %rcx, %rdx
    shldq   $63, %rcx, %rdx
    leaq    (%rdx,%rax,2), %rax
    decq    %rax
; │ @ REPL[37]:6 within `f1`
    retq
; │ @ REPL[37] within `f1`
L43:
    xorl    %eax, %eax
; │ @ REPL[37]:6 within `f1`
    retq
    nop
; └

julia> function f2(x)
           total = 0
           for i in 1:x
               @gensym a b c d e f g
               total += i
           end
           return total
       end
f2 (generic function with 1 method)

julia> @code_native f2(1000)
    .text
; ┌ @ REPL[40]:3 within `f2`
; │┌ @ range.jl:5 within `Colon`
; ││┌ @ range.jl:354 within `UnitRange`
; │││┌ @ REPL[40]:1 within `unitrange_last`
    pushq   %rbp
    pushq   %r15
    pushq   %r14
    pushq   %r13
    pushq   %r12
    pushq   %rbx
    subq    $40, %rsp
; │││└
; │││┌ @ range.jl:359 within `unitrange_last`
    testq   %rdi, %rdi
; │└└└
    jle L253
; │ @ REPL[40] within `f2`
    movq    %rdi, %rax
    sarq    $63, %rax
    andnq   %rdi, %rax, %rax
    movabsq $139821397918949, %r12          # imm = 0x7F2AB4C144E5
; │ @ REPL[40]:5 within `f2`
    negq    %rax
    movq    %rax, 32(%rsp)
    movl    $1, %r13d
    xorl    %ebx, %ebx
    leaq    -178798797(%r12), %rax
    movq    %rax, 24(%rsp)
    leaq    -178798637(%r12), %rax
    movq    %rax, 16(%rsp)
    leaq    -178798477(%r12), %rax
    movq    %rax, 8(%rsp)
    leaq    -178798317(%r12), %rax
    movq    %rax, (%rsp)
    leaq    -178798157(%r12), %rbp
    leaq    -178797997(%r12), %r14
    leaq    -178797837(%r12), %r15
    nopl    (%rax,%rax)
; │ @ REPL[40]:4 within `f2`
; │┌ @ expr.jl:12 within `gensym`
L144:
    movl    $1, %esi
    movq    24(%rsp), %rdi
    callq   *%r12
    movl    $1, %esi
    movq    16(%rsp), %rdi
    callq   *%r12
    movl    $1, %esi
    movq    8(%rsp), %rdi
    callq   *%r12
    movl    $1, %esi
    movq    (%rsp), %rdi
    callq   *%r12
    movl    $1, %esi
    movq    %rbp, %rdi
    callq   *%r12
    movl    $1, %esi
    movq    %r14, %rdi
    callq   *%r12
    movl    $1, %esi
    movq    %r15, %rdi
    callq   *%r12
; │└
; │ @ REPL[40]:5 within `f2`
; │┌ @ int.jl:87 within `+`
    addq    %r13, %rbx
    movq    32(%rsp), %rax
; │└
; │┌ @ range.jl:837 within `iterate`
; ││┌ @ promotion.jl:468 within `==`
    addq    %r13, %rax
    incq    %rax
; ││└
    incq    %r13
; ││┌ @ promotion.jl:468 within `==`
    cmpq    $1, %rax
; │└└
    jne L144
    jmp L255
; │ @ REPL[40] within `f2`
L253:
    xorl    %ebx, %ebx
; │ @ REPL[40]:7 within `f2`
L255:
    movq    %rbx, %rax
    addq    $40, %rsp
    popq    %rbx
    popq    %r12
    popq    %r13
    popq    %r14
    popq    %r15
    popq    %rbp
    retq
    nopw    %cs:(%rax,%rax)
; └

julia> @benchmark f1(1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.152 ns … 69.760 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.160 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.171 ns ±  0.712 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                █
  ▂▁▁▆▁▁▁▄▁▁▁▄▁▁▁▃▁▁▁▂▁▁▁▂▁▁▁▃▁▁█▁▁▁▅▁▁▁▂▁▁▁▄▁▁▁█▁▁▁▆▁▁▁▇▁▁▂ ▂
  1.15 ns        Histogram: frequency by time        1.17 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark f2(1000)
BenchmarkTools.Trial: 641 samples with 1 evaluation.
 Range (min … max):  7.365 ms …  10.314 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     7.726 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.795 ms ± 346.021 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▁    ▁▂█▆▆▆▆▁▃▁▃▂▂
  ▅▆█▆▅█▅█████████████▆▅▃▃▃▃▂▃▁▃▃▃▃▁▄▄▃▃▃▃▃▃▂▁▁▂▁▂▂▁▁▁▂▂▁▁▂▁▃ ▄
  7.36 ms         Histogram: frequency by time        9.05 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

Jeffrey Sarnoff (Jun 24 2022 at 23:48):

You answered the question by benchmarking both ways and noting the relative slowdown exceeds 5000x.
You will be happier reorganizing the code, lift the gensym out of the loop -- pregen batches of these symbols.

Michael Fiano (Jun 24 2022 at 23:51):

gensym comes from Common Lisp where it is only supposed to be used at macro-expansion time. string-interning symbols as Julia calls it, is not cheap.

Brian Chen (Jun 24 2022 at 23:51):

I was trying to get a better sense of the absolute unit overhead. Some more local benchmarking seemed to suggest 1-2us, but that's subject to noise and possibly other factors

Michael Fiano (Jun 24 2022 at 23:56):

If it's anything like CL, there is more than string-interning involved too, since it has to keep a generation counter to ensure uniqueness.

Brian Chen (Jun 24 2022 at 23:56):

Brian Chen (Jun 24 2022 at 23:57):

I think I can fulfill my use case using a custom macro (basically @static_gensym x, y, ...), but will have to try. Edit: it worked!

Notification Bot (Jun 25 2022 at 00:18):

Sukera (Jun 25 2022 at 04:48):

Sukera (Jun 25 2022 at 04:49):

I would expect this particular benchmark to be fast, since the gensymmed variables are never used, but the ccall may have sideeffects the compiler can't see, so it can't be eliminated

Brian Chen (Jun 25 2022 at 05:35):

Sukera (Jun 25 2022 at 05:37):

Sukera (Jun 25 2022 at 05:38):

Mosè Giordano (Jun 25 2022 at 08:32):

I'm not sure the benchmarks were realistic, I think f1 was constant propagated, no? Summing 1000 elements took 1 ns

Sukera (Jun 25 2022 at 08:37):

oh yeah, 100% - it got folded to ~~eulers~~ gauss formula, there isn't even a jump there anymore

Sukera (Jun 25 2022 at 08:37):

Mosè Giordano (Jun 25 2022 at 08:42):

Sukera (Jun 25 2022 at 08:43):

Sukera (Jun 25 2022 at 08:44):

always hard to remember which of the gazillion famous formulas is from which of the two

Brian Chen (Jun 25 2022 at 13:45):

@Mosè Giordano the goal was just to have the function body not optimized away entirely. I wanted to use donotdelete but didn't want to switch versions :P

Brian Chen (Jun 25 2022 at 13:46):

Ended up just testing a single iteration and explicit calls because those are kept around still

Mosè Giordano (Jun 25 2022 at 14:52):

julia> @benchmark f1(1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  2.460 ns … 20.945 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.472 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.511 ns ±  0.566 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄   ▃    ▃                                                ▁
  ██▃▃▁█▁▃▁▁█▇▄▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ █
  2.46 ns      Histogram: log(frequency) by time     3.26 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

which would also suggests it isn't doing much in practice. these damned compilers eliminating everything...

Stream: helpdesk (published)

Topic: ✔ What is the overhead of `gensym`?

Brian Chen (Jun 24 2022 at 23:21):

Brian Chen (Jun 24 2022 at 23:23):

Jeffrey Sarnoff (Jun 24 2022 at 23:48):

Michael Fiano (Jun 24 2022 at 23:51):

Brian Chen (Jun 24 2022 at 23:51):

Michael Fiano (Jun 24 2022 at 23:56):

Brian Chen (Jun 24 2022 at 23:56):

Brian Chen (Jun 24 2022 at 23:57):

Notification Bot (Jun 25 2022 at 00:18):

Sukera (Jun 25 2022 at 04:48):

Sukera (Jun 25 2022 at 04:49):

Brian Chen (Jun 25 2022 at 05:35):

Sukera (Jun 25 2022 at 05:37):

Sukera (Jun 25 2022 at 05:37):

Sukera (Jun 25 2022 at 05:38):

Mosè Giordano (Jun 25 2022 at 08:32):

Sukera (Jun 25 2022 at 08:37):

Sukera (Jun 25 2022 at 08:37):

Mosè Giordano (Jun 25 2022 at 08:42):

Sukera (Jun 25 2022 at 08:43):

Sukera (Jun 25 2022 at 08:44):

Brian Chen (Jun 25 2022 at 13:45):

Brian Chen (Jun 25 2022 at 13:46):

Mosè Giordano (Jun 25 2022 at 14:52):

chriselrod (Jun 25 2022 at 20:35):