Stream: helpdesk (published)

Topic: PkgBenchmark CI


view this post on Zulip Sukera (Nov 15 2021 at 17:26):

Anyone familiar with running benchmarks for a package in CI? Possibly using PkgBenchmark, open to custom solutions

view this post on Zulip Sukera (Nov 15 2021 at 17:28):

alternatively, in which environment am I supposed to install PkgBenchmark? I don't really want to make it a dependency for my package and giving it its own (sub-)environment seems to break things.

view this post on Zulip Sukera (Nov 15 2021 at 17:28):

also, does it play well with unregistered dependencies?

view this post on Zulip Mosè Giordano (Nov 16 2021 at 02:34):

You should have it in a separate environment: https://github.com/JuliaPhysics/Measurements.jl/tree/ba56443c23bed16fc6e4e8e87c3b12c835f00edb/benchmark it works for me also with unregistered packages.

My only problem is that results are often totally unreliable, I tend to blame github actions machines, but the performance reports I get are usually garbage

view this post on Zulip Andrey Oskin (Nov 16 2021 at 04:44):

That's correct, you need separate environment, and may be tweak your yml configs, for it to run properly. Results are mostly garbage, that's also true :-)

Here is our setup, which uses BenchmarkCI.jl

https://github.com/PyDataBlog/ParallelKMeans.jl/tree/master/benchmark

view this post on Zulip Sukera (Nov 16 2021 at 07:59):

If CI is not reliable due to machine differences, that's fine for me - having a way to not have PkgBenchmark/BenchmarkCI in the main dependencies is the main concern, thank you both!

view this post on Zulip Sukera (Nov 16 2021 at 08:05):

so presumably, if I run the benchmarks on my/a custom fixed machine, the benchmarks should be more reliable

view this post on Zulip Takafumi Arakaki (tkf) (Nov 16 2021 at 22:23):

FWIW, I'm now shifted to only running smoke tests for benchmarks in newer projects.

I reduce the number of evaluations as much as possible for "smoke run"
https://github.com/JuliaConcurrent/ConcurrentCollections.jl/blob/7c3d94b24a36506e2d84161648b95ecf116312bb/benchmark/ConcurrentCollectionsBenchmarks/src/ConcurrentCollectionsBenchmarks.jl#L23-L43

...and then run it via test suite
https://github.com/JuliaConcurrent/ConcurrentCollections.jl/blob/master/test/ConcurrentCollectionsTests/src/test_bench_smoke.jl#L6-L14

view this post on Zulip Sukera (Nov 16 2021 at 22:25):

yeah I expect my benchmarks to be fairly long running (think ms-s range per evaluation)

view this post on Zulip Sukera (Nov 16 2021 at 22:26):

if it turns out to be super unstable, I don't mind running them on a machine I control

view this post on Zulip Sukera (Nov 16 2021 at 22:26):

thanks for that idea though, may be helpful!

view this post on Zulip Takafumi Arakaki (tkf) (Nov 16 2021 at 22:26):

I want to make benchmarks more reliable on CI, though. I've been wanting to look into Cachegrind for a reliable benchmark CI https://pythonspeed.com/articles/consistent-benchmarking-in-ci/ I wonder if anyone looked into it?

view this post on Zulip Sukera (Nov 16 2021 at 22:31):

I didn't even think of cachegrind, that's a good idea

view this post on Zulip Sukera (Nov 16 2021 at 22:32):

probably not suited to julia as-is though, as valgrind requires some setup iirc

view this post on Zulip Sukera (Nov 16 2021 at 22:32):

https://docs.julialang.org/en/v1/devdocs/valgrind/

view this post on Zulip Sukera (Nov 16 2021 at 22:32):

however, we may be able to use llvm-mca for a similar effect

view this post on Zulip Sukera (Nov 16 2021 at 22:33):

it doesn't know about caches though :/

view this post on Zulip Sukera (Nov 16 2021 at 22:34):

another possible problem may be variance in how the CI machine actually looks CPU and cache-wise

view this post on Zulip Sukera (Nov 16 2021 at 22:39):

[sukera@tempman ~]$ valgrind --smc-check=all-non-file --tool=cachegrind julia -e 'for _ in 1:10 println(stdout, "hello") end'
==337656== Cachegrind, a cache and branch-prediction profiler
==337656== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==337656== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==337656== Command: julia -e for\ _\ in\ 1:10\ println(stdout,\ "hello")\ end
==337656==
--337656-- warning: L3 cache found, using its data for the LL simulation.
ERROR: Unable to find compatible target in system image.
==337656==
==337656== I   refs:      102,539,492
==337656== I1  misses:         18,030
==337656== LLi misses:         14,960
==337656== I1  miss rate:        0.02%
==337656== LLi miss rate:        0.01%
==337656==
==337656== D   refs:       32,186,907  (23,929,774 rd   + 8,257,133 wr)
==337656== D1  misses:        780,973  (   709,078 rd   +    71,895 wr)
==337656== LLd misses:        190,592  (   131,846 rd   +    58,746 wr)
==337656== D1  miss rate:         2.4% (       3.0%     +       0.9%  )
==337656== LLd miss rate:         0.6% (       0.6%     +       0.7%  )
==337656==
==337656== LL refs:           799,003  (   727,108 rd   +    71,895 wr)
==337656== LL misses:         205,552  (   146,806 rd   +    58,746 wr)
==337656== LL miss rate:          0.2% (       0.1%     +       0.7%  )

haha nope, I don't think julia likes the virtual CPU cachgrind presents :laughter_tears:

view this post on Zulip Takafumi Arakaki (tkf) (Nov 16 2021 at 23:40):

Yeah, CPU detection in Julia feels like a big maze to me.

view this post on Zulip Takafumi Arakaki (tkf) (Nov 16 2021 at 23:40):

I wonder if you can use julia --cpu-target=... to get around this though?

view this post on Zulip Sukera (Nov 16 2021 at 23:41):

what to use as --cpu-target though?

view this post on Zulip Sukera (Nov 16 2021 at 23:42):

hm, simply setting it so skylake (my CPU) didn't work, same result

view this post on Zulip Takafumi Arakaki (tkf) (Nov 16 2021 at 23:47):

Maybe haswell? I guess we'd want to specify x86-64-v3 but it looks like it's not in the list of LLVM.

view this post on Zulip Sukera (Nov 16 2021 at 23:47):

well I checked with --cpu-target=help first to see if skylake was valid


Last updated: Oct 02 2023 at 04:34 UTC