Anyone familiar with running benchmarks for a package in CI? Possibly using PkgBenchmark, open to custom solutions
alternatively, in which environment am I supposed to install PkgBenchmark? I don't really want to make it a dependency for my package and giving it its own (sub-)environment seems to break things.
also, does it play well with unregistered dependencies?
You should have it in a separate environment: https://github.com/JuliaPhysics/Measurements.jl/tree/ba56443c23bed16fc6e4e8e87c3b12c835f00edb/benchmark it works for me also with unregistered packages.
My only problem is that results are often totally unreliable, I tend to blame github actions machines, but the performance reports I get are usually garbage
That's correct, you need separate environment, and may be tweak your yml configs, for it to run properly. Results are mostly garbage, that's also true :-)
Here is our setup, which uses BenchmarkCI.jl
https://github.com/PyDataBlog/ParallelKMeans.jl/tree/master/benchmark
If CI is not reliable due to machine differences, that's fine for me - having a way to not have PkgBenchmark/BenchmarkCI in the main dependencies is the main concern, thank you both!
so presumably, if I run the benchmarks on my/a custom fixed machine, the benchmarks should be more reliable
FWIW, I'm now shifted to only running smoke tests for benchmarks in newer projects.
I reduce the number of evaluations as much as possible for "smoke run"
https://github.com/JuliaConcurrent/ConcurrentCollections.jl/blob/7c3d94b24a36506e2d84161648b95ecf116312bb/benchmark/ConcurrentCollectionsBenchmarks/src/ConcurrentCollectionsBenchmarks.jl#L23-L43
...and then run it via test suite
https://github.com/JuliaConcurrent/ConcurrentCollections.jl/blob/master/test/ConcurrentCollectionsTests/src/test_bench_smoke.jl#L6-L14
yeah I expect my benchmarks to be fairly long running (think ms-s range per evaluation)
if it turns out to be super unstable, I don't mind running them on a machine I control
thanks for that idea though, may be helpful!
I want to make benchmarks more reliable on CI, though. I've been wanting to look into Cachegrind for a reliable benchmark CI https://pythonspeed.com/articles/consistent-benchmarking-in-ci/ I wonder if anyone looked into it?
I didn't even think of cachegrind, that's a good idea
probably not suited to julia as-is though, as valgrind requires some setup iirc
https://docs.julialang.org/en/v1/devdocs/valgrind/
however, we may be able to use llvm-mca for a similar effect
it doesn't know about caches though :/
another possible problem may be variance in how the CI machine actually looks CPU and cache-wise
[sukera@tempman ~]$ valgrind --smc-check=all-non-file --tool=cachegrind julia -e 'for _ in 1:10 println(stdout, "hello") end'
==337656== Cachegrind, a cache and branch-prediction profiler
==337656== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==337656== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==337656== Command: julia -e for\ _\ in\ 1:10\ println(stdout,\ "hello")\ end
==337656==
--337656-- warning: L3 cache found, using its data for the LL simulation.
ERROR: Unable to find compatible target in system image.
==337656==
==337656== I refs: 102,539,492
==337656== I1 misses: 18,030
==337656== LLi misses: 14,960
==337656== I1 miss rate: 0.02%
==337656== LLi miss rate: 0.01%
==337656==
==337656== D refs: 32,186,907 (23,929,774 rd + 8,257,133 wr)
==337656== D1 misses: 780,973 ( 709,078 rd + 71,895 wr)
==337656== LLd misses: 190,592 ( 131,846 rd + 58,746 wr)
==337656== D1 miss rate: 2.4% ( 3.0% + 0.9% )
==337656== LLd miss rate: 0.6% ( 0.6% + 0.7% )
==337656==
==337656== LL refs: 799,003 ( 727,108 rd + 71,895 wr)
==337656== LL misses: 205,552 ( 146,806 rd + 58,746 wr)
==337656== LL miss rate: 0.2% ( 0.1% + 0.7% )
haha nope, I don't think julia likes the virtual CPU cachgrind presents :laughter_tears:
Yeah, CPU detection in Julia feels like a big maze to me.
I wonder if you can use julia --cpu-target=...
to get around this though?
what to use as --cpu-target
though?
hm, simply setting it so skylake
(my CPU) didn't work, same result
Maybe haswell
? I guess we'd want to specify x86-64-v3 but it looks like it's not in the list of LLVM.
well I checked with --cpu-target=help
first to see if skylake
was valid
Last updated: Dec 28 2024 at 04:38 UTC