I have been using julia for the past 5 years, and I keep running into problems where my codebase suddenly takes an extremely excessive amount of time to start up. I have started multiple projects where at one point or another this issue cropped up, and even had to abandon one - rewriting it in another language.
How is one supposed to debug this? What am I doing wrong? Which heuristics are getting derailed? If it is type instability, can we make the default behavior to just give up after failing to infer for too long?
For a silly example, this is a work-in-progress package : https://github.com/quantumghent/PEPSKit.jl/tree/djeezus (you really need that branch)
using PEPSKit,TensorKit
peps = InfinitePEPS(2,3);
leading_boundary(peps,CTMRG(truncdim(5),1e-12))
takes an absolutely unusable amount of time to run
Do you know about https://github.com/timholy/SnoopCompile.jl ?
Also seems like that branch can't even be added by itself:
(jl_vbpKOh) pkg> add https://github.com/quantumghent/PEPSKit.jl#djeezus
Updating git-repo `https://github.com/quantumghent/PEPSKit.jl`
Resolving package versions...
ERROR: Unsatisfiable requirements detected for package MPSKit [bb1c41ca]:
MPSKit [bb1c41ca] log:
├─possible versions are: 0.1.0-0.7.0 or uninstalled
├─restricted to versions 0.7 by PEPSKit [52969e89], leaving only versions 0.7.0
│ └─PEPSKit [52969e89] log:
│ ├─possible versions are: 0.1.0 or uninstalled
│ └─PEPSKit [52969e89] is fixed to version 0.1.0
└─restricted by julia compatibility requirements to versions: 0.1.0-0.4.0 or uninstalled — no versions left
Is there a circular dependency between your packages?
ah, my julia version is too new
I have tried snoopcompile in the past, but nothing obvious showed. For example, I did not see particularly many method invalidations. That said, the package changed a lot apparently, I will try it again
I can change the package (or remove the mpskit dependency) if you want to get it to run
I was just trying to see where the time of your example was actually spent
(I'm now running snoopi_deep on it)
is it package loading or runtime? If it's package loading, SnoopCompile can catch that. If it's runtime, you'll have to optimize your algorithms. If it's compile time, you can try to use precompile statements to reduce latency after package loading (though snoop compile can help finding the right invocations for that)
so the first thing I usually do when trying to optimize performance is figuring out _where_ the time is spent in the first place - I usually start with runtime, since that's where people most often write suboptimal code
Loading is near instantaneous, runtime is always good, compile time takes very long. In the past it has often been a single type instability (for example due to captured variables in closures) which takes the compile time from 30 seconds to 10 minutes.
where does leading_boundary
come from? I can't seem to find it in either PEPSKit or TensorKit
have you run JET.jl on your code, to find those type instabilities more easily?
Sukera said:
where does
leading_boundary
come from? I can't seem to find it in either PEPSKit or TensorKit
It's defined in MPSKit, extended in PEPSKit for the particular case that is being called (in src/algorithms)
Sukera said:
have you run JET.jl on your code, to find those type instabilities more easily?
haven't tried JET yet. I tried it in the past, and then julia segfaulted
snoopcompile results are in InferenceTimingNode: 0.304494/237.488271 on Core.Compiler.Timings.ROOT() with 4 direct children
yep, sounds like A LOT of specialization is happening for what you're doing
you may want to look into what exactly is eating up all that time
but precompile statements will certainly help. it'll mostly shift the time to package install time, but people usually feel better about it being spent there :shrug:
in your specific case, introducing some function barriers may just help enough already
not everything in your big left_move
function for example relies on PType
, right?
like the inner hot loops for example
not a perfect fit from the example in the docs/your code, but I think https://docs.julialang.org/en/v1/manual/performance-tips/#kernel-functions may apply
I still think there's an issue here. The code I showed does not run - it's work in progress - and takes forever to start running. Yet if I now simply complete the algorithm (implement a few extra methods, not change the code) to make it run successfully, then the actual startup cost goes from 200+ seconds to like 30 second.
I will try to further reduce startup time with function barriers (because left_move is indeed rather clunky), but I will also try to construct a minimal example to illustrate this long-startuptime behavior.
very interesting
when you say you had to implement extra methods to make it work, what were those methods for?
is it possible you hit some fallback from one of your dependencies, which could have resulted in undue specialization?
For finding compilation time, you can also use the normal profiler with StatProfileHTML. That works great too and is easier to use IMO
Adding precompile directives can help but not for overspecialisations like is the case here? Then you would need to run precompile directives in a loop
Maarten make sure you’re running the newest Julia version possible. The compiler has gotten a lot better especially in Julia 1.8
Last updated: Nov 06 2024 at 04:40 UTC