Stream: helpdesk (published)

Topic: Zygote.jl gradient returning `nothing`


view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 12:11):

Is this a bug?

julia> using Zygote

julia> f(x, y) = y
f (generic function with 1 method)

julia> gradient(f, 0, 0)
(nothing, 1.0)

Shouldn't the result be (0.0, 1.0) instead?

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 12:17):

Reported here: https://github.com/FluxML/Zygote.jl/issues/1538

view this post on Zulip Mason Protter (Nov 08 2024 at 12:32):

Zygote uses nothing as a "hard" zero

view this post on Zulip Mason Protter (Nov 08 2024 at 12:33):

i.e. a differential that's known at compile time to be zero is represented as nothing.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 12:35):

That is somewhat unexpected mathematically speaking. Makes it difficult to write generic code. Is there a good practice to handle nothing in this context ?

view this post on Zulip Mason Protter (Nov 08 2024 at 12:54):

I guess you could do something like

denothing(x) = x
denothing(::Nothing) = false

my_gradient(args...; kwargs...) = denothing.(gradient(args...; kwargs...))

view this post on Zulip Mason Protter (Nov 08 2024 at 12:56):

julia> sum(my_gradient(f, 0, 0))
1.0

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 13:01):

I wonder why this is not done automatically for end-users

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 13:02):

Shouldn't it be fixed in Zygote.jl?

view this post on Zulip Michael Abbott (Nov 08 2024 at 15:25):

One reason for a special flag is that Zygote can avoid some work in the backward pass, as the gradient of any operations done before f is certain to be zero. Whereas with a runtime 0.0 it can't tell & must do the work.

view this post on Zulip Michael Abbott (Nov 08 2024 at 15:26):

The other is that for larger things like x::Array, allocating zero(x) is expensive.

view this post on Zulip Michael Abbott (Nov 08 2024 at 15:28):

FWIW, the Enzyme's design is:

julia> Enzyme.gradient(Reverse, (x,y) -> sum(abs2, x .* y), [1, 2.], [3 4 5.])
([100.0, 200.0], [30.0 40.0 50.0])

julia> Enzyme.gradient(Reverse, (x,y) -> sum(abs2, x .* x), [1, 2.], [3 4 5.])
([4.0, 32.0], [0.0 0.0 0.0])

julia> Enzyme.gradient(Reverse, (x,y) -> sum(abs2, x .* y), [1, 2.], Const([3 4 5.]))
([100.0, 200.0], nothing)

view this post on Zulip Mason Protter (Nov 08 2024 at 15:29):

It could have had a design like ChainRulesCore.ZeroTangent(), since that at least supports math ops

view this post on Zulip Mason Protter (Nov 08 2024 at 15:29):

but yeah, it's mostly just historical reasons and a ton of work to overhaul it

view this post on Zulip Mason Protter (Nov 08 2024 at 15:40):

FWIW, the Enzyme's design is:

Enzyme's gradient is pretty unlike Zygote's gradient.

view this post on Zulip Mason Protter (Nov 08 2024 at 15:41):

I'd say the equivalent in Enzyme is instead

julia> let x = Ref(0.0), y = Ref(0.0)
           dx, dy = make_zero(x), make_zero(y)
           autodiff(Reverse, Duplicated(x, dx), Duplicated(y, dy)) do x, y
               f(x[], y[])
           end
           dx[], dy[]
       end
(0.0, 1.0)

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 15:45):

I'm considering moving to Enzyme.jl because of this design of Zygote.jl. It is pretty counter intuitive to have a mathematical gradient with those entries

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 15:46):

Does Enzyme.jl support all platforms that Julia supports?

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 15:46):

I understand it is a wrapper package

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 15:47):

And another question: is Zygote.jl the recommended package for autodiff in native Julia or there is something new?

view this post on Zulip Nils (Nov 08 2024 at 16:13):

https://discourse.julialang.org/t/state-of-ad-in-2024/112601

view this post on Zulip Michael Abbott (Nov 08 2024 at 16:19):

Can you say what problem nothing causes, more narrowly than just being surprising?

view this post on Zulip Mason Protter (Nov 08 2024 at 16:21):

I think he wants to be able to do math with the result of gradient.

view this post on Zulip Michael Abbott (Nov 08 2024 at 16:22):

For instance, I agree that the fact that x + dx won't always work is a bit sad. (I think + needs to be replaced with Zygote.accum which knows about nothing.) ChainRules.jl took making this work as an axiom, and the result was massive complexity of Tangent which has all kinds of sharp edges. (Not to mention several kinds of zeros which nobody knows how to use correctly, and resulting type-instabilities.) So there are trade-offs, and nothing (plus NamedTuple for any struct) has the advantage of being very simple.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 16:45):

Michael Abbott said:

Can you say what problem nothing causes, more narrowly than just being surprising?

We are simply doing Newton-Rhapson iteration with automatic gradients. The problem with this nothing design is that it relies on all third-party packages handling it. Even if we workaround the situation in our own package, this solution doesn't compose well.

view this post on Zulip Mason Protter (Nov 08 2024 at 16:47):

Wouldn't Enzyme be a much better fit for stuff like Newton Rhapson because it supports mutation?

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 16:47):

We are doing Newton-Rhapson with 2 scalar values. There are no allocations.

view this post on Zulip Mason Protter (Nov 08 2024 at 16:49):

Ah. In that case, maybe just use ForwardDiff?

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 16:52):

Will take a look. I am assuming that ForwardDiff.jl provides autodiff like Zygote.jl but without the nothing.

view this post on Zulip Mason Protter (Nov 08 2024 at 16:57):

You really only want to reach for reverse mode AD like Zygote if you need the derivatives of functions from N dimensions to M dimensions where N >> M

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 16:59):

And in terms of maturity, ForwardDiff.jl is mature, actively maintained, etc?

view this post on Zulip Mason Protter (Nov 08 2024 at 17:00):

The classic use-case for reverse-mode is deep learning where N might be in the many thousands and M = 1

view this post on Zulip Mason Protter (Nov 08 2024 at 17:00):

ForwardDiff is very mature.

view this post on Zulip Mason Protter (Nov 08 2024 at 17:01):

I'd say it's actively maintained, but I wouldn't say it's actively developed (on account of said maturity)

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 17:01):

I already like that it has a much smaller list of dependencies compared to Zygote.jl

view this post on Zulip Mason Protter (Nov 08 2024 at 17:02):

forward mode AD is just fundamentally much much much simpler than reverse mode

view this post on Zulip Mason Protter (Nov 08 2024 at 17:03):

If you feel like trying out something bleeding edge instead, Diffractor.jl actually has a pretty well working forwards mode nowadays (probably don't actually do this)

view this post on Zulip Expanding Man (Nov 08 2024 at 17:06):

As a general rule, you should avoid reverse mode like the plague unless you are absolutely sure you need it.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 17:13):

Thank you. That is very helpful.

view this post on Zulip Expanding Man (Nov 08 2024 at 17:19):

Also, since it hasn't been mentioned yet, I highly recommend using DifferentiationInterface.jl which makes it trivial to swap out AD back-ends and has no performance penalty in simple cases.

view this post on Zulip Mason Protter (Nov 08 2024 at 17:23):

It'd be nice if DI turned the nothings into some sort of zero <:Number.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 17:24):

I think in this case we will go ahead with the ForwardDiff.jl package directly. There are no plans to swap the backend given that it is ideal for the application at hand.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 18:15):

For some reason ForwardDiff.jl is generating slower code compared to Zygote.jl.

Can you try to reproduce this benchmark on the main branch (Zygote) and on the forwarddiff branch?

https://github.com/JuliaEarth/CoordRefSystems.jl/tree/main/benchmark

Do you also see a massive slow down in the last line of the output.csv? The last column has the speedup metric.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 18:15):

For me the Zygote.jl result is 0.28 and the ForwardDiff.jl result is 0.06 (larger is better).

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:18):

We are simply doing Newton-Rhapson iteration with automatic gradients.

If it's scalar, then you don't want to be diffing through it anyways. BracketingNonlinearSolve or SimpleNonlinearSolve with Zygote/ForwardDiff overloads would just skip the implicit part.

But I would almost guarantee for scalar that ForwardDiff will be faster here.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:22):

With forward mode you want to essentially always do this trick: https://github.com/SciML/NonlinearSolve.jl/blob/master/lib/NonlinearSolveBase/ext/NonlinearSolveBaseForwardDiffExt.jl

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 18:33):

Thank you @Christopher Rackauckas . Can you please elaborate on that?

The PR that replaces Zygote.jl by ForwardDiff.jl has a small diff that you can read here: https://github.com/JuliaEarth/CoordRefSystems.jl/pull/199/files

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 18:33):

What do we need to do differently to get the expected superior performance of forward diff?

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:34):

How are you solving the nonlinear system?

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 18:36):

The diff has the formulas. Basically given two functions x=fx(λ,ϕ)x = f_x(\lambda, \phi) and y=fy(λ,ϕ)y = f_y(\lambda, \phi) and values xx\star and yy\star, we perform newton iteration to find λ\lambda\star and ϕ\phi\star.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 18:37):

These two formulas as decoupled in the diff above as you can see.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:37):

Yeah so if it's using SimpleNonlinearSolve it should automatically apply the implicit rule

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:37):

If you did it by hand then you'll need to copy that code / do a similar implicit function push through on the duals

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:38):

For scalar it's almost equivalent to not differentiating the first n steps of the newton method, re-applying the duals, and then applying it on the n+1th step

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 18:38):

I am not sure I am following. As an end-user of ForwardDiff.jl it is not clear what I am doing wrong.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:39):

Optimally handling implicit equations is not something automatic differentiation as a tool can do on its own. It requires that the solver library that you're using for the implicit system overloads the AD to avoid differentiation through the method

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 18:40):

So Zygote.jl is doing something more that guarantees better performance?

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:41):

The derivative of Newton-Rhapson w.r.t. u0 is 0, since the solution is independent of the initial condition (or undefined if it moves to a different solution). So you need to not differentiate the solve and then only differentiate effectively the last step. If the implicit solve is the expensive part of the code, then doing this trick turns O(n) expensive calls differentiating each step into exactly 1. That's hard to beat.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:41):

It's not really up to the AD libraries. It's up to the solver libraries, i.e. whomever writes the Newton method (NonlinearSolve) to supply rules for ForwardDiff/Zygote/etc. to do this

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 18:42):

You mean that there is a small package that we could take as dependency that already defines newton-rhapson inversion with AD rules?

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:42):

Since an AD library cannot really know by looking at code that it should have this convergence property, i.e. that the solution is independent of the previous steps, not in code (since in the code, each step of newton depends on the previous step), but in the solution (since it converges to the same value regardless of where you start)

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:43):

Yes, SimpleNonlinearSolve.jl

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:43):

It's a split out of NonlinearSolve that is focused only on very simple Newton Rhapson + the required AD rules.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 18:44):

It looks like the list of dependencies is very large?

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 18:45):

Ideally, we would just retain the performance of Zygote.jl, but with ForwardDiff.jl

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:45):

Well with a scalar nonlinear solve you probably want to be using ITP instead of Newton for stability if you have bounds. In that case, BracketingNonlinearSolve would then be an even smaller dep.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:48):

What exactly do you need different in the size? The import time is ~200ms and most of that is the precompilation load of the Newton method itself.

view this post on Zulip Expanding Man (Nov 08 2024 at 18:48):

It beggars belief that the code in the diff you pasted is much slower in forwarddiff than in zygote, though of course I don't know what functions you are running through it. I think there is something else wrong.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:48):

Most likely yes, Zygote shouldn't ever be faster in this kind of case.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:49):

But even then, the next thing you'd want to do is do the implicit rule for either ForwardDiff or Zygote :shrug:

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 18:49):

Perhaps I didn't run the benchmark properly. Let me try to isolate the issue.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:55):

using SimpleNonlinearSolve
f(u, p::Number) = u * u - p
f(u, p::Vector) = u * u - p[1]
u0 = 1.0
p = 1.0
const cprob = NonlinearProblem(f, u0, p)
sol = solve(prob_int, SimpleNewtonRaphson())

function loss(p)
    solve(remake(cprob, p=p),SimpleNewtonRaphson()).u - 4.0
end

using ForwardDiff, BenchmarkTools
@btime ForwardDiff.derivative(loss, p)
16.741 ns (1 allocation: 16 bytes)

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:56):

For a scalar problem you should be able to optimize most stuff out of it.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:58):

Though a bracketing method is almost certainly going to be more robust

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:58):

using BracketingNonlinearSolve
f(u, p::Number) = u * u - p
u0 = 1.0
p = 1.0
uspan = (1.0, 2.0) # brackets
const cprob_int = IntervalNonlinearProblem(f, uspan, p)
sol = solve(prob_int)

function loss(p)
    solve(remake(cprob_int, p=p)).u - 4.0
end

using ForwardDiff, BenchmarkTools
@btime ForwardDiff.derivative(loss, p);
18.495 ns (1 allocation: 16 bytes)

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 18:59):

You can probably specialize on a lot of other properties too though. What kind of system is it? Is it polynomial? Rational polynomial?

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:00):

I am creating a MWE with the exact code that is slower. Will share here in a few minutes...

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:12):

using CoordRefSystems
using BenchmarkTools

latlon = LatLon(45, 90)
winkel = convert(WinkelTripel, latlon)

@btime convert($LatLon, $winkel)
1.491 μs (10 allocations: 192 bytes) # main
6.356 μs (144 allocations: 2.88 KiB) # PR

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:13):

You can see that the ForwardDiff.jl in the PR is 6x slower. The underlying functions fx and fx are here:

https://github.com/JuliaEarth/CoordRefSystems.jl/blob/d9193f6d692816fae9982dfcfb284e26613add6a/src/crs/projected/winkeltripel.jl#L77-L79

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:13):

Trigonometric functions.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:14):

what is sincα?

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:14):

oh I see

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:14):

defined right above

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:20):

Wait you're talking about AD in the nonlinear solve not of the nonlinear solve?

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:21):

Yes, the AD is in the functions fx and fy inside the nonlinear solve.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:22):

I was assuming that this should be instantaneous given the "simplicity" of these trigonometric functions.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:22):

so where is your forwarddiff code?

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:22):

These are functions f ⁣:R2Rf\colon R^2 \to R

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:22):

My guess is you did something odd to handle the multiple returns.

view this post on Zulip Brian Chen (Nov 08 2024 at 19:23):

I think this kind of scalar, branch-free straight line code is the best-case performance scenario for Zygote. So it's not crazy that it'd be faster than ForwardDiff.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:23):

In this PR I shared a few messages ago: https://github.com/JuliaEarth/CoordRefSystems.jl/pull/199

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:23):

The PR literally replaces Zygote by ForwardDiff, nothing else.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:24):

yeah this kind of case is not so bad for Zygote, though either should do fine

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:25):

you shouldn't be getting so many allocs with forwarddiff though

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:25):

but for this kind of case, AD inside the ODE for a scalar output, Zygote should just optimize out all allocs which is usually what would kill it

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:26):

So Zygote should be fine, and should almost even match Enzyme here without some Reactant tricks.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:26):

The only other thing to try really is just avoiding the AD with something like an ITP and seeing how that does.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:26):

So the moral of the story is Zygote.jl is still recommended even in this scalar case with N=2 and M=1

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:27):

in this case, yes, because it can compile away a bunch of stuff so its normal issues don't come up here.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:27):

there are cases for which that is not true

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:27):

it's somewhat code dependent

view this post on Zulip Brian Chen (Nov 08 2024 at 19:27):

Zygote sucks at optimizing code with arrays and falls off a cliff any time there's a branch, but yes this is one of the few niches it's perf-competitive in.

view this post on Zulip Brian Chen (Nov 08 2024 at 19:29):

Hence the demos Mike and others used to do where they showed it constant-folding all the way to the correct gradient

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:29):

These heuristics to pick an AD backend are super hard. Every time we dive into it, we unlearn something that was told.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:29):

It still bothers me the original issue of this thread where Zygote.jl returns nothing. That is really annoying.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:30):

I think it's better to have a hard zero? It's annoying with AD just treats structural zeros as 0.0 because then it's harder to debug.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:30):

For your case you could just x === nothing ? 0.0 : x

view this post on Zulip Brian Chen (Nov 08 2024 at 19:31):

A lot of inputs Zygote accepts are not conducive to having natural Zeros. Structs with arbitrary type constraints, for example

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:31):

though almost certainly if you get that nothing in your code, it's likely a bug and you should throw an error saying "you likely have a bug in your f"

view this post on Zulip Brian Chen (Nov 08 2024 at 19:32):

One challenge ChainRules and later Mooncake ran into is that some types can't even be reliably represented by structural zeroes! Self-referential structs being a big culprit

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:32):

It is not a bug in f. It is common to have formulas that only depend on a subset of the arguments in this context.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:32):

yeah but that's a general case

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:32):

That is not grounded in this specific case

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:32):

in this specific case, if you get nothing, that means f is not a function of the parameter

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:32):

that means you can just remove it from the rootfind

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:33):

that tell you that you can optimize it more!

view this post on Zulip Brian Chen (Nov 08 2024 at 19:33):

I think in an alternate world where ChainRules matured a little earlier, Zygote could've used ZeroTangent and NoTangent instead of nothing

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:34):

That is a good point. Maybe refactoring the algorithm with a branch that handles nothing is not that bad. In any case, I wish we had Enzyme.jl behavior here, it always returns 0.0 for zero gradient.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:34):

Regardless though, this code should want the nothing or whatever structural zero because then it should just branch down to doing a scalar rootfind and double its speed

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:34):

This code should also be compatible with Enzyme?

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:35):

It is. It is just that we are trying to keep it native Julia as much as possible, at least for now. Maybe we will consider Enzyme.jl as the only exception.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:36):

Exploiting the structural zero with Zygote would still beat Enzyme here though

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:36):

The full stack is native Julia, which facilitates deployment in exotic platforms.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:37):

Screenshot 2024-11-08 at 6.36.36 PM.png

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:37):

Chopping out the fy gradient could be like half of the compute, so I'd just exploit the nothing and call it a day.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:37):

The full stack is native Julia, which facilitates deployment in exotic platforms.

Like what?

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:38):

Christopher Rackauckas said:

Chopping out the fy gradient could be like half of the compute, so I'd just exploit the nothing and call it a day.

Yes, it sounds reasonable.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:38):

The other thing you could potentially do is use fastmath approximations to the trig functions in the gradient context.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:38):

Or run this as a mixed precision and just do the gradient in 32-bit

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:39):

Christopher Rackauckas said:

The full stack is native Julia, which facilitates deployment in exotic platforms.

Like what?

We are investigating some heterogeneous cluster setups. I understand that external binary dependencies may support a subset of the platforms that Julia supports.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:40):

So we avoid external binary deps as much as possible. What is the situation with Enzyme.jl? Does it support all platforms that Julia does because it is LLVM-based?

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:40):

like how exotic though, ARMv7/8? Or like, embedded type chips?

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:40):

Julia doesn't even support all LLVM supported platforms because of runtime things

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:41):

Christopher Rackauckas said:

like how exotic though, ARMv7/8? Or like, embedded type chips?

Nothing specific at the moment. We are just trying to save ourselves from build issues that we can't address easily.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:42):

Christopher Rackauckas said:

Julia doesn't even support all LLVM supported platforms because of runtime things

So adding Enzyme.jl as a dependency shouldn't reduce the list of supported platforms, right?

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:42):

Premature optimization can be the root of all evil.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:42):

I mean, it might be easier to get Julia to kick something out for like a TI C600 without Enzyme, but the chances that will ever be in a cluster is zero.

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:43):

In this case, I see it as precaution. If we can stick to a native Julia app, why not? :smile:

view this post on Zulip Júlio Hoffimann (Nov 08 2024 at 19:44):

If Enzyme.jl is indeed the best thing to adopt, and the benefits outweigh the downsides, we will go for it.

view this post on Zulip Christopher Rackauckas (Nov 08 2024 at 19:45):

I mean, I see eVTOLs and satellites deploying to ARMv8 these days. I would be surprised if your case is actually all that exotic unless it's for a microsat


Last updated: Nov 22 2024 at 04:41 UTC