Stream: helpdesk (published)

Topic: ✔ `Distributed` remote calls fail inside modules


view this post on Zulip Mason Protter (Feb 28 2024 at 14:25):

Anyone know how to deal with this? I'm trying to use remote_call (or any other form of distributed computing) from within a package and I just can't get it to work because it complains about not being able to find the package.

Here's a MWE:

julia> module Foos

       struct Foo end
       using Distributed
       f() = let p = only(addprocs(1))
           x = remotecall_fetch(p) do
               Foo()
           end
           rmprocs([p])
           x
       end

       end
Main.Foos

julia> Foos.f()
ERROR: On worker 2:
UndefVarError: `Foos` not defined
Stacktrace:
  [1] deserialize_module
    @ ~/julia-1.10/usr/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:997
  [2] handle_deserialize
    @ ~/julia-1.10/usr/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:896

view this post on Zulip Sergio Vargas (Feb 28 2024 at 14:46):

You defined Foo in your process (1), but not the newly added process. Since Julia is nominally typed, and its full name is Foos.Foo, you'll get an error.

view this post on Zulip Sergio Vargas (Feb 28 2024 at 14:46):

You'd have to import shared types with @everywhere

view this post on Zulip Mason Protter (Feb 28 2024 at 14:51):

I don't think that's right. Notice that the error isn't that Foo isn't defined, it's that Foos is not defined

view this post on Zulip Mason Protter (Feb 28 2024 at 14:58):

Let's step away from the REPL version with a local module and do a package:

>  cat Foos/src/Foos.jl
module Foos

using Distributed

f() = let p = only(addprocs(1))
    x = remotecall_fetch(p) do
        1
    end
    rmprocs([p])
    x
end

end # module Foos
> cat Foos/Project.toml
name = "Foos"
uuid = "378b42d5-b922-4ee0-b574-33ae726347f8"
version = "0.1.0"

[deps]
Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"

i.e. a very simple little package that just creates a process, runs a function that does nothing but return 1, and then deletes that process.

Now here's what I see if I try to use it:

> julia -q --project=./Foos
julia> using Foos

julia> Foos.f()
ERROR: On worker 2:
KeyError: key Foos [378b42d5-b922-4ee0-b574-33ae726347f8] not found
Stacktrace:
  [1] getindex
    @ ./dict.jl:498 [inlined]
  [2] macro expansion
    @ ./lock.jl:267 [inlined]
  [3] root_module
    @ ./loading.jl:1878
  [4] deserialize_module
    @ ~/julia-1.10/usr/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:994
  [5] handle_deserialize
    @ ~/julia-1.10/usr/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:896
[...]

view this post on Zulip Sukera (Feb 28 2024 at 15:10):

Maybe you need to do using Foos on the worker first?

view this post on Zulip Sukera (Feb 28 2024 at 15:11):

both your REPL & package MWE share that the name Foos already exists; in the REPL example because it's in Main, in the package example because you have using Foos first

view this post on Zulip Mason Protter (Feb 28 2024 at 15:15):

Changing the function to

f() = let p = only(addprocs(1))
    x = remotecall_fetch(p) do
        eval(:(using Foos))
        1
    end
    rmprocs([p])
    x
end

doesn't seem to eliminate the error

view this post on Zulip Mason Protter (Feb 28 2024 at 15:27):

Hm, okay, so I managed to get my actual usecase working by using what feels like an excessive amount of indirection. I did something like this:

module Foos

using Distributed

struct Foo
    x
end

function _f(foo)
    Foo(foo.x + 1)
end

f(foo) = let p = only(addprocs(1))
    Distributed.remotecall_eval(
        Main, p, :(using Foos))
    x =Distributed.remotecall_eval(Main, p, :(Foos._f($foo)))
    rmprocs([p])
    x
end

end # module Foos

view this post on Zulip Mason Protter (Feb 28 2024 at 15:27):

Without all that indirection, I guess it was unhappy about me passing around structs that were defined in Foos, in particular, I needed totally separate calls to remotecall_eval to make it work.

view this post on Zulip Mason Protter (Feb 28 2024 at 15:27):

very weird

view this post on Zulip Notification Bot (Feb 28 2024 at 15:35):

Mason Protter has marked this topic as resolved.

view this post on Zulip Sukera (Feb 28 2024 at 15:40):

yeah, that's what I was thinking of; you need the additional remotecall_eval for world age purposes, I think


Last updated: Nov 22 2024 at 04:41 UTC