Stream: helpdesk (published)

Topic: using PyCall object in threads


view this post on Zulip Rein Zustand (Jun 04 2021 at 05:10):

The following code fails with segmentation faults

import PyCall

so = PyCall.pyimport("scipy.optimize")

function f(x)
    (x - 2) * x * (x + 2)^2
end

function do_optimize(fn)
    result = so.minimize_scalar(fn)
end

let
    out = Vector{Any}(undef, 20)
    Threads.@threads for i in 1:20
        println(i)
        out[i] = do_optimize(f)
    end
end

What should be the proper way to do this? If it is not possible, do I have to use multi-processing with Distributed instead?

I have also found an unresolved question (https://discourse.julialang.org/t/using-pycall-from-threads/32742) from the internet.

view this post on Zulip Mason Protter (Jun 04 2021 at 05:13):

Python doesn't have mulithreading, so none of it's datastructures are threadsafe

view this post on Zulip Mason Protter (Jun 04 2021 at 05:14):

Spinning up multiple julia instances won't help I think, instead, you need to have multiple python instances that you're sending data to

view this post on Zulip Rein Zustand (Jun 04 2021 at 05:16):

I see. Looks like the only way to do it is to use Python's own multiprocessing module, which technically spawn independent processes (with identical setups)?

view this post on Zulip Mason Protter (Jun 04 2021 at 05:16):

yep

view this post on Zulip Mason Protter (Jun 04 2021 at 05:16):

PyCall might have it's own way to spawn multiple pythons but I don't know

view this post on Zulip Rein Zustand (Jun 04 2021 at 05:18):

Thank you! I will investigate whether PyCall can spawn multiple Pythons.

view this post on Zulip Rein Zustand (Jun 04 2021 at 05:43):

I tried Python's multiprocessing, and still got segmentation fault, and caused my laptop to slow down. I managed to recover by pkill julia in a non-X terminal (ctrl-alt-F6).

import PyCall

so = PyCall.pyimport("scipy.optimize")
mp = PyCall.pyimport("multiprocessing")

function f(x)
    (x - 2) * x * (x + 2)^2
end

function do_optimize(fn)
    result = so.minimize_scalar(fn)
end

let
    procs = []
    for x in 1:20
        println(x)
        proc = mp.Process(target=f, args=())
        proc.start()
        push!(procs, proc)
    end
    for p in procs
        p.join()
    end
end

My hypothesis is that the proc object gets duplicated indefinitely when passed over to Julia?

view this post on Zulip Rein Zustand (Jun 04 2021 at 05:44):

^ I didn't even use the do_optimize function.

view this post on Zulip Rein Zustand (Jun 04 2021 at 06:49):

EDIT: I forgot to add -t2 to the julia argument. The code below failed with segmentation fault as well!
This solution works:

import PyCall

so = PyCall.pyimport("scipy.optimize")

function f(x)
    (x - 2) * x * (x + 2)^2
end

let
    out = Vector{Any}(undef, 20)
    Threads.@threads for i in 1:20
        function do_optimize(fn)
            result = so.minimize_scalar(fn)
        end

        out[i] = do_optimize(f)
    end
    println(out)
end

Not sure why.

view this post on Zulip Rein Zustand (Jun 04 2021 at 06:53):

EDIT: I forgot to add -t2 to the julia argument. The code below failed with segmentation fault as well!
This also works:

import PyCall

so = PyCall.pyimport("scipy.optimize")

function f(x)
    (x - 2) * x * (x + 2)^2
end

function do_optimize(fn)
    result = so.minimize_scalar(fn)
end

let
    out = Vector{Any}(undef, 20)
    Threads.@threads for i in 1:20
        _do_optimize = do_optimize

        out[i] = _do_optimize(f)
    end
    println(out)
end

view this post on Zulip Rein Zustand (Jun 04 2021 at 10:58):

I have tested that using Python multiprocessing, running a Python script that calls Julia functions also result in segfault. Looks like my only viable path is to either go full Julia or stay full Python. Too bad that the only Python function I need is the LBFGSB code from scipy.optimize.

view this post on Zulip Maarten (Jun 04 2021 at 12:02):

why don't you use a native lbfgs implementation?

view this post on Zulip Rein Zustand (Jun 04 2021 at 12:11):

LBFGS isn't bounded. It has to use a generic Optim.Fminbox and so is less efficient than LBFGSB I think. See https://discourse.julialang.org/t/optim-jl-vs-scipy-optimize-once-again/61661/35

view this post on Zulip Maarten (Jun 04 2021 at 12:19):

cool, I didn't know about lbfgs-B :)

view this post on Zulip Paulito Palmes (Jun 17 2021 at 02:12):

@distributed with pycall should work

view this post on Zulip Paulito Palmes (Jun 17 2021 at 02:15):

just add @everywhere stubs in loading packages and function defs

view this post on Zulip Rein Zustand (Jun 17 2021 at 10:01):

@Paulito Palmes yes I confirm that @distributed with PyCall works. Though by now I have already used LBFGSB.jl.

view this post on Zulip Rein Zustand (Jun 17 2021 at 10:36):

Using LBFGSB.jl results in fewer allocations that using Scipy's LBFGSB.


Last updated: Nov 22 2024 at 04:41 UTC