using PyCall object in threads · helpdesk (published)

Stream: helpdesk (published)

Topic: using PyCall object in threads

Rein Zustand (Jun 04 2021 at 05:10):

The following code fails with segmentation faults

import PyCall

so = PyCall.pyimport("scipy.optimize")

function f(x)
    (x - 2) * x * (x + 2)^2
end

function do_optimize(fn)
    result = so.minimize_scalar(fn)
end

let
    out = Vector{Any}(undef, 20)
    Threads.@threads for i in 1:20
        println(i)
        out[i] = do_optimize(f)
    end
end

What should be the proper way to do this? If it is not possible, do I have to use multi-processing with Distributed instead?

I have also found an unresolved question (https://discourse.julialang.org/t/using-pycall-from-threads/32742) from the internet.

Mason Protter (Jun 04 2021 at 05:13):

Python doesn't have mulithreading, so none of it's datastructures are threadsafe

Mason Protter (Jun 04 2021 at 05:14):

Spinning up multiple julia instances won't help I think, instead, you need to have multiple python instances that you're sending data to

Rein Zustand (Jun 04 2021 at 05:16):

I see. Looks like the only way to do it is to use Python's own multiprocessing module, which technically spawn independent processes (with identical setups)?

Mason Protter (Jun 04 2021 at 05:16):

yep

Mason Protter (Jun 04 2021 at 05:16):

PyCall might have it's own way to spawn multiple pythons but I don't know

Rein Zustand (Jun 04 2021 at 05:18):

Thank you! I will investigate whether PyCall can spawn multiple Pythons.

Rein Zustand (Jun 04 2021 at 05:43):

I tried Python's multiprocessing, and still got segmentation fault, and caused my laptop to slow down. I managed to recover by pkill julia in a non-X terminal (ctrl-alt-F6).

import PyCall

so = PyCall.pyimport("scipy.optimize")
mp = PyCall.pyimport("multiprocessing")

function f(x)
    (x - 2) * x * (x + 2)^2
end

function do_optimize(fn)
    result = so.minimize_scalar(fn)
end

let
    procs = []
    for x in 1:20
        println(x)
        proc = mp.Process(target=f, args=())
        proc.start()
        push!(procs, proc)
    end
    for p in procs
        p.join()
    end
end

My hypothesis is that the proc object gets duplicated indefinitely when passed over to Julia?

Rein Zustand (Jun 04 2021 at 05:44):

^ I didn't even use the do_optimize function.

Rein Zustand (Jun 04 2021 at 06:49):

This solution works:

import PyCall

so = PyCall.pyimport("scipy.optimize")

function f(x)
    (x - 2) * x * (x + 2)^2
end

let
    out = Vector{Any}(undef, 20)
    Threads.@threads for i in 1:20
        function do_optimize(fn)
            result = so.minimize_scalar(fn)
        end

        out[i] = do_optimize(f)
    end
    println(out)
end

Not sure why.

Rein Zustand (Jun 04 2021 at 06:53):

This also works:

import PyCall

so = PyCall.pyimport("scipy.optimize")

function f(x)
    (x - 2) * x * (x + 2)^2
end

function do_optimize(fn)
    result = so.minimize_scalar(fn)
end

let
    out = Vector{Any}(undef, 20)
    Threads.@threads for i in 1:20
        _do_optimize = do_optimize

        out[i] = _do_optimize(f)
    end
    println(out)
end

Rein Zustand (Jun 04 2021 at 10:58):

I have tested that using Python multiprocessing, running a Python script that calls Julia functions also result in segfault. Looks like my only viable path is to either go full Julia or stay full Python. Too bad that the only Python function I need is the LBFGSB code from scipy.optimize.

Maarten (Jun 04 2021 at 12:02):

why don't you use a native lbfgs implementation?

Rein Zustand (Jun 04 2021 at 12:11):

LBFGS isn't bounded. It has to use a generic Optim.Fminbox and so is less efficient than LBFGSB I think. See https://discourse.julialang.org/t/optim-jl-vs-scipy-optimize-once-again/61661/35

Maarten (Jun 04 2021 at 12:19):

cool, I didn't know about lbfgs-B :)

Paulito Palmes (Jun 17 2021 at 02:12):

@distributed with pycall should work

Paulito Palmes (Jun 17 2021 at 02:15):

just add @everywhere stubs in loading packages and function defs

Rein Zustand (Jun 17 2021 at 10:01):

@Paulito Palmes yes I confirm that @distributed with PyCall works. Though by now I have already used LBFGSB.jl.

Rein Zustand (Jun 17 2021 at 10:36):

Using LBFGSB.jl results in fewer allocations that using Scipy's LBFGSB.

Last updated: Oct 02 2023 at 04:34 UTC