I'm trying to run julia on a tpu-vm v3-8 using the tpu-vm-pt-1.10 image. It crashes on various operations with "free(): invalid pointer." This happens with both the latest release and the LTS. (crossposting from slack)
(@v1.6) pkg> generate Demo
Generating project Demo:
free(): invalid pointer
signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f5bc79ef3ed)
unknown function (ip: 0x7f5bc79f747b)
unknown function (ip: 0x7f5bc79f8cab)
git_mbedtls_stream_global_init at /home/rs/tools/julia-1.6.5/bin/../lib/julia/libgit2.so (unknown line)
init_once at /home/rs/tools/julia-1.6.5/bin/../lib/julia/libgit2.so (unknown line)
__pthread_once_slow at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
git_libgit2_init at /home/rs/tools/julia-1.6.5/bin/../lib/julia/libgit2.so (unknown line)
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/error.jl:108 [inlined]
initialize at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl:986
#164 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl:971
lock at ./lock.jl:187
ensure_initialized at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl:967 [inlined]
GitConfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/config.jl:50
GitConfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/config.jl:50 [inlined]
with at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/types.jl:1156 [inlined]
getconfig at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/config.jl:160 [inlined]
project at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:30
#generate#3 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:15
generate at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:10 [inlined]
#generate#2 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:8 [inlined]
generate at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:8 [inlined]
#generate_deprecated#1 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:5 [inlined]
generate_deprecated at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/generate.jl:4
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:670
do_cmd! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:405
#do_cmd#21 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:386
do_cmd at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:377 [inlined]
#24 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/REPLMode/REPLMode.jl:550
jfptr_YY.24_45436.clone_1 at /home/rs/tools/julia-1.6.5/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:714
#invokelatest#2 at ./essentials.jl:708 [inlined]
invokelatest at ./essentials.jl:706 [inlined]
run_interface at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/LineEdit.jl:2441
jfptr_run_interface_54737.clone_1 at /home/rs/tools/julia-1.6.5/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
run_frontend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:1126
#44 at ./task.jl:411
jfptr_YY.44_53285.clone_1 at /home/rs/tools/julia-1.6.5/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:834
Allocations: 2654 (Pool: 2639; Big: 15); GC: 0
Aborted (core dumped)
I found a very similar issue in discourse, but it had no replies. Github issues did not seem to have anything relevant.
https://discourse.julialang.org/t/issues-with-julia-installation-on-google-tpu-vm/65783
$ uname -a
Linux *********** 5.11.0-1021-gcp #23~20.04.1-Ubuntu SMP Fri Oct 1 19:04:32 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
I did notice that trying to check the libc.so.6 info segfaulted, which makes me suspicious google is doing something strange with their glibc.
$ /lib/x86_64-linux-gnu/libc.so.6
Segmentation fault (core dumped)
Anyone had success running julia on tpu-vms? Thanks!
Hm, yeah I'm not sure how much attention Julia's TPU stuff has gotten in a while. It seemed to be an exciting hotness for a little while and then I suddenly stopped hearing about it
Yeah.. I wonder how hard it would be to target with StaticCompiler?
To be fair, I have seen these often enough with using jax via PyCall that it's likely some interaction between XLA's LLVM and Julia's LLVM.
AFAIK @Reid is using the official binaries, so there should only be one LLVM (Julia's own) in play? No other libraries.
Brian Chen said:
AFAIK Reid is using the official binaries, so there should only be one LLVM (Julia's own) in play? No other libraries.
I am using the official binaries. I also tried installing with conda forge and building from source.
Also here's the GitHub issue I opened.
https://github.com/JuliaLang/julia/issues/44242
Jayesh K. Gupta said:
To be fair, I have seen these often enough with using jax via PyCall that it's likely some interaction between XLA's LLVM and Julia's LLVM.
Yes, I understand XLA.jl is languishing, but I thought calling out to jax, maybe with a simple wrapper would be usable. Sadly I can't even get to the point of trying to use the tpu cores. Do you have a working setup on a tpu vm (aka you can use julia even just on the cpu)?
What confuses me is that Julia does run on Colab VMs, which shouldn't be that different than the TPU VM you're using (they're all on GCP after all). Perhaps there's some (mis)configuration with the TPU VM image that is wreaking havok?
Brian Chen said:
What confuses me is that Julia does run on Colab VMs, which shouldn't be that different than the TPU VM you're using (they're all on GCP after all). Perhaps there's some (mis)configuration with the TPU VM image that is wreaking havok?
Yeah, I tested it on a regular vm and it worked. I have tried v2-alpha, tpu-vm-pt-1.10, and tpu-vm-tf-2.8.0 images and it failed.
Last updated: Dec 28 2024 at 04:38 UTC