If I see:
double free or corruption (out)
double free or corruption (!prev)signal (6): Aborted
in expression starting at REPL[2]:1
[1] 104817 IOT instruction (core dumped) julia --project=@. --threads=32
is this a me-problem or a julialang-problem?
For context, this seems to be the code triggering it:
rows = fill(DataFrameRow[], threads())
@threads for r in axes(gnomad1, 1)
# stuff
push!(rows[threadid()], rowd[1, :])
end
Tried running again and I see
corrupted size vs. prev_size while consolidating
signal (6): Aborted
in expression starting at REPL[2]:1
__pthread_kill_implementation at /lib64/libc.so.6 (unknown line)
raise at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
__libc_message at /lib64/libc.so.6 (unknown line)
malloc_printerr at /lib64/libc.so.6 (unknown line)
_int_free at /lib64/libc.so.6 (unknown line)
free at /lib64/libc.so.6 (unknown line)
jl_realloc_aligned at /buildworker/worker/package_linux64/build/src/gc.c:249 [inlined]
gc_managed_realloc_ at /buildworker/worker/package_linux64/build/src/gc.c:3540 [inlined]
jl_gc_managed_realloc at /buildworker/worker/package_linux64/build/src/gc.c:3557
array_resize_buffer at /buildworker/worker/package_linux64/build/src/array.c:681
jl_array_grow_at_end at /buildworker/worker/package_linux64/build/src/array.c:904 [inlined]
jl_array_grow_end at /buildworker/worker/package_linux64/build/src/array.c:968
_growend! at ./array.jl:948 [inlined]
push! at ./array.jl:995
...
threadid()
is not a stable way to query this
also, your rows
holds the exact same array multiple times
julia> mutable struct Example
x
end
julia> rows = fill(Example(rand(UInt)), 3)
3-element Vector{Example}:
Example(0x32bdaaec78c24bc5)
Example(0x32bdaaec78c24bc5)
Example(0x32bdaaec78c24bc5)
julia> rows[1] === rows[2]
true
either fill in a loop explicitly or use a comprehension, otherwise you're possibly trying to grow the exact same array from multiple threads
Ah, thanks for that hint. I'll try [DataFrameRow[] for _ in 1:nthreads()]
. Do you know if this could also cause an UndefRefError
? I ask because it will take me ~1.5h to see if the fix worked.
don't think so? I'd have to take a look at the stacktrace
Well, I'll find out in a few minutes :sweat_smile:
1.5h seems like a very long time, do you not have a small-ish check program to run for testing purposes?
ah, yeah :joy:
or is that already your small-ish check?
I've stumbled across a bunch of issues that don't happen with small data sets, I'm guessing they're friendlier on race conditions etc.
Wooo! I think it worked
haha yeah, I can see that
resiliency against data races is hard (unless you go for Rust, then it's a little easier)
The moments of anticipation when the ETA counted down to 0s and it was processing were killing me :laughing: worse than a horror movie
nice
Timothy has marked this topic as resolved.
Last updated: Dec 28 2024 at 04:38 UTC