I'm starting to learn how to do asynchronous programming in Julia, yet, the topic is brand new to me and Julia's Documentation is kind of terse in the subject. In the code below, I download a bunch of files using the @async
macro, and then I do some processing. The thing is, I wish to only the the processing once the files have been downloaded...
function parsedocuments()
for b in [1, 11, 21]
downloadbatch(documents[b:b+9,:])
Threads.@threads for i in b:(b+9)
try
@suppress parsedocument(documents[i, :])
catch
continue
end
end
end
end
function downloadbatch(batch)
output = datadir() * "/temp/"
for row in eachrow(batch)
@async run(`aws s3 cp $(row[:pasta]) $(output)`)
end
end
I haven't been able to figure out how to do this... I mean, the code for parsedocument
starts running before the download has finished.
Two separate loops? You may also want a @sync
in there somewhere..
Brenhin Keller said:
Two separate loops? You may also want a
@sync
in there somewhere..
Tried adding a @sync
, but it did not work.
@sync for row in eachrow(batch)
@async run(`aws s3 cp $(row[:pasta]) $(output)`)
end
You should read tkf blog about async patterns, it is extremely useful.
Or just use his packages, like FLoops.jl and ThreadsX.jl
https://discourse.julialang.org/t/tutorial-concurrency-patterns-for-controlled-parallelisms/62651
Last updated: Nov 06 2024 at 04:40 UTC