Reading from io = open(fname) is much slower than reading from IOBuffer(read(io)). I understand that this is because all the bytes are copied to RAM.
If we can guarantee that all bytes in io fit in RAM, then it is always better to use IOBuffer for speed?
How are you measuring? If you exclude the time it takes to reads the data into RAM for the IOBuffer from your measurement, of course it's faster. Only the processing time is left after all. However, your overall application likely is not magically fast, because the data still has to be read into RAM.
I am including the time it takes to create the IOBuffer and it is still faster.
Interesting :thinking: do you have an example?
Can try to produce one with more time later. Currently trying to debug another issue.
The example could be as simple as reading a matrix of size 2500×25000 stored in the IO:
for j in 1:25000
for i in 1:2500
read(io, Float64)
end
end
That is almost certainly because the file object (IOStream) has some overhead, mostly from taking a lock associated with the file. You can match the performance of the IOBuffer by buffering the file object in Julia. That is, you make a small buffer, e.g. a 16 KiB Vector{UInt8}, then read into that
You mean there is a method of read that accepts a small buffer to mutate?
help?> read!
search: read! read real rpad readdir Threads isready prepend! readeach readline readlink replace!
read!(stream::IO, array::AbstractArray)
read!(filename::AbstractString, array::AbstractArray)
Read binary data from an I/O stream or file, filling in array.
For what it's worth, I'm quite unhappy with the API that Base provides, which is why I made the package BufferIO.jl to improve this area of Julia
For a more mature, though less efficient alternative, look at BufferedStreams.jl
Nice! One more thing to learn when I find some time :slight_smile:
Júlio Hoffimann has marked this topic as resolved.
There is also InputBuffers.jl which I made as an even faster less buggy version of IOBuffer. I also use this with a FileArray type to avoid using IOStream or mmap, which I have had problems with. https://github.com/JuliaIO/ZipArchives.jl/blob/c48c367b6208c9733ff0e2f3431a6b8d13939126/test/test_file-array.jl#L22
Still fighting with IO here, but seeing some progress.
If you have the memory it will almost always be nicer to just read everything into memory first and do your your processing on a big Vector{UInt8}. Beyond performance doing this greatly simplifies the logic for error handling, because any IO errors can be handled up front, also, unlike IO, the Vector interface is well documented.
Last updated: Nov 27 2025 at 04:44 UTC