Stream: helpdesk (published)

Topic: number of bytes remaining in IO


view this post on Zulip Colin Caine (Feb 14 2021 at 20:05):

Is there a way of finding out how many bytes remain in an IO stream? I want to be able to work out if e.g.peek(io, Int) will error without calling it.

I think this can probably only be done by trying to read the file and seeing what happens, then resetting the seek point (which is how peek works), but I thought I'd ask anyway.

view this post on Zulip Fredrik Ekre (Feb 14 2021 at 20:07):

You can use bytesavailable maybe?

view this post on Zulip Jakob Nybo Nissen (Feb 14 2021 at 20:07):

Returns zero for me for a nonzero file

view this post on Zulip Expanding Man (Feb 14 2021 at 20:17):

ohhhh, so interesting that you should mention this... I think this may be a bug

view this post on Zulip Expanding Man (Feb 14 2021 at 20:18):

are you on 1.6?

view this post on Zulip Expanding Man (Feb 14 2021 at 20:18):

I saw this also, maybe we should open an issue

view this post on Zulip Fredrik Ekre (Feb 14 2021 at 20:18):

Ah, right! I ran into this the other day.

view this post on Zulip Expanding Man (Feb 14 2021 at 20:18):

ok, we definitely need an issue

view this post on Zulip Expanding Man (Feb 14 2021 at 20:18):

let me ask on slack first

view this post on Zulip Fredrik Ekre (Feb 14 2021 at 20:20):

It might be correct though, since the io stream needs to block before you can read?

view this post on Zulip Expanding Man (Feb 14 2021 at 20:20):

no, I've seen it give 0 and then immediately readavailable returns something of length >0 >0

view this post on Zulip Fredrik Ekre (Feb 14 2021 at 20:21):

Yea but that is because readavailable does block and fill the buffer I think: https://github.com/JuliaLang/julia/blob/2eeef2e231a55cac770543b6dd673e349adfd797/base/iostream.jl#L381

view this post on Zulip Expanding Man (Feb 14 2021 at 20:21):

I'm confused, shouldn't readavailable always return the number of bytes indicated before calling it by bytesavailable?

view this post on Zulip Expanding Man (Feb 14 2021 at 20:22):

that was my understanding of what they're supposed to do

view this post on Zulip Expanding Man (Feb 14 2021 at 20:23):

bytesavailable seems to work on IOBuffer btw, though it may be worth noting that IOBuffer never blocks

view this post on Zulip Fredrik Ekre (Feb 14 2021 at 20:23):

Yea I don't know actually.

view this post on Zulip Expanding Man (Feb 14 2021 at 20:26):

I believe this is something of a MWE

julia> open(io -> write(io, "what up"), "testfile.txt", "w+")
7

julia> f = open("testfile.txt")
IOStream(<file testfile.txt>)

julia> bytesavailable(f)
0

julia> readavailable(f)
7-element Vector{UInt8}:
 0x77
 0x68
 0x61
 0x74
 0x20
 0x75
 0x70

view this post on Zulip Expanding Man (Feb 14 2021 at 20:26):

I believe bytesavailable should return 7

view this post on Zulip Kwaku Oskin (Feb 14 2021 at 20:28):

# num bytes available without blocking

view this post on Zulip Kwaku Oskin (Feb 14 2021 at 20:28):

It is written in the source code.

view this post on Zulip Expanding Man (Feb 14 2021 at 20:28):

Yup, that's what the docs say, and the above does not block, so this looks like a bug

view this post on Zulip Kwaku Oskin (Feb 14 2021 at 20:29):

So, it's not the number of bytes from here till the end of the stream.

view this post on Zulip Kwaku Oskin (Feb 14 2021 at 20:29):

Mmmm....
You need to block to readavailable

view this post on Zulip Fredrik Ekre (Feb 14 2021 at 20:30):

I mean, you can not assume that readavailable will give you bytesavailable bytes I think

view this post on Zulip Expanding Man (Feb 14 2021 at 20:30):

why not?

view this post on Zulip Fredrik Ekre (Feb 14 2021 at 20:30):

Because that could have changed between calling bytesavailable and readavailable.

view this post on Zulip Expanding Man (Feb 14 2021 at 20:30):

well yeah, ok fair enough, but we are seeing it in examples where that is not the case

view this post on Zulip Kwaku Oskin (Feb 14 2021 at 20:31):

Hmmm.... I do not quite understand what is intended behaviour, but code itself is rather straightforward.

bytesavailable(s::IOStream) = @_lock_ios s ccall(:jl_nb_available, Int32, (Ptr{Cvoid},), s.ios)

function readavailable(s::IOStream)
    lock(s.lock)
    nb = ccall(:jl_nb_available, Int32, (Ptr{Cvoid},), s.ios)
    if nb == 0
        ccall(:ios_fillbuf, Cssize_t, (Ptr{Cvoid},), s.ios)
        nb = ccall(:jl_nb_available, Int32, (Ptr{Cvoid},), s.ios)
    end
    a = Vector{UInt8}(undef, nb)
    nr = ccall(:ios_readall, Csize_t, (Ptr{Cvoid}, Ptr{Cvoid}, Csize_t), s, a, nb)
    if nr != nb
        unlock(s.lock)
        throw(EOFError())
    end
    unlock(s.lock)
    return a
end

view this post on Zulip Fredrik Ekre (Feb 14 2021 at 20:31):

What example? In your example you get mroe bytes than bytesavailable says so I think that is fine.

view this post on Zulip Kwaku Oskin (Feb 14 2021 at 20:33):

So to readavailableit verify number of available bytes and if it is zero it fills the buffer and update this number.

view this post on Zulip Kwaku Oskin (Feb 14 2021 at 20:33):

So, I think, correct meaning of bytesavailable is not the number of bytes available in iostream, but number of bytes available in the intermediate buffer.

view this post on Zulip Kwaku Oskin (Feb 14 2021 at 20:34):

Purely technical thing.

view this post on Zulip Expanding Man (Feb 14 2021 at 20:34):

Yeah, I noticed that it's a direct ccall, kind of makes me suspect it's an intended thing, but I don't know why

view this post on Zulip Expanding Man (Feb 14 2021 at 20:35):

well, regardless, I find it all very confusing

view this post on Zulip Expanding Man (Feb 14 2021 at 20:35):

"number of bytes available before blocking" sounds very straightforward, and it's clearly not giving me that

view this post on Zulip Kwaku Oskin (Feb 14 2021 at 20:36):

It depends on how you define blocking

view this post on Zulip Kwaku Oskin (Feb 14 2021 at 20:36):

For me, blocking means that you temporarily "own" file handler to read next chunk of data from the file.

view this post on Zulip Expanding Man (Feb 14 2021 at 20:37):

well I did not do an @async or @spawn and it's not stopping the execution, so that seems to me like it's not blocking

view this post on Zulip Fredrik Ekre (Feb 14 2021 at 20:37):

Yea I am pretty sure that is what is meant.

view this post on Zulip Fredrik Ekre (Feb 14 2021 at 20:45):

What are you using it for? I thought I would have use for it the other day but in the end I only use read and write with async tasks.

view this post on Zulip Fredrik Ekre (Feb 14 2021 at 20:45):

https://github.com/JuliaLang/julia/issues/24526 has some review of this btw

view this post on Zulip Expanding Man (Feb 14 2021 at 20:46):

Ok, I just had a conversation on slack, I think our problem is just that we are conflating different types of blocking

view this post on Zulip Expanding Man (Feb 14 2021 at 20:46):

readavailable seems to normally only block on relatively fast stuff

view this post on Zulip Colin Caine (Feb 14 2021 at 21:47):

I feel like bytesavailable isn't what I want. I want to know if there are at least N bytes remaining in the file or stream, but I don't care how many are currently buffered. When N = 1, this is equivalent to eof(io), but I want to be able to pick other Ns.


Last updated: Nov 22 2024 at 04:41 UTC