Truncate string to X characters · helpdesk (published)

str = "some long string possibly with Unicode characters"
trunc_str = f(str)
@assert textwidth(str) == X

Sebastian Pfitzner (Sep 15 2021 at 13:18):

Kwaku Oskin (Sep 15 2021 at 13:31):

function widthtrunc(str, N)
    i = 1
    w1 = 0
    while i < ncodeunits(str)
        c = str[i]
        w1 += textwidth(c)
        w1 >= N && break
        i = nextind(str, i)
    end

    i = i >= ncodeunits(str) ? prevind(str, i) : i

    return str[1:i]
end

julia> widthtrunc("h⛵αβ", 4)
"h⛵α"

Keith Rutkowski (Sep 15 2021 at 14:18):

I had recently done something like this using chop(str, length(str)-N) (maybe a -1 is needed too, I can't recall)

Fredrik Ekre (Sep 15 2021 at 14:53):

Thanks, of course a loop is one option, was just thinking if there was something more convenient. I will suggest this:

function shorten_str(str, max_len)
    io = IOBuffer(; sizehint=max_len)
    len = 0
    for c in str
        len += textwidth(c)
        if len > max_len
            break
        end
        print(io, c)
    end
    return String(take!(io))
end

Sebastian Pfitzner (Sep 15 2021 at 14:55):

Fredrik Ekre (Sep 15 2021 at 14:59):

True, I had that version first but then I thought I needed i for something, but obviously not in this final version. Edited.

Kwaku Oskin (Sep 15 2021 at 15:07):

julia> @btime widthtrunc("h⛵αβ", 4)
  63.938 ns (1 allocation: 32 bytes)
"h⛵α"

julia> @btime shorten_str("h⛵αβ", 4)
  161.731 ns (5 allocations: 256 bytes)
"h⛵α"

Of course, it's probably not a bottleneck function, so it doesn't really matter which one to choose.

Fredrik Ekre (Sep 15 2021 at 15:25):

function shorten_str(str, max_len)
    len = 0
    ind = 1
    for i in eachindex(str)
        c = @inbounds str[i]
        len += textwidth(c)
        len > max_len && break
        ind = i
    end
    return str[1:ind]
end

Keith Rutkowski (Sep 15 2021 at 17:40):

Jakob Nybo Nissen (Sep 15 2021 at 17:45):

Sebastian Pfitzner (Sep 15 2021 at 18:07):

Also, doesn't chop allocate a new string as well? If you actually want a non-allocating solution then there's always

function shorten_str(str, max_len)
    len = 0
    ind = 1
    for i in eachindex(str)
        c = @inbounds str[i]
        len += textwidth(c)
        len > max_len && break
        ind = i
    end
    return SubString(str, 1:ind)
end

Sebastian Pfitzner (Sep 15 2021 at 18:10):

And if you don't care about textwidth then you should probably be using first(str, max_len) anyways

Keith Rutkowski (Sep 15 2021 at 18:38):

chop does respect textwidth, at least the results are the same as shorten_str, and it seems to be faster than the SubString approach too.

julia> shorten(str, max_len) = chop(str, tail = length(str)-(max_len-1))
shorten (generic function with 1 method)

julia> @btime shorten_str("h⛵αβ", 4)
  70.533 ns (0 allocations: 0 bytes)
"h⛵α"

julia> @btime shorten("h⛵αβ", 4)
  26.924 ns (0 allocations: 0 bytes)
"h⛵α"

Sebastian Pfitzner (Sep 15 2021 at 18:41):

sure, but that's because that example string doesn't have any chars that are wider than 1 column

Kwaku Oskin (Sep 15 2021 at 18:45):

Sebastian Pfitzner (Sep 15 2021 at 18:46):

Kwaku Oskin (Sep 15 2021 at 18:46):

julia> textwidth('⛵')
2

Sebastian Pfitzner (Sep 15 2021 at 18:51):

julia> shorten("aaaaa", 3)
"aa"

Sebastian Pfitzner (Sep 15 2021 at 18:51):

Kwaku Oskin (Sep 15 2021 at 18:53):

Keith Rutkowski (Sep 15 2021 at 18:53):

julia> chop("h⛵αβ", head=1, tail = 0)
"⛵αβ"

julia> chop("h⛵αβ", head=2, tail = 0)
"αβ"

julia> chop("h⛵αβ", head=3, tail = 0)
"β"

julia> chop("h⛵αβ", head=4, tail = 0)
""

Keith Rutkowski (Sep 15 2021 at 18:57):

Oh, I see the problem... the example used 4 as the max length, when it should have been 3?

julia> shorten(str, max_len) = chop(str, tail = length(str)-max_len)
shorten (generic function with 1 method)

julia> shorten("aaaaa", 3)
"aaa"

julia> shorten("h⛵αβ", 3)
"h⛵α"

Sebastian Pfitzner (Sep 15 2021 at 18:58):

julia> textwidth("h⛵α")
4

Stream: helpdesk (published)

Topic: Truncate string to X characters

Fredrik Ekre (Sep 15 2021 at 11:56):

Sebastian Pfitzner (Sep 15 2021 at 13:18):

Kwaku Oskin (Sep 15 2021 at 13:31):

Keith Rutkowski (Sep 15 2021 at 14:18):

Fredrik Ekre (Sep 15 2021 at 14:53):

Sebastian Pfitzner (Sep 15 2021 at 14:55):

Fredrik Ekre (Sep 15 2021 at 14:59):

Kwaku Oskin (Sep 15 2021 at 15:07):

Fredrik Ekre (Sep 15 2021 at 15:25):

Keith Rutkowski (Sep 15 2021 at 17:40):

Jakob Nybo Nissen (Sep 15 2021 at 17:45):

Sebastian Pfitzner (Sep 15 2021 at 18:07):

Sebastian Pfitzner (Sep 15 2021 at 18:10):

Keith Rutkowski (Sep 15 2021 at 18:38):

Sebastian Pfitzner (Sep 15 2021 at 18:41):

Kwaku Oskin (Sep 15 2021 at 18:45):

Sebastian Pfitzner (Sep 15 2021 at 18:46):

Kwaku Oskin (Sep 15 2021 at 18:46):

Sebastian Pfitzner (Sep 15 2021 at 18:51):

Sebastian Pfitzner (Sep 15 2021 at 18:51):

Kwaku Oskin (Sep 15 2021 at 18:53):

Keith Rutkowski (Sep 15 2021 at 18:53):

Keith Rutkowski (Sep 15 2021 at 18:57):

Sebastian Pfitzner (Sep 15 2021 at 18:58):