Stream: helpdesk (published)

Topic: reading a file with header with CSV.jl


view this post on Zulip Moorits Muru (May 28 2021 at 11:55):

I have a file like this:

# x y z  vmap wmap d_str  dx dy dz
    201.37400    193.90800    237.37600 0.34808 0.05243 0.95427  0.86271141 -0.41003201  0.29597765
    201.01700    193.58800    237.45800 0.22788 0.01797 0.97622  0.86271141 -0.41003201  0.29597765
    218.37300    191.85400    170.69900 0.00000 0.00000 0.00000  0.00000000  0.00000000  0.00000000
    208.55000    230.82100    194.24000 0.02981 0.00932 0.96908 -0.08724227 -0.91048766  0.40422890

What I would like to get is a dataframe with column names from the header (x, y, z, vmap...). But CSV is having a hard time figuring out the format of the file and I can't seem to find, how to configure it manually.

For reference, DelimitedFiles works better for this file. The only problems are that header has an extra token in the beginning ("#") and data has an empty column as the last column, but both can be removed manually. Should I just go with DelimitedFiles, or is there a nice way to make it work with CSV?

view this post on Zulip Andrey Oskin (May 28 2021 at 13:07):

You can skip first line (there is a special argument for that in CSV.jl) and it should figure out the rest.

view this post on Zulip Moorits Muru (May 28 2021 at 13:13):

Unfortunately, then it thinks that the delimter is ' ' and fills first 4 columns with missings, then has a value and again 3 missings, because that's how many whitespaces there are between the values. :confused:

view this post on Zulip Andrey Oskin (May 28 2021 at 13:16):

Ah, it's not csv, in a sense is not comma separated.

view this post on Zulip Ian Weaver (May 28 2021 at 15:47):

I would love to hear about this too! I have data formatted in a very similar way that I have to deal with, and the closest I have been able to get is by manually dropping the # character and doing something like:

CSV.File(
    "./test_data.txt",
    delim = ' ',
    ignorerepeated = true,
)

but it would be fantastic if there was a way to parse the header with the comment character included in the original file

view this post on Zulip Fredrik Ekre (May 28 2021 at 16:03):

Perhaps a feature request could be to enable header::Regex that captures the names or something?

view this post on Zulip Ian Weaver (May 28 2021 at 17:31):

Sounds good, just opened! https://github.com/JuliaData/CSV.jl/issues/840


Last updated: Oct 02 2023 at 04:34 UTC