I have a file like this:
# x y z vmap wmap d_str dx dy dz
201.37400 193.90800 237.37600 0.34808 0.05243 0.95427 0.86271141 -0.41003201 0.29597765
201.01700 193.58800 237.45800 0.22788 0.01797 0.97622 0.86271141 -0.41003201 0.29597765
218.37300 191.85400 170.69900 0.00000 0.00000 0.00000 0.00000000 0.00000000 0.00000000
208.55000 230.82100 194.24000 0.02981 0.00932 0.96908 -0.08724227 -0.91048766 0.40422890
What I would like to get is a dataframe with column names from the header (x, y, z, vmap...). But CSV is having a hard time figuring out the format of the file and I can't seem to find, how to configure it manually.
For reference, DelimitedFiles works better for this file. The only problems are that header has an extra token in the beginning ("#") and data has an empty column as the last column, but both can be removed manually. Should I just go with DelimitedFiles, or is there a nice way to make it work with CSV?
You can skip first line (there is a special argument for that in CSV.jl) and it should figure out the rest.
Unfortunately, then it thinks that the delimter is ' ' and fills first 4 columns with missings, then has a value and again 3 missings, because that's how many whitespaces there are between the values. :confused:
Ah, it's not csv, in a sense is not comma separated.
I would love to hear about this too! I have data formatted in a very similar way that I have to deal with, and the closest I have been able to get is by manually dropping the #
character and doing something like:
CSV.File(
"./test_data.txt",
delim = ' ',
ignorerepeated = true,
)
but it would be fantastic if there was a way to parse the header with the comment character included in the original file
Perhaps a feature request could be to enable header::Regex
that captures the names or something?
Sounds good, just opened! https://github.com/JuliaData/CSV.jl/issues/840
Last updated: Dec 28 2024 at 04:38 UTC