Stream: helpdesk (published)

Topic: Reading tsv file with multiple comment chars


view this post on Zulip Moorits Muru (Mar 24 2021 at 13:14):

I have some files that I want to read in. Most of the files are well-formatted and have either no header/comments or have header/comments all marked with leading '#' symbol. But few of the files have comments with the '#' symbol and header with '|' symbols. I don't need any of the headers, only data. Example file:

# comment line 1
# comment line 2
# comment line 3
|first column|  second column|  third column|
0.12415         125125.222      123132512
1.8950          555124.1        910520

Is there a nice way to be able to handle all those files? Usually, I use readdlm, but it only allows one character for comments. I am not familiar with CSV and was not able to find a way to handle these files well. There is, of course, skipstart and similar options, but I would prefer not to set them manually for each file.

view this post on Zulip Nils (Mar 24 2021 at 15:46):

Not sure about those headers (I guess if you just need the data they don't matter either way), on the comments CSV.jl allows you to pass the kwarg comment = "#" so they should get skipped automatically.


Last updated: Oct 02 2023 at 04:34 UTC