@Kevin Bonham interesting paper, that. I never heard about BURST before. Do you know of any papers that measure how much better BURST is than other (k-mer based) aligners in e.g. a metagenomic setting or at various levels of nucleotide identity? The BURST paper itself skips _very_ lightly over the methodology.
I mean, if you have a 250 bp read with 80% nt identity, there would be only 6.7e-5 probability of a length 12 kmer not matching if the mismatches were randomly distributed. And in bacteria, my experience is that there are usually some more conserved areas around the interesting parts of the genome anyway.
(That being said, for some applications, if I could get even 10% of the reads to align more accurately, I would happily pay 10x the compute for that)
@Jakob Nybo Nissen I have not seen anyone use it in the wild. At a conference they presented some really impressive benchmarks - the context was shallow shotgun gut metagenomes, and the claim was that BURST allowed them to get remarkably good taxonomic assignment with like 20% of the number of reads.
Which would be huge, if true. But the last time I looked at the paper and the documentation, it was not obvious to me what steps to take to reproduce that.
The reason I'm somewhat skeptical is that, if it was just a matter of more compute, people would have run Needleman-Wunch on reads already. I don't know how much effort and money it takes to obtain, say 200 gut metagenome NGS runs, but I'm betting it's a lot. Spending tens of thousands of dollars on that, and then not wanting to pay for a server to sit and run N-W for a few days would be weird.
I know Dan Knights (the PI) has a company (or consults with one maybe) that does analysis of microbiome metagenomes in a commercial setting, and I think they're using BURST for that, but I'm not sure
Well yeah, I mean that's what they were hyping - a revolution that gets you the precision of NW at drastically higher speeds. But it's tough to evaluate that claim of I can't run it :shrug:
Last updated: Nov 22 2024 at 04:41 UTC