Partially parallel problems · helpdesk (published)

I'm trying to find some resources on how to work on improving the parallel processing of an algorithm, but I don't know what terms I should be searching with. The specific algorithm I'm looking at is Needleman-Wunch, a classic in bioinformatics and an example of "dynamic programming" (though I don't actually know what that term means).

Basically, given an n x m matrix, one can calculate the value of position (i, j), as long as [(i-1, j-1), (i-1, j), (i, j-1)] are known (the first row and first column can all be calculated from the start).

I found this paper, but it seems to be doing some fancy tricks to optimize, I'm looking to just sort of queue up tasks as they become available, if that makes any sense. Or maybe this is just an incredibly naive way to think about it...

Patrick Toche (Jun 15 2021 at 06:43):

Dynamic programming is the main pillar of modern mathematical economics. There are at least 4 or 5 econ Nobel prize winners that I can think of whose work contributed to the foundations of dynamic programming (sadly Richard Bellman, the "father" of dynamic programming, did not get the prize). Thomas Sargent, of Julia and Nobel fame, has good intro chapters in most of his books, and there will be Julia-based lectures over at Quant Econ: https://julia.quantecon.org/dynamic_programming/index_grad.html

Being an economist I was surprised that I'd never heard of Needleman-Wunch, not even by name. I got curious and found these introductory pages: It seems that naive approaches run into memory issues pretty quickly. Sorry I can't help at all, but I'd love to learn more about this.

Kwaku Oskin (Jun 15 2021 at 07:52):

Kwaku Oskin (Jun 15 2021 at 07:54):

You probable should lock channel on read/write operations, which can be bottleneck, but other then that it should work smoothly.

Jakob Nybo Nissen (Jun 15 2021 at 11:38):

Takafumi Arakaki (tkf) (Jun 16 2021 at 00:18):

I agree with others that it might be better to look into existing implementation strategies. Maybe there are some domain specific optimizations you can do.

But, from just the description of the OP, it looks like what you want has this access pattern?

# "left top" means (1, 1) and "right bottom" means (n, m)
for i in left_top_to_right_bottom      # serialized
    for j in left_bottom_to_right_top  # parallelizable
        xs[i, j] = compute(xs[i-1, j-1], xs[i-1, j], xs[i, j-1])
    end
end

If so, it looks like you can "just" parallelize the inner loop? You can use @threads or @floop to do it.

Also, for this kind of "parallel loop(s) inside sequential loop" pattern, https://github.com/tkf/Barriers.jl may be an interesting tool to use. Though it's a bit lower-level than simple parallel for loops.

Syx Pek (Jun 16 2021 at 11:15):

Rasmus Henningsson (Jun 16 2021 at 14:55):

Takafumi Arakaki (tkf) (Jun 16 2021 at 20:38):

ah, yeah, iterating diagonally was what I meant by left top to right bottom etc. but using i and j there was incorrect

Kevin Bonham (Jun 16 2021 at 23:44):

Cool, thanks for the responses everyone, sorry it took me a while to get back to it!

@Patrick Toche Very interesting, thanks. I still don't know what dynamic programming means, but sounds like it has a rich history! :laughter_tears: . I will get to your reading list, thanks! Happy to tell you more about the bioinformatics part if you like.

@Jakob Nybo Nissen I am, because it's simpler than Smith-Waterman and I'm doing it as a learning exercise (for myself and my students). Ultimately I'm interested in figuring out this paper, which is supposedly based on NW, and possibly trying to write it in julia - the tool is supposed to be amazing, but basically only the grad student who wrote it knows how to use it, and he jumped to industry. The impression he gave (though to be frank it was a bit tough to understand him) was that those optimizations and shortcuts detract from the "provably optimal" nature of NW... whether that's actually significant, I don't really know.

@Takafumi Arakaki (tkf) Cool - thanks! If I understand what you mean by iterating diagonally, I think that's right, but I'll need to spend more time wrapping my head around it.

Patrick Toche (Jun 17 2021 at 03:05):

Always interested to read about this! Had a quick look at the BURST github page: looks like quite a bit of work went into it!

Kevin Bonham (Jun 17 2021 at 03:26):

Yeah, I met the guy that wrote it - he's pretty clearly a genius. But one of those whose genius does not extend to explaining themselves well, nor apparently to documenting a tool that's meant to be used by others. :shrug:

Brenhin Keller (Jun 17 2021 at 04:21):

https://github.com/knights-lab/BURST looks reasonably short at least.. I wonder how direct of a C-to-Julia port you could get away with? Julia written in C-like style is usually not too bad performance wise.. but I guess the pointer stuff including all the -> s would be a bit tricky to naively translate and still get the same performance as C given that there apparently isn't quite a proper "dereference" in Julia since according to https://giordano.github.io/blog/2019-05-03-julia-get-pointer-value/ even unsafe_load makes a copy

Brenhin Keller (Jun 17 2021 at 04:41):

…but the pragma omp stuff could be pretty trivially replaced with the built-in multithreading, and the preprocessor macro stuff seems like it ought to be no trouble by Julia metaprogramming standards?

Takafumi Arakaki (tkf) (Jun 17 2021 at 04:56):

atomic increment of an element in an array like this is a bit tricky ATM (it's doable but you'd need to write LLVM IR by hand)

Keith Rutkowski (Jun 17 2021 at 10:00):

CBinding.jl provides a decent solution for C's -> style dereferencing in Julia without making copies.

Stream: helpdesk (published)

Topic: Partially parallel problems

Kevin Bonham (Jun 15 2021 at 00:07):