Now this is really really clever. Heap-merging a heavyweight genomics format, using RocksDB to speed it up.
There’s a problem with the single-pass merge described above when the number of intermediate files, N/R, is large. Merging the sorted intermediate files in limited memory requires constantly reading little bits from all those files, incurring a lot of disk seeks on rotating drives. In fact, at some point, samtools sort performance becomes effectively bound to disk seeking. […] In this scenario, samtools rocksort can sort the same data in much less time, using no more memory, by invoking RocksDB’s background compaction capabilities. With a few extra lines of code we configure RocksDB so that, while we’re still in the process of loading the BAM data, it runs additional background threads to merge batches of existing sorted temporary files into fewer, larger, sorted files. Just like the final merge, each background compaction requires only a modest amount of working memory.(via the RocksDB facebook group)
great presentation on Android mobile battery life, and what to avoid
‘This form is to document the testing that has been done on each app version before submitting to the App Store. For each item, indicate Yes if the testing has been done, Not Applicable if the testing does not apply (eg testing audio for an app that doesn’t play any), or No if the testing has not been done for another reason.’
paper by Peter M Fenwick, 1993. ‘A new method (the ‘binary indexed tree’) is presented for maintaining the cumulative frequencies which are needed to support dynamic arithmetic data compression. It is based on a decomposition of the cumulative frequencies into portions which parallel the binary representation of the index of the table element (or symbol). The operations to traverse the data structure are based on the binary coding of the index. In comparison with previous methods, the binary indexed tree is faster, using more compact data and simpler code. The access time for all operations is either constant or proportional to the logarithm of the table size. In conjunction with the compact data structure, this makes the new method particularly suitable for large symbol alphabets.’ via Jakob Buchgraber, who’s implementing it right now in Netty ;)
Links for 2014-05-02
permalink. Both comments and trackbacks are currently closed.. Bookmark the