seminaires:algocomp:db:62

Partitioning sequences and trees are classical load balancing problems that received considerable attention in the 80ies and 90ies. For both problems exact algorithms with different characteristics exist. However, in the context of massive data sets, these algorithms fail since they assume random access to the input, an assumption that can hardly be granted. The key motivation of this work is the partitioning of current XML databases, some of which reach many terrabytes in size. In an XML database, data is organized in a tree structure.

In this talk, I will present streaming algorithms for both problems. The presented algorithms require a random access memory whose size is only logarithmic in the size of the input, which makes them good candidates for performing well in practice. This work will be presented next week at ICDT 2016, the 19th International Conference on Database Theory.