This project was created as an assigment for the Implementation of Distributed Systems and Scientific Writing courses during my semester at KTH.
The aim of this project was to find out just how feasible the distribution of the alignment of genome sequences over several machines is in a world that is still largely dominated by single-machine multiple-core sequence aligners. To that end, we evaluated the performance of 5 different aligners with the same input data, focusing on alignment duration and accuracy. The considered aligners were:
- Centralized (non-distributed)
Based on our results, we concluded that distributing the alignment process is feasible. We were able to obtain more than 3x speedup with a relatively small cluster of 5 nodes with a negligible impact in accuracy. We have also noticed that different aligners have different optimization areas with some favouring accuracy while others favour speed. FInally, we have noticed that most of the aligners currently implemented rely on old algorithms and that newer, more sophisticated ones provide good opportunities for further speedup.