TY - JOUR
T1 - ChopBAI
T2 - BAM index reduction solves I/O bottlenecks in the joint analysis of large sequencing cohorts
AU - Kehr, Birte
AU - Melsted, Páll
N1 - Publisher Copyright: © 2016 The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: [email protected].
PY - 2016/7/15
Y1 - 2016/7/15
N2 - Advances in sequencing capacity have led to the generation of unprecedented amounts of genomic data. The processing of this data frequently leads to I/O bottlenecks, e. g. when analyzing a small genomic region across a large number of samples. The largest I/O burden is, however, often not imposed by the amount of data needed for the analysis but rather by index files that help retrieving this data. We have developed chopBAI, a program that can chop a BAM index (BAI) file into small pieces. The program outputs a list of BAI files each indexing a specified genomic interval. The output files are much smaller in size but maintain compatibility with existing software tools. We show how preprocessing BAI files with chopBAI can lead to a reduction of I/O by more than 95% during the analysis of 10 kb genomic regions, eventually enabling the joint analysis of more than 10 000 individuals. Availability and Implementation: The software is implemented in C ++, GPL licensed and available at http://github.com/DecodeGenetics/chopBAI Contact:
AB - Advances in sequencing capacity have led to the generation of unprecedented amounts of genomic data. The processing of this data frequently leads to I/O bottlenecks, e. g. when analyzing a small genomic region across a large number of samples. The largest I/O burden is, however, often not imposed by the amount of data needed for the analysis but rather by index files that help retrieving this data. We have developed chopBAI, a program that can chop a BAM index (BAI) file into small pieces. The program outputs a list of BAI files each indexing a specified genomic interval. The output files are much smaller in size but maintain compatibility with existing software tools. We show how preprocessing BAI files with chopBAI can lead to a reduction of I/O by more than 95% during the analysis of 10 kb genomic regions, eventually enabling the joint analysis of more than 10 000 individuals. Availability and Implementation: The software is implemented in C ++, GPL licensed and available at http://github.com/DecodeGenetics/chopBAI Contact:
UR - https://www.scopus.com/pages/publications/84992391290
U2 - 10.1093/bioinformatics/btw149
DO - 10.1093/bioinformatics/btw149
M3 - Article
C2 - 27153590
SN - 1367-4803
VL - 32
SP - 2202
EP - 2204
JO - Bioinformatics
JF - Bioinformatics
IS - 14
ER -