TY - JOUR
T1 - Independent component analysis recovers consistent regulatory signals from disparate datasets
AU - Sastry, Anand V.
AU - Hu, Alyssa
AU - Heckmann, David
AU - Poudel, Saugat
AU - Kavvas, Erol
AU - Palsson, Bernhard O.
N1 - Funding Information: AVS, AH, DH, SP, EK, and BOP were funded by the Novo Nordisk Foundation Center for Biosustainability and the Technical University of Denmark (grant number NNF10CC1016517). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Publisher Copyright: © 2021 Sastry et al.
PY - 2021/2/2
Y1 - 2021/2/2
N2 - The availability of bacterial transcriptomes has dramatically increased in recent years. This data deluge could result in detailed inference of underlying regulatory networks, but the diversity of experimental platforms and protocols introduces critical biases that could hinder scalable analysis of existing data. Here, we show that the underlying structure of the E. coli transcriptome, as determined by Independent Component Analysis (ICA), is conserved across multiple independent datasets, including both RNA-seq and microarray datasets. We subsequently combined five transcriptomics datasets into a large compendium containing over 800 expression profiles and discovered that its underlying ICA-based structure was still comparable to that of the individual datasets. With this understanding, we expanded our analysis to over 3, 000 E. coli expression profiles and predicted three high-impact regulons that respond to oxidative stress, anaerobiosis, and antibiotic treatment. ICA thus enables deep analysis of disparate data to uncover new insights that were not visible in the individual datasets.
AB - The availability of bacterial transcriptomes has dramatically increased in recent years. This data deluge could result in detailed inference of underlying regulatory networks, but the diversity of experimental platforms and protocols introduces critical biases that could hinder scalable analysis of existing data. Here, we show that the underlying structure of the E. coli transcriptome, as determined by Independent Component Analysis (ICA), is conserved across multiple independent datasets, including both RNA-seq and microarray datasets. We subsequently combined five transcriptomics datasets into a large compendium containing over 800 expression profiles and discovered that its underlying ICA-based structure was still comparable to that of the individual datasets. With this understanding, we expanded our analysis to over 3, 000 E. coli expression profiles and predicted three high-impact regulons that respond to oxidative stress, anaerobiosis, and antibiotic treatment. ICA thus enables deep analysis of disparate data to uncover new insights that were not visible in the individual datasets.
UR - https://www.scopus.com/pages/publications/85101332814
U2 - 10.1371/JOURNAL.PCBI.1008647
DO - 10.1371/JOURNAL.PCBI.1008647
M3 - Article
C2 - 33529205
SN - 1553-734X
VL - 17
SP - e1008647
JO - PLoS Computational Biology
JF - PLoS Computational Biology
IS - 2
M1 - e1008647
ER -