Vasilis Ntranos

Vasilis Ntranos

  • Postdoctoral Scholar in Biology and Biological Engineering @ Caltech
    Visiting Postdoc, Electrical Engineering @ Stanford

    Hello! I’m a postdoctoral researcher time-sharing between Caltech, Stanford and UC Berkeley working with Prof. Lior Pachter and Prof. David Tse on computational and statistical problems that arise in biology. My background is in information theory and I'm particularly interested in challenges that emerge from single-cell RNA-seq technologies. Before moving to Berkeley, I received my PhD in Electrical Engineering from USC, under the supervision of Prof. Giuseppe Caire. My PhD research focused on information theory problems with applications in distributed systems, cooperative communication and large-scale wireless networks.

    download cv

    Recent publications:

    M. Zhang, V. Ntranos, D. Tse, "One read per cell per gene is optimal for single-cell RNA-seq," under review in Nature Communications, preprint available on biorxiv

    V. Ntranos, L. Yi, P. Melsted, L. Pachter, " A discriminative learning approach to differential expression analysis for single-cell RNA-seq," Nature Methods, 2019

    A. Ntranos, V. Ntranos, V. Bonnefil, J. Liu, S. Kim-Schulze, Y. He, Y. Zhu, R. Brandstadter, C. T. Watson, A. J. Sharp, I. Katz, P. Casaccia, “Fumarates target the metabolic-epigenetic interplay in brain-homing T cells of multiple sclerosis patients,” Brain (editor's choice, cover), 2019.

    News.

    • Oct 3, 2018

      Invited talk, Allerton Conference on Communication, Control, and Computing.

      Presenting our recent results on "Optimal sequencing-budget allocation for single-cell RNA-seq." Joint work with Martin Zhang and David Tse. [link]

    • July 23, 2018

      2018 Computational Genomics Summer Institute, UCLA

      Organizing a week-long journal club on "Deep learning applications in genomics" as a part of CGSI's long course program. See here for more info.

    • April 9, 2018

      Invited participant, Human Cell Atlas jamboree meeting, Broad Institute, Boston

      I'm attending the second annual HCA data analysis/coding jamboree held at the Broad Institute in Boston, representing the Pachter Lab.

    • Mar 22, 2018

      BRAIN Initiative Cell Census Network (BICCN) - Brain Cell Data Center workshop at Allen Institute, Seattle

    Research.

    • I'm interested in exploring connections between information theory, genomics and machine learning — with the ultimate goal of developing methods to facilitate biomedical discovery and understand biology from a data science perspective. My postdoctoral research revolves around algorithmic and statistical challenges that arise in computational biology, with a particular focus on problems related to single-cell RNA sequencing. One line of work I'm currently interested in is the design of provably efficient and statistically robust methods for analyzing large-scale genomic datasets. During my postdoc I have worked on developing methods for single-cell RNA-seq experimental design and estimation, scalable clustering and computation, and isoform-level gene differential expression.

    single-cell RNA-seq experimental design

    One read per cell per gene is optimal for single-cell RNA-seq.

    under review in Nature Communications, 2018 (joint work with Martin Zhang* and David Tse)

    An underlying question for virtually all single-cell RNA sequencing experiments is how to allocate the limited sequencing budget: deep sequencing of a few cells or shallow sequencing of many cells? A mathematical framework reveals that, for estimating many important gene properties, the optimal allocation is to sequence at the depth of one read per cell per gene. Interestingly, our analysis shows that the corresponding optimal estimator is not the widely-used plug-in estimator but one developed via empirical Bayes.

    Read the article
    isoform level differential expression analysis

    A discriminative learning aproach to differential expression analysis for scRNA-seq.

    Nature Methods, 2019. (joint work with Lynn Yi*, Páll Melsted and Lior Pachter)

    Single-cell RNA-seq makes it possible to characterize the transcriptomes of cell types across different conditions and to identify their transcriptional signatures via differential analysis. We present a fast and accurate method for differential expression that takes advantage of the large numbers of cells that are assayed in order to detect changes in transcript dynamics as well as changes in overall gene abundance. When applied to transcript compatibility counts obtained via pseudoalignment, our approach provides a quantification-free analysis of 3’ single-cell RNA-seq that can identify previously undetectable marker genes.

    Read the article
    Meddit

    Medoids in almost linear time via multi-armed bandits.

    AISTATS, 2018. (joint work with Vivek Bagaria*, Govinda Kamath*, Martin Zhang* and David Tse)

    Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit to compute the medoid with high probability, which uses O(nlogn) distance evaluations. Med-dit is based on a connection with the Multi-Armed Bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and single-cell RNA-seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a 5-10x improvement in performance over the current state of the art.

    Read the article
    transcript compatibility counts

    Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts.

    Genome Biology, vol.17, no.1, 2016. (joint work with Govinda Kamath*, Jesse Zhang*, Lior Pachter and David Tse)

    Current approaches to single-cell transcriptomic analysis are computationally intensive and require assay-specific modeling, which limits their scope and generality. We propose a novel method that compares and clusters cells based on their transcript-compatibility read counts rather than on the transcript or gene quantifications used in standard analysis pipelines. In the reanalysis of two landmark yet disparate single-cell RNA-seq datasets, we show that our method is up to two orders of magnitude faster than previous approaches, provides accurate and in some cases improved results, and is directly applicable to data from a wide variety of assays.

    Read the article

    Publications.

    • V. Ntranos*, L. Yi*, P. Melsted, L. Pachter, “A discriminative learning approach to differential expression analysis for single-cell RNA-seq,” Nature Methods, 2019.

    • P. Melsted, V. Ntranos, L. Pachter, “The Barcode, UMI, Set format and BUStools,” under review in Bioinformatics, available on biorxiv (doi.org/10.1101/472571) Dec 2018.

    • M. Zhang*, V. Ntranos*, D. N. Tse, “One read per cell per gene is optimal for single-cell RNA-seq,” under review in Nature Communications, available on biorxiv (doi.org/10.1101/389296) Aug 2018.

    • A. Ntranos, V. Ntranos, V. Bonnefil, J. Liu, S. Kim-Schulze, Y. He, Y. Zhu, R. Brandstadter, C. T. Watson, A. J. Sharp, I. Katz, P. Casaccia, “Fumarates target the metabolic-epigenetic interplay in brain-homing T cells of multiple sclerosis patients,” Brain (editor's choice, cover), 2019.

    • P. J. Thompson, A. Shah, V. Ntranos, F. Van Gool, M. Atkinson, A. Bhushan, “Targeted elimination of senescent beta cells prevents Type 1 Diabetes,” Cell Metabolism, 2019.

    • V. Bagaria*, G. Kamath*, V. Ntranos*, M. Zhang*, D. N. Tse, “Medoids in almost linear time via multi-armed bandits,” International Conference on Artificial Intelligence and Statistics (AISTATS), Canary Islands, 2018.

    • S. McCurdy, V. Ntranos, L. Pachter, “Deterministic column subset selection for single-cell RNA-seq,” PloS one, 2019.

    • V. Ntranos*, G. M. Kamath*, J. Zhang*, L. Pachter, D. N. Tse, “Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts,” Genome Biology, vol.17, no.1, pp.1-14, doi:10.1186/s13059-016-0970-8, May 2016.

    • F. Rosas, V. Ntranos, C. J. Ellison, S. Polin, M. Verhelst, “Understanding Interdependency Through Complex Information Sharing,” Entropy, vol.18, no.2, 38, doi:10.3390/e18020038, January 2016.

    • B. Nazer, V. R. Cadambe, V. Ntranos, G. Caire, “Expanding the Compute-and-Forward Framework: Unequal Powers, Signal Levels, and Multiple Linear Combinations,” IEEE Trans. on Information Theory, vol.62, no.9, Sept 2016.

    • N. Armenatzoglou, H. Pham, V. Ntranos, D. Papadias, C. Shahabi, “Real-Time Multi-Criteria Social Graph Partitioning: A Game Theoretic Approach,” ACM International Conference on Management of Data (SIGMOD ’15), Melbourne, 2015.

    • V. Ntranos, M. A. Maddah-Ali, G. Caire, “Cooperation Alignment for Distributed Interference Management,” IEEE International Symposium on Information Theory (ISIT), Hong Kong, 2015.

    • V. Ntranos, M. A. Maddah-Ali, G. Caire, “Cellular Interference Alignment: Omni-directional antennas and Asymmetric Configurations,” IEEE Transactions on Information Theory, vol. 61, no. 12, pp. 6663-6679, Dec. 2015.

    • F. Rosas, V. Ntranos, C. J. Ellison, M. Verhelst, and S. Pollin, “Understanding high-order correlations using a synergy-based decomposition of the total entropy,” IEEE Symposium on Information Theory in the Benelux, Brussels, 2015.

    • V. Ntranos, M. A. Maddah-Ali, G. Caire, “Cellular Interference Alignment,” IEEE Transactions on Information Theory, vol.61, no.3, pp.1194-1217, March 2015.

    • . . .

    * equal contribution, co-first authors


    download cv / full list of publications

    Software.

    scRNA-Seq TCC-prep

    single-cell RNA-Seq workflow to generate transcript compatibility count (TCC) matrices from 10X Chromium 3' data.

    An end-to-end workflow to generate transcript compatibility count (TCC) matrices from 10X Chromium 3' single-cell RNA-Seq data. Included is error-correction of barcodes, collapsing of UMIs and pseudoalignment of reads to a transcriptome to obtain transcript compatibility counts. The scripts utilize kallisto for pseudoalignment. The output is a matrix that specifies, for each cell, a list of transcript sets with associated counts. Those counts, called transcript compatibility counts, were introduced in Ntranos et al. 2016 as the starting point for downstream analysis of the data.

    Github repository: scRNA-Seq-TCC-prep Getting started tutorial

    Meddit

    scalable medoid computation for high dimensional clustering using multi-armed bandits.

    Joint work with V. Bagaria, G. Kamath, M. Zhang and D. Tse.

    This is an implementation of our new algorithm for the fundamental problem of computing the medoid of a set of points. While in the worst case computing the medoid requires O(n^2) distance evaluations, Meddit leverages a connection with the multi-armed bandit problem and uses a upper-confidence-bound type of algorithm to compute the medoid in O(nlogn) under mild statistical assumptions on the points. A Java implemetation of Meddit is also available in JSAT, a Java library for machine learning implemented and mainained by Edward Raff.

    Github repository: Meddit Java Statistical Analysis Toolbox implementation

    logR

    gene differential expression analysis at isoform-level resolution for single-cell RNA-seq via logistic regression.

    Joint work with L. Yi, P. Melsted, and L. Pachter.

    A tool for single-cell RNA-seq differential expression analysis accompanying our recent paper (in press, to appear in Nature Methods). logR fully exploits all the transcript information that can be extracted from the sequenced reads to detect gene differential expression with isoform-level resolution. Unlike traditional methods that test either for changes in overall gene abundance or for changes in transcript allocation, our method has the power to detect a change in any linear combination of transcript quantificiations and provides a unified testing framework that eliminates the need for a dichotomy between differential gene expression (DGE) and differential transcript usage (DTU) methods. For 3' end data (e.g., 10x Genomics) where transcript quantifications are infeasible to obtain, logR makes use of the transcript compatibility counts (TCC) to provide a quantification-free analysis of 3’ single-cell RNA-seq that is able to identify previously undetectable marker genes, such as CD45 between Memory and Naive T-cells.

    A simple variant of our method has already been included in scanpy::rank_genes_groups (by Wolf et al. 2018). A complete implementation, including TCC functionality for 3' end RNA-seq will be available soon in Seurat, thanks to Andrew Butler and Rahul Satija.

    Github repository Seurat R implemetation (available soon)

    sceb

    sequencing-budget aware estimators for single-cell RNA-seq analysis via empirical Bayes.

    Joint work with M. Zhang and D. Tse.

    A class of empirical Bayes estimators for single-cell RNA-seq analysis, accompanying the paper "One read per cell per gene is optimal for single-cell RNA-seq" (currently under review in Nature Comunications). An underlying question for virtually all single-cell RNA sequencing experiments is how to allocate the limited sequencing budget: deep sequencing of a few cells or shallow sequencing of many cells? Our work shows that, for estimating many important gene distributional quantities, the optimal allocation is to sequence at the depth of one read per cell per gene. An important result arising from our experimental design framework is the fundamental role of the estimator in the optimal trade-off: A very common — almost routine — practice in the literature is to use the so-called plug-in estimator, which, as a general recipe, blindly uses the read counts as a proxy for the true gene expression, effectively estimating the corresponding distributional quantities by “plugging-in” the observed values. Although this can be very accurate for deeply sequenced datasets, it becomes increasingly problematic in the limit of shallow sequencing. Our analysis suggested that the optimal trade-off cannot be achieved by the conventional plug-in approach but with another class of estimators developed via Empirical Bayes. sceb provides estimators that are inherently aware of the Poisson sampling noise introduced by sequencing, and can therefore adapt to varying sequencing depths.

    Github repository: sceb Python Package Index (PyPI) link

    sc-DCSS

    informative gene set selection for single-cell RNA-seq preprocessing with PCA-subspace approximation guarantees.

    Joint work with S. McCurdy* and L. Pachter.

    Analysis of single-cell RNA sequencing (scRNA-seq) data often involves filtering out uninteresting or poorly measured genes and dimensionality reduction to reduce noise and simplify data visualization. However, techniques such as principal components analysis (PCA) fail to preserve non-negativity and sparsity structures present in the original matrices, and the coordinates of projected cells are not easily interpretable. Commonly used thresholding methods to filter genes avoid those pitfalls, but ignore collinearity and covariance in the original matrix. In our paper we showed that column subset selection methods posses many of the favorable properties of common thresholding and PCA, while avoiding pitfalls from both. In sc-DCSS we apply the deterministic column subset selection algorithm (Papailiopoulos, Kyrillidis, and Boutsidis, 2014) to scRNA-seq data in order to provide an informative gene subset selection for subsequent analysis with provable PCA-subspace approximation guarantees.

    * first author and main contributor

    Github repository










    Contact.

       ntranos (at) caltech.edu