Calvin Chi

Publications

Identification of Sjögren’s syndrome patient subgroups by clustering of labial salivary gland DNA methylation profiles

Calvin Chi, Olivia Solomon, Caroline Shiboski, Kimberly E. Taylor, Hong Quach, Diana Quach, Lisa F. Barcellos, Lindsey A. Criswell

PLoS ONE 18(3) 2023

Abstract Heterogeneity in Sjögren’s syndrome (SS), increasingly called Sjögren’s disease, suggests the presence of disease subtypes, which poses a major challenge for the diagnosis, management, and treatment of this autoimmune disorder. Previous work distinguished patient subgroups based on clinical symptoms, but it is not clear to what extent symptoms reflect underlying pathobiology. The purpose of this study was to discover clinical meaningful subtypes of SS based on genome-wide DNA methylation data. We performed a cluster analysis of genome-wide DNA methylation data from labial salivary gland (LSG) tissue collected from 64 SS cases and 67 non-cases. Specifically, hierarchical clustering was performed on low dimensional embeddings of DNA methylation data extracted from a variational autoencoder to uncover unknown heterogeneity. Clustering revealed clinically severe and mild subgroups of SS. Differential methylation analysis revealed that hypomethylation at the MHC and hypermethylation at other genome regions characterize the epigenetic differences between these SS subgroups. Epigenetic profiling of LSGs in SS yields new insights into mechanisms underlying disease heterogeneity. The methylation patterns at differentially methylated CpGs are different in SS subgroups and support the role of epigenetic contributions to the heterogeneity in SS. Biomarker data derived from epigenetic profiling could be explored in future iterations of the classification criteria for defining SS subgroups.

Paper

Hypomethylation mediates genetic association with the major histocompatibility complex genes in Sjögren's syndrome

Calvin Chi, Kimberly E. Taylor, Hong Quach, Diana Quach, Lindsey A. Criswell, Lisa F. Barcellos

PLoS ONE 16(4) 2021

Abstract Differential methylation of immune genes has been a consistent theme observed in Sjögren’s syndrome (SS) in CD4+ T cells, CD19+ B cells, whole blood, and labial salivary glands (LSGs). Multiple studies have found associations supporting genetic control of DNA methylation in SS, which in the absence of reverse causation, has positive implications for the potential of epigenetic therapy. However, a formal study of the causal relationship between genetic variation, DNA methylation, and disease status is lacking. We performed a causal mediation analysis of DNA methylation as a mediator of nearby genetic association with SS using LSGs and genotype data collected from 131 female members of the Sjögren’s International Collaborative Clinical Alliance registry, comprising of 64 SS cases and 67 non-cases. Bumphunter was used to first identify differentially-methylated regions (DMRs), then the causal inference test (CIT) was applied to identify DMRs mediating the association of nearby methylation quantitative trait loci (MeQTL) with SS. Bumphunter discovered 215 DMRs, with the majority located in the major histocompatibility complex (MHC) on chromosome 6p21.3. Consistent with previous findings, regions hypomethylated in SS cases were enriched for gene sets associated with immune processes. Using the CIT, we observed a total of 19 DMR-MeQTL pairs that exhibited strong evidence for a causal mediation relationship. Close to half of these DMRs reside in the MHC and their corresponding meQTLs are in the region spanning the HLA-DQA1, HLA-DQB1, and HLA-DQA2 loci. The risk of SS conferred by these corresponding MeQTLs in the MHC was further substantiated by previous genome-wide association study results, with modest evidence for independent effects. By validating the presence of causal mediation, our findings suggest both genetic and epigenetic factors contribute to disease susceptibility, and inform the development of targeted epigenetic modification as a therapeutic approach for SS.

Paper

Bipartite graph-based approach for clustering of cell lines by gene expression-drug response associations

Calvin Chi, Yuting Ye, Bin Chen, Haiyan Huang

Bioinformatics 2021

Abstract In pharmacogenomic studies, the biological context of cell lines influences the predictive ability of drug-response models and the discovery of biomarkers. Thus, similar cell lines are often studied together based on prior knowledge of biological annotations. However, this selection approach is not scalable with the number of annotations, and the relationship between gene-drug association patterns and biological context may not be obvious. We present a procedure to compare cell lines based on their gene-drug association patterns. Starting with a grouping of cell lines from biological annotation, we model gene-drug association patterns for each group as a bipartite graph between genes and drugs. This is accomplished by applying sparse canonical correlation analysis (SCCA) to extract the gene-drug associations, and using the canonical vectors to construct the edge weights. Then, we introduce a nuclear norm-based dissimilarity measure to compare the bipartite graphs. Accompanying our procedure is a permutation test to evaluate the significance of similarity of cell line groups in terms of gene-drug associations. In the pharmacogenomics datasets CTRP2, GDSC2, and CCLE, hierarchical clustering of carcinoma groups based on this dissimilarity measure uniquely reveals clustering patterns driven by carcinoma subtype rather than primary site. Next, we show that the top associated drugs or genes from SCCA can be used to characterize the clustering patterns of haematopoietic and lymphoid malignancies. Finally, we confirm by simulation that when drug responses are linearly-dependent on expression, our approach is the only one that can effectively infer the true hierarchy compared to existing approaches.

Paper Software Code

Admixture mapping reveals evidence of differential multiple sclerosis risk by genetic ancestry

Calvin Chi, Xiaorong Shao, Brooke Rhead, Evangelina Gonzalez, Jessica B. Smith, Anny H. Xiang, Jennifer Graves, Amy Waldman, Timothy Lotze, Teri Schreiner, Bianca Weinstock-Guttman, Gregory Aaen, Jan-Mendelt Tillema, Jayne Ness, et al.

PLoS Genetics 15(1) 2019

Abstract Multiple sclerosis (MS) is an autoimmune disease with high prevalence among populations of northern European ancestry. Past studies have shown that exposure to ultraviolet radiation could explain the difference in MS prevalence across the globe. In this study, we investigate whether the difference in MS prevalence could be explained by European genetic risk factors. We characterized the ancestry of MS-associated alleles using RFMix, a conditional random field parameterized by random forests, to estimate their local ancestry in the largest assembled admixed population to date, with 3,692 African Americans, 4,915 Asian Americans, and 3,777 Hispanics. The majority of MS-associated human leukocyte antigen (HLA) alleles, including the prominent HLA-DRB1*15:01 risk allele, exhibited cosmopolitan ancestry. Ancestry-specific MS-associated HLA alleles were also identified. Analysis of the HLA-DRB1*15:01 risk allele in African Americans revealed that alleles on the European haplotype conferred three times the disease risk compared to those on the African haplotype. Furthermore, we found evidence that the European and African HLA-DRB1*15:01 alleles exhibit single nucleotide polymorphism (SNP) differences in regions encoding the HLA-DRB1 antigen-binding heterodimer. Additional evidence for increased risk of MS conferred by the European haplotype were found for HLA-B*07:02 and HLA-A*03:01 in African Americans. Most of the 200 non-HLA MS SNPs previously established in European populations were not significantly associated with MS in admixed populations, nor were they ancestrally more European in cases compared to controls. Lastly, a genome-wide search of association between European ancestry and MS revealed a region of interest close to the ZNF596 gene on chromosome 8 in Hispanics; cases had a significantly higher proportion of European ancestry compared to controls. In conclusion, our study established that the genetic ancestry of MS-associated alleles is complex and implicated that difference in MS prevalence could be explained by the ancestry of MS-associated alleles.

Poster Paper

Projects

HLA Allele Imputation with Multitask Deep Convolutional Neural Network

Research

Computational Biology

2020

Developed multitask convolutional neural network for HLA imputation from phased genotype data.

On the T1DGC test dataset, achieved 97.6% imputation accuracy, which is comparable to state-of-the-art performance from programs such as HIBAG, HLA*IMP:02, and SNP2HLA.

Paper Poster Code

Embedding-Augmented Deep CNN for PudMed Journal Recommendation

Class Project

NLP

2018

Journal detection from PubMed abstract with 415,381 programmatically-collected abstracts

Compared multitask and embedding-augmented CNNs with output space of 1,548 journals

Best performance when CNN input augmented with topic and impact factor embeddings, with accuracy 23.7% and 90% of true journals in top 60 recommendations

Report Poster Code

Data Augmentation using GAN for Breast Cancer Classification

Class Project

Computer Vision

2018

Investigated whether augmenting data with GANs could improve histology breast cancer classification with Resnet-18 re-trained on 5,547 breast histology images. Dataset provided by Kaggle

DCGAN most effective at generating realistic histology images when kernel size is divisible by stride length in the generator

Augmentation with ~400 DCGAN images improved accuracy and precision by 5% and 12% respectively, but recall decreased by nearly 15%

Report Code

Analyzing the Effect of Salary on Employee Attrition

Class Project

Causal Inference

2017

Causal inference of effect of salary on employee attrition for simulated Kaggle dataset

Variable importance ranking of features for employee attrition

Causal inference with DAGs, ensemble learning, and TMLE

Report Presentation

Bearmaps

Class Project

Data Structures

2016

Mapping application for Berkeley, CA; implemented in Java

Rastering with quad tree

Routing via A* algorithm; location search autocompletion with trie

Specs

Calvin Chi

calvin.chi at berkeley dot edu

Case Western Reserve University

University of California, Berkeley

Amazon

Identification of Sjögren’s syndrome patient subgroups by clustering of labial salivary gland DNA methylation profiles

Hypomethylation mediates genetic association with the major histocompatibility complex genes in Sjögren's syndrome

Bipartite graph-based approach for clustering of cell lines by gene expression-drug response associations

Admixture mapping reveals evidence of differential multiple sclerosis risk by genetic ancestry

HLA Allele Imputation with Multitask Deep Convolutional Neural Network

Embedding-Augmented Deep CNN for PudMed Journal Recommendation

Data Augmentation using GAN for Breast Cancer Classification

Analyzing the Effect of Salary on Employee Attrition

Bearmaps

Classical Machine Learning

Statistics

Deep Learning

Natural Language Processing

Algorithms