Page 68 - Annual report 2021-22
P. 68
Annual Report 2021-22 |
Kumardeep Chaudhary
51
Kumardeep Chaudhary is a computational biologist who works at the interface of artificial intelligence
and human genomics to provide translational solutions in the healthcare domain. His lab works on
multi-modal big data integration including electronic health records (EHRs) in the field of cancer and
cardiovascular diseases.
Discovery of lineage-specific k-mers in the SARS-CoV-2 variants
COVID-19 pandemic, caused by the prolific spread of SARS-CoV-2 virus, has affected most parts of the
world. The rapid spread of this single-stranded RNA virus within the human population has led to
genomic adaptations favoring virus survival, immune escape within the host. These adaptations
resulted in various variants and subvariants which were reported and tracked by the governing
agencies like WHO. Concerted efforts across the globe helped researchers to catalog genome
sequences of this virus into common repositories viz. GISAID and NCBI. This huge volume of genomic
data provides the opportunity to identify specific patterns in the nucleotide sequences to segregate
different clades and lineages. In total there are 5 VOCs (Alpha, Beta, Gamma, Delta and Omicron), 8
VOIs (Epsilon, Eta, Zeta, Theta, Iota, Kappa, Lambda and Mu) consisting of numerous lineages and
sublineages with potential of higher public health risk as compared to other variants. Thus, it becomes
imperative to identify these variants from genomic sequences in the real-time for effective
surveillance. Nucleotide-based k-mer sequences exclusive to these and novel VOCs and VOIs can help
in rapid identification of these variants from their genomic sequences. They have developed a k-mer
(short stretch of nucleotide sequences) based approach for SARS-CoV-2 genomic surveillance; where
identification of SARS-CoV-2 and its variants of concern (VOCs) were carried out from the huge pool
of genomic sequences submitted to publicly available resources. They first identified a set of k-mers
against SARS-CoV-2 and each of its VOCs using a small subset of high quality genomic sequences
downloaded from GISAID. These k-mers were then evaluated for their sensitivity for SARS-CoV-2 and
its VOCs identification from genomic sequences. Furthermore, supervised and unsupervised machine
learning approaches were implemented using presence and absence of VOC-specific k-mers as input
features. The analysis shows statistically sound results from the k-mer based approach for the
identification and classification of SARS-CoV-2 and its VOCs from millions of genomic sequences.
Current version of the pipeline is focused on classifying existing variants with a potential to be
extended to the identification of the newly identified variants. Adding on to the above mentioned
application, they would like to demonstrate the generalizability of this approach for other pathogens
to delineate different variants/genotypes.
AI-based identification of novel potential biomarkers in breast cancer
Integration of public domain multi-omics datasets (viz. transcriptomics, methylation, proteomics
along with the clinical metadata) pertaining to Breast Cancer (BC) (with special emphasis to triple-
negative breast cancer i.e. TNBC) can provide unprecedented opportunity to compare and meta
analyze the individual or patient-specific studies for precise, rapid, and economical novel biomarker