Page 68 - Annual report 2021-22
P. 68

Annual Report 2021-22 |






               Kumardeep Chaudhary

                                                                                                                  51



               Kumardeep Chaudhary is a computational biologist who works at the interface of artificial intelligence
               and human genomics to provide translational solutions in the healthcare domain. His lab works on
               multi-modal big data integration including electronic health records (EHRs) in the field of cancer and
               cardiovascular diseases.

               Discovery of lineage-specific k-mers in the SARS-CoV-2 variants

               COVID-19 pandemic, caused by the prolific spread of SARS-CoV-2 virus, has affected most parts of the
               world. The rapid spread of this single-stranded RNA virus within the human population has led to
               genomic  adaptations  favoring  virus  survival,  immune  escape  within  the  host.  These  adaptations
               resulted  in  various  variants  and  subvariants  which  were  reported  and  tracked  by  the  governing
               agencies  like  WHO.  Concerted  efforts  across  the  globe  helped  researchers  to  catalog  genome
               sequences of this virus into common repositories viz. GISAID and NCBI. This huge volume of genomic
               data provides the opportunity to identify specific patterns in the nucleotide sequences to segregate
               different clades and lineages. In total there are 5 VOCs (Alpha, Beta, Gamma, Delta and Omicron), 8
               VOIs (Epsilon, Eta, Zeta, Theta, Iota, Kappa, Lambda and Mu) consisting of numerous lineages and
               sublineages with potential of higher public health risk as compared to other variants. Thus, it becomes
               imperative  to  identify  these  variants  from  genomic  sequences  in  the  real-time  for  effective
               surveillance. Nucleotide-based k-mer sequences exclusive to these and novel VOCs and VOIs can help
               in rapid identification of these variants from their genomic sequences. They have developed a k-mer
               (short stretch of nucleotide sequences) based approach for SARS-CoV-2 genomic surveillance; where
               identification of SARS-CoV-2 and its variants of concern (VOCs) were carried out from the huge pool
               of genomic sequences submitted to publicly available resources.  They first identified a set of k-mers
               against SARS-CoV-2 and each of its VOCs using a small subset of high quality genomic sequences
               downloaded from GISAID. These k-mers were then evaluated for their sensitivity for SARS-CoV-2 and
               its VOCs identification from genomic sequences. Furthermore, supervised and unsupervised machine
               learning approaches were implemented using presence and absence of VOC-specific k-mers as input
               features.  The  analysis  shows  statistically  sound  results  from  the  k-mer  based  approach  for  the
               identification  and  classification  of  SARS-CoV-2  and  its  VOCs  from  millions  of  genomic  sequences.
               Current  version  of  the  pipeline  is  focused  on  classifying  existing  variants  with  a  potential  to  be
               extended to the identification of the newly identified variants. Adding on to the above mentioned
               application, they would like to demonstrate the generalizability of this approach for other pathogens
               to delineate different variants/genotypes.

               AI-based identification of novel potential biomarkers in breast cancer

               Integration  of  public  domain  multi-omics  datasets  (viz.  transcriptomics,  methylation,  proteomics
               along with the clinical metadata) pertaining to Breast Cancer (BC) (with special emphasis to triple-
               negative  breast  cancer  i.e.  TNBC)  can  provide  unprecedented  opportunity  to  compare  and  meta
               analyze the individual or patient-specific studies for precise, rapid, and economical novel biomarker
   63   64   65   66   67   68   69   70   71   72   73