Page 59 - Biennial Report 2018-20 Jun 2021
P. 59

against this subset in the second stage. This strategy also helped in controlling false positives.
                  The protocol was repeated again using the same parameters as in stage 1. Finally, different
                  proteoforms associated with the proteins  were identified. Further, these proteoforms were
                  filtered for brain specificity based on the gene enrichment information from Human Protein
                  Atlas. A new template  was designed to accommodate various new features of proteoforms
                  which will be incorporated in HuBSProt database.



                  A GENOMICS AND PROTEOMICS APPROACH TO UNDERSTANDING THE PITCHER
                  PLANT


                  Nepenthes khasiana the only pitcher plant found in India is endemic to West and South Garo
                  Hills, West and East Khasi Hills and Jaintia Hills of Meghalaya. This plant has a unique combination
                  of biochemistry, morphology and physiology to enable prey capture and nutrient assimilation
                  from digested prey. Under the special twinning programme of DBT that facilitates interaction
                  with scientists of the Northeast region of India, it was decided to sequence the genome, compare
                  the proteome and inquiline diversity  of  open and unopened pitchers, owing to its many
                  medicinal values and importance in the evolution of genus Nepenthes in Asia. The chloroplast
                  (cp) and mitochondrial (mt) genomes were pulled from whole genome data of N. khasiana.
                  Reads from a shotgun library with insert size 450 bp were used for assembling the organelle
                  genomes. The cp & mt genome was assembled using NOVOplasty. The cp genome was annotated
                  using DOGMA, CpGAVAS and BLAST (blastn, blastp and tblastx). The mt genome was annotated
                  using  MITOFY  and  BLAST.  Both  the  annotated  genomes  were  submitted to  GenBank  with
                  accession numbers MK330891 and MH923233 for the mitochondrial and chloroplast genome,
                  respectively. The length of the assembled cp genome is 156914 bp, having a quadripartite
                  structure with a pair of inverted repeats of 25193 bp, a large single copy of 87237 bp and a small
                  single copy region of 19291 bp. A total of 87 protein coding genes, 37 tRNAs and 8 rRNAs were
                  annotated in the assembled cp genome. The length of the assembled mt genome is 900031 bp.
                  A total of 50 protein coding genes, 27 tRNAs and 7 rRNAs were annotated in the assembled
                  genome.
                  The  chloroplast  (cp) genome  was assembled using adapter  trimmed  shotgun reads in
                  NOVOplasty. For de novo whole genome assembly, adapter and quality trimmed shotgun and
                  mate-pair reads were assembled using AllPathsLG. GapCloser and RepeatMasker were used for
                  closing gaps and masking repeats. Repeat masked draft genome was used for all further analysis.
                  Gene prediction was done with AUGUSTUS using Arabidopsis as the training dataset. SSRs were
                  identified with MISA. 88.6% of the coding genome after final assembly was found to be complete
                  based on core orthologs (plants ortholog dataset). A total of 7,214 scaffolds were assembled
                  with scaffold N50 1,163,181 bp (~1Mb) and average scaffold size 120 Kb. The genome size was
                  computed as 749,857,876 bp(~750 Mb) based on k-mer distribution. The draft genome was
                  found to be richer in trinucleotide repeats as compared to the mono- or di-nucleotide repeats.
                  Assembled cp genome was 156,914 bp long with a quadripartite structure (a pair of inverted
                  repeats, a large single copy and a small single copy region including 87 PCGS, 37 tRNAs and 8
                  rRNAs). N. khasiana whole genome data accession in SRA is SRP149035; Cp genome accession in
                  GenBank is MH923233. The high-quality reads from six paired-end libraries and three mate-pair




                                                           58
   54   55   56   57   58   59   60   61   62   63   64