Coronaviruses (CoVs) are a family of RNA viruses that infect a wide variety of species including humans. They are associated with diseases that negatively impact the respiratory , hepatic , enteric , and central nervous systems . CoVs originating from bats have been responsible for 3 pandemics in the last 20 years: the severe acute respiratory syndrome (SARS), the Middle East respiratory syndrome (MERS), and the 2019 novel coronavirus disease (COVID-19) pandemics. The ability of these viruses to transmit from bats to humans necessitates putting a method in place with which researchers can monitor these viruses for their potential to trigger an outbreak in humans. Next generation sequencing (NGS) provides the information needed to evaluate CoV for pathogenicity, virulence, spread, and evolution.
Since the amount of virus in a given sample varies, the preferred method is unbiased, total, next generation RNA sequencing (RNA-seq), but total RNA-seq is too expensive for practical use. Additionally, total RNA-seq misses low frequency variants in viral genomes, because these variants can be drowned out by the abundance of non-viral sequences. In this study, Li et al. compares total RNA-seq to hybridization capture sequencing and evaluates the value of these sequencing methods to CoV surveillance in terms of cost, efficiency, and sensitivity.
Viruses were cultured in cell lines to prepare them before RNA extraction. RNA was extracted and assayed for representative CoVs using qPCR.
For total RNA-seq, NGS libraries were constructed using total RNA, and a cDNA synthesis step. Each total RNA-seq library was sequenced on a single Illumina HiSeq® lane.
For targeted sequencing using hybridization capture, xGen™ Universal Blocker-TS Mix (IDT) was added to a subset of the library DNA. xGen Blocking Oligos bind to the library adapter sequences to reduce off-target capture during library enrichment. Targeted enrichment of the CoV genomes was performed using xGen Lockdown Probes . Post-capture, samples were PCR-amplified, purified with magnetic beads, visualized on an agarose gel, and quantified using a Bioanalyzer instrument (Agilent). The enriched libraries were multiplexed for sequencing into 2 pools of 8 or 9 samples.
After sequencing, bioinformatics analysis began by assembling genomes using known International Committee on Taxonomy of Viruses (ICTV) CoV reference genomes on the Galaxy platform. 3 enriched sample genomes did not reach full sequence coverage and were gap-filled using PCR and Sanger sequencing. Viral genome data is available on GenBank.
Results and discussion
Targeted NGS detects and characterizes CoVs
The challenge when detecting and characterizing CoVs is two-pronged. First, not all samples will contain CoVs, so the technique must be sensitive enough to distinguish samples with CoVs from those without CoVs. Second, once a sample is identified as CoV-positive, the entire CoV genome must be sequenced, so that it can be characterized. 4303 unique probes were designed to identify 90 unique CoV genomes. Samples were drawn from cell lines containing 5 CoVs and 3 patient samples. While total RNA-seq coverage depth was greater than targeted sequencing, viral reads were detected in all samples, and targeted sequencing coverage reached sufficient depth to cover full-length viral genomes.
New bat CoV genomes are revealed by targeted NGS
In addition to identifying known CoVs, targeted NGS was evaluated for its ability to discover new CoVs. 9 samples derived from bat anal swabs were sequenced using both total RNA-seq and hybridization capture. Again, while total RNA-seq reached greater depth of coverage, targeted sequencing was sufficient to achieve full-length sequencing of viral genomes for most of the samples. 6 samples were fully sequenced, and the remaining 3 reached >75% coverage after gap-filling using PCR and Sanger sequencing.
Bat CoV genomes are very diverse
While the targeted sequencing protocol used by Li et al. was able to identify all the bat CoV genomes in the study, due to the diversity of bat CoV sequences, full coverage of all novel bat CoVs was not achieved. As more CoVs are discovered, the custom capture-based panel can be updated with new probes for greater coverage of the new CoV genomes. The researchers also suggested improvements to their protocol and stressed the importance of gap-filling to achieve full sequence coverage once CoVs are identified if the sequences are divergent and the entire genome is not captured. Full characterization is necessary to evaluate outbreak risk.
The study showed that targeted NGS using hybridization capture is sufficient to identify, characterize, and discover diverse viruses, using bat CoVs as a proof of concept. Targeted NGS is efficient and cost-effective for large-scale, routine surveillance of known and unknown mammalian CoVs and has potential for researching new and emerging viruses. To read more about IDT’s NGS solutions for CoV and other virus researchers, click here.