CRISPR genome editing can be used to modify genomes by introducing single- or double-stranded breaks (DSB) which are repaired using native molecular pathways. When edits are introduced through DNA repair pathways, they can be characterized using targeted next generation sequencing (NGS). Accurate quantification of editing at both on- and off-target sites is paramount to developing applications of CRISPR. To enable easily-accessible, accurate analysis of NGS data derived from CRISPR experiments, Integrated DNA Technologies, Inc. (IDT) has created and launched a validated cloud-hosted software tool, CRISPAltRations, to analyze the data with more accuracy than existing software tools.
CRISPAltRations is a software tool that is accessed through a web interface, rhAmpSeq CRISPR Analysis Tool. The tool utilizes cloud-hosted computational resources for data processing. Briefly, the workflow is as follows:
- The tool identifies and merges read pairs from paired-end sequencing.
- The reads are binned to the expected amplicons resulting from targeted amplification library preparation (e.g., rhAmpSeq CRISPR Library Kit).
- The alignment of the read to the expected amplicon is refined using a Cas-enzyme specific aligner.
- Variants are called and summarized.
Although these steps are relatively common to most software tools that analyze NGS data derived from CRISPR screens, CRISPAltRations has a number of improvements that enable higher accuracy of variant detection, including:
- a Cas-specific aligner
- an optimized default variant detection window
- systematically validated program parameters for utilized open source tools to provide high-quality results.
To better understand the impact of these improvements, we developed a set of synthetic datasets to validate the accuracy of annotating on- and off-target editing (11 on-target sites; 592 off-target sites) and the accuracy of annotating mutations introduced through the homology directed repair (HDR) pathway at on-target loci (91 on-target sites). The performance of CRISPAltRations was compared to other published software tools, such as Amplican  and CRISPResso2 . For on/off-target characterization, CRISPAltRations characterized the percent of indels down to <0.1% deviation from expectation for 99.5% of target sites. Alternative workflows such as Amplican/CRISPResso2 could not reach this level of accuracy even with a higher threshold for error (<2% deviation) (Figure 1).
For on-target HDR repair characterization, we compared the performance of CRISPAltRations and CRISPResso2. CRISPResso2 was unable to complete analysis at 4.3% of targets and overestimated the perfect HDR repair events by >3% at 38% of targets (Figure 2). CRISPAltRations demonstrates a high level of accuracy on this dataset with <2% deviation from the expected percent perfect HDR events and can better differentiate an editing event as being derived from the HDR (imperfect) vs NHEJ pathway, as compared to CRISPResso2 (Figure 2).
Experimental recommendations and limits
To further guide experimental design, we generated a series of recommendations for using CRISPAltRations. First, we investigated the read depth requirements to accurately annotate editing at different levels of sensitivity. To do this, we subsampled a series of rhAmpSeq panels with various amounts of on- and off-target editing to compare the annotated indels of subsampled samples to that of the original sample. Generally, there is an inverse correlation between editing efficiency and the number of reads needed for quantification. With our tool, editing annotation can reach ~0.5% editing with only 1000 reads per target (Figure 3). With increased read depth and subtraction of background signal in an unedited control, editing annotation can reach ~0.1% in ideal scenarios. However, background indel noise is dependent on several factors such as sequence context, sequencer run, library preparation, and more. Thus, low levels of editing should be accompanied by an appropriate statistical test or other advanced methods to ensure confidence of genome editing. Our recommendations are based on using the rhAmpSeq Library Kit followed by 2 x 150 sequencing on an Illumina MiSeq™ with v3 chemistry.
To make CRISPAltRations more broadly accessible, we developed a web user interface (UI) that utilizes cloud resources for data processing and storage (Figure 4). This ensures that researchers are not restricted by 2 major burdens often encountered: a lack of programming knowledge or bioinformatics personnel or a lack of suitable computational resources. This interface enables users to upload data by streaming from local hardware, streaming from cloud resources (AWS/Google/BaseSpace), or simply “dragging-and-dropping” from the web interface. The interface can run thousands of samples simultaneously and includes interactive visualization of the generated results. Additionally, researchers can easily export results to commonly used programs (e.g., Excel) to enable integration into other existing graphing software to meet custom needs. By providing a version-controlled, tested, and easily-accessible analysis software, we hope to empower the scientific community to use NGS to evaluate the effects of on- and off-target editing resulting from CRISPR genome editing.
Here we develop, validate, and deploy a software analysis tool, CRISPAltRations, for accurate quantification of genome editing from CRISPR experiments. We show that a combination of novel features, optimized parameters, and systematically tested code enables us to improve our ability to annotate editing from DSB events. By comparing to other CRISPR analysis tools, we show that CRISPAltRations outperforms other tools for characterization of NHEJ and HDR activity at on/off-target locations. By providing experimental recommendations for optimal performance and developing a “point-and-click” interface we hope to set researchers up for success, regardless of scientific background. We are excited to see what you can do with high-quality genome editing specificity data. Check out the rhAmpSeq CRISPR Analysis System, and get started on your CRISPR analysis.