Overview of detection methods

figure image

Repo for the main code associated with the quantification of HHV-6 in single-cell data is available here

Repo for the main code associated with the quantification of EBV in whole genomes is available here

Use Serratus to perform a human virus detection screen

For a digest on how to screen all human viruses by all RNA-seq libraries in Serratus, check out the repo here for reproducing the pan-virus screen.

EBV reference genome

The 171,823 nucleotide EBV genome (NC_007605.1) was first included in December 2013 (hg38 version GCA_000001405.15) as a sink for off-target reads that are often present in sequencing libraries, to account for pervasive EBV reads present from the immortalization of LCLs (as with the 1000 Genomes Project and related consortia). Importantly, whole genome sequencing (WGS) in the UK Biobank (UKB) and All of Us (AoU) consortia was performed on whole blood, reflecting that EBV reads detected would derive from viral DNA from prior infections.

While one can mostly just directly extract rads from the contig, we found highly repetitive regions that were problematic in finding true viral reads and not background contamination. We recommend removing regions that are high coverage exceeding an order of magnitude more than the median coverage in the data. For our study, we use these regions as a quick start for doing this yourself (R code):

wipe_here <- sort(c(36389:36516, 52012:52034, 95997:96037,163596:163617, 36390:36514, 95997:96037))

HHV-6B reference genome

To detect HHV-6 transcripts in existing single-cell data, we developed a kallisto|bustools workflow to rapidly quantify reads from either single-cell or bulk transcriptomes. We downloaded the GenBank AF157706 reference transcriptome and created a kallisto index using the default -k 31 (kmer) parameter. The best place for reproducing our HHV-6 pseudoalignment pipeline is here. For single-cell libraries, raw sequencing reads were processed using the kallisto bus command with appropriate hyperparameters for each version of the single-cell chemistry (either 14 or 16 bp sequence barcode and 10 or 12 bases of UMI sequence).

After barcode and UMI correction, a plain text sparse matrix was emitted, corresponding to unique HHV-6B reads mapping to individual cells in the single-cell sequencing library. For bulk libraries, the same index could be utilized with the standard kallisto quant execution.