Preprint in English | medRxiv | ID: ppmedrxiv-21264695


The SARS-CoV-2 ARTIC amplicon protocol is the most widely used genome sequencing method for SARS-CoV-2, accounting for over 43% of publicly-available genome sequences. The protocol utilises 98 primers to amplify [~]400bp fragments of the SARS-CoV-2 genome covering all 30,000 bases. Understanding the analytical performance metrics of this protocol will improve how the data is used and interpreted. Different concentrations of SARS-CoV-2 control material were used to establish the limit of detection (LoD) of the ARTIC protocol. Results demonstrated the LoD was a minimum of 25-50 virus particles per mL. The sensitivity of ARTIC was comparable to the published sensitivities of commercial diagnostics assays and could therefore be used to confirm diagnostic testing results. A set of over 3,600 clinical samples from three UK regions were then evaluated to compare the protocols performance to clinical diagnostic assays (Roche Lightcycler 480 II, AusDiagnostics, Roche Cobas, Hologic Panther, Corman RdRp, Roche Flow, ABI QuantStudio 5, Seegene Nimbus, Qiagen Rotorgene, Abbott M2000, Thermo TaqPath, Xpert). We developed a Python tool, RonaLDO, to perform this validation (available under the GNU GPL3 open-source licence from Positives detected by diagnostic platforms were generally supported by sequencing data; platforms that used RT-qPCR were the best predictors of whether the sample would subsequently sequence successfully. To maximise success of sample sequencing for phylogenetic analysis, samples with Ct <31 should be chosen. For diagnostic tests that do not provide a quantifiable Ct value, adding a quantification step is recommended. The ARTIC SARS-CoV-2 sequencing protocol is highly sensitive, capable of detecting SARS-CoV-2 in samples with Cts in the high 30s. However, to routinely obtain whole genome coverage, samples with Ct <31 are recommended. Comparing different virus detection methods close to their LoD was challenging and significant discordance was observed.

Preprint in English | medRxiv | ID: ppmedrxiv-20218446


The UKs COVID-19 epidemic during early 2020 was one of worlds largest and unusually well represented by virus genomic sampling. Here we reveal the fine-scale genetic lineage structure of this epidemic through analysis of 50,887 SARS-CoV-2 genomes, including 26,181 from the UK sampled throughout the countrys first wave of infection. Using large-scale phylogenetic analyses, combined with epidemiological and travel data, we quantify the size, spatio-temporal origins and persistence of genetically-distinct UK transmission lineages. Rapid fluctuations in virus importation rates resulted in >1000 lineages; those introduced prior to national lockdown were larger and more dispersed. Lineage importation and regional lineage diversity declined after lockdown, whilst lineage elimination was size-dependent. We discuss the implications of our genetic perspective on transmission dynamics for COVID-19 epidemiology and control.

Preprint in English | bioRxiv | ID: ppbiorxiv-328328


Genomic epidemiology has become an increasingly common tool for epidemic response. Recent technological advances have made it possible to sequence genomes rapidly enough to inform outbreak response, and cheaply enough to justify dense sampling of even large epidemics. With increased availability of sequencing it is possible for agile networks of sequencing facilities to collaborate on the sequencing and analysis of epidemic genomic data. In response to the ongoing SARS-CoV-2 pandemic in the United Kingdom, the COVID-19 Genomics UK (COG-UK) consortium was formed with the aim of rapidly sequencing SARS-CoV-2 genomes as part of a national-scale genomic surveillance strategy. The network consists of universities, academic institutes, regional sequencing centres and the four UK Public Health Agencies. We describe the development and deployment of Majora, an encompassing digital infrastructure to address the challenge of collecting and integrating both genomic sequencing data and sample-associated metadata produced across the COG-UK network. The system was designed and implemented pragmatically to stand up capacity rapidly in a pandemic caused by a novel virus. This approach has underpinned the success of COG-UK, which has rapidly become the leading contributor of SARS-CoV-2 genomes to international databases and has generated over 60,000 sequences to date.

Preprint in English | medRxiv | ID: ppmedrxiv-20166082


Global dispersal and increasing frequency of the SARS-CoV-2 Spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of Spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large data set, well represented by both Spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the Spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant.

Preprint in English | medRxiv | ID: ppmedrxiv-20128249


Brazil currently has one of the fastest growing SARS-CoV-2 epidemics in the world. Due to limited available data, assessments of the impact of non-pharmaceutical interventions (NPIs) on virus transmission and epidemic spread remain challenging. We investigate the impact of NPIs in Brazil using epidemiological, mobility and genomic data. Mobility-driven transmission models for Sao Paulo and Rio de Janeiro cities show that the reproduction number (Rt) reached below 1 following NPIs but slowly increased to values between 1 to 1.3 (1.0-1.6). Genome sequencing of 427 new genomes and analysis of a geographically representative genomic dataset from 21 of the 27 Brazilian states identified >100 international introductions of SARS-CoV-2 in Brazil. We estimate that three clades introduced from Europe emerged between 22 and 27 February 2020, and were already well-established before the implementation of NPIs and travel bans. During this first phase of the epidemic establishment of SARS-CoV-2 in Brazil, we find that the virus spread mostly locally and within-state borders. Despite sharp decreases in national air travel during this period, we detected a 25% increase in the average distance travelled by air passengers during this time period. This coincided with the spread of SARS-CoV-2 from large urban centers to the rest of the country. In conclusion, our results shed light on the role of large and highly connected populated centres in the rapid ignition and establishment of SARS-CoV-2, and provide evidence that current interventions remain insufficient to keep virus transmission under control in Brazil. One Sentence SummaryJoint analysis of genomic, mobility and epidemiological novel data provide unique insight into the spread and transmission of the rapidly evolving epidemic of SARS-CoV-2 in Brazil.

Preprint in English | bioRxiv | ID: ppbiorxiv-118992


Extensive global sampling and whole genome sequencing of the pandemic virus SARS-CoV-2 have enabled researchers to characterise its spread, and to identify mutations that may increase transmission or enable the virus to escape therapies or vaccines. Two important components of viral spread are how frequently variants arise within individuals, and how likely they are to be transmitted. Here, we characterise the within-host diversity of SARS-CoV-2, and the extent to which genetic diversity is transmitted, by quantifying variant frequencies in 1390 clinical samples from the UK, many from individuals in known epidemiological clusters. We show that SARS-CoV-2 infections are characterised by low levels of within-host diversity across the entire viral genome, with evidence of strong evolutionary constraint in Spike, a key target of vaccines and antibody-based therapies. Although within-host variants can be observed in multiple individuals in the same phylogenetic or epidemiological cluster, highly infectious individuals with high viral load carry only a limited repertoire of viral diversity. Most viral variants are either lost, or occasionally fixed, at the point of transmission, consistent with a narrow transmission bottleneck. These results suggest potential vaccine-escape mutations are likely to be rare in infectious individuals. Nonetheless, we identified Spike variants present in multiple individuals that may affect receptor binding or neutralisation by antibodies. Since the fitness advantage of escape mutations in highly-vaccinated populations is likely to be substantial, resulting in rapid spread if and when they do emerge, these findings underline the need for continued vigilance and monitoring.