Consortium seeks to provide centralized access to global TB data for research, diagnostic development
NEW YORK, Oct 08, 2015 -- An international team of investigators from various academic institutions, public health agencies, and nongovernmental organizations is developing a cloud-based repository of global tuberculosis data to support improved diagnostic development and clinical decision making.
The database is expected to include genotype, phenotype, and clinical information along with associated metadata such as geographic location, testing methodology, and phenotypic drug susceptibility testing results collected from TB patients around the world.
The so-called
Rapid Drug Susceptibility Consortium
(RDST) comprises investigators from several organizations
including the Foundation for Innovative New Diagnostics (FIND),
the Critical Path Institute (CPATH), the US Centers for Disease
Control and Prevention (CDC), and the World Health
Organization.
Among other activities, the
investigators are developing the tuberculosis relational
sequencing platform (ReSeqTB), a system that will offer access
to data and tools for identifying molecular mutations in TB
samples and exploring correlations between these variations and
drug susceptibility testing results.
The researchers
explained
in a paper
published recently in Clinical Infectious Diseases that the
planned repository addresses a communal need for a resource
capable of handling continuous collation, management, and
validation of both retrospective and prospective data on
Mycobacterium tuberculosis drug resistance. Relocating
information that is currently collecting in siloed repositories
into a shared pool accessible to researchers would help expand
current knowledge on the genetic basis for resistance mutations,
they wrote in CID. It could help expose important information on
geographic variations associated with major mutations,
lineage-specific polymorphisms, and new mutations that arise as
a result of practices such as using standardized treatment
regimens, they said.
Access to this kind of
information would bolster efforts to develop more effective
diagnostic tools for rapidly detecting drug resistance, the
researchers wrote. Earlier this year, researchers associated
with Médecins Sans Frontières/Doctors Without
Borders reported
the results of a study
in which they discovered that approximately 30 percent of MDR TB
strains collected during a 2009 outbreak in Swaziland contained
a mutation that could not be detected by most molecular tests of
drug resistance, including Cepheid's widely adopted GeneXpert
MTB/RIF test. Access to data on the geographical occurrence of
resistance mutations could help developers design more tailored
tests moving forward.
The repository could also boost
efforts to develop more potent treatments for recalcitrant TB
cases. According to statistics reported in the paper, of the 9
million new TB cases and 1.5 million TB-related deaths that
occurred in 2013, MDR TB — iterations of the disease that
resist two of the most effective first-line TB drugs —
accounted for an estimated 480,000 cases and 210,000 deaths that
year. Besides better diagnostics and therapies, these datasets
could also improve clinical decision making and even inform
national policy decisions for diagnosing and treating TB.
Marco
Schito, associate scientific director for CPATH's Critical Path
to TB Drug Regimens initiative and one of the authors of the CID
paper, told GenomeWeb that the consortium plans to make the
first iteration of ReSeqTB available on Amazon Web Services in
the US with the possibility of creating mirror sites at other
locations around the world later on.
Initially, the
database will be open to consortium members only, starting at
the end of the month, for early-access testing and to gather
feedback on ways to improve the system Current non-members who
are interested in early access to the database are encouraged to
contact CPATH for details on how to join the consortium. Their
current plan is to make ReSeqTB more broadly available in
October 2016.
ReSeqTB will build on the efforts of
existing repositories such as the
Tuberculosis Drug Resistance Mutation database
and others like it that already exist in the TB community. In
fact, RDST is actively partnering with developers of some of
these existing databases to incorporate their data into ReSeqTB,
Schito said. They plan to obtain the raw sequences that these
groups have collected and run them through an internally
developed computational pipeline annotating the relevant genes
and recalling variants.
This process will be
repeated for all samples that the consortium collects for
ReSeqTB. This way, the consortium controls the quality of the
data that feeds into the platform and will help ensure
consistent, reproducible results across studies, Schito said.
For contributors who aren't comfortable with all of their
research data being made widely available, the consortium will
have mechanisms in place to access their datasets in aggregate,
he added.
Part of ReSeqTB's development process
involved assembling two expert panels to provide guidance on how
to build the actual database architecture and to come up with
criteria for defining drug resistance variants, according to
Timothy Rodwell, FIND's senior scientific officer. Although he
is a member of the consortium, Rodwell is not one of the authors
on the CID paper. The first of these panels, the so-called input
group, was comprised of researchers with expertise in building
whole-genome sequencing analysis pipelines specifically for TB
data. Their task, Rodwell told GenomeWeb, was to design a
standardized computational analysis pipeline, which would be
used to analyze raw sequence from patient isolates. The pipeline
is currently housed on a CDC server but will eventually be
co-located alongside the data stored on the cloud.
Proposed
guidelines needed to include specifics on analysis parameters
for tasks such as variant filtering as well as particulars on
input file specifications, SNP definitions, and ways of reaching
consensus in unclear cases such as when multiple variant callers
report different calls for a given position, he said. These
guidelines were then turned over to a team at the CDC, under the
supervision of James Posey, leader of the CDC's Applied Research
team, who were tasked with the responsibility of actually
developing and validating the pipeline.
A second
panel has been tasked with defining appropriate criteria for
determining the relationship between variants and drug
resistance. Members of the output group, as it's called, are
expected to come up with validated group of important drug
resistance-related mutations that will serve as a standard for
testing the efficacy of diagnostic assays, Rodwell said. Another
task, which the output group will tackle, is establishing
criteria for determining the clinical relevance of TB mutations,
he said.
When it's completed, ReSeqTB will offer
tiered access to data depending on who is trying to use it. In
addition to assay developers, the list of potential users
includes researchers, clinicians, ministries of health, and
national tuberculosis programs, all of whom will be able to
tailor the system to return the kind of information that's most
useful to them. So clinicians, for example, would be able to
search for information on potential treatment options for
patients based on the specific mutations found in test samples,
while diagnostic developers who might be more interested in
which mutations are associated with geography-specific drug
resistance could search for those specific bits of
information.
For now, the consortium's primary focus
will be on providing data to researchers and diagnostics
developers for this first phase of ReSeqTB's development, Schito
said. Their efforts here include designing a user-friendly way
of reporting results to diagnostic developers, Rodwell said.
They are also exploring mechanisms for making the raw sequence
data easily accessible to researchers who may want to apply
their own algorithms and software to the ReSeqTB data rather
than use the consortium's pipeline. One possible option, Schito
said, is it to make FastQ files from ReSeqTB available in one of
the National Center for Biotechnology Information databases,
where high-volume users can easily download them.
If
the initial deployment to test developers and researchers goes
as planned, the consortium will then look into expanding access
to other user groups such as national healthcare systems,
clinicians' practices, and even patients and advocacy groups,
Schito said. With an eye towards expanding access, the
consortium has begun reaching out to some of these parties to
figure out what sorts of questions they might want to address
and gain a better sense of how the database could be of benefit,
he said.
The researchers are also continuing to
gather data to populate ReSeqTB. Currently, they have gathered
information on about 5,000 isolates and these are the first
datasets that will be hosted in the repository. Moving forward,
they will accept data from academic, governmental, and nonprofit
researchers as well as from clinical laboratories, clinical
trial sponsors, and countries performing drug resistance
surveys, according to the CID paper. When the repository goes
live, researchers will have access to the data under specific
use agreements and contributors will always be able to access
and own datasets that they submit.
In terms of
specific contributions, the consortium is primarily interested,
at least for now, in datasets that include good phenotype data
in addition to genotype information. The reason for this, as
Schito explained, is to help clear up discrepancies between drug
resistance phenotypes and associated genotype data. Currently,
"we have all this phenotypic data so we know what's resistant
and what's susceptible but when we compare it with genotypic
data, we have all these discordances," Schito explained. Access
to good phenotype information could help researchers figure out
why these discordances occur, he said.
The consortium
also hopes to capture information on patient outcomes in
ReSeqTB, a task which is difficult to do outside of the context
of clinical trials. As a result of the disease's lengthy
lifetime and equally lengthy treatment regimens, patients
sometimes fail to complete their therapy regimens, or drop one
treatment protocol in favor of another making it difficult to
track treatment response. Schito told GenomeWeb that the
consortium is reaching out to some groups that are attempting to
track TB patient outcomes and will work with them to include
this information in future releases.
The Gates
Foundation provided the initial funding for the ReSeqTB project
— the exact amount is not being disclosed — with
CPATH and FIND as the main grantees. Part of the consortium's
mandate will be to figure out how best to sustain the database
in the long term, Schito told GenomeWeb. He said that consortium
members are mulling options such as charging commercial testing
labs in the US and other high-income countries a small fee for
access to the data. They also hope that global non-profit
organizations like the WHO and Gates Foundation will help
subsidize the cost of analyzing test results in lower-income,
high-disease-burden countries, he said.
Moving
forward, the developers will also publish additional details of
ReSeqTB and their activities. Rodwell told GenomeWeb that in
addition to the current CID paper, the consortium plans to
publish a white paper by the end of this year that will describe
its computational pipeline including details of its development
and construction as well as running parameters. "The whole point
of this entire process is to be completely transparent, make it
available publicly, and also get it peer-reviewed," he said.
Source:
GenomeWeb