We have developed the Open Targets Platform to provide summaries of the evidence for the involvement of a specific gene with a selected disease.
Open Targets Genetics is a portal for investigation of Genome Wide Association Study (GWAS) data to assist in identifying the causal genes underlying each assoication, and hence to prioritise drug targets. The portal aggregates and merges genetic associations curated from literature and newly-derived loci from UK Biobank with (open source) functional genomics data including epigenetics (e.g., chromatin conformation, chromatin interactions) and quantitative trait loci (e.g., eQTLs from GTEX, pQTL), and applies statistical fine-mapping across thousands of trait-associated loci, to resolve association signals and link each variant to its proximal and distal target gene(s), using a single evidence score.
epiChoose is an application for quantifying the relatedness between cell lines and primary cells. We have extensively profiled a number of commonly used cell line models across a number of tissues. This profiling consists of epigenetic (histone modification, CTCF, ATAC-seq) and transcriptional (RNA-seq) whole-genome measurements. From this data, we have established a platform that provides information on the “distance” between candidate cell models and target primary cells. By ranking and selecting cell line models in a data-driven manner, the best model for the experiment can be chosen, based on epigenetic and transcriptional evidence and not historical usage.
Genome editing by CRISPR-Cas9 technology allows large-scale screening of gene essentiality in cancer. A confounding factor when interpreting CRISPR-Cas9 screens is the high false-positive rate in detecting essential genes within copy number amplified regions of the genome. We have developed the computational tool CRISPRcleanR which is capable of identifying and correcting gene-independent responses to CRISPR-Cas9 targeting. CRISPRcleanR uses an unsupervised approach based on the segmentation of single-guide RNA fold change values across the genome, without making any assumption about the copy number status of the targeted genes. CRISPRcleanR is implemented as an R package and as an interactive Python package with full documentation, tutorials, built in datasets to reproduce the results in this manuscript, and is publically available (R package: https://github.com/francescojm/CRISPRcleanR and Python package: https://github.com/cancerit/pyCRISPRcleanR). The Python implementation is dockerized making it platform independent and usable in cloud environments (https://dockstore.org/containers/quay.io/wtsicgp/dockstore-pycrisprcleanr).
Please see Iorio et al. 2018 for more information.
CELLector is a computational tool implemented in an open source R Shiny application and R package that allows researchers to select the most relevant cancer cell lines in a genomic-guided fashion. CELLector combines methods from graph theory and market basket analysis; it leverages tumour genomics data to explore, rank, and select optimal cell line models in a user-friendly way, enabling scientists to make appropriate and informed choices about model inclusion/exclusion in retrospective analyses and future studies. Additionally, it allows the selection of models within user-defined contexts, for example, by focusing on genomic alterations occurring in biological pathways of interest or considering only predetermined sub-cohorts of cancer patients. Finally, CELLector identifies combinations of molecular alterations underlying disease subtypes currently lacking representative cell lines, providing guidance for the future development of new cancer models.
Please see Najgebauer et al. 2018 for an explanation of the approach.
A new tool called LINK (LIterature coNcept Knowledgebase) has been developed that allows the exploration of half a billion relations between genes, diseases, drugs and key concepts extracted from PubMed abstracts using NLP (Natural Language Processing).
Once MEDLINE relaxed their license for obtaining and analysing publication data last year, we started looking for novel ways to mine their data. We wanted to exploit the biomedical knowledge often buried in the literature to help scientists generate new hypotheses for the identification of new drug targets. For this purpose, we have built Library, an open source ecosystem comprising:
Our pipeline annotates genes, diseases and drugs present in PubMed abstracts, and extracts key concepts.
DoRothEA (Discriminant Regulon Expression Analysis) is a research resource that can be used to search candidate TF-drug interactions in cancer.
Due to their location as downstream effectors of signalling pathways, aberrant activities in upstream driver genes (even if not mutated) will cause altered TF activities, thus proposing TFs as sensors of pathway dysregulation and alternative markers. Here we study the role of 127 TFs in drug sensitivity across ~1,000 cancer cell lines screened with 265 anti-cancer compounds from the GDSC. In our first approach we studied how the TF activity pattern of an individual affects drug response and mined for single TF-drug statistical interactions. In our second approach we screened for TFs whose activity patterns complement or improve well-established genomic markers in the prediction of drug response.
Please see Garcia-Alonso et al. for an explanation of the approach.
Open Targets is guided by the following principles:
We place in the public domain all our new informatics tools, experimental methods, platforms and the data generated by our projects as soon as is practical. We do not plan to seek patent protection for IP arising from Open Targets. However, we recognise that instances may arise where this is appropriate to support our mission. Therefore, Open Targets has a Joint Patent Committee with scientific and legal experts from all of our partners that reviews all potential publications (but not raw data) prior to submission.
In practice, this means: