Research Publications

Review Articles

Open Targets Platform

The Open Targets Platform - previously called the Target Validation Platform - is a freely available resource for the integration of genetics, omics and chemical data to aid systematic drug target identification and prioritisation. The Platform provides summaries of the evidence (e.g. germline variants, somatic mutations, pathways, drugs) for the involvement of a specific gene with a selected disease. It supports disease and target centric workflows, displays the evidence for target-disease associations, as well as profiles of relevant target and disease annotations.

Please see Carvalho-Silva et al. 2019 for more information.

Open Targets Genetics

Open Targets Genetics is a portal for investigation of Genome Wide Association Study (GWAS) data to assist in identifying the causal genes underlying each association, and hence to prioritise drug targets. The portal aggregates and merges genetic associations curated from literature and newly-derived loci from UK Biobank with (open source) functional genomics data including epigenetics (e.g., chromatin conformation, chromatin interactions) and quantitative trait loci (e.g., eQTLs from GTEX, pQTL), and applies statistical fine-mapping across thousands of trait-associated loci, to resolve association signals and link each variant to its proximal and distal target gene(s), using a single evidence score.

Project Score

Project Score is a web portal that allows researchers to explore the results of CRISPR-Cas9 whole-genome drop out screens across a diverse collection of human cancer cell models and to identify dependencies in cancer cells to help guide precision cancer medicines.

Please see Behan et al. 2019 for more information.


epiChoose is an application for quantifying the relatedness between cell lines and primary cells. We have extensively profiled a number of commonly used cell line models across a number of tissues. This profiling consists of epigenetic (histone modification, CTCF, ATAC-seq) and transcriptional (RNA-seq) whole-genome measurements. From this data, we have established a platform that provides information on the “distance” between candidate cell models and target primary cells. By ranking and selecting cell line models in a data-driven manner, the best model for the experiment can be chosen, based on epigenetic and transcriptional evidence and not historical usage.


Genome editing by CRISPR-Cas9 technology allows large-scale screening of gene essentiality in cancer. A confounding factor when interpreting CRISPR-Cas9 screens is the high false-positive rate in detecting essential genes within copy number amplified regions of the genome. We have developed the computational tool CRISPRcleanR which is capable of identifying and correcting gene-independent responses to CRISPR-Cas9 targeting. CRISPRcleanR uses an unsupervised approach based on the segmentation of single-guide RNA fold change values across the genome, without making any assumption about the copy number status of the targeted genes. CRISPRcleanR is implemented as an R package and as an interactive Python package with full documentation, tutorials, built in datasets to reproduce the results in this manuscript, and is publically available (R package: and Python package: The Python implementation is dockerized making it platform independent and usable in cloud environments (

Please see Iorio et al. 2018 for more information.

CELLector (Genomics Guided Selection of Cancer in vitro Models)

CELLector is a computational tool implemented in an open source R Shiny application and R package that allows researchers to select the most relevant cancer cell lines in a genomic-guided fashion. CELLector combines methods from graph theory and market basket analysis; it leverages tumour genomics data to explore, rank, and select optimal cell line models in a user-friendly way, enabling scientists to make appropriate and informed choices about model inclusion/exclusion in retrospective analyses and future studies. Additionally, it allows the selection of models within user-defined contexts, for example, by focusing on genomic alterations occurring in biological pathways of interest or considering only predetermined sub-cohorts of cancer patients. Finally, CELLector identifies combinations of molecular alterations underlying disease subtypes currently lacking representative cell lines, providing guidance for the future development of new cancer models.

Please see Najgebauer et al. 2018 for an explanation of the approach.

LINK (LIterature coNcept Knowledgebase)

A new tool called LINK (LIterature coNcept Knowledgebase) has been developed that allows the exploration of half a billion relations between genes, diseases, drugs and key concepts extracted from PubMed abstracts using NLP (Natural Language Processing).

Once MEDLINE relaxed their license for obtaining and analysing publication data last year, we started looking for novel ways to mine their data. We wanted to exploit the biomedical knowledge often buried in the literature to help scientists generate new hypotheses for the identification of new drug targets. For this purpose, we have built Library, an open source ecosystem comprising:

  • a pipeline that allows us to quickly run a large scale NLP analysis.
  • an API that serves the resulting data.
  • a user interface to explore this data.

Our pipeline annotates genes, diseases and drugs present in PubMed abstracts, and extracts key concepts.

Please see Andrea Pierleoni’s blog post for more information.

DoRothEA (Discriminant Regulon Expression Analysis)

DoRothEA (Discriminant Regulon Expression Analysis) is a research resource that can be used to search candidate TF-drug interactions in cancer.

Due to their location as downstream effectors of signalling pathways, aberrant activities in upstream driver genes (even if not mutated) will cause altered TF activities, thus proposing TFs as sensors of pathway dysregulation and alternative markers. Here we study the role of 127 TFs in drug sensitivity across ~1,000 cancer cell lines screened with 265 anti-cancer compounds from the GDSC. In our first approach we studied how the TF activity pattern of an individual affects drug response and mined for single TF-drug statistical interactions. In our second approach we screened for TFs whose activity patterns complement or improve well-established genomic markers in the prediction of drug response.

Please see Garcia-Alonso et al. 2017 for an explanation of the approach.

Open Targets Principles

Open Targets is guided by the following principles:

  • We are focused on pre‐competitive research that will enable the systematic identification and prioritisation of targets
  • We are committed to rapid publication and making data, methods and results publically available as soon as possible
  • We believe in non‐exclusive partnerships that foster the free exchange of ideas and expertise

We place in the public domain all our new informatics tools, experimental methods, platforms and the data generated by our projects as soon as is practical. We do not plan to seek patent protection for IP arising from Open Targets. However, we recognise that instances may arise where this is appropriate to support our mission. Therefore, Open Targets has a Joint Patent Committee with scientific and legal experts from all of our partners that reviews all potential publications (but not raw data) prior to submission.

In practice, this means:

  • For the development and execution of Open Targets Research Projects each Member has agreed to license its Background IP to the other members.
  • All Members have agreed to license Open Targets Arising IP to each other for use in Open Targets Research Projects and for the Members’ research and development activities.
  • Any IP that relates directly and solely to an Industry Partner Compound will belong to the Industry Partner.
  • Any other IP arising from an Open Targets Research Project will belong to the Member(s) that invented it.