Open Targets employs large-scale human genetics and genomics data to change the way drug targets are identified and validated. We have established a set of interlinking projects to develop both the data and analytical processes that implicate targets as valid, and the core platform to provide the information to a diverse audience of users. We are developing 2 major areas of work, which in each case can be further subdivided into individual workstreams. The major areas are the core bioinformatics platforms, including pipelines and a database to integrate existing target validation data as well as a public web portal to serve the integrated views, and experimental projects which will generate new data to feed into the database.

We have established a portfolio of experimental projects that we believe will provide target validation information relevant to key therapeutic areas. Our approach is to use high throughput methodologies that can address as near as possible the full range of relevant targets (ideally the whole genome) in systems that are relevant to the physiology of diseases with high unmet need. As such we have identified oncology, inflammatory bowel diseases (IBD), respiratory disease, inflammation and immunity as suitable therapeutic areas because in addition to the pressing need for developments in these areas due to disease burden, there is complementary expertise between the partners’ research interests.

We are using methods that provide cross cutting themes across multiple therapeutic areas. We have chosen to focus on the use of genetics as a tool for target identification and validation, and subsequent exploitation of cellular models of disease through the new gene editing technologies or single cell analysis. We are investigating the use of large populations and approaches such as Mendelian Randomisation including for metabolomic and proteomic data. In addition to clinical samples, iPS cells and cellular organoids are resources that can provide cellular phenotyping at scale with more physiological relevance than transformed cell lines.

Core Bioinformatics and Data Pipelines

We leverage the power of our partners’ data analyses and annotation to integrate information about common and rare diseases. Our work on core components and pipelines forms the backbone of information and display for the Open Targets Platform, and encompasses any type of data that is relevant to human disease biology.

The key challenge for our core bioinformatics work is to integrate data sources that support the validity of a target in a single infrastructure, allowing seamless interrogation of all these data. We aim to cover many data types relevant to human disease biology, and our approach is to leverage the power of existing data analysis and curation efforts within the Open Targets partners and beyond. To do this we have developed a data model describing evidence of the association of a target (gene or protein) with a disease. This model is applicable to diverse data sources including rare and common genetic associations, pathway and network information, gene expression datasets, known drug targets and literature mining via EuropePubMedCentral. We have developed statistical approaches to summarise the association statistics from these diverse sources into a single joint score for prioritization and inferring putative unobserved associations.

We have developed the Open Targets Platform to provide summaries of the evidence for the involvement of a specific gene with a selected disease. The platform was developed using a user experience (UX) design process over the last 18 months using extensive interaction with a selected user community of scientists within our partners. The first public version of the platform (version 1.0) was released in December 2015, and we expect to release version 1.3 in November 2016. The platform currently supports workflows starting at either a target or disease, and displays the evidence for associations as well as profiles of relevant information for the target or disease and state of the art Javascript visualisations developed and tested via the UX process. We also provide direct data access via a supported public API and provide examples via our blog at blog.opentargets.org.

Our next major bioinformatics focus is the development of a genetics pipeline to establish the most likely variant associations to disease from well-powered GWAS and the association of SNPs to genes via functional genomics data. We have tested fine-mapping approaches for GWAS data for IBD and are developing a pipeline to apply this to additional key datasets. We are building a generalizable system for disease association to SNP to gene assignment incorporating resolution of the GWAS signals by fine mapping from summary statistics, and gene assignment using key regulatory datasets including the GTEx consortium expression QTL data, Fantom5 tag sequences, and physical interaction data including promoter capture Hi-C.

Video: Ian Dunham: The Open Targets Platform

Collaborative Experimental Projects

We have established a portfolio of experimental projects that will provide target identification and prioritisation information relevant to key therapeutic areas. Our approach is to use high throughput methodologies that can address as near as possible the full range of relevant targets (ideally the whole genome) in systems that are relevant to the physiology of diseases with high unmet need.

Across our therapeutic focus areas we use genetics as a tool for target identification and prioritisation, followed by validation using cellular models of disease through the new gene editing technologies or single cell analysis. We are investigating the use of large populations and approaches such as Mendelian Randomisation including for metabolomics and proteomic data. In addition to clinical samples, iPS cells and cellular organoids are resources that can provide cellular phenotyping at scale with more physiological relevance than cell lines. For example, we are comparing iPSC derived macrophages (a cell type relevant to a number of disease areas) to macrophages derived from primary monocytes and developing gene editing in macrophages derived from an iPS cell line.

We have identified Oncology, Immunology and Neurodegeneration as therapeutic areas for Open Targets both because they represent substantial unmet therapeutic need, and there is complementary expertise within the partners.

Oncology

We use genomic data from clinical cancer research to identify ‘driver’ genes in many types of cancer. This guides the way we interpret experimental results, and helps us identify potential drug targets and clinically relevant associations. To maximise the usefulness of our results, we prioritise tumour types and mutations most in need of clinical investigation.

In Oncology there is the opportunity to leverage resources and expertise within the Sanger Institute’s cancer program, which has played an important role in understanding the genetic basis of cancer. A shared theme across the oncology workstreams is the application of genomic data from analysis of clinical samples to guide target development. We use a variety of accessible cancer resources to curate and analyse clinical genomic datasets to identify driver genes (mutations, amplification, deletions and gene-fusions) across multiple cancer sub-types. A key resource is the unique collection of >1000 human cancer cell lines at the Sanger Institute along with their drug sensitivities. Genomic information including RNA-seq and synthetic lethality from genome editing will enable the identification of putative targets, the selection of model systems that best reflect the biology of tumours, as well as guide the analysis of experimental results to identify clinically relevant associations.

Immunology

Open Targets has developed state-of-the-art analysis methods for its immunology focus, which includes inflammatory bowel disease (IBD). These projects will validate candidate targets experimentally in gut tissue. Our immunology work benefits from access to the UK’s IBD BioResource, which collects DNA, biopsies and stool for genomic studies. We will expand this work into the area of asthma research, using single-cell genomics.

In Immunology we initially focussed on inflammatory bowel disease (IBD) where there is both a strong interest from GSK as a potential application area from the broad immuno-inflammation area and substantial expertise on the Genome Campus. We have developed a state of the art meta-analysis for the existing IBD cohorts, and will move candidate targets from this into genome-wide knockouts in gut epithelium organoids for validation. We are also partly supporting an inception cohort in the UK IBD BioResource, where DNA, biopsies and stool will be collected for future genomic studies. As Open Targets has progressed we have expanded our focus to additional projects probing the role of targets either in well-defined immune cells through gene editing and epigenetic profiling (for instance in macrophages, dendritic cells and T cells in response to various stimulations) or in disease such as asthma using single cell genomics. A project to identify receptor ligand pairs in NK cells with application in immune-oncology crosses these two therapeutic areas.

Neurodegeneration

Using gene-editing techniques on neurons derived from induced-pluripotent stem cells, we are identifying factors that influence oxidative stress response and Tau uptake including mutations specific to Alzheimer’s disease. Using the same neuron systems, we are also using genome-wide association studies of Alzheimer’s and Parkinson’s disease to identify and test potential targets.

Since Biogen joined Open Targets we have initiated a series of projects in Neurodegeneration. These projects will use similar approaches, such as gene editing in neurons derived from iPS cells to identify modifiers of the response to oxidative stress, mechanism of Tau uptake and the effects of Alzheimer’s disease specific mutations. We are using fine mapping of GWAS in Alzheimer’s and Parkinson’s disease to identify and test potential targets in the same neuron systems. We are also characterising these systems at the single cell genomics level.