Big Data Gets Personal: Bringing Biological Meaning to Genomic Data Sets

July 1, 2014 in Blog by tahera

By Tahera Zabuawala, PhD

In the previous blog I discussed the explosive growth of DNA sequencing technology, in this article I will shed light on the downstream bottleneck processes. Taking the car analogy again, even an engine with immense horsepower alone cannot move the car forward unless an efficient transmission system delivers the engine’s power to the wheels. Similarly, although we can now sequence DNA deeper, faster and cheaper than ever before, unless we intelligently interpret the sequenced DNA into accurate, easy and digestible solutions, we cannot use it to develop a roadmap towards enhancing and extending human life.

There are several steps involved to bring clinical value to the several million ‘ATGC’s sequenced. The first step is to align the sequence to a reference genome and annotate variants. There are advanced alignment tools available that accurately align sample genome to a reference genome and sophisticated algorithms that differentiate and annotate variant types to deliver variant calls representative of the true nature of the sample. Accuracy in alignment really matters so that the real biology can be discerned from noise in the data. The current public reference (hg19) contains a minor allele at over a million positions, significant fraction of which are associated with disease (e.g. Factor V Leiden, rs6025). This can cause false positive (individual has major alleles that are absent in reference genome) and false negatives (individual has minor alleles that are present in reference genome). Also, ethnical biases in the reference genome should be accounted for during variant calling.

The next step is to analyze and interpret the variants by incorporating prior scientific and clinical knowledge to deliver results in a concise, intuitive and actionable report. This step is really complicated, nuanced and multifaceted. Primary literature, manual curation and expert opinion are used to compare variants from individuals to the repository of biomedical important variants to filter out noise from the data and derive clinically tangible interpretations.

As one can sense, turning the genomics vision into reality in the clinic involves exponential data explosion – extracting, processing, analyzing and storing large volumes of data. The answer to an economically viable technology that could carry out such an operation came in with the advent of Big Data technology. The Big Data Phenomenon came to existence with the explosion of social/digital media in our lives. Companies like Google, Facebook & Twitter revolutionized our ways of interacting and communicating. Internet became the Information Highway and before we realized everyone and everything was connected to the internet. Companies started to realize that data overload, was in fact a treasure trove of valuable information that could be monetized and Hadoop came to existence. The idea of was quite simple. Instead of making one monster machine (like the mainframes of past), why not tie a bunch of computers together and make them work like one big machine, with each one sharing the workload, performing its tasks and finally providing one unified output? First developed as a project at Yahoo, Hadoop has now become the go-to platform for all Big Data Use Cases. Big Data does not just refer to large volumes of data, usually in petabytes, but also defined by the type of data and the frequency at which it is generated, commonly referred as 3 Vs (Volume, Variety, Velocity). The same 3Vs also very relevant in personalized medicine and hence Hadoop offered the much awaited economically viable solution bringing biological meaning to genomic data sets.


PIcture Tahera 2

Clinical Workflow for Tumor Genome Analysis and Interpretation



Mining DNA to Develop Targeted Cancer Drugs

June 2, 2014 in Blog by tahera

By Tahera Zabuawala, PhD

Just a few years ago, sequencing the first human genome cost about $1 billion and took 13 years to complete. Today the same task typically costs $1,000 to $4,000 and takes as little as a few hours. With sub-$1,000 genome sequencing soon becoming a reality, it will enable large-scale sequencing studies that could lead to revolutionary advances in personalized medicine.

Diagram Blog Post 060214Next Generation Sequencing-based strategy to develop targeted therapeutics in oncology

Next Generation DNA sequencing technology will be the engine that powers targeted cancer therapeutics to navigate the clinical trials road map. Not only the ‘engine’ has achieved an exponential increase in horsepower, we have also gained an insight about the fuel (tumor DNA). There is strong evidence that a tumor from a given primary site or histology type is genomically heterogeneous which has a dramatic influence on responsiveness to a drug. Several kinase inhibitors developed along with companion diagnostic tests have achieved success by selecting patients that are mostly likely to benefit or by excluding patients that are least likely to benefit from the regimen. Companion tests evaluate whether the patient harbors a mutation in the ‘druggable’ target; patients are selected based on the test results. For example, a companion test for erlotinib analyzes mutations in the EGFR gene.

A downside of using only companion tests is that it does not aid in tailoring alternate therapies for patients that develop resistance for the ‘druggable’ target. For example, even if the patient did not harbor EGFR mutations, the patient may be resilient to erlotinib treatment due to the impact of an alteration in another gene in the molecular pathway. Whole genome/exome sequencing provides an ‘actionable genome’ that highlights functional aspects of the druggable target and its associated molecular pathways.

Another important application of whole genome sequencing lies in pharmacogenomics or pharmacogenetics. Pharmacogenetics is the study of an individual’s genotype and the ability of the individual to metabolize a drug. Gene sequencing technology helps in identifying the network of genes that determine responsiveness to a drug. Consequently, in the preclinical setting, one may start screening compounds with the least variation across individuals and the compound that works best overall against all its subtypes may be chosen.

Continuing with the car analogy, engineering a car based on the type of fuel – gas, diesel, ethanol or electricity, will increase the performance and reduce its carbon emission. Similarly, whole genome or exome sequencing enables stratification of patient’s genotype (fuel) that will increase the success rates of drug development. This information can also be used to rescue failed or failing drugs by redefining the patient population. An informed and rational-based strategy of genetic subtyping of patient response, whether due to different molecular subtypes or differential drug effects, will expedite drug development and entry of more targeted cancer drugs in the market with potential increases in efficacy and an improved side effect profile.

Life Science Strategy Group, LLC is experienced in helping companies define a commercialization pathway for targeted oncology therapeutics. Dr. Wade Sunada, Ph.D., LSSG’s oncology practice leader, explains: “Developing a commercialization pathway in oncology for targeted therapeutics requires more than knowledge of the oncology market, but also an understanding of the science. LSSG can help identify potential hurdles and streamline the commercialization pathway to help companies bring their targeted therapeutics to market.