GSoC 2018 Clonal evolution visualization web-tools

Background

Mutations drive the initiation and development of the tumor. And the mutations accumulate during tumor growth. The cancer environment selects tumor cells, and the environment selects specific cells to become clones. In a tumor, the clone composition differs between regions, resulting in tumor genetic heterogeneity within the tumor. Studying cancer genetic heterogeneity can help people understand the process of tumor initiation and development. And researcher can better know the therapy by looking into how treatment impacts the cancer genetic heterogeneity.

Because of a clonal specific mutation present in the same population of cells, they have similar VAF in one sample.  And the trend of a clone specific mutation between samples is consistent. Therefore, by clustering the mutations by the variant allele frequency (VAF, more about sequencing and VAF), we can identify the clones. The sequence of mutations also can be inferred by comparing the VAF mutations of these mutant clusters.


Describe my work briefly

  • Create a pipeline to generate the mutation cluster information from datasets in the cBioPortal.
  • Implementation Cloncal evolution visualization function in front end.

What is done

  • Create a PyClone pipeline to generate mutation cluster information using cBioPortal datasets.
  • Implemented the PyClone pipeline using Docker, to ease the deploy.
  • Evaluate the sensitivity of ClonEvol parameters.
  • Establish a workflow to visualize the ClonEvol result.
  • Implement the basic (threshold-based) Clone Ordering infer function in the front end.
  • Implement the cancer cell fraction (CCF) and tree plot in the front end.
  • Implement the interaction between tree plot and the mutation table.

Demo

We used the lung cancer multi-region sequencing data from TRACERx project's dataset as an example. The CRUK0034 patients have three tumor regions been sequenced: CRUK0034-R1, CRUK0034-R2, and CRUK0034-R23.



The result provided the cancer cell fraction (CCF) line plot and tree-plot for 5 mutation clusters. Among the five mutant clusters, Cluster1 has 100% CCF in 3 regions. Therefore mutation Cluster1 occurs at the most beginning. Besides, mutation cluster4 has a CCF approaching 100% in regions 1 and 3, while mutation cluster 5 has a CCF approaching 100% in region 2. Hence, we can infer that regions 1 and 3 are the clone from the cell with mutation Cluster4, while region 2 is a clone from the cell with mutation Cluster5.

By clicking on the edge of the tree-plot, we can filter the mutant table. We can see the well-known oncogene KRAS mutation, which is very likely to mediate tumorigenesis, presented in the trunk of the tree. 

By clicking on mutation cluster5 in the tree, there is a mutation in the JAK2 gene in mutation cluster5, suggesting that JAK2 may mediate positive selection of clones and generate clones.

TODO

  • Fix the bugs, error, and warnings then merge the code.
  • Customize the react-line plot to fix the legend position.
  • Further implementing statistics based Clone Ordering infer function.
  • Optimizing mutation cluster filtering options.
  • Add the tooltip, downloading graph function in the tree plot and line plot.

My GSoC Journey

I participated in the GSoC because of my experience of using cBioPortal on cancer research. I also hope to contribute.

I learned a lot from it, in addition to the front-end REACT framework and other IT technologies, I learned a lot about how to design projects and team communication.

The cBioPortal development team is full of active and creative. I am very lucky to involve in this cooperation, thanks to my mentor Chris Fong and Ino!

Useful links


Comments