Uncovering genetic candidates for colorectal cancer

Mapping levels of gene expression with Eddie and DataStore

A little bit of background...

Colorectal cancer, the fourth most common type of cancer in the UK, is associated with a genetic risk of approximately 40%. To date, 205 different genetic variations have been identified that can lead to an increased risk of colorectal cancer. Interestingly, within these genetic variations, one-off highly disruptive mutations are uncommon. Rather, mutations which change the level at which a gene is expressed in the cells of the colon are a more common risk factor for colonic cancer.

However, it can be difficult to pinpoint which exact variation in the region contributes to differences in gene expression, what exact mechanism is triggered by it, and whether this mechanism contributes to increased risk of colorectal cancer.

In his research, PhD student Bradley Harris tries to answer some of these questions. Specifically, he wants to find which genes change expression level, where they are expressed, and what the relationship between these genes are.

View publication


Eddie and DataStore work hand-in-hand

Bradley used very large publicly available datasets of single cell RNA sequences (scRNAseq). The data was downloaded onto Eddie. “Eddie is where I do all my work”.

He then used Eddie to perform so-called weighted gene co-expression network analysis. This is a robust method to identify correlations between genes, detect non-overlapping gene networks (“modules”) and correlate these to the sample phenotype. It overcomes some of the limitations posed by more traditional methods of measuring gene expression.

Eddie was used for its high computational power. Single cell RNA is very big – Bradley examined 14,840 genes across 32,361 cells. This was much too big for my laptop to handle”. Apart from being secure and reliable, Eddie also already had several shortcuts installed, such as reference sequences (for genomes or transcriptors) or bioinformatics resources (such as Gtex). According to Bradley, this gave him a head start.

To store original datasets, and everything that had gone through Eddie and was “done and dusted”, Bradley used DataStore. He found the filing system convenient, and the fact that you can access it from anywhere. “Eddie and DataStore is what everyone uses, never looked elsewhere because it totally fits the purpose. They work together very nicely; you can send things across from one to the other very easily.”

Read more about Eddie Read more about DataStore



From novice to tutor

Bradley had no experience in coding before – he learned everything at the University of Edinburgh. He felt very well supported, even during the height of the pandemic. He made use of introductory courses on Python and R, as well as the Research Services drop-in sessions. As his confidence grew, Bradley became a helper on courses and taught others. He helped support an EdDash course on high dimensional statics, which gave an insight into what it means to “stand on the other side”. Bradley explains that the University of Edinburgh truly lived up to its promise.


I trained here, worked here and completed my work here - all using Edinburgh’s systems"




Picture of Bradley presenting the work

Picture of Bradley presenting the work at the ‘Probing human disease using single cell technologies’ conference in Cancun, May 2022


Results and future projects with Eddie

In his research, Bradley demonstrated that where these is genetic variation in one of the regions, there is a change of gene expression that is very specific to one cell type, the colonic tuft cells. Colonic tuft cells are known to be involved in immune regulation. These results give researchers a good idea of what might be happening, and prompts avenues for future research. For instance, it is necessary to confirm the causality between the expression levels of specific genes (e.g. POU2AF2), tuft cell abundance and colorectal cancer risk.

The availability and size of single cell RNA sequences datasets is growing considerably. This makes it easier to map heritable disease on a more cell-specific level. Eddie is vital for this project, but as Bradley mentions, the ceiling of work which is possible with Eddie is yet to be exhausted. Aside from RNA sequencing, Bradley’s research group are also running projects focusing on whole genome sequencing. This technique generates notoriously large amount of data, and is a great example of Eddie’s power, potential and dependability.

Figure 3a

Figure 3(a) from the publication in Scientific Reports. Bradley explains: "This is a non-linear reduction of the gene expression profile of all 32,361 healthy colon epithelial cells we studied. Each colour represents a distinct cluster of cells."

About the author

Bradley Harris is a PhD student at the Colon Cancer Genetics Group within the Institute of Genetics and Cancer. Twitter: Bradley Harris (@Bradleyomics)

This case study was written by Dr Sarah Janac, Research Facilitator for the College of Medicine and Veterinary Medicine.

Publication

Harris, B.T., Rajasekaran, V., Blackmur, J.P. et al. Transcriptional dynamics of colorectal cancer risk associated variation at 11q23.1 correlate with tuft cell abundance and marker expression in silico. Sci Rep 12, 13609 (2022). https://doi.org/10.1038/s41598-022-17887-5

Related items

Why don't you explore featured projects demonstrating the use of similar resources and related training opportunities? Have a look at the carousels below.