Anthony Wood is a postdoctoral researcher at the Rowland Kao Group, Roslin Institute. He started here at May 2021, working on Covid data modelling for Public Health Scotland. The Covid data he was working with was very specific and at very high resolution – it included data on testing, hospital admissions, vaccinations and intensive care units. This work, alongside all the data sharing agreements, were fast-tracked on an emergency basis with retrospective ethics approval.
Fast forward two years, and Anthony’s work has moved from modelling the spread of Covid to modelling the spread of bovine tuberculosis (TB). Specifically, the overarching question is: How are badgers contributing to the spread of bovine TB amongst cattle?
Although the diseases and hosts are very different in these two projects, the underlying principles of coding and modelling of data are similar.
CMVM Research Facilitator Sarah Janac met with Anthony Wood to discuss his work, especially his experience of version control. Version control systems help you track and manage changes made to code (or other files) over time.
Can you tell me a bit more about your current project on bovine TB?
I am bringing together data on badgers and cattle, including DNA sequencing data of positive cases and data on cattle movement. I want to find out how this can be used to understand the direction of transmission. For instance, imagine there are three cattle and one badger in a field. The cattle have very different sequencing data, but the badger’s data is similar to one of the cows’. This suggests the cow and that badger may be closely linked in a chain of transmission. However, it is not obvious if badger gave it to cow or vice versa.
What role does coding play in your research?
The purpose of the code is two-fold: First, it simply helps me bring in sequence data, bring in cattle data and visualising it all. Second, I use code for mathematical modelling. I start off by building simulations with artificial data, both on the badger and cattle systems. This helps me test more underlying pen and paper theory. In the next step, I will take the real data and apply the modelling to it.
Tell me about version control
I use Gitlab for my current project on bovine TB, which is provided free of charge by the University. In my research group we try and work with the mentality: If I disappear tomorrow, could someone pick up my code? Gitlab is great for this. My code works on Gitlab, and I keep it up to date on the git repository. In my research group, everyone has access to each other’s code.
How long have you been using Gitlab?
This is the first project I am using it for. During the Covid project, I chose a much more manual process. This did the job – for instance, I was able to amend the code after peer review – but it was time consuming. Now with Gitlab, I have a single file with code, and I don’t even really have to think about it. I have about 10 different scripts which talk to one another.
Why did you not use Gitlab earlier?
The learning curve for Gitlab is steep. As you can imagine, things were a little crazy during the Covid project. I did not feel like I had the time to prepare properly or invest time into learning Git. However, I knew this bovine TB project was going to be a long-term project and I put the effort in. It took me a couple of days to get everything set up and get the hang of commands. I still consider myself a novice with Git, but I would use it again without hesitation. It has saved me so much time and effort.
You say the learning curve is steep… any advice for future users?
I did a lot of googling to understand how things worked. People in my research group also helped me. I would say – it is worth the effort and getting over the learning curve!
Why do you think version control is so important?
It all goes back to reproducibility. By having version control, you have a version in time where you can see when decisions were made. It is important because it illustrates the research process, rather than me presenting a finished article without context. Moreover, you never know when you might need to resurrect code to make changed. If you don’t have version control, you don’t stand a chance.
Will you publish your code on Gitlab?
Yes, I will make the final repository public on Gitlab and add a link in my publication. Often when I read papers, I either cannot find the code or it does not run, which is frustrating. Especially in some of my early conceptual modelling for bovine TB, no actual data is required, so in principle it can be run by anyone and adapted to other diseases and so on.
Anthony is still in the early stages of his bovine TB project, and he has not yet published code. However, he has shared links to the public Gitlab repositories underlying his Covid work.
Finally, we have talked a lot about Gitlab and version control. Do you use any other tools provided by the University?
I use DataStore to store sequencing data, data on cattle movements and testing. We use the shared group space function in our lab. DataStore is an absolute requirement for our work – the data we use is under quite strict sharing agreements and needs secure storage.
I also use Pure when I get a paper published, to help make it open access.
This case study was written by Dr Sarah Janac, Research Facilitator for the College of Medicine and Veterinary Medicine.
Dr Anthony Wood is a Post-Doctoral Research Fellow at Roslin Institute.
"Meet the Scientist" profileWhy don't you explore featured projects demonstrating the use of similar resources and related training opportunities? Have a look at the carousels below.