Genomic Inference Workshop Benefits University of Idaho Chipmunk Research
November 30, 2022Photo by Matthieu Petiard on Unsplash
Have you ever tried to catch a chipmunk? Neither has David Sneddon, a doctoral student in the Bioinformatics and Computational Biology (BCB) program. For his thesis, he is investigating the relationships between two subspecies of a local critter – the red-tailed chipmunk. From the hundreds of frozen chipmunk tissue samples at his lab, he has sequenced 30 full genomes that can be analyzed to identify genetic variations, population structures, and new genetic lineages as well as model the role the chipmunks play in the environment.
Sneddon is also interested in how chipmunk interbreeding is affecting their genomes. These genomes are low-coverage, which means that there are fewer copies of each sequenced region compared to a higher-coverage genome. He researches concepts such as recombination (the likelihood of offspring having traits not found in either parent) and introgression (the exchange of genetic material between two lineages) in his chipmunks; to answer these questions, it is important to sample as much of the genome as possible.
As low-coverage whole-genome sequencing is still relatively novel, especially for non-model organisms where genomic resources are scarce or in a draft state, resources for learning necessary analysis tools can be difficult to find. Recently, Sneddon had the opportunity to attend a Physalia-courses workshop on genomic inference from low-coverage whole-genome sequencing data. During this workshop, he learned the basic usage of the ANGSD (Analysis of Next Generation Sequence Data) toolkit along with individual genotype likelihood models. These likelihoods can be used to estimate allele frequencies and their distribution and identify linkage groups within the genomes, which can then be used to estimate genetic variations, population structures, new genetic lineages, and study chromosomes.
“This workshop is great for anyone looking to broaden their genomic sampling to a whole genome without the necessary funds to sequence high coverage genomes for each individual,†Sneddon said.
As sequencing technology improves and genomic resources for non-model organisms become increasingly available, it is becoming more feasible for small labs to utilize whole-genome sequencing in their empirical investigations. However, these advances do not come without a trade-off. Researchers are now tasked with balancing the proportion of the genome sequenced (breadth of coverage) with the number of copies of each sequenced region (depth of coverage) to maximize the information found in sequence data.
When opting for breadth rather than depth, such as in low-coverage whole-genome sequencing, standard tools are often inappropriate and inaccurate. The ANGSD toolkit tackles this issue by estimating data differently using genotype likelihoods, then applying these likelihoods for more accurate analyses.
The need for this comprehensive genomic sampling, compounded with limited lab funds, has made low-coverage whole-genome sequencing a great option for Sneddon. “Working with low-coverage whole-genomes at first filled me with quite a bit of angst,†he said. “With my new skillset, I am excited to turn my former angst into ANGSD and learn about my chipmunks’ evolutionary history using robust and proper data processing and analysis.â€
Article by Rachel Wiedenmann
IIDS Scientific Writing/Design Intern