Neurotech Recruitments 25-26 Bio Track Task-1
A Genetic Game of Telephone
A single nucleotide polymorphism or SNP is a single nucleotide variation at a specific genomic position in a large population. It is the most prevalent type of sequence variation found in the human genome. Point mutations that occur in more than $1\%$ of the population qualify as SNPs. These are present once every 1000 nucleotides on an average in the human genome.
For this task, you will be given a reference chromosome and segments of two other versions of it, in the form of FASTA(.fasta
) files. Your aim is to compare the two versions of the chromosome with the reference and find the following:
- Their SNP distribution plot
- SNP frequency matrix
- Transition / transversion ratio.
The SNP frequency matrix required must look something like this, in structure.
Note: The data in this sample is not representative of the data in the attached files.
Submission
Make a report (Google docs/latex) detailing the process you used to identify SNPs and your analysis of the SNPs identified in the sequences. If you used any programming in your submission, attach links to any relevant scripts. Please also list the external software tools you used in your task.
Deadline: 12th September EOD
Submission link: Google form
The relevant files are linked below: