2.1.5 Control: Loading and processing data
In this control, you are going to do, by yourselves, repeat the analyses we did in the last classes. After most steps, you will have to execute some code, either by pressing enter in the console, or by clicking the Run
button in the text area. If you see any errors (text in red), call me.
You have 20 minutes. The information you need to do the exam is in the text, read it attentively. You can look at the pages of the previous lessons if you want. Call me after you succeed each step.
1 Loading the library
In order to use the seqinr
library to load and process gene sequence data, run the following command in the console:
library(seqinr)
How to know it worked: RStudio will accept your command, and not produce any errors. The line after the command will be just empty.
2 Reading the data
Gene data can be read using the read.fasta("FILE_NAME")
, where FILE_NAME
corresponds to the name of the file you are loading, which is the file you saved in the first step.
Save it to a variable named dengue
. Remember, that to save the value of some action action
to a variable x
, you need to write:
x <- action
Replace these by suitable names and actions.
How to know it worked: On the top right panel, in the Environment
tab, you will see the variable dengue
with the description List of 1
.
3 Extracting the gene sequence
The data we load has many details that are not important for us right now, we want just the gene sequence. In order to extract it, save the value dengue[[1]]
to the variable dengueseq
.
How to know it worked: On the top right panel, in the Environment
tab, you will see the variable dengueseq
with the description 'SeqFastadna' ...
. If you type dengueseq
in the console, you should see lots of nucleotides.
4 Calculating nucleotide frequencies
The function count
calculates nucleotide frequencies, producing a table relating nucleotides to their quantities in the data. In order to use it, you need to do count(data,size)
, where data
corresponds to the data we are using, and size
is the size of the blocks. We want to analyse dengueseq
, with sizes equal to 1
. Save this to a variable freq
.
How to know it worked: On the top right panel, in the Environment
tab, you will see the variable freq
with the description 'table' int ...
. If you type freq
, you will see a table with the header with a, c, g, t
.
5 Showing the data in a plot
You can show the data by using the function barplot
. Its argument should be the table we want to display, that is, freq
.
How to know it worked: You should see a bar plot on the bottom-right panel.