2.1.5 Control: Loading and processing data

In this control, you are going to do, by yourselves, repeat the analyses we did in the last classes. After most steps, you will have to execute some code, either by pressing enter in the console, or by clicking the Run button in the text area. If you see any errors (text in red), call me.

You have 20 minutes. The information you need to do the exam is in the text, read it attentively. You can look at the pages of the previous lessons if you want. Call me after you succeed each step.

1 Loading the library

In order to use the seqinr library to load and process gene sequence data, run the following command in the console:

library(seqinr)

How to know it worked: RStudio will accept your command, and not produce any errors. The line after the command will be just empty.

2 Reading the data

Gene data can be read using the read.fasta("FILE_NAME"), where FILE_NAME corresponds to the name of the file you are loading, which is the file you saved in the first step.

Save it to a variable named dengue. Remember, that to save the value of some action action to a variable x, you need to write:

x <- action

Replace these by suitable names and actions.

How to know it worked: On the top right panel, in the Environment tab, you will see the variable dengue with the description List of 1.

3 Extracting the gene sequence

The data we load has many details that are not important for us right now, we want just the gene sequence. In order to extract it, save the value dengue[[1]] to the variable dengueseq.

How to know it worked: On the top right panel, in the Environment tab, you will see the variable dengueseq with the description 'SeqFastadna' .... If you type dengueseq in the console, you should see lots of nucleotides.

4 Calculating nucleotide frequencies

The function count calculates nucleotide frequencies, producing a table relating nucleotides to their quantities in the data. In order to use it, you need to do count(data,size), where data corresponds to the data we are using, and size is the size of the blocks. We want to analyse dengueseq, with sizes equal to 1. Save this to a variable freq.

How to know it worked: On the top right panel, in the Environment tab, you will see the variable freq with the description 'table' int .... If you type freq, you will see a table with the header with a, c, g, t.

5 Showing the data in a plot

You can show the data by using the function barplot. Its argument should be the table we want to display, that is, freq.

How to know it worked: You should see a bar plot on the bottom-right panel.