Classwork for BIMM143
Laura (PID: A17844089)
All functions in R have at least 3 things:
Our first wee function:
add <- function(x, y=1) {
x + y
}
Lets test our function
add(10, 10)
[1] 20
add(10)
[1] 11
Let’s try something more interesting. Make a sequence generation tool.
The sample() function could be useful here.
sample(1:10, size = 3)
[1] 6 10 3
change this to work with the nucleotides A C G and T and return 3 of them
n <- c("A", "C", "G", "T")
sample(n, size=15, replace = TRUE)
[1] "A" "T" "C" "C" "G" "C" "T" "A" "C" "G" "G" "G" "A" "G" "G"
Turn this snipet into a function that returns a user specified length
dna sequence. Let’s call it generate_dna…
generate_dna <- function(len=10, fasta=FALSE) {
n <- c("A", "C", "G", "T")
v <- sample(n, size=len, replace = TRUE)
# Make a single element vector
s <- paste(v, collapse="")
cat("Well done you!\n")
if(fasta) {
return( s )
} else {
return( v )
}
}
generate_dna(5)
Well done you!
[1] "G" "A" "C" "G" "A"
s <- generate_dna(15)
Well done you!
s
[1] "C" "C" "A" "T" "C" "T" "T" "C" "C" "A" "G" "T" "G" "C" "A"
I want the option to return a single element character vector with my sequence all together like this: “GGAGTAC”
s
[1] "C" "C" "A" "T" "C" "T" "T" "C" "C" "A" "G" "T" "G" "C" "A"
paste(s, collapse="")
[1] "CCATCTTCCAGTGCA"
I want the option to return a single element character vector with my sequence all together like this: “GGAGTAC”
generate_dna(10, fasta=FALSE)
Well done you!
[1] "T" "A" "C" "G" "C" "G" "G" "G" "G" "C"
generate_dna(10, fasta=TRUE)
Well done you!
[1] "ACACTATGAA"
Make a third function that generates protein sequence of a user specified length and format.
generate_protein <- function(size=15, fasta=TRUE) {
aa <- c("A", "R", "N", "D", "C", "Q", "E",
"G", "H", "I", "L", "K", "M", "F",
"P", "S", "T", "W", "Y", "V")
seq <- sample(aa, size = size, replace = TRUE)
if(fasta) {
return(paste(seq, collapse = ""))
} else {
return(seq)
}
}
Try this out…
generate_protein(10)
[1] "WNNHLFCLSW"
Q. Generate random protein sequences between lengths 5 and 12 amino acids
generate_protein(5)
[1] "EWTQG"
generate_protein(6)
[1] "QLQNLK"
One approach is to do this by brute force calling our function for each length 5 to 12.
Another approach is to write a for() loop to iterate over the input
valued 5 to 12
A very useful third R specific approach is to use the sapply()
function.
seq_lengths <- 6:12
for (i in seq_lengths) {
cat(">", i, "\n")
cat( generate_protein(i) )
cat("\n")
}
> 6
QCWAGM
> 7
RNNDRYL
> 8
NNKMHMKV
> 9
HDAKPHCDM
> 10
FQKLLGRIWV
> 11
WRNNYVYVQFP
> 12
DWTIPPMAAAWM
sapply(5:12, generate_protein)
[1] "GPPDC" "KNGRSG" "WNGWRKN" "CFGSCAII" "KYWLCNWQH"
[6] "LIRYKFNCMW" "YTICPDCHHHI" "RNHGDYCTCRGL"
Key-Point: Writing functions in R is doable but not the easiest thing. Starting with a working snippet of code and then using LLM tools to improve and generalize your function code is a productive approach.