Classwork for BIMM143
Laura Lu (PID: A17844089)
Downloaded a CSV file from Ensemble < https://useast.ensembl.org/Homo_sapiens/Variation/Sample?db=core;r=17:39894595-39895595;v=rs8067378;vdb=variation;vf=959672880#373531_tablePanel >
Here we read this CSV file
mxl <- read.csv("373531-SampleGenotypes-Homo_sapiens_Variation_Sample_rs8067378.csv")
head(mxl)
Sample..Male.Female.Unknown. Genotype..forward.strand. Population.s. Father
1 NA19648 (F) A|A ALL, AMR, MXL -
2 NA19649 (M) G|G ALL, AMR, MXL -
3 NA19651 (F) A|A ALL, AMR, MXL -
4 NA19652 (M) G|G ALL, AMR, MXL -
5 NA19654 (F) G|G ALL, AMR, MXL -
6 NA19655 (M) A|G ALL, AMR, MXL -
Mother
1 -
2 -
3 -
4 -
5 -
6 -
table(mxl$Genotype..forward.strand.)
A|A A|G G|A G|G
22 21 12 9
table(mxl$Genotype..forward.strand.) / nrow(mxl) * 100
A|A A|G G|A G|G
34.3750 32.8125 18.7500 14.0625
Now let’s look at a different population. I picked the GBR.
gbr <- read.csv("373522-SampleGenotypes-Homo_sapiens_Variation_Sample_rs8067378.csv")
Find the proportion of G|Gv
round(table(gbr$Genotype..forward.strand.) / nrow(gbr) * 100, 2)
A|A A|G G|A G|G
25.27 18.68 26.37 29.67
This variant that is associated with childhood asthma is more frequent in GBR population than the MKL population.
Let’s now dig into this further.
One sample is obivously not enough to know what is happening in a population. You are interested in assessin genetic differences on a population scale.
How many samples do we have?
expr <- read.table("rs8067378_ENSG00000172057.6.txt")
head(expr)
sample geno exp
1 HG00367 A/G 28.96038
2 NA20768 A/G 20.24449
3 HG00361 A/A 31.32628
4 HG00135 A/A 34.11169
5 NA18870 G/G 18.25141
6 NA11993 A/A 32.89721
nrow(expr)
[1] 462
table(expr$geno)
A/A A/G G/G
108 233 121
library(ggplot2)
Let’s make a boxplot
ggplot(expr) + aes(geno, exp, fill=geno) +
geom_boxplot(notch=TRUE)

data <- read.table("rs8067378_ENSG00000172057.6.txt")
header = TRUE
head(data)
sample geno exp
1 HG00367 A/G 28.96038
2 NA20768 A/G 20.24449
3 HG00361 A/A 31.32628
4 HG00135 A/A 34.11169
5 NA18870 G/G 18.25141
6 NA11993 A/A 32.89721
summary(data)
sample geno exp
Length:462 Length:462 Min. : 6.675
Class :character Class :character 1st Qu.:20.004
Mode :character Mode :character Median :25.116
Mean :25.640
3rd Qu.:30.779
Max. :51.518
table(data$geno)
A/A A/G G/G
108 233 121
tapply(data$exp, data$geno, median)
A/A A/G G/G
31.24847 25.06486 20.07363
aggregate(exp ~ geno, data, median)
geno exp
1 A/A 31.24847
2 A/G 25.06486
3 G/G 20.07363
boxplot(exp ~ geno,
data = data,
xlab = "Genotype",
ylab = "ORMDL3 Expression",
main = "ORMDL3 Expression by rs8067378 Genotype")
