Last updated: 2025-04-28
Checks: 6 1
Knit directory:
HairCort-Evaluation-Nist2020/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish
to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20241016)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version cb56a96. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .RData
Ignored: .Rhistory
Ignored: analysis/.DS_Store
Ignored: analysis/.Rhistory
Ignored: data/.DS_Store
Ignored: data/Test3/.DS_Store
Ignored: data/Test4/.DS_Store
Untracked files:
Untracked: data/Test3/Data_Cortisol_Processed.csv
Unstaged changes:
Modified: analysis/ELISA_Analysis_FinalVals_comparisons_test3_test4.Rmd
Modified: analysis/ELISA_Analysis_RawVals_test3.Rmd
Modified: analysis/ELISA_Calc_FinalVals_test3.Rmd
Modified: analysis/ELISA_Calc_FinalVals_test4.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown
(analysis/ELISA_Analysis_RawVals_test3.Rmd
) and HTML
(docs/ELISA_Analysis_RawVals_test3.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 8108c43 | Paloma | 2025-04-25 | improved figures |
html | 8108c43 | Paloma | 2025-04-25 | improved figures |
html | 0ed450f | Paloma | 2025-04-25 | updated input (myassays.com) including dilution |
html | dd200fc | Paloma | 2025-04-23 | corrected figures |
Rmd | 16ce91c | Paloma | 2025-04-10 | recalc_evaluations |
html | 16ce91c | Paloma | 2025-04-10 | recalc_evaluations |
html | 961ce6a | Paloma | 2025-04-05 | upd calc |
Rmd | e97ccaf | Paloma | 2025-04-01 | test4-clean |
html | e97ccaf | Paloma | 2025-04-01 | test4-clean |
Rmd | 87248a1 | Paloma | 2025-04-01 | upd |
Here I use QCed results from an ELISA plate. All hair samples were obtained from the same person. I tested 3 variables:
dilution (60 uL vs 250 uL, coded as 0 and 1, respectively)
weight (11 to 37.1 mg)
spike (25 uL stock solution (1:10) added to some wells, coded as 0 and 1, meaning not-spiked and spiked)
I removed the samples that have a Coef of variation higher than 15%.
The figures below were used to decide that the optimal parameters are:
# Loading data
data <- read.csv("./data/Test3/Data_QC_filtered.csv")
std <- read.csv("./data/Test3/Standard_data_test3.csv")
#Inclusion of standard readings in plot
data1 <- data[,c("Wells","Binding.Perc", "Ave_Conc_pg.ml", "Weight_mg", "Buffer_nl", "Spike", "Sample")]
std1 <- std[,c("Wells", "Binding.Perc", "Backfit")]
#std1$Binding.Perc <- std1$Binding.Perc*100
std1$Spike <- 2
std1$Buffer_nl <- 250
std1$Weight_mg <- 20
colnames(std1)[3]<- c("Ave_Conc_pg.ml")
data_std <- plyr::join(data1, std1, type = "full")
Joining by: Wells, Binding.Perc, Ave_Conc_pg.ml, Weight_mg, Buffer_nl, Spike
# Scatter plot of Binding Percentage vs Weight
ggplot(data_std, aes(x = Ave_Conc_pg.ml,
y = Binding.Perc, color = factor((Spike)))) +
geom_jitter(size = 3, width = 0.4) +
geom_smooth(linewidth = 0.5, color = "orange") + # Add a trend line
geom_hline(yintercept = 80, linetype = "dashed",
color = "gray", linewidth = 1) +
geom_hline(yintercept = 20, linetype = "dashed",
color = "gray", linewidth = 1) + # Add horizontal line
geom_text(aes(label = Sample), size = 3, vjust = -1, hjust = -0.1) +
labs(title = "Results",
x = "Ave. Concentration (pg.ml)", y = "Binding Percentage") +
scale_y_continuous(n.breaks = 10) +
scale_color_discrete(name = "", labels = c("Not-spiked","Spiked", "Standard")) +
theme_minimal()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
# Scatter plot of Binding Percentage vs Weight
ggplot(data, aes(x = Weight_mg,
y = Binding.Perc,
shape = factor(Buffer_nl))) +
geom_point(size = 3, color = "turquoise3") +
geom_smooth(method = "lm", se = FALSE, linewidth = 0.4, color = "orange3") + # Add a trend line
geom_hline(yintercept = 50, linetype = "dashed",
color = "gray", linewidth = 1) + # Add horizontal line
labs(title = "Results separated by spiked/non-spiked",
x = "Weight (mg)", y = "Binding Percentage",
shape = "Buffer Amount (ul)") +
geom_text(aes(label = Sample), size = 3, vjust = -1, hjust = -0.1) +
scale_y_continuous(n.breaks = 10) +
scale_shape_discrete(labels = c("60 uL", "250 uL")) +
facet_grid( ~ Spike, labeller = as_labeller(c("0" = "Not spiked", "1" = "Spiked"))) +
theme_minimal()
Here we see the effect of the spike more clearly: adding a spike may not be necessary unless we have very small samples.
The following plots were made considering that having a binding of 50% is ideal. Data points that are over 80% or under 20% are not within the curve, and predictions are less accurate.
# Scatter plot of Binding Percentage vs Weight, 4 groups
ggplot(data, aes(x = Weight_mg,
y = Binding.Perc,
color = factor(Spike),
shape = factor(Buffer_nl))) +
geom_point(size = 3) +
geom_smooth(method = "lm", se = FALSE, linewidth = 0.5) + # Add a trend line
geom_hline(yintercept = 50, linetype = "dashed",
color = "gray", linewidth = 1) + # Add horizontal line
labs(title = "Binding Percentage vs Weight, 4 groups",
x = "Weight (mg)",
y = "Binding Percentage",
color = "Spike Status",
shape = "Buffer Amount (ul)") +
scale_y_continuous(n.breaks = 10) +
geom_text(aes(label = Sample), size = 3, vjust = -1, hjust = -0.1) +
theme_minimal() +
# Change the labels in the legend using labs
scale_color_discrete(labels = c("No Spike", "Spike Added")) + # Change color legend labels
scale_shape_discrete(labels = c("60 uL", "250 uL"))
Spiked samples (turquoise) have lower binding, because they have higher levels of cortisol than non spiked (pink) samples.
Dilution: effect is less clear. We see samples with both 60 uL and 250 uL binding at very high and very low levels.
Trends: within non-spiked samples with a similar weight and diluted at 60uL (pink circles), we do not obtain consistent binding percentages. However, non-spiked samples with similar weights do obtain similar bindings, and the lines are in the expected direction (higher weight, lower binding), except by a few outliers that would be removed from the analysis anyway (for having binding over 80%)
Conclusion: samples across different weights, non-spiked, and diluted in 250 uL buffer seem to provide the best results, particularly if samples weigh more than 15 mg. Using less than that may be risky, and in those cases, it may be better to use less buffer to concentrate the samples a bit more.
## Boxplot of Binding Percentage by dilution
ggplot(data, aes(x = factor(Spike),
y = Binding.Perc,
fill = factor(Spike))) +
geom_boxplot() +
geom_hline(yintercept = 50, linetype = "dashed",
color = "gray", linewidth = 1) +
labs(title = "Binding Percentage by Dilution",
x = "",
y = "Bindings Percentage") +
scale_y_continuous(n.breaks = 10) +
theme(legend.position = "none") +
facet_wrap(~ Buffer_nl,
labeller = as_labeller(c("60" = "60 uL",
"250" = "250 uL"))) +
scale_x_discrete(labels = c("Not spiked", "Spiked"))
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
Here we also see that the impact of the spike on the values is larger than the impact of using a different dilution
The coefficient of variation or CV is a standardized measure of the difference between duplicates (same sample, same weight, same dilution, same everything). Some variables may make duplicates more variable, so this is what will be tested below.
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
Conclusion diluting the sample less seems to lead to higher differences between duplicates, which is something we want to avoid. We also see less variation for the group of spiked samples, with the lowest average of the four groups. Yet, we also must note that the spiked, 250 uL group has only 6 samples, as we see on the table below.
Dilution: | No spike | Spiked |
---|---|---|
60 uL | 7 | 7 |
250 uL | 12 | 6 |
Total: 32 samples | ||
Lower CV is seen in spiked + 250 uL group, particularly for samples with low weight. Yet, non spiked, diluted in 250uL samples have very low CV if weight is over 30.
Here I calculate a “binding” deviation score, to have a better idea of the “distance” between the values obtained and what I should aim for: 50% binding. Here an example of how this score works:
Sample | Binding.Perc | Binding_deviation | |
---|---|---|---|
20 | 32 | 50.0 | 0.0 |
19 | 31 | 51.2 | 1.2 |
22 | 34 | 51.7 | 1.7 |
21 | 33 | 52.3 | 2.3 |
23 | 36 | 53.2 | 3.2 |
14 | 27 | 46.0 | 4.0 |
# Scatter plot of Binding Deviation vs Weight, 4 groups
ggplot(data, aes(x = Weight_mg,
y = Binding_deviation,
color = factor(Spike),
shape = factor(Buffer_nl))) +
geom_point(size = 3) +
geom_smooth(method = "lm", se = FALSE, linewidth = 0.5) +
labs(title = "Binding Deviation vs Weight, 4 groups",
x = "Weight (mg)",
y = "Binding Deviation",
color = "Spike Status",
shape = "Buffer Amount (ul)") +
geom_text(aes(label = Sample), size = 3, vjust = -1, hjust = -0.1) +
scale_y_continuous(n.breaks = 10) +
theme_minimal() +
# Change the labels in the legend using labs
scale_color_discrete(labels = c("No Spike", "Spike Added")) + # Change color legend labels
scale_shape_discrete(labels = c("60 uL", "250 uL"))
This plot suggests that for samples of weight lower than 20 mg, adding a spike lowers the binding deviation. This effect is lost if samples are heaver than 20 mg.
We observe that spiked samples have a higher deviation from the ideal binding. We also observe that having larger samples leads to values closer to 50%. It is interesting to see that error does not go below 15% if we look at samples with weight under 20mg. Yet, we know that a deviation of up to 30% is acceptable.
ggplot(data, aes(x = factor(Spike),
y = Binding_deviation,
fill = factor(Spike))) +
geom_boxplot() +
labs(title = "Binding deviation, by dilution and spike",
x = "",
y = "Binding deviation") +
scale_y_continuous(n.breaks = 10) +
theme(legend.position = "none") +
geom_text(aes(label = Sample), size = 3, vjust = -1, hjust = -0.1) +
facet_wrap(~ Buffer_nl,
labeller = as_labeller(c("60" = "60 uL",
"250" = "250 uL"))) +
scale_x_discrete(labels = c("Not spiked", "Spiked"))
Here we see how the lowest (best) scores are obtained by the non-spiked groups. Even better results are obtained if the dilution is 250 uL.
To explore the effects of each variable more systematically, I run multiple models and compared them using AIC Akakikes’ coefficient.
weight <- data$Weight_mg
binding <- data$Binding.Perc
x <- mean(binding)
# Q-Q Plot
qqnorm(binding, pch = 16, col = "gray40")
qqline(binding, col = "orange3", lwd = 1.5)
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
First, I looked at the distribution of the data (binding percentage). I am not sure how to describe it, but it does not look very linear. I will test different distributions at another time, but for now, I will run and compare simple models that should allow me to understand which variables have a greater impact on binding percentages.
mod <- lm(binding ~ weight)
summary(mod)
Call:
lm(formula = binding ~ weight)
Residuals:
Min 1Q Median 3Q Max
-19.39 -14.46 -5.65 12.69 30.62
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.6739 9.2460 5.589 5.56e-06 ***
weight -0.2934 0.3732 -0.786 0.438
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.77 on 28 degrees of freedom
Multiple R-squared: 0.0216, Adjusted R-squared: -0.01335
F-statistic: 0.618 on 1 and 28 DF, p-value: 0.4384
plot(weight, binding,
main = "Linear regression ~ weight",
xlab = "Weight",
ylab = "binding %",
pch = 16,
cex = 1.3,
col = "gray40")
abline(mod, col = "orange3", lwd = 1.5)
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
# filter by weight (20-29.9 mg)
d<- data[c(data$Weight_mg >= 20 & data$Weight_mg <= 29.9), ]
weight <- d$Weight_mg
binding <- d$Binding.Perc
# Q-Q Plot
qqnorm(binding, pch = 16, col = "cyan3")
qqline(binding, col = "red", lwd = 1.5)
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
## Linear regression
mod <- lm(binding ~ weight)
summary(mod)
Call:
lm(formula = binding ~ weight)
Residuals:
Min 1Q Median 3Q Max
-15.3218 -10.9401 -0.0469 6.8640 21.5852
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 96.860 30.175 3.210 0.00933 **
weight -2.339 1.257 -1.861 0.09242 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 13.23 on 10 degrees of freedom
Multiple R-squared: 0.2572, Adjusted R-squared: 0.1829
F-statistic: 3.462 on 1 and 10 DF, p-value: 0.09242
plot(weight, binding,
main = "Linear regression, weight 20 - 29.9 mg",
xlab = "weight",
ylab = "binding %",
pch = 16,
cex = 1,
col = "darkgreen")
abline(mod, col = "red", lty = 'dashed')
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
# more than 30 mg
## Linear regression, by weight group
d<- data[c(data$Weight_mg > 29.9), ]
weight <- d$Weight_mg
binding <- d$Binding.Perc
# Q-Q Plot
qqnorm(binding, pch = 16, col = "cyan3")
qqline(binding, col = "red", lwd = 1.5)
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
## Linear regression
mod <- lm(binding ~ weight)
summary(mod)
Call:
lm(formula = binding ~ weight)
Residuals:
Min 1Q Median 3Q Max
-12.830 -8.875 5.782 6.585 7.765
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 71.1761 49.2217 1.446 0.198
weight -0.7449 1.4526 -0.513 0.626
Residual standard error: 9.92 on 6 degrees of freedom
Multiple R-squared: 0.04199, Adjusted R-squared: -0.1177
F-statistic: 0.263 on 1 and 6 DF, p-value: 0.6264
plot(weight, binding,
main = "Linear regression, weight over 30 mg",
xlab = "weight",
ylab = "binding %",
pch = 16,
cex = 1,
col = "darkgreen")
abline(mod, col = "red", lty = 'dashed')
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
## SPIKED
d<- data[c(data$Spike == 1), ]
weight <- d$Weight_mg
binding <- d$Binding.Perc
# Q-Q Plot
qqnorm(binding, pch = 16, col = "cyan3", main = "qqplot, spiked samples")
qqline(binding, col = "red", lwd = 1.5)
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
## Linear regression
mod <- lm(binding ~ weight)
summary(mod)
Call:
lm(formula = binding ~ weight)
Residuals:
Min 1Q Median 3Q Max
-4.1104 -1.1097 0.2364 1.3247 3.4221
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2842 2.3680 15.745 2.19e-08 ***
weight -0.3252 0.1111 -2.927 0.0151 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.304 on 10 degrees of freedom
Multiple R-squared: 0.4613, Adjusted R-squared: 0.4075
F-statistic: 8.564 on 1 and 10 DF, p-value: 0.01513
plot(weight, binding,
main = "Linear regression, spiked",
xlab = "weight",
ylab = "binding %",
pch = 16,
cex = 1,
col = "darkgreen")
abline(mod, col = "red", lty = 'dashed')
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
## Linear regression, by spike
## NO
d <- data[c(data$Spike == 0), ]
weight <- d$Weight_mg
binding <- d$Binding.Perc
# Q-Q Plot
qqnorm(binding, pch = 16, col = "cyan3", main = "qqplot, NONspiked samples")
qqline(binding, col = "red", lwd = 1.5)
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
## Linear regression
mod <- lm(binding ~ weight)
summary(mod)
Call:
lm(formula = binding ~ weight)
Residuals:
Min 1Q Median 3Q Max
-22.718 -7.360 5.027 7.309 12.492
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 81.8251 8.0270 10.194 2.1e-08 ***
weight -1.0795 0.2991 -3.609 0.00235 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.21 on 16 degrees of freedom
Multiple R-squared: 0.4487, Adjusted R-squared: 0.4143
F-statistic: 13.02 on 1 and 16 DF, p-value: 0.002355
plot(weight, binding,
main = "Linear regression, not spiked",
xlab = "weight",
ylab = "binding %",
pch = 16,
cex = 1,
col = "darkgreen")
abline(mod, col = "red", lwd = 1.5)
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
#creating function to extract coeffs
extract_coefs <- function(model, model_name) {
# Extract summary of the model
coef_summary <- summary(model)$coefficients
# Create a data frame with term names, estimates, and standard errors
coef_df <- data.frame(
term = rownames(coef_summary),
estimate = coef_summary[, "Estimate"],
std.error = coef_summary[, "Std. Error"],
model = model_name # Add the model name as a new column
)
# Return the data frame
return(coef_df)
}
binding <- data$Binding.Perc
weight <- data$Weight_mg
spike <- data$Spike
buffer <- data$Buffer_nl
# model 1
m1 <- lm(binding ~ weight)
summary(m1)
Call:
lm(formula = binding ~ weight)
Residuals:
Min 1Q Median 3Q Max
-19.39 -14.46 -5.65 12.69 30.62
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.6739 9.2460 5.589 5.56e-06 ***
weight -0.2934 0.3732 -0.786 0.438
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.77 on 28 degrees of freedom
Multiple R-squared: 0.0216, Adjusted R-squared: -0.01335
F-statistic: 0.618 on 1 and 28 DF, p-value: 0.4384
confint(m1, level = 0.95)
2.5 % 97.5 %
(Intercept) 32.734311 70.6134504
weight -1.057979 0.4711297
# model 2
m2 <- lm(binding ~ spike)
summary(m2)
Call:
lm(formula = binding ~ spike)
Residuals:
Min 1Q Median 3Q Max
-21.6889 -3.6222 -0.2333 3.8667 23.6111
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 54.189 2.490 21.765 < 2e-16 ***
spike -23.556 3.937 -5.984 1.91e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.56 on 28 degrees of freedom
Multiple R-squared: 0.5612, Adjusted R-squared: 0.5455
F-statistic: 35.81 on 1 and 28 DF, p-value: 1.912e-06
confint(m2, level = 0.95)
2.5 % 97.5 %
(Intercept) 49.08898 59.28880
spike -31.61922 -15.49189
# model 3
m3 <- lm(binding ~ buffer)
summary(m3)
Call:
lm(formula = binding ~ buffer)
Residuals:
Min 1Q Median 3Q Max
-19.165 -14.670 -3.835 9.970 31.515
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.25406 5.77579 6.623 3.48e-07 ***
buffer 0.03884 0.03004 1.293 0.207
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.49 on 28 degrees of freedom
Multiple R-squared: 0.05636, Adjusted R-squared: 0.02266
F-statistic: 1.672 on 1 and 28 DF, p-value: 0.2065
confint(m3, level = 0.95)
2.5 % 97.5 %
(Intercept) 26.42288169 50.0852393
buffer -0.02268419 0.1003694
# model 4
m4 <- lm(binding ~ weight + spike)
summary(m4)
Call:
lm(formula = binding ~ weight + spike)
Residuals:
Min 1Q Median 3Q Max
-21.621 -4.416 1.329 6.013 14.585
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 76.6215 5.7290 13.374 2.00e-13 ***
weight -0.8763 0.2100 -4.172 0.00028 ***
spike -28.0684 3.3078 -8.485 4.25e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.388 on 27 degrees of freedom
Multiple R-squared: 0.7332, Adjusted R-squared: 0.7134
F-statistic: 37.09 on 2 and 27 DF, p-value: 1.795e-08
confint(m4, level = 0.95)
2.5 % 97.5 %
(Intercept) 64.866499 88.3764142
weight -1.307243 -0.4453015
spike -34.855427 -21.2812872
# model 5
m5 <- lm(binding ~ weight + buffer)
summary(m5)
Call:
lm(formula = binding ~ weight + buffer)
Residuals:
Min 1Q Median 3Q Max
-20.5867 -13.8305 0.2964 10.4043 28.5410
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.61126 9.67968 4.815 5e-05 ***
weight -0.40028 0.37261 -1.074 0.292
buffer 0.04520 0.03053 1.480 0.150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.45 on 27 degrees of freedom
Multiple R-squared: 0.09504, Adjusted R-squared: 0.02801
F-statistic: 1.418 on 2 and 27 DF, p-value: 0.2597
confint(m5, level = 0.95)
2.5 % 97.5 %
(Intercept) 26.75019226 66.4723321
weight -1.16481019 0.3642451
buffer -0.01745052 0.1078448
# model 6
m6 <- lm(binding ~ spike + buffer)
summary(m6)
Call:
lm(formula = binding ~ spike + buffer)
Residuals:
Min 1Q Median 3Q Max
-17.230 -5.022 -1.865 4.197 22.370
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.19577 3.94659 11.959 2.69e-12 ***
spike -23.77849 3.69347 -6.438 6.73e-07 ***
buffer 0.04224 0.01922 2.198 0.0367 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.907 on 27 degrees of freedom
Multiple R-squared: 0.6278, Adjusted R-squared: 0.6002
F-statistic: 22.77 on 2 and 27 DF, p-value: 1.607e-06
confint(m6, level = 0.95)
2.5 % 97.5 %
(Intercept) 39.098031857 55.29350737
spike -31.356858924 -16.20012221
buffer 0.002807951 0.08167268
# model 7
m7 <- lm(binding ~ weight + buffer + spike)
summary(m7)
Call:
lm(formula = binding ~ weight + buffer + spike)
Residuals:
Min 1Q Median 3Q Max
-16.2258 -2.0491 0.4047 3.4737 12.5351
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 70.98564 4.39479 16.152 4.50e-15 ***
weight -1.04123 0.15906 -6.546 6.11e-07 ***
buffer 0.05955 0.01232 4.833 5.22e-05 ***
spike -29.23218 2.45833 -11.891 5.13e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.204 on 26 degrees of freedom
Multiple R-squared: 0.8594, Adjusted R-squared: 0.8432
F-statistic: 52.99 on 3 and 26 DF, p-value: 3.261e-11
confint(m7, level = 0.95)
2.5 % 97.5 %
(Intercept) 61.95201200 80.01927215
weight -1.36817295 -0.71428618
buffer 0.03422208 0.08487659
spike -34.28534532 -24.17900665
# model 8
sp1 <- data[data$Spike == 1,]
sp0 <- data[data$Spike == 0,]
binding1 <- sp1$Binding.Perc
weight1 <- sp1$Weight_mg
spike1 <- sp1$Spike
buffer1 <- sp1$Buffer_nl
m8 <- lm(binding1 ~ buffer1 + weight1)
summary(m8)
Call:
lm(formula = binding1 ~ buffer1 + weight1)
Residuals:
Min 1Q Median 3Q Max
-2.343 -1.448 -0.262 1.241 2.886
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.265393 2.179575 16.639 4.57e-08 ***
buffer1 0.012459 0.006592 1.890 0.09133 .
weight1 -0.379489 0.103187 -3.678 0.00509 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.055 on 9 degrees of freedom
Multiple R-squared: 0.6144, Adjusted R-squared: 0.5287
F-statistic: 7.17 on 2 and 9 DF, p-value: 0.01373
# model 9
binding0 <- sp0$Binding.Perc
weight0 <- sp0$Weight_mg
spike0 <- sp0$Spike
buffer0 <- sp0$Buffer_nl
m9 <- lm(binding0 ~ buffer0 + weight0)
summary(m9)
Call:
lm(formula = binding0 ~ buffer0 + weight0)
Residuals:
Min 1Q Median 3Q Max
-14.6241 -2.2477 0.1961 3.0228 12.8202
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 72.36202 4.87984 14.829 2.28e-10 ***
buffer0 0.08634 0.01487 5.807 3.45e-05 ***
weight0 -1.26822 0.17446 -7.269 2.74e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.851 on 15 degrees of freedom
Multiple R-squared: 0.8303, Adjusted R-squared: 0.8077
F-statistic: 36.69 on 2 and 15 DF, p-value: 1.67e-06
coef_df1 <- extract_coefs(m1, "Model 1")
coef_df2 <- extract_coefs(m2, "Model 2")
coef_df3 <- extract_coefs(m3, "Model 3")
coef_df4 <- extract_coefs(m4, "Model 4")
coef_df5 <- extract_coefs(m5, "Model 5")
coef_df6 <- extract_coefs(m6, "Model 6")
coef_df7 <- extract_coefs(m7, "Model 7")
coef_df8 <- extract_coefs(m8, "Model 8")
coef_df9 <- extract_coefs(m9, "Model 9")
#Combine the data frames for plotting
coef_df <- rbind(coef_df1, coef_df2, coef_df3, coef_df4)
ggplot(coef_df, aes(x = term, y = estimate, color = model)) +
geom_point(position = position_dodge(width = 0.5)) + # Points for the estimates
geom_errorbar(aes(ymin = estimate - 1.96 * std.error, ymax = estimate + 1.96 * std.error),
position = position_dodge(width = 0.5), width = 0.2) + # Error bars for confidence intervals
theme_minimal() +
coord_flip() + # Flip the coordinates for better readability
labs(title = "Coefficient Plot for Multiple Models",
x = "Terms",
y = "Estimates") +
geom_hline(yintercept = 0, color = "gray", linetype = "dashed") + # Gray line at zero
theme(legend.position = "bottom")
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
ggplot(coef_df, aes(x = term, y = estimate, color = model)) +
geom_point(position = position_dodge(width = 4)) + # Points for the estimates
geom_errorbar(aes(ymin = estimate - 1.96 * std.error, ymax = estimate + 1.96 * std.error),
position = position_dodge(width = 0.85), width = 1) + # Error bars for confidence intervals
theme_minimal() +
coord_flip() + # Flip the coordinates for better readability
facet_wrap(~ model, ncol = 1) + # One model per line
labs(title = "Coefficient Plot for Models 1-4",
x = "Terms",
y = "Estimates") +
theme(legend.position = "none") +
geom_hline(yintercept = 0, color = "gray", linetype = "dashed") + # Gray line at zero
expand_limits(y = c(-58, 58)) +
theme(
axis.text.x = element_text(size = 12), # X-axis text size
axis.text.y = element_text(size = 12), # Y-axis text size
axis.title.x = element_text(size = 14), # X-axis title size
axis.title.y = element_text(size = 14), # Y-axis title size
plot.title = element_text(size = 16, hjust = 0.5), # Plot title size and centering
strip.text = element_text(size = 14) # Facet label text size
)
Warning: `position_dodge()` requires non-overlapping x intervals.
`position_dodge()` requires non-overlapping x intervals.
`position_dodge()` requires non-overlapping x intervals.
`position_dodge()` requires non-overlapping x intervals.
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
# Combine the data frames for plotting
coef_df <- rbind(coef_df5, coef_df6, coef_df7, coef_df8,coef_df9)
ggplot(coef_df, aes(x = term, y = estimate, color = model)) +
geom_point(position = position_dodge(width = 3)) + # Points for the estimates
geom_errorbar(aes(ymin = estimate - 1.96 * std.error, ymax = estimate + 1.96 * std.error),
position = position_dodge(width = 0.9), width = 0.85) + # Error bars for confidence intervals
theme_minimal() +
coord_flip() + # Flip the coordinates for better readability
facet_wrap(~ model, ncol = 1) + # One model per line
labs(title = "Coefficient Plot for Model 5 to 9",
x = "Terms",
y = "Estimates") +
theme(legend.position = "none") +
geom_hline(yintercept = 0, color = "gray", linetype = "dashed") + # Gray line at zero
expand_limits(y = c(-60, 60)) +
theme(
axis.text.x = element_text(size = 12), # X-axis text size
axis.text.y = element_text(size = 12), # Y-axis text size
axis.title.x = element_text(size = 14), # X-axis title size
axis.title.y = element_text(size = 14), # Y-axis title size
plot.title = element_text(size = 16, hjust = 0.5), # Plot title size and centering
strip.text = element_text(size = 14) # Facet label text size
)
Warning: `position_dodge()` requires non-overlapping x intervals.
`position_dodge()` requires non-overlapping x intervals.
`position_dodge()` requires non-overlapping x intervals.
`position_dodge()` requires non-overlapping x intervals.
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
model_names <- paste("m", 1:9, sep="")
r_values <- 1:9
all_models <- list(m1, m2, m3, m4, m5, m6, m7, m8, m9)
model_info <- c("weight", "spike", "buffer", "weight + spike", "weight + buffer", "spike + buffer", "spike + buffer + weight", "buffer + weight, spiked only","buffer + weight, NOT spiked only")
sum_models <- as.data.frame(r_values, row.names=model_names)
sum_models$res_std_error <- 1:length(model_names)
sum_models$info <- model_info
for (i in 1:length(model_names)) {
sum_models$r_values[i] <- summary(all_models[[i]])$adj.r.squared
sum_models$res_std_error[i] <- summary(all_models[[i]])$sigma
}
kable(sum_models[order(sum_models$r_values, decreasing = TRUE), ])
r_values | res_std_error | info | |
---|---|---|---|
m7 | 0.8432246 | 6.203728 | spike + buffer + weight |
m9 | 0.8076695 | 5.850626 | buffer + weight, NOT spiked only |
m4 | 0.7134063 | 8.387781 | weight + spike |
m6 | 0.6001972 | 9.906874 | spike + buffer |
m2 | 0.5454965 | 10.562881 | spike |
m8 | 0.5287000 | 2.054609 | buffer + weight, spiked only |
m5 | 0.0280065 | 15.447046 | weight + buffer |
m3 | 0.0226583 | 15.489485 | buffer |
m1 | -0.0133472 | 15.772222 | weight |
# computing bias-adjusted version of AIC (AICc) Akakaike's information criteria table
AICc_compare <-AICtab(m1, m2, m3, m4, m5, m6, m7, m8, m9,
base = TRUE,
weights = TRUE,
logLik = TRUE,
#indicate number of observations
nobs = 30)
kable(AICc_compare)
logLik | AIC | dLogLik | dAIC | df | weight | |
---|---|---|---|---|---|---|
m8 | -23.94220 | 55.8844 | 100.3385734 | 0.00000 | 4 | 1 |
m9 | -55.69788 | 119.3958 | 68.5828959 | 63.51136 | 4 | 0 |
m7 | -95.17615 | 200.3523 | 29.1046184 | 144.46791 | 5 | 0 |
m4 | -104.79103 | 217.5821 | 19.4897423 | 161.69766 | 4 | 0 |
m6 | -109.78461 | 227.5692 | 14.4961574 | 171.68483 | 4 | 0 |
m2 | -112.25364 | 230.5073 | 12.0271289 | 174.62289 | 3 | 0 |
m3 | -123.73810 | 253.4762 | 0.5426679 | 197.59181 | 3 | 0 |
m5 | -123.11028 | 254.2206 | 1.1704906 | 198.33617 | 4 | 0 |
m1 | -124.28077 | 254.5615 | 0.0000000 | 198.67715 | 3 | 0 |
# Coef table
coeftab(m1, m2, m3, m4, m5, m6, m7, m8, m9) -> coeftabs
kable(coeftabs)
(Intercept) | weight | spike | buffer | buffer1 | weight1 | buffer0 | weight0 | |
---|---|---|---|---|---|---|---|---|
m1 | 51.67388 | -0.2934245 | NA | NA | NA | NA | NA | NA |
m2 | 54.18889 | NA | -23.55556 | NA | NA | NA | NA | NA |
m3 | 38.25406 | NA | NA | 0.0388426 | NA | NA | NA | NA |
m4 | 76.62146 | -0.8762722 | -28.06836 | NA | NA | NA | NA | NA |
m5 | 46.61126 | -0.4002825 | NA | 0.0451972 | NA | NA | NA | NA |
m6 | 47.19577 | NA | -23.77849 | 0.0422403 | NA | NA | NA | NA |
m7 | 70.98564 | -1.0412296 | -29.23218 | 0.0595493 | NA | NA | NA | NA |
m8 | 36.26539 | NA | NA | NA | 0.0124595 | -0.3794893 | NA | NA |
m9 | 72.36202 | NA | NA | NA | NA | NA | 0.0863352 | -1.268219 |
par(mfrow = c(3, 3))
plot(m1, which = 1)
plot(m2, which = 1, main = "m2")
plot(m3, which = 1, main = "m3")
plot(m4, which = 1, main = "m4")
plot(m5, which = 1, main = "m5")
plot(m6, which = 1, main = "m6")
plot(m7, which = 1, main = "m7")
plot(m8, which = 1, main = "m8")
plot(m9, which = 1, main = "m9")
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
model <- list(m1, m2, m3, m4, m5, m6, m7, m8, m9)
par(mfrow = c(3, 3))
for (i in 1:length(model)) {
# Create a Q-Q plot for the residuals of the i-th model
qqnorm(residuals(model[[i]]), main = paste("Q-Q Plot, m", i, sep = ""))
qqline(residuals(model[[i]]), col = "red")
}
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
Model 7 has the highest weight, a measure of certainty in the model. However, we need to consider that the distribution of the data is not normal. Perhaps I should try using other distributions (binom, posson, )
#scale variable
d2 <- data
d2$y <- data$Binding.Perc/100
nll_beta <- function(mu, phi) {
a <- mu * phi
b <- (1 - mu) * phi
-sum(dbeta(d2$y, a, b, log = TRUE))
}
# Fit models using mle2
fit <- mle2(nll_beta, start = list(mu = 0.5, phi = 1), data = d2)
Warning in dbeta(d2$y, a, b, log = TRUE): NaNs produced
Warning in dbeta(d2$y, a, b, log = TRUE): NaNs produced
Warning in dbeta(d2$y, a, b, log = TRUE): NaNs produced
Warning in dbeta(d2$y, a, b, log = TRUE): NaNs produced
Warning in dbeta(d2$y, a, b, log = TRUE): NaNs produced
summary(fit)
Maximum likelihood estimation
Call:
mle2(minuslogl = nll_beta, start = list(mu = 0.5, phi = 1), data = d2)
Coefficients:
Estimate Std. Error z value Pr(z)
mu 0.450562 0.027216 16.555 < 2.2e-16 ***
phi 10.074474 2.483221 4.057 4.97e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-2 log L: -29.41959
m0n <- mle2(d2$y ~ dnorm(mean = a, sd = sd(d2$y)), start = list(a = mean(d2$y)), data = d2)
# percent cover as predictor, use normal distribution
mcn <- mle2(d2$y ~ dnorm(mean = a + b * d2$Weight_mg, sd = sd(d2$y)), start = list(a = mean(d2$y), b = 0, s = sd(d2$y)), data = d2)
# scatter plot
plot(d2$y ~ d2$Weight_mg,
xlab = "Buffer",
ylab = "% binding",
col = "salmon",
pch = 16,
las = 1)
#m0n
k <-coef(m0n)
curve(k[1] + 0 * x,
from = 0, to = 100,
add=T, lwd = 3,
col = "black")
#mcn
k <-coef(mcn)
curve(k[1] + k[2] * x,
from = 0, to = 100,
add=T, lwd = 2,
col = "lightgreen",
lty = "dashed")
Version | Author | Date |
---|---|---|
e97ccaf | Paloma | 2025-04-01 |
The goal is to run essays that result in a 50% binding.
# choose one model (m7: buffer + weight + spike)
coef <- coef(m7)
# Set target binding
target_binding <- 50
# FUNCTION to Solve for weight, assuming spike = 0
# 50% - intercept - (buffer1 * 1) - (spike * 0) / weight
solve_for_weight <- function(dilution_value, spike_value = 0) {
(target_binding - coef[1] - coef[3] * dilution_value - coef[4] * spike_value) / coef[2]
}
# Find the weight that gives 50% binding whenspike is 0
# dilution = 250
optimal_weight <- solve_for_weight(dilution_value = 1)
optimal_weight
(Intercept)
20.21187
# dilution = 60
optimal_weight <- solve_for_weight(dilution_value = 0)
optimal_weight
(Intercept)
20.15467
# Find the weight that gives 50% binding when spike is 1
# dilution = 250
optimal_weight <- solve_for_weight(dilution_value = 1, spike_value = 1)
optimal_weight
(Intercept)
-7.862805
# dilution = 60
optimal_weight <- solve_for_weight(dilution_value = 0, spike_value = 1)
optimal_weight
(Intercept)
-7.919996
#Work in progress after this line
sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.4.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Detroit
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] dplyr_1.1.4 bbmle_1.0.25.1 arm_1.14-4 lme4_1.1-37
[5] Matrix_1.7-3 MASS_7.3-65 coefplot_1.2.8 RColorBrewer_1.1-3
[9] ggplot2_3.5.2 knitr_1.50
loaded via a namespace (and not attached):
[1] gtable_0.3.6 xfun_0.52 bslib_0.9.0
[4] lattice_0.22-6 numDeriv_2016.8-1.1 vctrs_0.6.5
[7] tools_4.5.0 Rdpack_2.6.4 generics_0.1.3
[10] tibble_3.2.1 pkgconfig_2.0.3 lifecycle_1.0.4
[13] farver_2.1.2 compiler_4.5.0 stringr_1.5.1
[16] git2r_0.36.2 munsell_0.5.1 httpuv_1.6.16
[19] htmltools_0.5.8.1 sass_0.4.10 yaml_2.3.10
[22] later_1.4.2 pillar_1.10.2 nloptr_2.2.1
[25] jquerylib_0.1.4 whisker_0.4.1 cachem_1.1.0
[28] reformulas_0.4.0 boot_1.3-31 abind_1.4-8
[31] useful_1.2.6.1 nlme_3.1-168 tidyselect_1.2.1
[34] bdsmatrix_1.3-7 digest_0.6.37 mvtnorm_1.3-3
[37] stringi_1.8.7 reshape2_1.4.4 labeling_0.4.3
[40] splines_4.5.0 rprojroot_2.0.4 fastmap_1.2.0
[43] grid_4.5.0 colorspace_2.1-1 cli_3.6.4
[46] magrittr_2.0.3 withr_3.0.2 scales_1.3.0
[49] promises_1.3.2 rmarkdown_2.29 workflowr_1.7.1
[52] coda_0.19-4.1 evaluate_1.0.3 rbibutils_2.3
[55] mgcv_1.9-1 rlang_1.1.6 Rcpp_1.0.14
[58] glue_1.8.0 rstudioapi_0.17.1 minqa_1.2.8
[61] jsonlite_2.0.0 R6_2.6.1 plyr_1.8.9
[64] fs_1.6.6