Here scale.2
uses the following formula to scale the data, constrained between 0-1
\[scale.2(x) =\frac{x - min(x)}{max(x) - min(x)}\]
# scale between 0 and 1
scale.2 <- function(x, na.rm = TRUE){
(x - min(x, na.rm = na.rm)) / (max(x, na.rm = na.rm) - min(x, na.rm = na.rm))
}
# plot the 6 distros
layout(matrix(1:6, nrow=2,byrow = TRUE))
hist(data.full$PG)
hist(scale(data.full$PG))
hist(scale.2(data.full$PG))
hist(data.full$CT)
hist(scale(data.full$CT))
hist(scale.2(data.full$CT))
The data are distributed similarly with the different scaling methods
# fit the models
fm1 <- glm(data = data.full, Harmandia > 0 ~ scale(PG))
fm2 <- glm(data = data.full, Harmandia > 0 ~ scale.2(PG))
# extract coefficientss
summary(fm1)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2156831 0.005206046 41.429355 0.000000000
## scale(PG) 0.0142960 0.005206464 2.745819 0.006053324
summary(fm2)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1892565 0.01094213 17.296132 1.668405e-65
## scale.2(PG) 0.1067943 0.03889342 2.745819 6.053324e-03
The p-values are also the same, but the coefficients and differ slightly
for fm1, we say that “an increase in PGs by 1 standard deviation (or 2.97 % dw) increases the probability of \(Harmandia > 0\) by 1.4%”
for fm2, "increasing PGs over their entire measured range (from 0% dw to 22.17% dw) increases the probability of \(Harmandia > 0\) by 10.7%
These, however, are equivalent:
# the coefficient for scale.2(PG) is equivalent to:
min.prob <- min(fitted.values(fm2))
max.prob <- max(fitted.values(fm2))
max.prob - min.prob
## [1] 0.1067943
# and as a function of the coeficient of fm1:
max.pg <- max(data.full$PG, na.rm = TRUE)
min.pg <- min(data.full$PG, na.rm = TRUE)
sd.pg <- sd(data.full$PG, na.rm = TRUE)
PG.coef <- coef(fm1)[2] %>% unname()
((max.pg - min.pg)/sd.pg) * PG.coef
## [1] 0.1067943