Scale Comparisons

Here scale.2 uses the following formula to scale the data, constrained between 0-1

\[scale.2(x) =\frac{x - min(x)}{max(x) - min(x)}\]

# scale between 0 and 1
scale.2 <- function(x, na.rm = TRUE){
  (x - min(x, na.rm = na.rm)) / (max(x, na.rm = na.rm) - min(x, na.rm = na.rm))
}

# plot the 6 distros
layout(matrix(1:6, nrow=2,byrow = TRUE))
hist(data.full$PG)
hist(scale(data.full$PG))
hist(scale.2(data.full$PG))
hist(data.full$CT)
hist(scale(data.full$CT))
hist(scale.2(data.full$CT))

The data are distributed similarly with the different scaling methods

# fit the models
fm1 <- glm(data = data.full, Harmandia > 0 ~ scale(PG))
fm2 <- glm(data = data.full, Harmandia > 0 ~ scale.2(PG))

# extract coefficientss
summary(fm1)$coefficients

##              Estimate  Std. Error   t value    Pr(>|t|)
## (Intercept) 0.2156831 0.005206046 41.429355 0.000000000
## scale(PG)   0.0142960 0.005206464  2.745819 0.006053324

summary(fm2)$coefficients

##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 0.1892565 0.01094213 17.296132 1.668405e-65
## scale.2(PG) 0.1067943 0.03889342  2.745819 6.053324e-03

The p-values are also the same, but the coefficients and differ slightly

for fm1, we say that “an increase in PGs by 1 standard deviation (or 2.97 % dw) increases the probability of \(Harmandia > 0\) by 1.4%”

for fm2, "increasing PGs over their entire measured range (from 0% dw to 22.17% dw) increases the probability of \(Harmandia > 0\) by 10.7%

These, however, are equivalent:

# the coefficient for scale.2(PG) is equivalent to:
min.prob <- min(fitted.values(fm2))
max.prob <- max(fitted.values(fm2))
max.prob - min.prob

## [1] 0.1067943

# and as a function of the coeficient of fm1:
max.pg <- max(data.full$PG, na.rm = TRUE)
min.pg <- min(data.full$PG, na.rm = TRUE)
sd.pg <- sd(data.full$PG, na.rm = TRUE)
PG.coef <- coef(fm1)[2] %>% unname()

((max.pg - min.pg)/sd.pg) * PG.coef

## [1] 0.1067943

Scale Comparisons

Clay J. Morrow