This file shows the results of model selection to determine the terms (no interactions) that influence insect incidence. The resulting traits and covariates will be used in insect and trait GWA.
Exhaustive model selection was be done on UW-Madison’s CHTC (high-throughput computing center) which uses a job-handling program called HTCondor
.
The job submission (.sub
) file for HTCondor is here and it’s purpose is to run as many tasks (models, in our case) in the shortest amount of time. For our models, we are considering 18 terms + Genet as predictor variables. Therefore, there are \(2^{18} = 2.62144\times 10^{5}\) possible models for each of our 18 insect response variables for a total of \(2^{18}\times18=4.718592\times 10^{6}\) models. Therefore, it is very important to parallelize the process.
the model-selection-submission.sub
(HTCondor_files/
) file executes run-model-selection.sh
, which then calls model-selection-script.R
(both in scripts/
), passing the arguments from the “arguments” fields to each script. Currently, the submission file is set to run 9438 jobs, each of which will fit 500 models. Each job will create a .csv
file called <insect>_proc<job>_mods<first.model>-<last.model>.csv
. These csv files will each contain 500 rows.
In order to combine all csv files for a given insect, use:
bash scripts/stack-CHTC-csv-files.sh <insect-files>*.csv > Master-<insect>.csv
Then the top N models, based on AIC, can be gathered using:
bash scripts/top-n-aic.sh <Master-file>.csv <N> > <insect>-top-models.csv
This step uses the result of model the model selection step (REFERENCE NEEDED) to determine which covariates should be included in each insect GWA model.
The following are the best models, with variables (left) and AIC (right)
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.492209 6792.783 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.111217 2755.366 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.467891 3793.99 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.233441 3490.541 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.375288 5018.141 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.277746 2685.095 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.373748 3168.395 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.231093 3633.979 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.212721 2868.664 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.99051 5935.149 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.372977 4277.243 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.273553 6971.09 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.303329 6458.203 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.211761 3783.142 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.427157 3885.784 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.330425 6298.889 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.375955 3000.058 FALSE TRUE
## selected model:
## model.names AIC dif.frm.top dif.frm.full
## 1 model.343537 2702.079 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.507889 6859.483 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.507633 2843.873 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.521140 3852.811 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.495601 3566.418 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.507896 5052.835 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.376564 2737.279 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.393204 3206.701 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.499445 3711.235 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.475122 2934.513 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.376820 6005.405 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.376819 4347.529 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.372722 7043.209 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.370417 6551.08 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.524210 3851.341 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.507383 3973.471 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.375546 6381.367 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.507636 3059.768 FALSE TRUE
## selected model:
## model.names BIC dif.frm.top dif.frm.full
## 1 model.524274 2752.836 FALSE TRUE
Here we’ll show an average, of sorts, for all models run for a given insect.
Eventually, I would like to color-code the model matrix with the effects of each term (like a correlation matrix)