Table of Contents
I got weird messages and nonsense estimates in AI REML. What is it?
Symptom
You would see G not positive definite
or corrected Covariance Matrix
in a REML iteration.
The (co)variance estimates would look weird (zero or huge values).
These are a sign of divergence - the estimation is nearly failing.
Once it happens, the estimates are most likely nonsense; you should not use it as estimated values.
Mechanism
AI REML is efficient and reliable if the model is simple, and the amount of data is enough.
However, AI REML is a purely numerical method, and the estimates are not guaranteed to be in the parameter space; sometimes the estimates become nonsense e.g., zero or negative variance components (this is what not positive definite
means).
The airemlf90 program tries to adjust the covariance estimates (corrected Covariance Matrix
), but it is not perfect.
Finally, the estimation would fail.
Source of issues
There are several reasons why the divergence has happened.
- Too complicated model (too many variance components) compared with the amount of data (phenotypes)
- Many traits: 5 traits with 1000 phenotypes would not work.
- Complicated model: Many random-regression coefficients would not work.
- Questionable model: Some variance components could not be estimable because of a data structure.
- Unbalanced model: The computation would be unstable if some effects are effective only for a particular trait.
- Inadequate model: It would fail if some effects are confounded or nonsense.
- Unbalanced data: Many missing observations would fail.
- Mistakes in files; data, pedigree, and parameter files
- Incorrect model description.
- Duplicated animal in the pedigree
- Wrong file format
- Just an accident
- Dependent on initial values
Remedy
There are some recommendations to avoid the divergence and to obtain estimates stably.
- Simplify the model
- Start from the simplest model.
- Split a big multiple-trait analysis into small two-trait analyses.
- Reduce the number of random regressions.
- Look into the data structure
- Remove nonsense or highly-confounded random effects
- Remove traits with too many missing observations
- Check the files
- Particularly, your parameter file
- Use the other staring values
- Try bigger initial values
- Use
OPTION EM-REML 10
which uses EM algorithm for the first 10 rounds to get much closer initial values to the estimates
- Use the other computer programs
- Gibbs samplers (gibbs2f90 or thrgibbs1f90; generally more stable than REML)
- EM algorithm (remlf90; the most stable but slow to get converged)
A good practice is to compare the estimates from different algorithms (different software). All the estimates should be the same or very close. If not, there is probably an issue in your model and data.