how_to_run_pure_gblup

We consider the simplest GBLUP model.

- All animals are genotyped.
- All animals have observations. Each animal has only 1 record.
- Pedigree information is not used in any way.

The mathematical model is

$$ \mathbf{y} = \mathbf{Xb} + \mathbf{u} + \mathbf{e} $$

where $\mathbf{y}$ is a vector of observations, $\mathbf{X}$ is a design matrix relating the fixed effects to the observations (typically $\mathbf{1}$), $\mathbf{b}$ is a vector of fixed effects (typically a single $\mu$), $\mathbf{u}$ is a vector of additive genetic effects, $\mathbf{e}$ is a vector of residual effects. We assume $\mathrm{var}(\mathbf{y})=\mathrm{var}(\mathbf{u})+\mathrm{var}(\mathbf{e})=\mathbf{G}\sigma_{u}^{2}+\mathbf{R}\sigma_{e}^{2}$ where $\mathbf{G}$ is the genomic relationship matrix, $\mathbf{R}$ is a diagonal matrix (typically $\mathbf{I}$), $\sigma_{u}^{2}$ is the additive genetic variance, and $\sigma_{e}^{2}$ is the residual variance. The system of mixed model equations (MME) is shown as follows.

$$ \left[ \begin{array}{cc} \mathbf{X}'\mathbf{R}^{-1}\mathbf{X} & \mathbf{X}'\mathbf{R}^{-1} \\ \mathbf{R}^{-1}\mathbf{X} & \mathbf{R}^{-1} + \mathbf{G}^{-1}/\sigma_{u}^{2} \\ \end{array} \right] \left[ \begin{array}{c} \mathbf{\hat{b}} \\ \mathbf{\hat{u}} \end{array} \right] = \left[ \begin{array}{c} \mathbf{X}'\mathbf{R}^{-1}\mathbf{y} \\ \mathbf{R}^{-1}\mathbf{y} \\ \end{array} \right] $$ Note that $\mathbf{R}^{-1}=\mathbf{I}/\sigma_{e}^{2}$ in the typical case.

The blupf90 program is designed to solve the animal model that has the inverse of the numerator relationship matrix ($\mathbf{A}^{-1}$). Also, this program is extended to perform the single-step GBLUP analysis including the inverse of a subset pedigree matrix for genotyped animals ($\mathbf{A}_{22}^{-1}$) as well as $\mathbf{G}^{-1}$.

Regardless of whether you need the pedigree relationships, the program literally creates the following MME.

$$ \left[ \begin{array}{cc} \mathbf{X}'\mathbf{R}^{-1}\mathbf{X} & \mathbf{X}'\mathbf{R}^{-1} \\ \mathbf{R}^{-1}\mathbf{X} & \mathbf{R}^{-1} + (\mathbf{A}^{-1}+\mathbf{G}^{-1}-\mathbf{A}_{22}^{-1})/\sigma_{u}^{2} \\ \end{array} \right] \left[ \begin{array}{c} \mathbf{\hat{b}} \\ \mathbf{\hat{u}} \end{array} \right] = \left[ \begin{array}{c} \mathbf{X}'\mathbf{R}^{-1}\mathbf{y} \\ \mathbf{R}^{-1}\mathbf{y} \\ \end{array} \right] $$ When all animals are genotyped, you will see $\mathbf{A}^{-1}=\mathbf{A}_{22}^{-1}$, and therefore, the two pedigree matrices will be canceled out in MME; only $\mathbf{G}^{-1}$ remains there in theory.

One restriction to run GBLUP in blupf90 is that this program always requires the pedigree file to create $\mathbf{G}^{-1}$ even though the pedigree information is not used. Also, the program literally calculates $\mathbf{A}^{-1}$ and $\mathbf{A}_{22}^{-1}$, and adds them to MME even though these matrices will be numerically canceled out. It means you have to prepare a pedigree file to run GBLUP, and the pedigree can be a dummy.

The renumf90 program prepares all the files including the dummy pedigree. We are going to use the following data file and SNP file to demonstrate the GBLUP analysis.

- data.txt
2 1 -1.775682 5 1 +0.467041 8 1 -0.888975 10 1 +0.893532 12 1 +0.596651 14 1 -1.949929 15 1 -1.672478

- snp.txt
2 1212020111001211212000102001202022201112000110020102011001012212110122100001210101112200210010022220 5 1201121111012200221100012210201010110111000211011112120112101221110122100012211000022200110010012220 8 1211121200011211210111111110012121201011000110020211021011002222000222000011110001111100210010111220 10 2111121101111210121211112210202021211211000110021211111001101211101122100111221000022200120020101120 12 2222020200110122210121001100022221102111000010011202020002002222000222000020120000020200200000200220 14 2122020101100221112111202101202022202222000000021111102000111201211022200101210000022200210010111120 15 2122020101210121121222101200112122102212000010012211111001101211101122100110220000021200210010200120

The parameter file for renumf90 is as follows. It has three OPTIONs that change the behavior of blupf90 to have the correct $\mathbf{G}^{-1}$ for GBLUP.

- renum.txt
DATAFILE data.txt TRAITS 3 FIELDS_PASSED TO OUTPUT WEIGHT(S) RESIDUAL_VARIANCE 1.00 EFFECT 2 cross alpha # mu EFFECT 1 cross alpha # animal RANDOM animal SNP_FILE snp.txt (CO)VARIANCES 1.00 OPTION AlphaBeta 1 0 OPTION tunedG 0

First of all, you do not provide any pedigree file in this parameter file. The renumf90 program creates a dummy pedigree file that has only animal ID but no parent ID. This dummy pedigree actually creates the pedigree relationships $\mathbf{A}^{-1}=\mathbf{I}$ and $\mathbf{A}_{22}^{-1}=\mathbf{I}$, which will be canceled out in the end.

The first two options define the blending of matrices. The raw $\mathbf{G}$ is updated with the other matrix to make sure it has the inverse.

$$ \mathbf{G}_{\mathrm{updated}} \leftarrow \alpha\mathbf{G}_{\mathrm{raw}} + \beta\mathbf{A}_{22} + \gamma\mathbf{I} + \delta\mathbf{J} $$

The default is $\alpha=0.95$, $\beta=0.05$, $\gamma=0$, and $\delta=0$. Note that we have $\mathbf{A}_{22}=\mathbf{I}$ in our case. The parameter file has $\alpha=0$ and $\beta=0$ but it may fail to get $\mathbf{G}^{-1}$. Perhaps you can remove the first two options.

You should keep the last option to skip the *tuning* step.
The program adjust $\mathbf{G}_{\mathrm{updated}}$ to be compatible with $\mathbf{A}_{22}$.
We call this adjustment *tuning* to obtain the final matrix.
$$
\mathbf{G}_{\mathrm{final}} \leftarrow a\mathbf{G}_{\mathrm{updated}} + b\mathbf{J}
$$
The constants $a$ and $b$ are tuning parameters. It is critical for genomic prediction when the genotyped animals are from only the last generation but you have the historical pedigree (i.e. incompatible $\mathbf{G}$ and $\mathbf{A}_{22}$).
Using the dummy pedigree, this tuning is nonsense, and therefore, you should turn it off by the option.

You don't have to use any other options to run GBLUP.

If you have more complicated situations, it might be reasonable to use ''OPTION omit_ainv''. But again, if you are satisfied with the above method, you don't need the extra option.

how_to_run_pure_gblup.txt · Last modified: 2019/07/30 18:41 by dani