### Table of Contents

# BLUPF90+

## Summary

This program combines blupf90, remlf90, and airemlf90.

For a general understanding of the parameter file, check the manual available at documentation.

It has some new features such as the computation of accuracies of (G)EBV based on PEV, and the ability to save solutions with original ID.

See preGSf90 for options using genomic (SNP) information.

**Hint**: type `blupf90+ --help`

to see all the blupf90+ options or `blupf90+ --help-genomic`

to see all the genomic options blupf90+ can take.

## Options for BLUPF90+ with "OPTION method BLUP" (BLUP: default)

OPTION conv_crit 1e-12

Set convergence criteria (default 1e-12).

OPTION maxrounds 10000

Set maximum number of rounds (default 5000).

OPTION solv_method FSPAK

Selection solutions by FSPAK, SOR or PCG (default PCG).

OPTION r_factor 1.6

Set relaxation factor for SOR (default 1.4).

OPTION sol se

Store solutions and standard errors.

OPTION store_pev_pec 6

Store triangular matrices of standard errors and its covariances for correlated random effects such as direct-maternal effects and random-regression effects in “pev_pec_bf90”.

OPTION missing -999

Specify missing observations in the data file. Must be an integer value (default is 0).

OPTION residual

y-hat and residual will be included in “yhat_residual”.

OPTION blksize 3

Set a block size for preconditioner (default 1) to accelerate convergence (usually 2 to 5 times faster). For a multi-trait model, use the number of traits. This option is extremely important to ensure convergence in multi-trait models.

OPTION prior_solutions

The previous solution file will be used to start the iteration. Additional software is required to set up files correctly before using this option.

OPTION stdresidual

y-hat and student residuals will be included in “yhat_student_residual”.

OPTION check_levels 0

Check levels (default 1 = true).

OPTION use_yams

Run the program with YAMS (modified FSPAK).

OPTION SNP_file snp

Specify the SNP file name to use genotype data.

OPTION snp_p_value

Computes the elements of the inverse of the Mixed Model Equations that are needed for exact GWAS with p-values using postGSf90. This requires quite a lot of memory and time. For details see https://doi.org/10.1101/555243.

### Storing solutions with original ID

blupf90+ allows storing solutions with original ID. To do it, simply add the following option:

OPTION origID

The output is *trait effect level original_id solution*, and is stored in `solutions.original`

.

### Storing reliability based on PEV

Reliabilities are 1-PEV/($\sigma\mathbf{}_{u}^{2}$(1 + F))

where PEV is the prediction error variance of a (G)EBV. Reliabilities are the squared correlation between (True) BV and (G)EBV. Accuracies are the square root of reliabilities and thus are the correlation(TBV,(G)EBV).

OPTION store_accuracy eff

Stores reliabilities (not accuracies in spite of the name) based on PEV, where *eff* is the number of the animal effect.

By default, it uses inbreeding (F) in the denominator of the reliability formula: reliability = 1-PEV/($\sigma\mathbf{}_{u}^{2}$(1 + F)) Aguilar et al. (2020).

It uses inbreeding based on $\mathbf{A}$ or $\mathbf{H}$ from the direct inversion of $\mathbf{A}^{-1}$ or $\mathbf{H}^{-1}$, whichever is being used.

The resulting file, acc_bf90, contains: trait, effect, level, solutions, reliabilities.

OPTION store_accuracy eff orig

Stores reliabilities based on PEV, where *eff* is the number of the animal effect, and *orig* should be used to output reliabilities with original ID. Combine with OPTION origID if solutions should also be output with original ID.

The resulting file, acc_bf90, contains: trait, effect, level, original_ID, solutions, reliabilities.

OPTION acctype 1.0

Select 1.0 for dairy cattle (Reliability) or 0.5 for beef cattle (BIF accuracy) (default 1.0).

OPTION correct_accuracy_by_inbreeding filaname

*filename* is the name of the inbreeding file if other than renf90.inb

OPTION correct_accuracy_by_inbreeding_direct 0

This option turns off the inbreeding correction in the reliability formula.

## Options for BLUPF90+ with "OPTION method VCE"

OPTION method VCE

Runs airemlf90 for variance component estimation (default running blupf90)

OPTION conv_crit 1d-12

Convergence criterion (default 1d-10).

OPTION maxrounds 1000

Maximum rounds (default 5000). When the number = 0, the program calculates BLUP without iterating REML and some statistics (-2logL, AIC, SE for (co)variances, …).

OPTION EM-REML 100

Runs EM-REML (REMLF90) for first 100 rounds to get initial variances within the parameter space. After 100 rounds, it will switch to AI-REML (AIREMLF90) and continue until convergence. If the program converges within 100 rounds, it will show only the result from AI-REML

Examples:

OPTION EM-REML n

….Runs AI rounds until convergence or x after n EM rounds, showing AI output

OPTION EM-REML or OPTION EM-REML pure

….Runs EM until convergence or x, showing EM output, which is equivalent to REMLF90

OPTION EM-REML AI or ai

….Runs EM until convergence and then switches to AI rounds until AI convergence or x, showing AI output

OPTION use_yams

Runs the program with YAMS (modified FSPAK). The computing time can be dramatically improved.

OPTION set_eig 1d-12

Tolerance (or precision) (default 1d-18) for positive definite matrix and g-inverse subroutines.

Convergence may be much faster by changing this value.

OPTION sol se

Stores solutions and standard errors.

OPTION store_pev_pec 6

Stores triangular matrices of standard errors and its covariances for correlated random effects such as direct-maternal effects and random-regression effects in “pev_pec_bf90”.

OPTION residual

y-hat and residuals will be included in “yhat_residual”.

OPTION missing -999

Specifies the missing value (default 0) in integer.

OPTION constant_var 5 1 2 ...

5: effect number

1: first trait number

2: second trait number

implying the covariance between traits 1 and 2 for effect 5.

**Heterogeneous residual variances for a single trait**

OPTION hetres_pos 10 11

Specifies the position of covariables.

OPTION hetres_pol 4.0 0.1 0.1

Initial values of coefficients for heterogeneous residual variances using *ln*(a0, a1, a2, …) to make these values.

**Heterogeneous residual variances for multiple traits**

Convergence will be very slow with multiple trait heterogeneous residual variances

OPTION hetres_pos 10 10 11 11

or

OPTION hetres_pos 10 11 12 13

Specify the position of covariables (trait first).
“10 10” or “10 11” could be linear for first and second traits.

“11 11” or “12 13” could be quadratic.

OPTION hetres_pol 4.0 4.0 0.1 0.1 0.01 0.01

Initial values of coefficients for heterogeneous residual variances using *ln*(a0, a1, a2, …) to make these values (trait first).

“4.0 4.0” are intercept for first and second traits.

“0.1 0.1” could be linear and “0.01 0.01” could be quadratic.

To transform back to the original scale, use exp(a0+a1*X1+a2*X2).

OPTION SNP_file snp

Specify the SNP file name to use genotype data.

OPTION se_covar_function <label> <function>

As an alternative of SE, calculate SD for function of (co)variances by repeated sampling of parameters estimates from their asymptotic multivariate normal distribution, following ideas presented by Meyer and Houle 2013 (Meyer, Karin, and David Houle. “Sampling based approximation of confidence intervals for functions of genetic covariance matrices.” Proc. Assoc. Advmt. Anim. Breed. Genet. Vol. 20. 2013.) .

`<label>`

A name for a particular function (e.g., `P1`

for phenotypic variance of trait 1, `H2_1`

for heritability for trait 1, `rg12`

for genetic correlation between traits 1 and 2, …).

`<function>`

A formula to calculate a function of (co)variances to estimate SD. All terms of the function should be written with NO spaces.

Each term of the function corresponds to (co)variance elements and could include any random effects (G) and residual (R) (co)variances.

Notation is with reference to the effect number and the trait number (`G_eff1_eff2_trt1_trt2`

) that indicate the element of the (co)variance matrix for random effect `eff1`

and `eff2`

and `trt1`

and `trt2`

,

where `eff1`

and `eff2`

are effect numbers 1 and 2, and `trt1`

and `trt2`

are trait numbers 1 and 2.

`R_trt1_trt1`

indicates the element of the residual (co)variance matrix for traits 1 and 2.

Several functions could be added, with one OPTION line per function.

Examples:

`OPTION se_covar_function P G_2_2_1_1+G_2_3_1_1+G_3_3_1_1+G_4_4_1_1+R_1_1`

`OPTION se_covar_function H2d G_2_2_1_1/(G_2_2_1_1+G_2_3_1_1+G_3_3_1_1+G_4_4_1_1+R_1_1)`

`OPTION se_covar_function H2t (G_2_2_1_1+1.5*G_2_3_1_1+0.5*G_3_3_1_1)/(G_2_2_1_1+G_2_3_1_1+G_3_3_1_1+G_4_4_1_1+R_1_1)`

`OPTION se_covar_function rg12 G_2_2_1_2/(G_2_2_1_1*G_2_2_2_2)**0.5`

The first function calculates the SD for the total variance for a maternal model with permanent maternal effect, where 2 and 3 are the effect number for the direct and maternal additive genetic effects respectively, and 4 is the effect number for the maternal permanent random effect.

The second function calculates the heritability for the direct component.

The third function the total heritability.

The fourth function calculates the SD of the genetic correlation between traits 1 and 2 for the direct genetic effect (effect number 2)

OPTION samples_se_covar_function <n>

Set the number of samples to calculate SE for function of (co)variances.

default value 10000

OPTION out_se_covar_function

Indicate to store in file samples of (co)variances function for postprocessing (histogram, etc.)

## Tips for running BLUPF90+ with "OPTION method VCE"

When the covariance matrix is close to non-positive definite, BLUPF90+ with “OPTION method VCE” (AIREMLF90) may not converge.

There are two options you might want to try the following options.

1. Change the tolerance value (xx) in the option:

OPTION set_eig xx

to a very strict value (e.g., 1d-24) or a lenient value (e.g., 1d-04)

2. Use an option to use EM-REML inside AI-REML:

OPTION EM-REML xx

where xx is the number of iterations for EM-REML (REMLF90) you expect to get a good starting value for AI-REML. After running xx rounds with EM-REML, the AIREMLF90 program will automatically switch from EM-REML to AI-REML using the last estimate from EM-REML as a starting value for AI-REML.

If you want to get the convergence in EM-REML, use a large number more than you expect in EM-REML (e.g., 100000), the program can converge as EM-REML (REMLF90).

3. When traits and/or correlated random effects are highly correlated (e.g., > 0.95), removing one correlated trait or effect is one option. Since two traits (effects) are highly correlated (especially genetically), there is no reason to select two traits together. This option may reduce the phenotyping cost.

4. When genetic covariances among multiple traits cannot be accurately estimated for some reason, a separate analysis using a single-trait model would be a better choice. It can happens, especially when one trait has too many missing observations and/or weak relationships among animals. The single-trait model will be much better than the multiple-trait model.

Also, some options (2 and 3) described below for BLUP could be useful for VCE.

## Tips for running BLUPF90+ with "OPTION method BLUP" (BLUP: default)

When the convergence is very slow or never reached, there are several reasons.

1. The covariance matrix G and/or R is close to non-positive definite.

In this case, change the tolerance value (xx) in the option:

OPTION set_eig xx

to a lenient value (e.g., 1d-04),

which is similar to re-parameterization for G = VDV' using larger eigenvalue(s) in D.

2. Remove the old data, which the contribution to the current active population is minimal.

3. Remove non-contribution animals, which have no relationships with other animals that have phenotypes.

4. Remove animals with no phenotypes in the input data file. If those animals have relationships in the pedigree and/or the genotypes, they will be included in the renaddXX.ped (output pedigree file) after running renumf90. So, they do not need to be included in the phenotypic data file when they have no phenotypes.

5. Re-estimate the variance-covariance matrices accurately using the current data.

### Weights

The program accepts now different WEIGHTS for different traits. This works both for BLUP and for REML estimates. It has also been introduced in renumf90 and BLUP90IOD3 (new BLUP90IOD) but not in gibbsf90+.

With this modification `weight_y`

becomes a `ntrait x ntrait`

matrix. The way it works is that, in the MME, the weight applied to trait *i* is `weight_y(i,i)`

and to the pairs trait *i* and trait *j* is `weight_y(i,j)`

where in fact `weight_y(i,j)=sqrt(weight_y(i,i)*weight_i(j,j))`

.

This corresponds to a model where the residual variance of record say *k* is *W*_{k} *R* *W*_{k} where *W*_{k} contains `sqrt(1/weight(traits))`

of record *k* in the diagonal so all is coherent with multivariate normality.

The syntax is:

#### Different weights

WEIGHT(S) 7 8 9

#### same weights

with “old” syntax for a single weight across all traits, programs always work:

WEIGHT(S) 7

or using the new syntax:

WEIGHT(S) 7 7 7

#### no weights

WEIGHT(S)

#### wrong definition of weights

the program stops when there are less weights than traits, e.g.:

NUMBER_OF_TRAITS 3 ... WEIGHT(S) 7 7

with the following message:

Values for WEIGHT can be 'empty line' (no weight), a single number (same weight for all lines) or 'ntrait' numbers (different weights for each trait)

### Running GBLUP in blupf90+: Omit A-inverse

OPTION omit_ainv

This option prohibits the program from creating $\mathbf{A}^{-1}$. It is especially useful for GBLUP. For example, if you would like to perform the exact GBLUP, you can put the following options to your parameter file.

OPTION omit_ainv OPTION TauOmega 1.0 0.0 OPTION AlphaBeta 0.95 0.05

With the above options, the program doesn't create $\mathbf{A}^{-1}$ but calculates $\tau\mathbf{G}^{-1}-\omega\mathbf{A}_{22}^{-1}$. When the omega ($\omega$) is zero, only $\mathbf{G}^{-1}$ will be included in the equations. $\mathbf{G}$ is blended with $\mathbf{A}_{22}$ as $\alpha\mathbf{G}+\beta\mathbf{A}_{22}$ before the inversion ($\alpha=0.95$ and $\beta=0.05$ in this case).

#### Details

Assuming a single-trait ssGBLUP, the mixed model equations are as follows.

\( \left[ \begin{array}{ll} \mathbf{X}'\mathbf{X} & \mathbf{X}'\mathbf{Z}\\ \mathbf{Z}'\mathbf{X} & \mathbf{Z}'\mathbf{Z} + \lambda\mathbf{H}^{-1} \end{array} \right] \left[ \begin{array}{c} \mathbf{\hat{b}}\\ \mathbf{\hat{u}} \end{array} \right] = \left[ \begin{array}{c} \mathbf{X}'\mathbf{y}\\ \mathbf{Z}'\mathbf{y} \end{array} \right] \)

where $\mathbf{H}$ is a matrix combining additive genetic relationship matrices and a genomic relationship matrix.

\( \mathbf{H}^{-1} = \mathbf{A}^{-1} + \left[ \begin{array}{cc} \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \tau\mathbf{G}^{-1}-\omega\mathbf{A}_{22}^{-1} \end{array} \right] \)

If we omit $\mathbf{A}^{-1}$ and $\mathbf{A}_{22}^{-1}$, the equations are equivalent to GBLUP. GBLUP by BLUPF90 was not so easy because the program creates $\mathbf{A}^{-1}$ by default and there was no way to avoid this behavior. The new option removes $\mathbf{A}^{-1}$ from the equations so GBLUP will be easily performed.