MTCAFS (MTC) - multitrait REML estimation of variance components program by 
               canonical transformation, with support for multiple random
               effects


by Ignacy Misztal
University Of Illinois
1207 W Gregory Dr., Urbana, IL 61801, USA
E-mail: ignacy@uiuc.edu, phone: (217)244-3164

last correction: 03/07/94
last major change: MTC accepts matrices of not full rank


Disclaimer
  This program was developed for my own research. It is believed to be
  reasonably free of serious errors, but few attempts have been made to make
  it resistant to errors in parameter and input files. Please use it at
  your own risk.


Distribution
  The program is free for research applications. Please contact the author for
  information on commercial use.  In any case, please acknowledge the use of
  this program in publications.



Introduction
MTCAFS estimates variance components in in single- and multiple-trait models by 
REML.  Supported models can have multiple fixed effects, including 
cross-classified and covariables, and several random effects, where one is the 
animal effect with unknown parent groups.  The program is computationally 
efficient; it uses public-domain sparse matrix package FSPAK developed by 
Miguel Perez-Enciso and Ignacy Misztal.  

MTCAFS uses a canonical transformation but supports models with several random
effects.  The last feature is made possible by simultaneous diagonalization of
variance matrices for all random effects, as described in:

  Flury, N. N. 1988. Common principal components and related multivariate
     models. Wiley, New York, NY.
  Lin, C. Y., and S. P. Smith. 1990. Transformation of multitrait to
     unitrait mixed model analysis of data with multiple random effects. J.
     Dairy Sci. 73:2494.

In general, the perfect diagonalization is not possible, but the studies with
type and yield traits suggest that, in real data, the off-diagonals after the
diagonalization are reduced 50 times or more.

The program is especially useful for analysis of variance components in
repeatability models.  For problems that the program can handle, its computing
requirements (memory, CPU) are up to n_trait^4 smaller than those of 
unrestricted MT DF REML.  Worked problems included 14 type or 3 production 
traits on 50,000 animals with about 80,000 records.  About 100 Mbytes were 
required.  A PC with 16 Mbyte memory computed problems up to 10,000 effects.  
Due to use of EM algorithm, canonical transformation and special measures to 
handle nearly-singular covariance matrices, the algorithm seems stable and 
reaches convergence in 20-100 rounds, dependent on priors and data.

The program has limitations. All traits must be recorded and the same model
must be used for all traits. Currently, the program neither computes
estimates of standard errors nor supports maternal effects.



Installation
The program consists of several modules in FORTRAN:
1. mtc1.for     main program + some subroutines
2. mtc2.for     remaining subroutines + FSPAK
3. second.for   computer dependent timing routine, described in
                FSPAK.MAN

Before compiling, determine program limits coded in the first few lines in
MTCAFS.FOR.  Memory requirements of the program are model and data dependent, 
and increase as the number of effects and the number of relationships increase.  
Parameter optim determines whether speed or memory optimized inversion routine
is used.The speed optimization is up to 3 times faster, especially on
supercomputers, but it takes 2 times more memory.

To try the program, compile mtc1.f, mtc2.f and subroutine second.f
specific to your computer (details in fspak.man). Then, give mmp1
or mt99s as the name of parameter file.




Model
The model can have:
  - crossclassified fixed effects,
  - covariables (integer values only),
  - one animal effect with unknown parent groups.

The unknown parent groups are mandatory.  If no groups are desired, code one
unknown parent group, which is equivalent to having no groups.



Input file
The data and relationship files are in the same format as in programs JAA (see
JAA manual). The ordering should be strict in a sense that all factors should
be numbered from 1 consecutively, and that there should be no "holes" in the
ordering.  The relationship file should contain entries for all animals, and
unknown parents need to have numbers above the highest animal number.  If
unknown parent group are undesired, create common group for one animals.
Program RENUM, available in another directory, recodes data into format
recognized by MTCAFS.



Parameter file
Parameter file is similar to that in JAA, with the exception that the
regressions are specified in the factor definition lines as:
(number_of_field) -1 (any_value)
and that one paramater:  "number of traits" needs to be edited.  Program RENUM 
outputs a parameter file, where only one paramater:  "number of traits" needs 
to be edited.  An example of a parameter file is given below:

data5k                  #name of data file
ped5k                   #name of pedigree file
out5k                   #name of output file
4   5 9                 #4 effects, 9 traits start from field 5
3 100 0                 #effect in field 3 has 100 levels and VR 0
4 -1 0                  #regression in field 4
1 1000 2                #effect in field 1 has 1000 levels and initial VR 2
2 5000 -3               #effect in field 2 has 5000 levels and is an animal
                        #   effect with VR 3
VR means variance ratio.  VR = 0 means fixed effect, VR > 0 means random effect 
with an identity covariance matrix, and VR < 0 means an animal effect.  The 
value of the ratios are used as a starting point, if no previous estimates in 
file gr(fn) are available 



Output files
ord(fn) - ordering information
gr(fn)  - G and R matrices (full stored)
where (fn) denotes the name of a parameter file

The above files enable the program to restart at the round where it stopped.




Convergence
Average squared differences for consecutive R and G matrices are provided.
For good convergence, the difference should be at 10e-6 or lower.  A 100
rounds or more might be needed to achieve such a convergence.  The value of 
relative off-diagonals in G after diagonalization should be smaller than 
.01. Otherwise, the level of diagonalization may not be sufficient. 




Limitations of MTCAFS
MTCAFS does not compute maternal effects, covariables of order higher than 1, 
and it does not consider inbreeding.  



Known problems
The convergence may stall at 10e-7 or 10e-8, due to intercepting slowly moving
variance ratios to the 10e5 level.  Errors in REML estimates in such cases
are likely to be negligible.

The program may crash if the same parameter file is used for different data
sets.  If the program crashes, erase all files beginning with ord,
tra and gr.

MTCAFS works with integer values only; floating point numbers are truncated.

The program is compiled with a limit of 5 effects.  If you have more than 5 
effects, recompile the program with increased limit.  




Other References
The publications below will provide some insight into the procedures
involved in the program. References for the sparse matrix package FSPAK are in
fspak.man in directory pub/fspak.

I. Misztal, M. Perez-Enciso. 1993. Sparse matrix inversion in restricted
maximum likelihood estimation of variance components by expectation -
maximization. J. Dairy Sci. (accepted).

I. Misztal, T.J Lawlor, T.H.  Short and P.M.  VanRaden.  1992.  Multiple-trait
estimation of variance components of yield and type traits using an animal
model.  J. Dairy Sci.  75:544

I. Misztal. 1990. Restricted maximum likelihood estimation of variance
components in animal model using sparse matrix inversion and a supercomputer.
J. Dairy Sci. 73:163



Related programs
JAA20 - single-trait animal model with support for approximate reliabilities
RENUM - data validation and renumbering
JSPFS - single trait EM REML with acceleration parameters
MTDFS - multiple trait EM REML with caonical transformation with speed
        optimizations
MTCAFS is the newest program, and provides additional parameter testing that 
the other programs don't


Example
Executed examples of MTCAFS and other programs are available in the DOC section
on this anonymous FTP.
