MTCAFS (MTC) - multitrait REML estimation of variance components program by canonical transformation, with support for multiple random effects by Ignacy Misztal University Of Illinois 1207 W Gregory Dr., Urbana, IL 61801, USA E-mail: ignacy@uiuc.edu, phone: (217)244-3164 last correction: 03/07/94 last major change: MTC accepts matrices of not full rank Disclaimer This program was developed for my own research. It is believed to be reasonably free of serious errors, but few attempts have been made to make it resistant to errors in parameter and input files. Please use it at your own risk. Distribution The program is free for research applications. Please contact the author for information on commercial use. In any case, please acknowledge the use of this program in publications. Introduction MTCAFS estimates variance components in in single- and multiple-trait models by REML. Supported models can have multiple fixed effects, including cross-classified and covariables, and several random effects, where one is the animal effect with unknown parent groups. The program is computationally efficient; it uses public-domain sparse matrix package FSPAK developed by Miguel Perez-Enciso and Ignacy Misztal. MTCAFS uses a canonical transformation but supports models with several random effects. The last feature is made possible by simultaneous diagonalization of variance matrices for all random effects, as described in: Flury, N. N. 1988. Common principal components and related multivariate models. Wiley, New York, NY. Lin, C. Y., and S. P. Smith. 1990. Transformation of multitrait to unitrait mixed model analysis of data with multiple random effects. J. Dairy Sci. 73:2494. In general, the perfect diagonalization is not possible, but the studies with type and yield traits suggest that, in real data, the off-diagonals after the diagonalization are reduced 50 times or more. The program is especially useful for analysis of variance components in repeatability models. For problems that the program can handle, its computing requirements (memory, CPU) are up to n_trait^4 smaller than those of unrestricted MT DF REML. Worked problems included 14 type or 3 production traits on 50,000 animals with about 80,000 records. About 100 Mbytes were required. A PC with 16 Mbyte memory computed problems up to 10,000 effects. Due to use of EM algorithm, canonical transformation and special measures to handle nearly-singular covariance matrices, the algorithm seems stable and reaches convergence in 20-100 rounds, dependent on priors and data. The program has limitations. All traits must be recorded and the same model must be used for all traits. Currently, the program neither computes estimates of standard errors nor supports maternal effects. Installation The program consists of several modules in FORTRAN: 1. mtc1.for main program + some subroutines 2. mtc2.for remaining subroutines + FSPAK 3. second.for computer dependent timing routine, described in FSPAK.MAN Before compiling, determine program limits coded in the first few lines in MTCAFS.FOR. Memory requirements of the program are model and data dependent, and increase as the number of effects and the number of relationships increase. Parameter optim determines whether speed or memory optimized inversion routine is used.The speed optimization is up to 3 times faster, especially on supercomputers, but it takes 2 times more memory. To try the program, compile mtc1.f, mtc2.f and subroutine second.f specific to your computer (details in fspak.man). Then, give mmp1 or mt99s as the name of parameter file. Model The model can have: - crossclassified fixed effects, - covariables (integer values only), - one animal effect with unknown parent groups. The unknown parent groups are mandatory. If no groups are desired, code one unknown parent group, which is equivalent to having no groups. Input file The data and relationship files are in the same format as in programs JAA (see JAA manual). The ordering should be strict in a sense that all factors should be numbered from 1 consecutively, and that there should be no "holes" in the ordering. The relationship file should contain entries for all animals, and unknown parents need to have numbers above the highest animal number. If unknown parent group are undesired, create common group for one animals. Program RENUM, available in another directory, recodes data into format recognized by MTCAFS. Parameter file Parameter file is similar to that in JAA, with the exception that the regressions are specified in the factor definition lines as: (number_of_field) -1 (any_value) and that one paramater: "number of traits" needs to be edited. Program RENUM outputs a parameter file, where only one paramater: "number of traits" needs to be edited. An example of a parameter file is given below: data5k #name of data file ped5k #name of pedigree file out5k #name of output file 4 5 9 #4 effects, 9 traits start from field 5 3 100 0 #effect in field 3 has 100 levels and VR 0 4 -1 0 #regression in field 4 1 1000 2 #effect in field 1 has 1000 levels and initial VR 2 2 5000 -3 #effect in field 2 has 5000 levels and is an animal # effect with VR 3 VR means variance ratio. VR = 0 means fixed effect, VR > 0 means random effect with an identity covariance matrix, and VR < 0 means an animal effect. The value of the ratios are used as a starting point, if no previous estimates in file gr(fn) are available Output files ord(fn) - ordering information gr(fn) - G and R matrices (full stored) where (fn) denotes the name of a parameter file The above files enable the program to restart at the round where it stopped. Convergence Average squared differences for consecutive R and G matrices are provided. For good convergence, the difference should be at 10e-6 or lower. A 100 rounds or more might be needed to achieve such a convergence. The value of relative off-diagonals in G after diagonalization should be smaller than .01. Otherwise, the level of diagonalization may not be sufficient. Limitations of MTCAFS MTCAFS does not compute maternal effects, covariables of order higher than 1, and it does not consider inbreeding. Known problems The convergence may stall at 10e-7 or 10e-8, due to intercepting slowly moving variance ratios to the 10e5 level. Errors in REML estimates in such cases are likely to be negligible. The program may crash if the same parameter file is used for different data sets. If the program crashes, erase all files beginning with ord, tra and gr. MTCAFS works with integer values only; floating point numbers are truncated. The program is compiled with a limit of 5 effects. If you have more than 5 effects, recompile the program with increased limit. Other References The publications below will provide some insight into the procedures involved in the program. References for the sparse matrix package FSPAK are in fspak.man in directory pub/fspak. I. Misztal, M. Perez-Enciso. 1993. Sparse matrix inversion in restricted maximum likelihood estimation of variance components by expectation - maximization. J. Dairy Sci. (accepted). I. Misztal, T.J Lawlor, T.H. Short and P.M. VanRaden. 1992. Multiple-trait estimation of variance components of yield and type traits using an animal model. J. Dairy Sci. 75:544 I. Misztal. 1990. Restricted maximum likelihood estimation of variance components in animal model using sparse matrix inversion and a supercomputer. J. Dairy Sci. 73:163 Related programs JAA20 - single-trait animal model with support for approximate reliabilities RENUM - data validation and renumbering JSPFS - single trait EM REML with acceleration parameters MTDFS - multiple trait EM REML with caonical transformation with speed optimizations MTCAFS is the newest program, and provides additional parameter testing that the other programs don't Example Executed examples of MTCAFS and other programs are available in the DOC section on this anonymous FTP.