MTDFS - multitrait REML estimation of variance components program by Ignacy Misztal University Of Illinois 1207 W Gregory Dr., Urbana, IL 61801, USA E-mail: ignacy@uiuc.edu, phone: (217)244-3164 last correction: 12/13/1993 Disclaimer This program was developed for my own research. It is believed to be reasonably free of serious errors, but few attempts have been made to make it resistant to errors in parameter and input files. Please use it at your own risk. Distribution The program is free for research applications. Please contact the author for information on commercial use. In any case, please acknowledge the use of this program in publications. Introduction MTDFS estimates multivariate variance components by REML using the canonical transformation. Supported models can have multiple fixed effects, including cross-classified and covariables, and one animal effect with unknown parent groups. The program is fast, due to the use of sparse matrix inversion, and now, it works with public-domain sparse matrix package FSPAK developed by Miguel Perez-Enciso and Ignacy Misztal. Computing procedures MTDFS employs several techniques for computer efficiency: - uses a sparse matrix package with a sparse matrix inversion, - Rather than calculate the trace for every round and trait, it calculates only a sample of trace elements for a given range of variance ratios,and in the canonical transformation,it obtains the traces by interpolation or extrapolation from the sample. - It accelerates the REML procedures by a SOR-type parameter. The description of that parameter is in the Appendix and in Misztal et. al. (JDS Feb 92) paper listed in the references section. Installation The program consists of several modules in FORTRAN: 1. mtdfs.for main program 2. mtdfs1.f auxillary subroutines + FSPAK 3. second.for computer dependent timing routine, described in FSPAK.MAN Before compiling, determine program limits coded in the first few lines in mtdfs.for. Memory requirements of the program are model and data dependent, and increase as the number of effects and the number of relationships increase. Parameter optim determines whether the inversion will be for minimum memory (optim=1) or maximum speed (optim=0). The latter option takes 2-3 times more memory but executes 20-300% faster, with the difference being particularily large on the Cray supercomputer. Model The model can have: - crossclassified fixed effects, - covariables (integer values only), - one animal effect with unknown parent groups. The unknown parent groups are mandatory. If no groups are desired, code one unknown parent group, which is equivalent to having no groups. Due to the sparse matrix package limitations, the model should be full rank. This can be accomplished by reducing the number of levels in second and later fixed effects by 1, equivalent to the "zero last solution" constrain. Input file The data and relationship files are in the same format as in programs JAA (see JAA manual). The ordering should be strict in a sense that all factors should be numbered from 1 consecutively, and that there should be no "holes" in the ordering. The relationship file should contain entries for all animals, and unknown parents need to have numbers above the highest animal number. If unknown parent group are undesired, create common group for one animals. Program RENUM, available in another directory, recodes data into format recognized by MTDINV. Parameter file Parameter file is similar to that in JAA, with the exception that the regressions are specified in the factor definition lines as: (number_of_field) -1 (any_value) and that one paramater: "number of traits" needs to be edited. An example of the parameter file with descriptions of its fields is provided in the Example below. Program RENUM outputs a parameter file, where only one paramater: "number of traits" needs to be edited. Output files fn denotes the name of a parameter file ordfn - ordering information trafn - trace samples grfn - G and R matrices (full stored) The above files enable the program to restart at the round where it stopped, so as not to repeat the operations of sparse matrix ordering and inversion. Traces The program requires giving a range and spacing of variance ratios for trace samples. At best, the range should bracket the variance ratios actually occuring in the computations. Lower bound is much more critical than the higher bound, which need not be larger than 50-200. A good choice could be step 1.2 from .3 to 50. Convergence Average squared differences for consecutive G and R matrices are provided. For good convergence, the difference should be at 10e-7 or lower. A 100 rounds or more might be needed to achieve such a convergence. The right value of the acceleration parameter is crucial in achieving good convergence rate. Limitations of MTDFS MTDFS does not compute more than 1 random effect, maternal effects, covariables of order higher than 1, and it does not consider inbreeding. Known problems The convergence may stall at 10e-7 or 10e-8, due to intercepting slowly moving variance ratios to the 10e5 level. Errors in REML estimates in such cases are likely to be negligible. The program may crash if the same parameter file is used for different data sets. If the program crashes, erase all files beginning with ord, tra and gr. MTDFS works with integer values only; floating point numbers are truncated. On some computers the program loops infinitely if the reoreding file is read. In such a case, erase the reordering file (ord...). References The publications below will provide some insight into the procedures involved in the program. References for the sparse matrix package FSPAK are in fspak.man in directory pub/fspak. I. Misztal, M. Perez-Enciso. 1993. Sparse matrix inversion in restricted maximum likelihood estimation of variance components by expectation - maximization. J. Dairy Sci. (accepted). I. Misztal, T.J Lawlor, T.H. Short and P.M. VanRaden. 1992. Multiple-trait estimation of variance components of yield and type traits using an animal model. J. Dairy Sci. 75:544 I. Misztal. 1990. Restricted maximum likelihood estimation of variance components in animal model using sparse matrix inversion and a supercomputer. J. Dairy Sci. 73:163 Example Warning: this example contains highly correlated traits. Therefore the results of the first round differ among computers. The results after the 6 round should be almost identical. Data file (mmdat) Animals are in field 2, followed by 6 traits. Field 1 is for a mean. 1 1 -5.31 58.60 -65.16 -3.470 .486 -.352 1 1 1 -.65 31.56 147.92 .394 1.191 .756 2 1 2 -10.91 13.08 -105.54 -3.029 -.612 -.419 1 1 2 15.47 -11.57 120.34 3.731 .843 .427 2 1 2 6.18 76.62 148.69 -.298 2.132 .594 1 1 2 32.77 -27.93 118.57 6.554 1.086 .181 2 1 3 -5.03 -29.80 -47.34 -.115 -.858 -.135 1 1 3 1.30 -29.72 7.02 1.297 -.411 .060 2 1 3 -25.60 -14.43 -108.65 -4.011 -1.513 -.176 1 1 3 -28.90 -47.86 -341.21 -5.769 -3.252 -1.322 2 1 3 -1.27 -26.29 147.47 2.354 .230 .845 1 1 3 -6.75 -85.94 -207.11 -.078 -2.576 -.884 2 1 4 6.82 -44.82 10.35 2.589 -.476 .016 1 1 4 -15.20 125.94 192.79 -4.364 2.502 1.081 2 1 4 -16.24 13.58 -60.09 -3.253 -.549 -.096 1 1 4 -11.93 86.25 58.00 -3.986 1.319 .367 2 1 4 18.13 11.71 160.25 3.687 1.489 .568 1 1 4 -8.14 -22.23 -27.12 -.571 -.733 .009 2 1 4 -14.50 16.60 -69.41 -3.233 -.492 -.177 1 1 4 3.09 32.86 201.49 1.415 1.577 .984 1 Relationship file (mmrel) In this case,this file is for animals without any relationships. 1 5 5 3 2 5 5 3 3 5 5 3 4 5 5 3 Parameter file (mmp) The comments below (# and after) are for illustration only, and should not be present in the actual parameter file. mmdat # data file mmrel # pedigree file mmpp # output file 2 3 3 # number of effects, number of traits, position of 1st trait 1 1 0 # description of first effect 2 5 -2 # description of second (animal) effect Execution d\>mtdfs name of a file with parameters, con - if console: mmp read 4 pedigree records 1 unknown parent groups filled 14 out of 50000 elements in hash table 1 filled 13 out of 35000 elements in hash table 2 1.6315625s smpa used 13 out of 90000 words ra used 5 out of 31500 words before odrv 3.9715624 incorrect size of permutation vector size 140991 found; size 5 required ordering time = 4.1015625 symbolic factorization time= 4.1015625 used 0.02% memory step,from,to? 1.2 1 20 numerical factorization time= 4.2515626 used 0.02% memory sparse inversion time= 4.2515626 used 0.04% memory trace 4.2515626 var,tr 1.000000000000000 1.599472990777339 var,tr 1.200000000000000 1.407011623547057 var,tr 1.440000000000000 1.240201915292793 var,tr 1.728000000000000 1.094620504698485 var,tr 2.073600000000000 0.966699780269887 var,tr 2.488320000000000 0.853591302436915 var,tr 2.985984000000000 0.753043780204874 var,tr 3.583180800000000 0.663293238688916 var,tr 4.299816960000000 0.582964419637863 var,tr 5.159780351999999 0.510983400136725 var,tr 6.191736422399999 0.446501863707623 var,tr 7.430083706879999 0.388833483205813 var,tr 8.916100448255998 0.337402597543311 var,tr 10.699320537907196 0.291704908012358 var,tr 12.839184645488634 0.251279390624788 var,tr 15.407021574586362 0.215690110579332 var,tr 18.488425889503632 0.184516218363179 variance ratios after transformation 4.000 4.000 4.000 tcanon 4.4715624 trhs 4.5015626 ican= 1 trait= 1 var= 2.030 sa=0.4927 se=0.9015 t= 4.53 ican= 1 trait= 2 var= 4.260 sa=0.2348 se= 1.071 t= 4.53 ican= 1 trait= 3 var= 12.56 sa=0.7961E-01 se= 1.179 t= 4.53 convg=0.32E-01 convr=0.57E-02 finish? (rounds-more + adjrel) 5 .5 variance ratios after transformation 1.297 5.918 51.01 ican= 2 trait= 1 var=0.7425 sa= 1.347 se=0.8199 t= 8.53 ican= 2 trait= 2 var= 8.053 sa=0.1242 se= 1.003 t= 8.53 ican= 2 trait= 3 var=0.1000E+06 sa=0.9897E-05 se=0.9897 t= 8.53 convg=0.20 convr=0.82E-03 g not positive definite y not positive definite variance ratios after transformation 0.6075 8.062 0.1000E+11 ican= 3 trait= 1 var=0.7483 sa= 1.336 se=0.8748 t= 8.66 ican= 3 trait= 2 var= 8.302 sa=0.1205 se= 1.017 t= 8.66 ican= 3 trait= 3 var=0.1000E+06 sa=0.1001E-04 se= 1.001 t= 8.66 convg=0.32E-01 convr=0.36E-03 variance ratios after transformation 0.6545 8.447 0.1003E+06 ican= 4 trait= 1 var=0.5881 sa= 1.701 se= 1.011 t= 8.75 ican= 4 trait= 2 var= 8.616 sa=0.1161 se= 1.002 t= 8.75 ican= 4 trait= 3 var=0.1000E+06 sa=0.1000E-04 se=1.0000 t= 8.75 convg=0.87E-02 convr=0.10E-04 variance ratios after transformation 0.5945 8.638 0.1003E+06 ican= 5 trait= 1 var=0.6264 sa= 1.596 se=0.9861 t= 8.85 ican= 5 trait= 2 var= 8.677 sa=0.1153 se= 1.001 t= 8.85 ican= 5 trait= 3 var=0.1000E+06 sa=0.1000E-04 se=1.0000 t= 8.88 convg=0.21E-02 convr=0.47E-05 variance ratios after transformation 0.6178 8.687 0.1003E+06 ican= 6 trait= 1 var=0.6020 sa= 1.661 se= 1.005 t= 8.97 ican= 6 trait= 2 var= 8.701 sa=0.1149 se= 1.000 t= 8.97 ican= 6 trait= 3 var=0.1000E+06 sa=0.1000E-04 se=1.0000 t= 8.97 convg=0.51E-03 convr=0.12E-05 finish? (rounds-more + adjrel) 0 0 estimates of variance components (appr. standard errors) trait residual genetic heritability 1 171.2 ( 55.55 ) 63.16 ( 86.23 ) 0.269( 0.276) 2 1924. ( 624.3 ) 910.2 ( 1135. ) 0.321( 0.281) 3 0.1668E+05( 5412. ) 5132. ( 7547. ) 0.235( 0.271) heritabilities on diagonal, correlations g above,r below 0.27 0.74 0.86 -0.37 0.32 0.98 0.57 0.38 0.24 Appendix Faster EM-alike REML formula Since the EM-REML formula is very slow, a faster formula is used. This formula was first described, in a somewhat different fashion, in Paul VanRaden's thesis. Three REML formulas: u'inv(A)u + sigmae * trace () 1) sigmau = ------------------------------ q u'inv(A)u 2) sigmau = ----------- q - alfa*tr u'inv(A)u - sigmau*beta*denom 3) sigmau = ------------------------------ denom*(1-beta) where denom = q - alfa*tr alfa = sigmae/sigmau beta - speed-up factor 0 <= beta < 1 If 1) is the original EM-REML, 2) is a faster formula due to Harville (?), and 3) is the formula used in this program. For beta=0, the formulas 2) and 3) are identical. To converge to 3 significant digits of accuracy using the example below, 300, 120 and 50 rounds were needed by formulas 1), 2) and 3) with beta=.7, respectively. The speed-up factor beta is somewhat critical. The iteration diverges if it is too high, and the increase in the convergence rate is low if it is too small. For the example below, the iteration was the fastest at beta=.7 but diverged at beta=.8.