CMIT and CMMAT - PROGRAMS FOR ANALYSIS OF MIXED LINEAR AND THRESHOLD MODELS WITH SUPPORT FOR REML-TYPE VARIANCE COMPONENT ESTIMATION AND MATERNAL GRANDSIRE MODEL Written by I. Misztal University of Georgia Athens, GA 30605 USA e-mail: ignacy@uga.cc.uga.edu, tel. (706) 542-0951 4/5/89 Introduction CMIT and CMMAT are Fortran programs for calculating estimates and predictions in threshold and linear mixed models. Models can include fixed and random factors. Maternal grandsire model and a relationship matrix due to sires are supported. CMIT iterates on data and can accomodate large number of levels. CMMAT is a matrix version that can perform absorption of one, fixed or random factor, and REML type variance component estimation for random factors. Design features To simplify testing and development, the design of the programs emphasize similarities between threshold- and linear equation systems. The threshold equation system is made similar to a weighted mixed linear model with regressions substituted for threshold effects. Pseudo-yields, pseudo-weights, and pseudo-covariates, computed by the programs, are functions of the iteration number. To minimize overhead over linear model caused by calculations of pseudo-values, the normal distribution and integral functions are approximated from tabled values. The accuracy of such approximation is dependent on table size only and does not affect the computational time. The reparametrization chosen forces first threshold to be 0, thus decreasing number of coefficients to be estimated. Both data and relationship files are rewritten to special format with large buffers in memory, so that disk I/O overhead is reduced. Additionally, when the files can fit in buffers, no disk operations are performed for reading the data. For more information about the program, see the paper: Misztal, I., D. Gianola and J. L. Foulley. 1989. Computing aspects of a nonlinear method of sire evaluation for categorical data. J. Dairy Science 72:1557. Attention: Please acknowledged the use of these programs in publications. Program limits - number of factors is limited to 20 (extendable), - number of categories is limited to 5 (extendable), - with 600 kbytes of memory (PC DOS) 25,000 levels can be handled by the program iterating on data (CMIT) and 220 levels (plus levels of an absorbed fixed factor) by the program with a matrix (CMMAT); with 8 Mbytes of memory, the maximum number of levels is extended to approximately 230,000 and 950, respectively. Data and relationship files Data and relationship files are read in Fortran free format. The positions of factors are not restricted. All factors should be numbered consecutively, starting from 1. The relationship file should have the following structure: no. of sire no. of sire of sire no. of maternal grandsire of sire If relationship file exists, it should comprise all animals, with 0 substituted for missing sires of sires or maternal grandsires. Parameters Parameters can be passed either interactively or by a parameter file. The parameter file should be structured exactly as if parameters were given interactively. List of parameters as prompted for the programs: - name of the data file, con - if console [ as required by OPEN statement - name of relationship file [ 'scratch' if not used - name of output (printout) file - number of categories, number of nested iterations [ linear model is assumed if number of categories is given as 1. Number of nested iterations (in CMIT only) sets the number of rounds before pseudo-values are recomputed again. Recommended value is from 1 to 3. - no. of factors, trait position, rel. factor, iterations? [ Number of factors does not include MGS sire or MGS group effect or absorbed effect. In threshold model trait position means the position of the first category. Relaxation factor (type of second-order Jacobi) speeds-up the iteration when MGS is used. Recommended value is .2-.4 for MGS model and 0 otherwise. "Iterations" means total number of iterations. User is prompted after the last iteration whether to continue the iteration. - for each factor: position, no. of levels,variance ratio [ Levels should be numbered consectutively. For random factors variance ratio should be given, 0 means a fixed factor. This line should be repeated as many times as there are traits (other than absorbed or MGS type). - position of factor to be absorbed and its variance ratio, 0 0 if none [ This and next parameters exists in CMMAT only. The data file should be sorted for the factor to be absorbed. - how many rounds of REML, 0 if not used [ rounds of REML between successive threshold round; in linear model user is prompted whether to continue REML. - how many rounds of Newton-Raphson before REML starts [ Suggested 3-6 - how many rounds of Newton-Raphson between subsequent REML evaluations? [ Suggested 1-3 - zero first level (1), or first level of second and later fixed effects (2), or first level of all fixed effects (3), 0 if none [ imposes constrains on threshold or fixed effects to aid the generalize invertor. Option one which zeroes the first threshold when running the threshold option should be used everytime when the thresholds appear unstable. Option 3 may be required when absorbing a fixed factor. - do not compute solution when diagonal < ? (.01-.5) [ with this parameter larger than 0 the levels of fixed factors, recognized as "extreme category levels", will be assigned value -10.0 or 10.0. Because the convergence rate is then adversely affected, put 0 unless smoothly decreasing convergence criterion is required. Convergence considerations In linear models the primary criterion for convergence is the squared average difference between consecutive solutions. In threshold model the solution of fixed effects that have observations in one extreme category only tends to + or - infinity without contributing anything to other solutions. The criterion mentioned above will eventually be affected by solutions of such observations only, even though the equation system has converged otherwise. At the end of each round programs print first and last solutions. If they do not belong to extreme observations, their convergence will generally ensure the convergence of other solutions. Attention: - The programs treat each number in the data as an integer. - The minimum number of factors for the iterative version is 3; The first factor should be fixed, and a dummy factor can be substituted for a lacking one. - If MGS model is used, the number of levels for the sire effect is the total number of sires, sires of sires and maternal grandsires which are present in sire or maternal grandsire factors, or in the relationship file. Similarily, the number of levels for the genetic group effect is the total number of groups of sires and maternal grandsires. - Threshold effects are printed as factor 0 on the printouts. First threshold, which is always 0, is not printed. Therefore the factor 0 is not printed when the number of categories is 2 - equivalent to one threshold. - Categories should always begin with one and contain no gaps. A number of problems have been reported with CMMAT(2) that were later traced to incorrect ordering of categories. - Please find out which constrain works best in your model. Constrain (1) usually works only in small problems, (2) works best in linear models and (3) is designed for larger threshold models. Very large solutions for fixed effects are usually symptoms of undetected dependencies in the model. Examples Examples were produced on on an IBM PC/AT compatible computer with Microsoft 4.0 Optimizing Compiler. Example 1 The design: 3 categories, 3 fixed factors, 1 random factor, was given by Schaeffer and Wilton (J. Dairy Sci. 59:544) and reevaluated by Gianola and Foulley (Genet. Sel. Evol. 15(2):201) >type cmd11 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 2 2 2 1 2 1 2 1 1 2 1 2 3 1 2 2 2 1 1 2 2 2 1 1 2 2 2 1 1 1 1 3 1 1 1 1 3 2 1 2 2 3 2 1 2 1 3 1 2 1 2 1 1 2 1 2 1 1 2 1 1 1 1 2 2 1 1 3 2 1 2 2 1 2 1 2 2 3 2 2 1 2 1 2 1 2 3 2 2 2 1 3 3 2 1 1 4 2 2 1 2 4 1 2 2 2 4 1 2 2 2 4 1 2 2 1 4 1 2 2 1 4 1 >cmmat name of a file with parameters, con - if console: con group effect followed by sire effect must be two last effects. mgs effects will be prompted later data name: cmd11 name of file with relationships: scratch name of output file: pout number of categories? 3 no of factors, trait pos., rel. factor,iterations?: 4 5 0 5 for each factor give position, no of levels, and variance ratio, 0 of fixed 1 2 0 2 2 0 3 2 0 4 4 19 give position of mgs group and mgs effects, or 0s 0 0 zero first level (1), or first levels of all fixed effects except one (2), 0 if none 1 position of factor to be absorbed and its variance ratio, 0 0 if none 0 0 how many rounds of reml, 0 if not used 0 round= 1 sol(1)= .0000 sol( 12)= -.0961 round= 2 sol(1)= .0000 sol( 12)= -.1016 round= 3 sol(1)= .0000 sol( 12)= -.1062 round= 4 sol(1)= .0000 sol( 12)= -.1067 round= 5 sol(1)= .0000 sol( 12)= -.1067 how many more iterations, 0-none? 0 solutions and inv. diagonals were stored in file sol Stop - Program terminated. >type pout round= 1 sol(1)= .0000 sol( 12)= -.0961 round= 2 sol(1)= .0000 sol( 12)= -.1016 round= 3 sol(1)= .0000 sol( 12)= -.1062 round= 4 sol(1)= .0000 sol( 12)= -.1067 round= 5 sol(1)= .0000 sol( 12)= -.1067 28 records 12 equations rank 9 solutions of factor 0 var. ratio: .00 levels: 2 1: .0000 2: .6360 solutions of factor 1 var. ratio: .00 levels: 2 1: -.8930 2: -.5955 solutions of factor 2 var. ratio: .00 levels: 2 1: .1268 2: .0000 solutions of factor 3 var. ratio: .00 levels: 2 1: .3907 2: .0000 solutions of factor 4 var. ratio:19.00 levels: 4 1: -.0815 2: .0655 3: .1228 4: -.1067 >type sol .00000000 .00000000 .63596170 .06578770 -.89304800 .30440601 -.59550710 .28373935 .12684760 .24871620 .00000000 .00000000 .39068560 .24669210 .00000000 .00000000 -.08152325 .04574174 .06549577 .04548111 .12277250 .04609732 -.10674500 .04728003 Example 2 The data set is the same as in example 1. Now, first factor is absorbed and variance components are estimated for factor 4. First group of parameters is now in the parameter file cmp. >type cmp cmd11 scratch pout 3 1 3 5 0 2 2 2 0 3 2 0 4 4 19 0 0 >cmmat name of a file with parameters, con - if console: cmp zero first level (1), or first levels of all fixed effects except one (2), 0 if none 1 position of factor to be absorbed and its variance ratio, 0 0 if none 1 0 how many rounds of reml, 0 if not used 2 how many rounds of newton-raphson before reml starts? 3 how many rounds of newton-raphson between subsequent reml estimations? 1 round= 1 sol(1)= .0000 sol( 10)= -.0961 round= 2 sol(1)= .0000 sol( 10)= -.1016 how many more iterations, 0-none? 10 round= 3 sol(1)= .0000 sol( 10)= -.1062 vcround= 1 sigmau 3 = .0759 sigmae= 1.0000 alfa 3 = 13.1791 vcround= 2 sigmau 3 = .1034 sigmae= 1.0000 alfa 3 = 9.6754 round= 4 sol(1)= .0000 sol( 10)= -.1458 vcround= 1 sigmau 3 = .1326 sigmae= 1.0000 alfa 3 = 7.5423 vcround= 2 sigmau 3 = .1598 sigmae= 1.0000 alfa 3 = 6.2592 round= 5 sol(1)= .0000 sol( 10)= -.2261 vcround= 1 sigmau 3 = .1849 sigmae= 1.0000 alfa 3 = 5.4079 vcround= 2 sigmau 3 = .2041 sigmae= 1.0000 alfa 3 = 4.9007 round= 6 sol(1)= .0000 sol( 10)= -.2869 vcround= 1 sigmau 3 = .2216 sigmae= 1.0000 alfa 3 = 4.5122 vcround= 2 sigmau 3 = .2334 sigmae= 1.0000 alfa 3 = 4.2838 round= 7 sol(1)= .0000 sol( 10)= -.3247 vcround= 1 sigmau 3 = .2441 sigmae= 1.0000 alfa 3 = 4.0973 vcround= 2 sigmau 3 = .2507 sigmae= 1.0000 alfa 3 = 3.9885 round= 8 sol(1)= .0000 sol( 10)= -.3461 vcround= 1 sigmau 3 = .2566 sigmae= 1.0000 alfa 3 = 3.8978 vcround= 2 sigmau 3 = .2601 sigmae= 1.0000 alfa 3 = 3.8451 round= 9 sol(1)= .0000 sol( 10)= -.3575 vcround= 1 sigmau 3 = .2631 sigmae= 1.0000 alfa 3 = 3.8001 vcround= 2 sigmau 3 = .2650 sigmae= 1.0000 alfa 3 = 3.7741 round= 10 sol(1)= .0000 sol( 10)= -.3634 vcround= 1 sigmau 3 = .2665 sigmae= 1.0000 alfa 3 = 3.7520 vcround= 2 sigmau 3 = .2674 sigmae= 1.0000 alfa 3 = 3.7392 round= 11 sol(1)= .0000 sol( 10)= -.3663 vcround= 1 sigmau 3 = .2682 sigmae= 1.0000 alfa 3 = 3.7282 vcround= 2 sigmau 3 = .2687 sigmae= 1.0000 alfa 3 = 3.7219 round= 12 sol(1)= .0000 sol( 10)= -.3678 how many more iterations, 0-none? 0 solutions and inv. diagonals were stored in file sol Stop - Program terminated. >type pout round= 1 sol(1)= .0000 sol( 10)= -.0961 round= 2 sol(1)= .0000 sol( 10)= -.1016 round= 3 sol(1)= .0000 sol( 10)= -.1062 vcround= 1 sigmau 3 = .0759 sigmae= 1.0000 alfa 3 = 13.1791 vcround= 2 sigmau 3 = .1034 sigmae= 1.0000 alfa 3 = 9.6754 round= 4 sol(1)= .0000 sol( 10)= -.1458 vcround= 1 sigmau 3 = .1326 sigmae= 1.0000 alfa 3 = 7.5423 vcround= 2 sigmau 3 = .1598 sigmae= 1.0000 alfa 3 = 6.2592 round= 5 sol(1)= .0000 sol( 10)= -.2261 vcround= 1 sigmau 3 = .1849 sigmae= 1.0000 alfa 3 = 5.4079 vcround= 2 sigmau 3 = .2041 sigmae= 1.0000 alfa 3 = 4.9007 round= 6 sol(1)= .0000 sol( 10)= -.2869 vcround= 1 sigmau 3 = .2216 sigmae= 1.0000 alfa 3 = 4.5122 vcround= 2 sigmau 3 = .2334 sigmae= 1.0000 alfa 3 = 4.2838 round= 7 sol(1)= .0000 sol( 10)= -.3247 vcround= 1 sigmau 3 = .2441 sigmae= 1.0000 alfa 3 = 4.0973 vcround= 2 sigmau 3 = .2507 sigmae= 1.0000 alfa 3 = 3.9885 round= 8 sol(1)= .0000 sol( 10)= -.3461 vcround= 1 sigmau 3 = .2566 sigmae= 1.0000 alfa 3 = 3.8978 vcround= 2 sigmau 3 = .2601 sigmae= 1.0000 alfa 3 = 3.8451 round= 9 sol(1)= .0000 sol( 10)= -.3575 vcround= 1 sigmau 3 = .2631 sigmae= 1.0000 alfa 3 = 3.8001 vcround= 2 sigmau 3 = .2650 sigmae= 1.0000 alfa 3 = 3.7741 round= 10 sol(1)= .0000 sol( 10)= -.3634 vcround= 1 sigmau 3 = .2665 sigmae= 1.0000 alfa 3 = 3.7520 vcround= 2 sigmau 3 = .2674 sigmae= 1.0000 alfa 3 = 3.7392 round= 11 sol(1)= .0000 sol( 10)= -.3663 vcround= 1 sigmau 3 = .2682 sigmae= 1.0000 alfa 3 = 3.7282 vcround= 2 sigmau 3 = .2687 sigmae= 1.0000 alfa 3 = 3.7219 round= 12 sol(1)= .0000 sol( 10)= -.3678 28 records 10 equations rank 7 absorbed factor 1 2 levels, var. ratio= .00 solutions of factor 0 var. ratio: .00 levels: 2 1: .0000 2: .6708 solutions of factor 2 var. ratio: .00 levels: 2 1: .1269 2: .0000 solutions of factor 3 var. ratio: .00 levels: 2 1: .4503 2: .0000 solutions of factor 4 var. ratio: 3.72 levels: 4 1: -.2453 2: .2267 3: .3864 4: -.3678 >type sol .00000000 .00000000 .67076550 .07199126 .12691430 .26790730 .00000000 .00000000 .45033590 .26444166 .00000000 .00000000 -.24531030 .16754830 .22671440 .16167636 .38639230 .16652679 -.36779640 .18167571 Example 3 The design: sire and maternal grandsire effects only, linear model, comes from L. R. Schaeffer (Course: Advances in estimating breeding values and population parameters. Berlin, March 18-29, 1985.) Dummy variables have been substituted for missed HYS and group effects. >type mlrs 1 1 1 1 2 108 1 1 1 1 2 116 1 1 3 1 2 110 1 1 3 1 2 101 1 1 3 1 2 96 1 1 2 1 4 105 1 1 1 1 4 104 1 1 1 1 4 102 1 1 1 1 4 100 1 1 3 1 4 90 1 1 3 1 4 94 1 1 1 1 0 106 1 1 2 1 0 105 1 1 3 1 0 98 1 1 1 1 5 114 1 1 1 1 5 108 1 1 2 1 5 99 1 1 2 1 5 97 1 1 3 1 5 104 1 1 3 1 5 102 1 1 3 1 5 96 1 1 3 1 5 94 >cmmat name of a file with parameters, con - if console: con group effect followed by sire effect must be two last effects. mgs effects will be prompted later data name: mlrs name of file with relationships: scratch name of output file: pout number of categories? 1 no of factors, trait pos., rel. factor,iterations?: 3 6 0 1 for each factor give position, no of levels, and variance ratio, 0 of fixed 1 1 0 2 1 0 3 5 15 give position of mgs group and mgs effects, or 0s 4 5 zero first level (1), or first levels of all fixed effects except one (2), 0 if none 0 position of factor to be absorbed and its variance ratio, 0 0 if none 0 how many rounds of reml, 0 if not used 0 round= 1 sol(1)= 102.2474 sol( 7)= -.0689 solutions and inv. diagonals were stored in file sol Stop - Program terminated. >type pout round= 1 sol(1)= 102.2474 sol( 7)= -.0689 22 records 7 equations rank 6 solutions of factor 1 var. ratio: .00 levels: 1 1: 102.2474 solutions of factor 2 var. ratio: .00 levels: 1 1: .0000 solutions of factor 3 var. ratio:15.00 levels: 5 1: 1.7868 2: .3858 3: -1.5037 4: -.6530 5: -.0689 >type sol 102.24740000 .07611349 .00000000 .00000000 1.78684500 .05171711 .38582930 .05519512 -1.50368100 .05079028 -.65302870 .06223322 -.06894939 .06149922 >