Examples of genetic evaluation and estimation of variance components I. Misztal University of Georgia 203 LP Bldg Athens, GA 30605 USA tel. (706) 542-0951 fax (706) 542-0399 Electronic mail: ignacy@uga.cc.uga.edu Acknowledgements Most of the programs described here would not have been written without the generous support by the Holstein Association of America, Brattleboro, VT. Assistance of many scientists in identifying problems in these programs is greatfully acknowledged. (9/93) Public programs In the course of research I have written many computer programs. Some of them are made them available to other researchers to: 1. promote ideas published in papers I (co)authored, 2. help discover bugs, that seem inevitable in any software. These programs may still contain bugs. The programs may feel complicated to use due to 2 reasons: a) my resources to make them more user-friendly are limited, and 2) the users of these programs are assumed to be knowledgeable in mixed model theory. The programs include: 1. JAA Computes solutions to a class of mixed models. Iterates on data and has modest computing requirements. Supports cross-classified random and fixed effects, including the animal effect. 2. JAA20 As above, with support for covariables and approximation of prediction error variances. 3. JSPFS Estimates variance components in single trait mixed models using the EM algorithm and sparse matrix inversion. Supports cross-classified random and fixed effects, including covariables. 4. MTDFS As above, for multiple traits. Uses canonical transformation and is restricted to a single random effect. 5. MTCAFS Other name MTC. As above, but with extensions to several random effects. 6. RENUM Renumbers effects consecutively. The original effects must be numeric, except for animals, which may be alphanumeric. For animals, the program removes a large part of redundant animals, verifies that progeny is born after parents, and assigns unknown parent groups. These programs and more detailed documentation are available electronically by anonymous FTP from misz.animal.uiuc.edu (numerically 128.174.78.6). Disclaimer All programs have been developed for our own research. They are believed to be reasonably free of serious errors, but few attempts have been made to make them resistant to errors in parameter and input files. Please use them at your own risk. Notation In program listings, text in bold is typed by the user, changed parameter file entries are underlined, and added comments are in italics. Example Model: observation = herd-year-month + permanent _environment + animal + residual Data sets: File typedata: 1. animal id 2. herd id 3. date of evaluation 4. trait 1 5. trait 2 6. trait 3 File pedigree: 1. animal id 2. sire id 3. dam id 4. year of birth All examples were computed on a Sun 2 workstation. Print first 10 records of the observation file $ head typedata 8009408698 123813 8306 262 329 322 8009430793 123813 8507 407 405 400 8009469056 123813 8409 388 409 445 8009652481 123813 8409 336 312 361 8009652494 123813 8306 294 380 428 8009666718 123813 8306 294 415 376 8009915566 123813 8306 400 500 500 8009915569 123813 8306 384 312 283 8009915569 123813 8409 294 285 380 8009915572 123813 8306 300 343 396 Print first 10 records of the pedigree file $ head pedigree 8009408698 8006584016 41491007 76 8009430793 8008843098 41683485 76 8009469056 8008602536 41683485 76 8009652481 8008724013 41562352 77 8009652494 8008980733 41583197 77 8009666718 8009027083 41590582 78 8009915566 8009256204 41702837 78 8009915569 8009305088 41688995 78 8009915572 8008980738 41590582 78 8009915573 8009110700 41727076 78 Renumber to consecutive numbers $ renum RENUM - renumbering tool has capacity to renumber up to approximately 9374 animals and 8333 other levels input data file? typedata output data file? type.ren printout data name? out.ren give positions of columns to be renumbered into single factor, 0 ends 2 3 0 give minimum number of observations per level for each factor, 0 if all observations retained 1 give positions of columns to pass unchanged 4 5 6 give position of animal in data file, 0 if none 1 name of the pedigree file pedigree name of the output pedigree file? ped.ren positions of animal, sire, dam and year of birth in the pedigree file, year of birth = 0 if missing 1 2 3 4 give years to separate unknown parent groups, or 0 81 84 87 give minimum, average, and maximum generation interval 1 4 20 data files read to 120 column only Example parameter file for program JAA was stored in file renum.jaa Summaries and messages were stored in file renum.msg (Processing took 30 seconds and 300k memory) Print first 10 lines of the renumbered observation file (herd- year-month, animal, traits 1-3) $ head type.ren 1 1 262 329 322 3 2 407 405 400 2 3 388 409 445 2 4 336 312 361 1 5 294 380 428 1 6 294 415 376 1 7 400 500 500 1 8 384 312 283 2 8 294 285 380 1 9 300 343 396 Print first 10 lines of the renumbered pedigree file (animal, sire, dam, 1-number_of_known_parents, year of birth, known or estimated year of birth (0 if not provided), number of known, not necessarily contributing parents, number of records (before elimination due to other effects), number of progeny (incl. noncontributing animals, original id) $ head ped.ren 1 2698 2148 2 75 2 1 3 8009408698 2 2698 2149 2 76 2 1 0 8009430793 3 2698 2149 2 75 2 1 7 8009469056 4 2150 2151 1 76 2 1 8 8009652481 5 2698 2152 2 76 2 1 5 8009652494 6 2698 2153 2 76 2 1 4 8009666718 7 2698 2698 3 78 2 1 2 8009915566 8 2698 2698 3 78 2 2 2 8009915569 9 2698 2153 2 78 2 1 0 8009915572 10 2154 2155 1 76 2 2 2 8009915573 Renumbering details for effects other than animal $ cat out.ren records 4539 factor renumbered level # original levels ------------------------------------------------------------ 1 : 2 3 1 65 123813 8306 2 87 123813 8409 3 85 123813 8507 4 86 123813 8602 5 86 123813 8609 6 80 123813 8704 7 133 123813 8711 ......................................................... 46 126 411978 9108 47 141 411978 9203 48 119 413608 8301 49 80 413608 8404 50 33 413608 8504 51 3 413608 8703 Renumbering details for all other effects $ cat renum.msg input data file: typedata output data file: type.ren input pedigree file: pedigree output pedigree file: ped.ren data records: 4539 original levels: 51 levels after cutting: 51 Renumbered factors factor levels 1 51 Animal factor summary pedigree records: 3000 animals found: 3344 animals eliminated: 647 repeated pedigrees: 853 Consecutive number assignment animals with records 1 - 2147 parents with pedigree record 2148 - 2147 parents without pedigree record 2148 - 2697 unknown parent groups 2698 - 2701 Unknown parent group allocation group no. consecutive no # years 1 2698 949 0- 81 2 2699 184 82- 84 3 2700 85 85- 87 4 2701 37 88- Unknown parent groups should contain many animals (> 100-200) to be estimated with sufficient accuracy. The last 2 or 3 groups should have been merged and perhaps the first one split! Example parameter file for program JAA was stored in file renum.jaa Renum wrote a template of a parameter file used by later programs. The template usually needs editing! $ cat renum.jaa type.ren data file ped.rend observation file jaaout messages file 2 3 0.70 10 2 effects, trait in column 3, relaxation factor .7, 10 round to begin with 1 51 0.00 Effect in column 1 with 51 levels; variance ratio = 0 (fixed effect) 2 2701 -2.00 Effect column 2 with 2701 effects, random (2.00), animal (minus) Single trait estimation of variance components The parameter file is updated to reflect extra permanent environmental effect (with 2147 levels, see renum.msg). Also, variance ratios are adjusted. Changed items below are underlined. $ cat renum.jaa type.ren ped.ren jaaout 3 3 0.70 10 1 51 0.00 2 2147 3.00 2 2701 -2.50 $ jspfs name of a file with parameters, con - if console: renum.jaa read 2697 pedigree records 4 unknown parent groups filled 22408 out of 150000 elements in hash table 1 filled 15269 out of 120000 elements in hash table 2 3.92000s smpa used 39798 out of 270000 words ra used 15151 out of 120000 words before odrv 4.67000 ordering time = 32.3400 symbolic factorization time= 32.5300 used 12.72% memory numerical factorization time= 33.2000 used 16.33% memory sparse inversion time= 34.9500 used 35.15% memory trace 35.0300 vround 1 se= 2191.0365 s( 2)= 1035.4501 alfa= 2.1160 ex= 0.0000 vround 1 se= 2191.0365 s( 3)= 1337.6576 alfa= 1.6380 ex= 0.0000 trace 38.0900 vround 2 se= 1986.7850 s( 2)= 1172.4338 alfa= 1.6946 ex= 1.3106 se=residual variance, s(x) is variance of x effect, alfa=variance ratio, ex=variance ratio if Aitken extrapolation were performed vround 2 se= 1986.7850 s( 3)= 1594.8933 alfa= 1.2457 ex= -0.9182 trace 41.1600 ............................................................... vround 10 se= 1800.9201 s( 2)= 1243.6914 alfa= 1.4480 ex= 0.6792 vround 10 se= 1800.9201 s( 3)= 2072.7703 alfa= 0.8688 ex=-0.8271 how many iterations more?, extrapolate (1)? 10 0 perform Aitken extrapolation only when ex values are close for a couple of rounds. Not a case until round 20! trace 65.6800 vround 11 se= 1800.9614 s( 2)= 1232.0966 alfa= 1.4617 ex= 2.1498 vround 11 se= 1800.9614 s( 3)= 2088.3660 alfa= 0.8624 ex=-0.8125 trace 68.7700 ............................................................... vround 17 se= 1801.8294 s( 2)= 1176.1833 alfa= 1.5319 ex= 1.7200 vround 17 se= 1801.8294 s( 3)= 2161.4039 alfa= 0.8336 ex= -0.7846 trace 87.2600 vround 18 se= 1801.9637 s( 2)= 1168.8541 alfa= 1.5416 ex= 1.7178 vround 18 se= 1801.9637 s( 3)= 2170.9618 alfa= 0.8300 ex=-0.7838 trace 90.3300 vround 19 se= 1802.0905 s( 2)= 1161.9962 alfa= 1.5509 ex= 1.7161 vround 19 se= 1802.0905 s( 3)= 2179.9141 alfa= 0.8267 ex=-0.7831 trace 93.3900 vround 20 se= 1802.2102 s( 2)= 1155.5775 alfa= 1.5596 ex= 1.7148 vround 20 se= 1802.2102 s( 3)= 2188.3014 alfa= 0.8236 ex=-0.7826 how many iterations more?, extrapolate (1)? 3 1 trace 96.4500 vround 21 se= 1805.5841 s( 2)= 1055.8521 alfa= 1.7101 ex= 0.0000 vround 21 se= 1805.5841 s( 3)= 2313.9041 alfa= 0.7803 ex= 0.0000 trace 99.5100 vround 22 se= 1804.7963 s( 2)= 1056.7066 alfa= 1.7079 ex= 1.7062 vround 22 se= 1804.7963 s( 3)= 2315.9436 alfa= 0.7793 ex=-0.7785 trace 102.5600 vround 23 se= 1804.4412 s( 2)= 1057.0954 alfa= 1.7070 ex= 1.7062 vround 23 se= 1804.4412 s( 3)= 2316.8611 alfa= 0.7788 ex=-0.7785 how many iterations more?, extrapolate (1)? 3 1 trace 105.850 vround 24 se= 1804.1497 s( 2)= 1057.4235 alfa= 1.7062 ex= 0.0000 vround 24 se= 1804.1497 s( 3)= 2317.6034 alfa= 0.7785 ex= 0.0000 trace 109.040 vround 25 se= 1804.1497 s( 2)= 1057.4277 alfa= 1.7062 ex= 1.7060 vround 25 se= 1804.1497 s( 3)= 2317.5977 alfa= 0.7785 ex=-0.7785 trace 112.170 vround 26 se= 1804.1496 s( 2)= 1057.4316 alfa= 1.7062 ex= 1.7060 vround 26 se= 1804.1496 s( 3)= 2317.5924 alfa= 0.7785 ex= -0.7785 how many iterations more?, extrapolate (1)? 3 1 trace 115.440 vround 27 se= 1804.1471 s( 2)= 1057.4974 alfa= 1.7061 ex= 0.0000 vround 27 se= 1804.1471 s( 3)= 2317.5099 alfa= 0.7785 ex= 0.0000 trace 118.560 vround 28 se= 1804.1477 s( 2)= 1057.4967 alfa= 1.7061 ex= 1.7061 vround 28 se= 1804.1477 s( 3)= 2317.5082 alfa= 0.7785 ex=-0.7785 trace 121.730 vround 29 se= 1804.1480 s( 2)= 1057.4963 alfa= 1.7061 ex= 1.7061 vround 29 se= 1804.1480 s( 3)= 2317.5075 alfa= 0.7785 ex=-0.7785 how many iterations more?, extrapolate (1)? 0 0 solutions were stored in file solrenum.jaa (total time 2 minutes; memory 3 Mbytes) Estimates: heritability = 2317/(2317+1057+1804)=.44, repeatability = (2317+1057)/(2317+1057+1804)=.65 Multivariate analysis For multivariate analysis change parameter file (changes underlined)to reflect three traits $ cat renum1.jaa type.ren ped.ren jaaout 3 3 3 1 51 0.00 2 2147 3.00 2 2701 -2.50 $ mtc name of a file with parameters, con - if console: renum1.jaa read 2697 pedigree records 4 unknown parent groups filled 22408 out of 90000 elements in hash table 1 filled 15269 out of 63000 elements in hash table 2 4.01000s smpa used 39798 out of 90000 words ra used 15151 out of 31500 words before odrv 4.59000 ordering time = 9.66000 symbolic factorization time= 9.85000 used 22.23% memory relative off-diagonals in G after diagonalization: 0.33E-10 0.33E-10 variance ratios after transformation in round 1 3.000 3.000 3.000 2.500 2.500 2.500 tcanon 9.88000 trhs 11.5500 numerical factorization time= 12.8300 used 28.89% memory sparse inversion time= 14.6300 used 62.10% memory computed variance ratios: 2.282 2.529 2.287 1.767 2.019 1.768 convR= 1.0 convG= 1.0 1.0 finish? (rounds-more?) 10 relative off-diagonals in G after diagonalization: 0.12E-01 0.74E-02 variance ratios after transformation in round 2 1.468 2.595 6.665 1.147 2.098 5.634 computed variance ratios: 1.620 2.676 7.387 1.229 2.124 6.084 convR= 0.20 convG= 0.23E-01 0.30E-01 relative off-diagonals in G after diagonalization: 0.11E-01 0.69E-02 ............................................................ variance ratios after transformation in round 9 3.316 1.265 12.42 2.312 0.7875 9.209 computed variance ratios: 3.349 1.280 12.59 2.294 0.7799 9.143 convR= 0.44E-05 convG= 0.12E-03 0.99E-04 relative off-diagonals in G after diagonalization: 0.94E-02 0.40E-02 .............................................................. variance ratios after transformation in round 11 3.372 1.292 13.01 2.272 0.7725 9.289 computed variance ratios: 3.404 1.306 13.12 2.258 0.7670 9.193 convR= 0.14E-05 convG= 0.10E-03 0.59E-04 finish? (rounds-more?) 10 relative off-diagonals in G after diagonalization: 0.92E-02 0.37E-02 variance ratios after transformation in round 12 3.402 1.305 13.23 2.257 0.7664 9.277 computed variance ratios: 3.432 1.319 13.31 2.244 0.7616 9.177 convR= 0.92E-06 convG= 0.94E-04 0.49E-04 relative off-diagonals in G after diagonalization: 0.91E-02 0.36E-02 ............................................................... variance ratios after transformation in round 29 3.731 1.449 14.69 2.126 0.7150 8.611 computed variance ratios: 3.741 1.454 14.74 2.122 0.7137 8.587 convR= 0.11E-07 convG= 0.10E-04 0.42E-05 relative off-diagonals in G after diagonalization: 0.89E-02 0.25E-02 variance ratios after transformation in round 30 3.741 1.454 14.74 2.122 0.7136 8.590 computed variance ratios: 3.750 1.459 14.79 2.119 0.7123 8.568 convR= 0.86E-08 convG= 0.92E-05 0.37E-05 relative off-diagonals in G after diagonalization: 0.89E-02 0.25E-02 ............................................................. variance ratios after transformation in round 40 3.815 1.488 15.16 2.097 0.7035 8.448 computed variance ratios: 3.820 1.491 15.20 2.095 0.7028 8.438 convR= 0.13E-08 convG= 0.25E-05 0.97E-06 relative off-diagonals in G after diagonalization: 0.89E-02 0.23E-02 variance ratios after transformation in round 41 3.820 1.491 15.20 2.095 0.7028 8.438 computed variance ratios: 3.825 1.493 15.23 2.093 0.7022 8.429 convR= 0.11E-08 convG= 0.22E-05 0.85E-06 For good accuracy, convergence criterions convG and convR should be below 1e-6, while diagonalization level should be below 1e-2. finish? (rounds-more?) 0 Heritability, repeatability and variances trait h^2 r^2 residual effect 1 2 1 0.441 0.650 1806. 1082. 2276. 2 0.349 0.523 2118. 769.4 1551. 3 0.408 0.604 1738. 862.4 1794. heritabilities on diagonal, genetic correlations above,phenotypic below 0.44 0.76 0.78 0.61 0.35 0.95 0.65 0.83 0.41 (Total 7.5 minutes and 4 Mbytes of memory) The above estimates are probably inflated due to small (and very selected) sample For estimation of breeding values, add 1 at the end to request approximation of EFR ("effective number of records"). Reliability = EFR/(EFR + animal_variance_ratio) In this case, the ratio is 2.50. $ cat renum2.jaa type.ren ped.ren jaaout 3 3 0.70 10 1 51 0.00 2 2147 3.00 2 2701 -2.50 1 Estimation of breeding values and accuracies by jaa20 Because the system of equations in not full rank, the solutions are not unique although estimable function are. Especially, solutions will change when the order of parameters changes. $ jaa20 name of a file with parameters, con - if console: renum2.jaa iteration no. convergence criterion last solution 1 1.00000 0. 2 1.25089E-02 -1.55775 3 1.44692E-02 -3.65280 4 9.06921E-03 -4.83105 5 5.25465E-03 -5.77580 6 4.71741E-03 -6.97025 7 2.80692E-03 -7.76238 8 1.67017E-03 -8.03068 9 1.64074E-03 -8.34621 how many more iterations, 0-none? 40 10 9.91685E-04 -8.98133 11 6.16794E-04 -9.65120 12 5.67883E-04 -9.96897 13 3.42211E-04 -9.97552 14 2.28457E-04 -9.96636 15 1.97561E-04 -9.94866 16 1.10729E-04 -9.81554 17 8.23527E-05 -9.70354 18 6.61531E-05 -9.71196 19 3.71847E-05 -9.73641 20 2.83531E-05 -9.74832 21 2.21738E-05 -9.80435 22 1.29083E-05 -9.85967 23 1.02231E-05 -9.85623 24 7.51070E-06 -9.82487 25 4.48016E-06 -9.79651 26 3.68470E-06 -9.76372 27 2.34305E-06 -9.72505 28 1.56222E-06 -9.68882 29 1.25211E-06 -9.66224 30 8.33604E-07 -9.64682 31 5.39722E-07 -9.63211 32 4.44054E-07 -9.61407 33 2.75445E-07 -9.60433 34 1.92064E-07 -9.60471 35 1.50408E-07 -9.60446 36 9.17401E-08 -9.60436 37 6.82633E-08 -9.61213 38 5.04465E-08 -9.62436 39 3.12934E-08 -9.63411 40 2.41969E-08 -9.64148 41 1.73349E-08 -9.64787 42 1.02343E-08 -9.65246 43 8.42514E-09 -9.65552 44 5.89217E-09 -9.65765 45 3.56182E-09 -9.65833 46 2.95657E-09 -9.65796 47 1.94332E-09 -9.65796 48 1.24605E-09 -9.65821 49 1.02896E-09 -9.65759 Convergence below 10e-06 is usually sufficient how many more iterations, 0-none? 0 In approximate Prediction Error Variance for animals it will be assumed that: first effect is fixed like HYS the last effect is animal effect before the last is permanent environment detected 4 unknown parent group(s) Calculation of approximate PEV iteration no. convergence PEV(1) PEV(2) .... 1 916.8806 1.2772 0.7434 1.6358 1.8127 1.3858 more pev rounds? 5 2 970.5330 1.5832 0.7434 1.4593 1.8252 1.8200 3 696.7351 1.6564 0.7710 1.6499 2.1809 1.8793 4 502.9659 1.7419 0.7850 1.7563 2.3345 1.9525 5 357.3745 1.8020 0.7937 1.8219 2.4501 2.0045 6 251.5577 1.8440 0.7996 1.8653 2.5338 2.0400 more pev rounds? 5 7 176.0641 1.8733 0.8037 1.8943 2.5935 2.0639 8 122.8023 1.8938 0.8066 1.9140 2.6358 2.0801 9 85.4800 1.9081 0.8086 1.9272 2.6659 2.0910 10 59.4356 1.9180 0.8100 1.9362 2.6871 2.0984 11 41.3065 1.9250 0.8110 1.9423 2.7021 2.1034 Convergence to 1-5% of original value is sufficient more pev rounds? 0 solutions and reliabilities were stored in file sol Print messages file $cat jaaout iteration no. convergence criterion last solution 1 1.00000 0. 2 1.25089E-02 -1.55775 ................................ 46 2.95657E-09 -9.65796 47 1.94332E-09 -9.65796 48 1.24605E-09 -9.65821 49 1.02896E-09 -9.65759 Calculation of approximate PEV iteration no. convergence PEV(1) PEV(2) .... 1 916.8806 1.2772 0.7434 1.6358 1.8127 1.3858 ............................................................ 11 41.3065 1.9250 0.8110 1.9423 2.7021 2.1034 4539 records 4899 equations solutions of factor 1 var. ratio 0. 1: 305.1121 2: 286.2624 3: 296.2005 4: 346.5326 5: 339.6574 6: 361.3795 7: 329.5870 8: 315.0382 9: 319.4854 10: 344.6087 11: 302.6514 12: 326.6187 13: 314.8734 14: 347.8259 15: 330.5576 16: 337.0867 17: 316.3046 18: 323.2482 19: 360.1327 20: 363.7310 21: 354.0697 22: 338.0883 23: 299.3730 24: 269.2602 25: 254.5380 26: 287.6992 27: 338.6224 28: 316.0580 29: 309.9366 30: 309.1578 31: 290.5121 32: 250.1046 33: 335.9875 34: 328.8289 35: 311.0511 36: 337.2393 37: 364.4048 38: 350.2770 39: 321.8100 40: 349.6934 41: 357.6898 42: 336.3453 43: 358.6937 44: 346.8690 45: 348.0636 46: 361.9219 47: 354.0283 48: 320.5478 49: 319.8862 50: 298.8404 51: 320.1252 solutions of factor 2 var. ratio 3.00000 solutions of factor 2 var. ratio -2.50000 Initial 10 solutions (heard-year-month_of_classification) $ head sol 305.1121 0.000 286.2624 0.000 296.2005 0.000 346.5326 0.000 339.6574 0.000 361.3795 0.000 329.5870 0.000 315.0382 0.000 319.4854 0.000 344.6087 0.000 Last 10 solutions: 6 animal effects and 4 unknown parent groups. Accuracies (EFRs), for animals only, are in the second column. $ tail sol -5.2573 0.096 -13.9487 0.096 -30.6629 0.130 -30.6629 0.130 0.4333 0.097 13.1817 0.100 -3.7906 0.000 -6.6349 0.000 -2.0957 0.000 -9.6576 0.000 The last 4 solutions - unknown parent groups - do not increase, as expected. Most probably this is due to small number of animals in the last 3 groups, as mentioned earlier.