Issues with the threshold model Ignacy Misztal, University of Georgia August 19, 2005 ================================ Computations with the threshold model generate a lot of excitement. However, the results are often short of expectations or are simply unsuccessful. This is due to may factors including: - scale: real with linear models and underlying with threshold models, - problems with estimating thresholds especially with few data points per threshold, - higher chances to instability, - problems of the threshold model with too mnay levels of fixed effect (ECP) problem, - less content of information with categorical than linear data. One way to familiarize with the threshold model is to simulate data for a variety of parameters and then estimate parameters. Program gen_thr.f90 generates data for linear modules with one fixed and one sire effect, and transforms the data to categorical based on given thresholds. Inputs include variances. Example of data generation -------------------------- $ gen_thr number of thresholds (number of categories-1)? 3 number of levels of fixed effect? 10 number of sires? 50 number of observations? 1000 sire variance? 5 residual variance? 50 give values of thresholds? 1 4 7 computed heritability: 0.363636 fixed effects calculated as N(0, 10.0000 ) Wrote 1000 records to file data_thr Format: fixed_effect sire y_linear y_category Distribution of categories 1:632 2:125 3:94 4:149 Parameter file for linear observations and linear model ------------------------------------------------------- $ cat exgen DATAFILE data.thr NUMBER_OF_TRAITS 1 NUMBER_OF_EFFECTS 2 OBSERVATION(S) 3 WEIGHT(S) EFFECTS: POSITIONS_IN_DATAFILE NUMBER_OF_LEVELS TYPE_OF_EFFECT [EFFECT NESTED] 1 10 cross 2 50 cross RANDOM_RESIDUAL VALUES 10 RANDOM_GROUP 2 RANDOM_TYPE diagonal FILE (CO)VARIANCES 2 Run with the threshold-linear program thrgibbs1f90 (continuous data - linear model) ----------------------------------------------- $thrgibbs1f90 name of parameter file?exgen THRGIBBS1F90 ver. 2.2 Parameter file: exgen Data file: data.thr Number of Traits 1 Number of Effects 2 Position of Observations 3 Position of Weight (1) 0 Value of Missing Trait/Observation 0 EFFECTS # type position (2) levels [positions for nested] 1 cross-classified 1 10 2 cross-classified 2 50 Residual (co)variance Matrix 10.000 Random Effect(s) 2 Type of Random Effect: diagonal trait effect (CO)VARIANCES 1 2 2.000 REMARKS (1) Weight position 0 means no weights utilized (2) Effect positions of 0 for some effects and traits means that such effects are missing for specified traits number of categorical traits = 0 Data record length = 3 number of samples and length of burn-in 1000 100 Give n to store every n-th sample? (1 means store all samples) Missing traits found in 0 records read 1000 records in 1.000000E-02 s, 0 nonzeroes finished peds in 1.000000E-02 s, 0 nonzeroes 1 rounds 5.403 51.60 2 rounds 3.456 54.79 ............. 998 rounds 8.060 49.12 999 rounds 4.652 54.46 1000 rounds 4.443 ave G 4.821 SD G 1.623 58.11 ave R 52.69 SD R 2.555 solutions stored in file: "solutions" Samples stored in file "gibbs_samples" Parameter file for the threshold model run (Difference is 4 in OBSERVATION and OPTION) -------------------------------------------- $cat exgen_thr DATAFILE data.thr NUMBER_OF_TRAITS 1 NUMBER_OF_EFFECTS 2 OBSERVATION(S) 4 WEIGHT(S) EFFECTS: POSITIONS_IN_DATAFILE NUMBER_OF_LEVELS TYPE_OF_EFFECT [EFFECT NESTED] 1 10 cross 2 50 cross RANDOM_RESIDUAL VALUES 10 RANDOM_GROUP 2 RANDOM_TYPE diagonal FILE (CO)VARIANCES 2 OPTION cat 4 Run (categorical data, threshold model) -------------------------------------- $thrgibbs1f90 name of parameter file?exgen_thr THRGIBBS1F90 ver. 2.2 Parameter file: exgen_thr Data file: data.thr Number of Traits 1 Number of Effects 2 Position of Observations 4 Position of Weight (1) 0 Value of Missing Trait/Observation 0 EFFECTS # type position (2) levels [positions for nested] 1 cross-classified 1 10 2 cross-classified 2 50 Residual (co)variance Matrix 10.000 Random Effect(s) 2 Type of Random Effect: diagonal trait effect (CO)VARIANCES 1 2 2.000 REMARKS (1) Weight position 0 means no weights utilized (2) Effect positions of 0 for some effects and traits means that such effects are missing for specified traits 1 -th trait : # categories for each trait (linear=0): 4 number of categorical traits = 1 Data record length = 4 1 1 0.000000000000000 1 2 1.00000000000000 1 3 2.00000000000000 number of samples and length of burn-in 1000 100 Give n to store every n-th sample? (1 means store all samples) Missing traits found in 0 records read 1000 records in 1.000000E-02 s, 0 nonzeroes finished peds in 1.000000E-02 s, 0 nonzeroes 1 rounds 1 -th trait 3 -th threshold = 1.99937256601821 0.7790 6.718 ................. 998 rounds 999 rounds 1 -th trait 3 -th threshold = 2.17660902210464 0.8300 10.06 1000 rounds 1 -th trait 3 -th threshold = 2.18870410494151 0.8451 ave G 0.7030 SD G 0.2805 11.24 ave R 6.980 SD R 1.026 ------------------------------------------------- Please note that the first two thresholds in thrgibbs1f90 are fixed to 0.0 and 1.0. ------------------------------------------------- Parameter file for the threshold-linear (two trait) model run (First trait is categorical and second trait is continuous) ------------------------------------------------------------- $cat exgen_thr2 DATAFILE data.thr NUMBER_OF_TRAITS 2 NUMBER_OF_EFFECTS 2 OBSERVATION(S) 4 3 WEIGHT(S) EFFECTS: POSITIONS_IN_DATAFILE NUMBER_OF_LEVELS TYPE_OF_EFFECT [EFFECT NESTED] 1 1 10 cross 2 2 50 cross RANDOM_RESIDUAL VALUES 1 1 1 10 RANDOM_GROUP 2 RANDOM_TYPE diagonal FILE (CO)VARIANCES 0.2 0.1 0.1 2 OPTION cat 4 0 Run with the threshold-linear program thrgibbs1f90 (categorical & continuous data - threshold-linear model) ----------------------------------------------------------- $thrgibbs1f90 name of parameter file? THRGIBBS1F90 ver. 2.2 Parameter file: exgen_thr2 Data file: data.thr Number of Traits 2 Number of Effects 2 Position of Observations 4 3 Position of Weight (1) 0 Value of Missing Trait/Observation 0 EFFECTS # type position (2) levels [positions for nested] 1 cross-classified 1 1 10 2 cross-classified 2 2 50 Residual (co)variance Matrix 1.0000 1.0000 1.0000 10.000 Random Effect(s) 2 Type of Random Effect: diagonal trait effect (CO)VARIANCES 1 2 0.2000 0.1000 2 2 0.1000 2.000 REMARKS (1) Weight position 0 means no weights utilized (2) Effect positions of 0 for some effects and traits means that such effects are missing for specified traits 1 -th trait : # categories for each trait (linear=0): 4 2 -th trait : # categories for each trait (linear=0): 0 number of categorical traits = 1 Data record length = 4 1 1 0.000000000000000 1 2 1.00000000000000 1 3 2.00000000000000 number of samples and length of burn-in Give n to store every n-th sample? (1 means store all samples) Missing traits found in 0 records read 1000 records in 1.000000E-02 s, 0 nonzeroes finished peds in 1.000000E-02 s, 0 nonzeroes 1 rounds 1 -th trait 3 -th threshold = 1.99599129092874 G 0.1833 0.7613 0.7613 4.960 R 1.785 7.447 7.447 48.84 2 rounds 1 -th trait 3 -th threshold = 1.99863608216840 G 0.2517 1.124 1.124 6.124 R 2.113 9.215 9.215 51.63 G 0.4445 1.418 1.418 4.523 R 4.560 15.36 15.36 51.76 9999 rounds 1 -th trait 3 -th threshold = 1.85173123114814 G 0.2974 0.9490 0.9490 3.029 R 4.474 15.07 15.07 50.74 10000 rounds 1 -th trait 3 -th threshold = 1.85070406737048 G 0.3204 1.023 1.023 3.269 ave G 0.4930 1.628 1.628 5.401 SD G 0.1616 0.5368 0.5368 1.793 R 5.260 17.71 17.71 59.64 ave R 4.533 15.47 15.47 52.78 SD R 0.2259 0.7358 0.7358 2.466 solutions stored in file: "solutions" Samples stored in file "gibbs_samples"