User Tools

Site Tools


readme.thrgibbs1

THRGIBBS1F90

Summary

Gibbs sampler for threshold-linear mixed models. The original program (THRGIBBSF90) was written by DeukHwan Lee in 2001 based on GIBBS2F90 and formulas by Rob Tempelman. Rewritten by Shogo Tsuruta in 2004. See PREGSF90 with genotypes (SNP) for options.
THRGIBBS1F90 implements Gibbs sampler for mixed threshold-linear models involving multiple categorical and linear variables. Thresholds and variances can be estimated or assumed. Another version of thrgibbs1f90b for binary responses is available.
See PREGSF90 with genotypes (SNP) for options.

Parameters

The parameter file is the same as for BLUPF90 except for options.

Options

OPTION cat 0 0 2 5

“0” indicate that the first and second traits are linear. “2” and “5” indicate that the third and fourth traits are categorical with 2 (binary) and 5 categories.

OPTION fixed_var all

Store all samples for solutions in “all_solutions” and posterior means and SD for all effects in “final_solutions”, assuming that (co)variances in the parameter file are known.

OPTION fixed_var all 1 2 3

Store all samples for solutions in “all_solutions” and posterior means and SD for 1, 2, and 3 effects in “final_solutions”, assuming that (co)variances in the parameter file are known.

OPTION fixed_var mean

Only posterior means and SD for solutions are calculated for all effects in “final_solutions”, assuming that (co)variances in the parameter file are known.

OPTION fixed_var mean 1 2 3

Only posterior means and SD for solutions are calculated for effects 1, 2, and 3 in “final_solutions”, assuming that (co)variances in the parameter file are known.

OPTION solution all

Caution: this option will create a huge output solution file when you run many rounds and/or use a large model. Store all samples for solutions in “all_solutions” and posterior means and SD for all effects. The file “all_solutions” could be very large. This option uses (co)variances from each round to get solutions.

OPTION solution all 1 2 3

Caution: this option will create a huge output solution file when you run many rounds and/or use a large model. Store all samples for solutions in “all_solutions” and posterior means and SD for 1, 2, and 3 effects. The file “all_solutions” could be very large. This option uses (co)variances from each round to get solutions.

OPTION solution mean

Only posterior means and SD for solutions are calculated for all effects in “final_solutions” while sampling (co)variances. This option is not recommended to use unless the burn-in is known. This option uses (co)variances from each round to get solutions.

OPTION solution mean 1 2 3

Only posterior means and SD for solutions are calculated for effects 1, 2, and 3 in “final_solutions” while sampling (co)variances. This option is not recommended to use unless the burn-in is known. This option uses (co)variances from each round to get solutions.

OPTION save_halfway_samples 5000

The program saves every “5000” samples to restart or recover the job right after the last saved samples. It is useful when the program accidentally stopped.

OPTION cont 10000

“10000” is the number of samples run previously. The user can restart the program from the last run. This option requires “last_solutions”, “binary_final_solutions”, “gibbs_samples”, and “fort.99” files. When using “OPTION cont”, all output files will be replaced by new ones. Before running with this option, all files should be backed up.

OPTION prior 5 2 -1 5 

The (co)variance priors are specified in the parameter file.
Degree of belief for all random effects should be specified using the following structure:
OPTION prior eff1 db1 eff2 db2 … effn dbn -1 dbres
effx correspond to the effect number and dbx to the degree of belief for this random effect, -1 corresponds to the degree of belief of the residual variance.
In this example 2 is the degree of belief for the 5th effect, and 5 is the degree of belief for the residual.

OPTION seed 123 -432

Two seeds for a random number generator can be specified.

OPTION thresholds 0.0 1.0 2.0

Set the fixed thresholds. No need to set 0 for binary traits.

OPTION residual 1

The residual variance can be set to 1 but not necessary for categorical traits more than 2 categories. For binary traits, the residual variance is automatically set to 1, so no need to use this option.

OPTION pos_def x.x

Specify checking pos-def for fixed effects where x.x is a tolerance (default=1d-08).

OPTION censored 1 0

Negative values for the categorical trait in the data set indicate censored records. “1 0” determines that the first categorical trait is censored and the second uncensored.

OPTION SNP_file snp

Specify the SNP file name to use genotype data.

Save intermediate results for "cold start"

  OPTION save_halfway_samples n

This option can help the 'cold start' (to continue the sampling when the program accidentally stops before completing the run). An integer value n is needed. In every n rounds, the program saves intermediate samples to 2 files (last_solutions and binary_final_solutions). The program can restart the sampling form the last round where the intermediate files were saved. The program also writes a log file save_halfway_samples.txt with useful information for the next run.

To restart, add OPTION cont 1 to your parameter file and run thrgibbs1f90 again. Input 3 numbers (samples, burn-in, and interval) according to save_halfway_samples.txt. Thrgibbs1f90 can take care of all restarting process by itself, so no other tools are needed.

Tips

  • Small n will make the program slow because of frequent file writing. The n should be a multiple of the interval (the 3rd number you will input in the beginning of the program).
  • If the program stops during burn-in, the restart will fail because gibbs_samples is not created. Recommendation is burn-in=0 (but it doesn't provide posterior mean and SD for solutions).
  • The cold start may add tiny numerical errors to the samples. Samples from the cold start wouldn't be identical to samples from a non-stop analysis.
  • If, unfortunately, the program is killed during its saving the intermediate samples, the cold start will fail. To avoid this, you can manually make a backup for gibbs_samples, fort.99, last_solutions, and binary_final_solutions at some point and write them back if needed.

Example

Put the following option in your parameter file.

  OPTION save_halfway_samples 100

Run thrgibbs1f90. You will see the following message on screen.

  '**** saving halfway samples in every         100  rounds (default=0)

In this case, we assume the number of total samples is 3000, the burn-in is 0, and the interval is 10.

   number of samples and length of burn-in
   3000 0
   Give n to store every n-th sample? (1 means store all samples)
   10

Make sure the intermediate results are saved to files.

          100  rounds
  G
    2758.       1900.       2019.    
    1900.       1690.       1656.    
    2019.       1656.       1737.    
  G
    225.5      -91.35      -9.998    
   -91.35       403.5       474.6    
   -9.998       474.6       702.2    
  R
    1755.       868.7       817.0    
    868.7       2122.       1361.    
    817.0       1361.       1804.    
  * Last seeds =  1877469549   348652151
  * Number of samples kept =         100
  solutions stored in binary file: "last_solutions"
  solutions stored in file: "binary_final_solutions"

Stop the program. In this case, program stops in the round 880.

           880  rounds
  forrtl: error (69): process interrupted (SIGINT)
  Image              PC                Routine            Line        Source             
  thrgibbs1f90       0000000000943031  Unknown               Unknown  Unknown
  thrgibbs1f90       0000000000941787  Unknown               Unknown  Unknown

Make sure there are the following 5 files.

  binary_final_solutions  fort.99  gibbs_samples  last_solutions  save_halfway_samples.txt

Browse the file save_halfway_samples.txt. Only information in the 'suggestion' block is needed for the next run. The halfway samples were saved in the round 800 so you will start the sampling from the round 801 in the next run. Remaining 2200 rounds are needed to satisfy the initial goal (3000 rounds).

  Saved on 2017-03-10 10:53:22
  
  State in the current run:
    last round              =        800
    sampled in this run     =        800
    total number of samples =       3000
    number of burn-in       =          0
    interval                =         10
  
  Suggestion for the input in next run:
    total number of samples =       2200
    number of burn-in       =          0
    interval                =         10
  
  When you restart the program, do not forget to put the following option
  in your parameter file.
     OPTION cont 1

Put the option OPTION cont 1 to your parameter file. It can invoke the 'cold start' module in thrgibbs1f90.

  OPTION cont 1

Run thrgibbs1f90 again. You will see the following message on screen.

  '*** continuous sampling selected *** previous # samples =           1

NOTE: Although the message may say the previous number of sample is 1, you can ignore it. The program recognizes it is the cold start mode and works correctly.

Input the three numbers that are shown in save_halfway_samples.txt.

   number of samples and length of burn-in
   2200 0
   Give n to store every n-th sample? (1 means store all samples)
   10

The program will start from the round 801 as expected.

           801  rounds
   G
     828.0       601.3       610.8    
     601.3       996.3       877.4    
     610.8       877.4       822.8    
   G
     2541.       1531.       1756.    
     1531.       1459.       1598.    
     1756.       1598.       1898.    
   R
     1800.       833.0       813.3    
     833.0       2114.       1303.    
     813.3       1303.       1778.    

Just wait the analysis. You can interrupt the program again. The final results will be basically the same to ones from a non-stop analysis.

readme.thrgibbs1.txt · Last modified: 2021/03/02 22:28 by dani