GIBBSF90+

Summary

This is a combined program of gibbs2f90, gibbs3f90, thrgibbs1f90, and thrgibbs3f90.

GIBBSF90+ implements Gibbs sampler for mixed threshold-linear models involving multiple categorical and linear variables. Thresholds and variances can be estimated or assumed. Another version of thrgibbs1f90b for binary responses is available.

When using threshold models for categorical (or binary) traits, phenotypes equal to 0 are considered missing. Therefore, categories should be numbered from 1. OPTION missing will not work.

See PREGSF90 with genotypes (SNP) for options.

Run-time options

Keyboard input

When you run the program, the program will ask you several questions. You have to type-in (input) the answer to each question. The first one is the name of parameter file.

 name of parameter file?

Then, the program asks you the number of total samples (rounds) and the number of burn-in. The program runs through up to the rounds specified here and the first burn-in samples will not be saved.

  number of samples and length of burn-in

Here, you put two integer numbers separated by spaces (or Enter key). For example, if you need 1000 samples with no burn-in, the correct input is 1000 0.

Finally, the program asks you the interval to save samples. It is also known as thinning.

 Give n to store every n-th sample? (1 means store all samples)

For example, if you put 10, the program save the sample every 10 rounds. The other samples will be discarded. To save all samples, please input 1 here.

Command line options

Instead of typing the above parameters, you can put them in the command line in shell. Here is the general form of command line.

gibbsf90+ parfile --samples i --burnin j --interval k

Each item should be replaced with your favorite value.

parfile: the name of parameter file (characters)
i: the number of samples (integer)
j: the number of burn-in (integer)
k: interval to save samples (integer)

For example, when you use `renf90.par` as parameter file, sample 10000 times with no burn-in, and save all samples, the following command works.

gibbsf90+ renf90.par --samples 10000 --burnin 0 --interval 1

Tips: you can use --rounds instead of --samples (even the singular form is okay: --sample and --round). Also, --thinning is acceptable as --interval.

Parameters

The parameter file is the same as for BLUPF90 except for options.

Options in parameter file

OPTION cat 0 0 2 5

“0” indicate that the first and second traits are linear. “2” and “5” indicate that the third and fourth traits are categorical with 2 (binary) and 5 categories.

OPTION fixed_var all

Store all samples for solutions in “all_solutions” and posterior means and SD for all effects in “final_solutions”, assuming that (co)variances in the parameter file are known.

OPTION fixed_var all 1 2 3

Store all samples for solutions in “all_solutions” and posterior means and SD for 1, 2, and 3 effects in “final_solutions”, assuming that (co)variances in the parameter file are known.

OPTION fixed_var mean

Only posterior means and SD for solutions are calculated for all effects in “final_solutions”, assuming that (co)variances in the parameter file are known.

OPTION fixed_var mean 1 2 3

Only posterior means and SD for solutions are calculated for effects 1, 2, and 3 in “final_solutions”, assuming that (co)variances in the parameter file are known.

OPTION solution all

Caution: this option will create a huge output solution file when you run many rounds and/or use a large model. Store all samples for solutions in “all_solutions” and posterior means and SD for all effects. The file “all_solutions” could be very large.

OPTION solution all 1 2 3

Caution: this option will create a huge output solution file when you run many rounds and/or use a large model. Store all samples for solutions in “all_solutions” and posterior means and SD for 1, 2, and 3 effects. The file “all_solutions” could be very large.

OPTION solution mean

Only posterior means and SD for solutions are calculated for all effects in “final_solutions” while sampling (co)variances.

OPTION solution mean 1 2 3

Only posterior means and SD for solutions are calculated for effects 1, 2, and 3 in “final_solutions” while sampling (co)variances.

OPTION save_halfway_samples 5000

The program saves every “5000” samples to restart or recover the job right after the last saved samples. It is useful when the program accidentally stopped.

OPTION cont 10000

“10000” is the number of samples run previously. The user can restart the program from the last run. This option requires “binary_final_solutions”, “gibbs_samples”, and “fort.99” files. When using “OPTION cont”, all output files will be replaced by new ones. Before running with this option, all files should be backed up.

OPTION prior 5 2 -1 5

The (co)variance priors are specified in the parameter file.
Degree of belief for all random effects should be specified using the following structure:
OPTION prior eff1 db1 eff2 db2 … effn dbn -1 dbres
effx correspond to the effect number and dbx to the degree of belief for this random effect, -1 corresponds to the degree of belief of the residual variance.
In this example 2 is the degree of belief for the 5th effect, and 5 is the degree of belief for the residual.

OPTION seed 123 -432

Two seeds for a random number generator can be specified.

OPTION thresholds 0.0 1.0 2.0

Set the fixed thresholds. No need to set 0 for binary traits.

OPTION residual 1

Set the residual variance = 1 for categorical traits. For binary traits, the residual variance is automatically set to 1, so no need to use this option.

OPTION pos_def x.x

Specify checking pos-def for fixed effects where x.x is a tolerance (default=1d-08).

OPTION censored 1 0

Negative values for the categorical trait in the data set indicate censored records. “1 0” determines that the first categorical trait is censored and the second uncensored.

OPTION SNP_file snp

Specify the SNP file name to use genotype data.

OPTION origID

Saves solutions with original ID. The output is trait effect level original_id solution, and is stored in “solutions.original”.

OPTION hetres_int col nlev

where col is column in the data file that selects which residual (co)variance to select, and nlev is the maximum number of levels. Different residual (co)variances need to be numbered consecutively starting from 1.

Initially, all residual (co)variances are assigned identical (co)variances as in the parameter file. During the estimation, (co)variances that are 0 in the parameter file will not be estimated.

The number of observations per each subset must be large enough to allow the estimation, and the missing-trait pattern should be similar. The program has not been tested in multiple-trait situations when one trait is present in some subclasses but always missing in another subclass.

number of samples and length of burn-in?

In the first run, if you have no idea about the number of samples and burn-in, just type your guess (10000 or whatever) for samples and (0) for burn-in. You may need 2 or 3 runs to figure out the convergence.

Give n to store every n-th sample?

Gibbs samples are usually highly correlated, so you do not have to keep all samples. Maybe every 10th,20th, 50th, …

To check the convergence and to calculate posterior means and SD, run postgibbsf90.

OPTION hetres_int 5 10

The position “5” to identify the interval in the data file and the number of intervals “10” for heterogeneous residual variances.

OPTION hetres_var

The file “hetres” should contain:

x11 x12 x13
x21 x22 x23
x31 x32 x33
y11 y12 y13
y21 y22 y23
y31 y32 y33
:
:
:
z11 z12 z13
z21 z22 z23
z31 z32 z33

Give residual covariances for each interval (e.g., x, y, and z for traits 1, 2, and 3).

OPTION external_halfway_script name n

program gibbsf90+ will stop every n iterations, write samples of variances, unknowns (samples of solutions: in files last_solutions (binary) and last_solutions.txt (plain text) and thresholds, wait for external process 'name' to run and then restart. This is useful to compute diagnostics from samples without storing them or modifying the program.

Save intermediate results for "cold start"

  OPTION save_halfway_samples n

This option can help the 'cold start' (to continue the sampling when the program accidentally stops before completing the run). An integer value n is needed. In every n rounds, the program saves intermediate samples to 2 files (last_solutions and binary_final_solutions). The program can restart the sampling from the last round where the intermediate files were saved. The program also writes a log file save_halfway_samples.txt with useful information for the next run.

To restart, add OPTION cont 1 to your parameter file and run gibbsf90+ again. Input 3 numbers (samples, burn-in, and interval) according to save_halfway_samples.txt. Gibbsf90+ can take care of all restarting process by itself, so no other tools are needed.

Tips

Small n will make the program slow because of frequent file writing. The n should be a multiple of the interval (the 3rd number you will input in the beginning of the program).
If the program stops during burn-in, the restart will fail because gibbs_samples is not created. Recommendation is burn-in=0 (but it doesn't provide posterior mean and SD for solutions).
The cold start may add tiny numerical errors to the samples. Samples from the cold start wouldn't be identical to samples from a non-stop analysis.
If, unfortunately, the program is killed during its saving the intermediate samples, the cold start will fail. To avoid this, you can manually make a backup for gibbs_samples, fort.99, last_solutions, and binary_final_solutions at some point and write them back if needed.

Example

Put the following option in your parameter file for the first time when you run the program.

  OPTION save_halfway_samples 100

Run gibbsf90+. You will see the following message on screen.

  '**** saving halfway samples in every         100  rounds (default=0)

In this example, we assume the number of total samples is 3000, the burn-in is 0, and the interval is 10.

   number of samples and length of burn-in
   3000 0
   Give n to store every n-th sample? (1 means store all samples)
   10

Make sure the intermediate results are saved to files.

          100  rounds
  G
    2758.       1900.       2019.    
    1900.       1690.       1656.    
    2019.       1656.       1737.    
  G
    225.5      -91.35      -9.998    
   -91.35       403.5       474.6    
   -9.998       474.6       702.2    
  R
    1755.       868.7       817.0    
    868.7       2122.       1361.    
    817.0       1361.       1804.    
  * Last seeds =  1877469549   348652151
  * Number of samples kept =         100
  solutions stored in binary file: "last_solutions"
  solutions stored in file: "binary_final_solutions"

Stop the program. In this case, program stops in the round 880.

           880  rounds
  forrtl: error (69): process interrupted (SIGINT)
  Image              PC                Routine            Line        Source             
  thrgibbs1f90       0000000000943031  Unknown               Unknown  Unknown
  thrgibbs1f90       0000000000941787  Unknown               Unknown  Unknown

Make sure there are the following 5 files.

  binary_final_solutions  fort.99  gibbs_samples  last_solutions  save_halfway_samples.txt

Browse the file save_halfway_samples.txt. Only information in the 'suggestion' block is needed for the next run. The halfway samples were saved in the round 800 so you will start the sampling from the round 801 in the next run. Remaining 2200 rounds are needed to satisfy the initial goal (3000 rounds).

  Saved on 2017-03-10 10:53:22
  
  State in the current run:
    last round              =        800
    sampled in this run     =        800
    total number of samples =       3000
    number of burn-in       =          0
    interval                =         10
  
  Suggestion for input in the next run:
    total number of samples =       2200
    number of burn-in       =          0
    interval                =         10
  
  When you restart the program, do not forget to put the following option
  in your parameter file.
     OPTION cont 1

Put the option OPTION cont 1 to your parameter file manually (by hand). It can invoke the 'cold start' module in gibbsf90+.

  OPTION cont 1

Run gibbsf90+ again. You will see the following message on screen.

  '*** continuous sampling selected *** previous # samples =           1

NOTE: Although the message may say the previous number of sample is 1, you can safely ignore it. The program runs with the cold start mode and it works correctly.

Input the three numbers that are shown in save_halfway_samples.txt.

   number of samples and length of burn-in
   2200 0
   Give n to store every n-th sample? (1 means store all samples)
   10

The program will start from the round 801 as expected.

           801  rounds
   G
     828.0       601.3       610.8    
     601.3       996.3       877.4    
     610.8       877.4       822.8    
   G
     2541.       1531.       1756.    
     1531.       1459.       1598.    
     1756.       1598.       1898.    
   R
     1800.       833.0       813.3    
     833.0       2114.       1303.    
     813.3       1303.       1778.

You can interrupt the program again. The results will be basically the same to ones from the non-stop analysis.

Tips for GIBBSF90+

1. In the first run, set the burn-in 0 because of no clue for the convergence.

2. Therefore, in the first run, “OPTION solution …” should not be used without the exactly known burn-in xxx.

3. When you want BLUP solutions and their SDs, “OPTION fixed_var mean” can provide posterior means and posterior SD for all solutions without saving all samples.

4. Usually, a large number of iterations (samples) > 100,000, 200,000, or 300,000 is not needed when you have enough data. On the other hand, when you have small data (e.g., animals or phenotypes < 1000-10,000), increasing the number of Gibbs samples (e.g., 500,000 or 1,000,000) is not always useful. Small biased data cannot provide sufficient (reliable) posterior information.

5. It will be useful to compare the result with BLUPF90+.

6. Other tips in blupf90+ for VCE can be also useful.

Table of Contents