PreGSF90 / PostGSF90

PreGSF90 / PostGSF90

PreGSF90 is an interface program to the genomic module to process the genomic information for the BLUPF90 family of programs

This page also describes some options for PostGSF90 which is designed for genome-wide assocication study (GWAS).

Ignacio Aguilar and Ignacy Misztal, University of Georgia
email: iaguilar at inia.org.uy; ignacy at uga.edu
01/29/09 - 07/30/14

Summary

Program PreGSF90 helps to implement the genomic selection following the single-step methodology as presented by Aguilar et al. 2010 JDS.
In this methodology the relationship matrix A based on the pedigree information is replaced by matrix H, which combines the pedigree and genomic information.

The main difference between A^-1 and H^-1 is matrix of structure
GimA22i=inv(G)-inv(A₂₂),
where G is a genomic relationship matrix and
A₂₂ is a relationship matrix for genotyped animals.

Efficient methods for the creation of the genomic relationship matrix, relationship based on pedigree and their inverses are described in Aguilar et al., 2011 JABG.
Program PreGSF90 could be run after RENUMf90.
It is also run automatically by application programs like BLUPF90, REMLF90, GIBBSxF90 or BLUP90IOD when their parameter file contains OPTION SNP_file filename.

Input files

* Parameter file (ie renf90.par) as created by RENUMF90 with genotype file specified with keyword SNP_FILE

* Genotype file

Field 1 - animal ID with format as in pedigree file
Field 2 - genotype with 0,1,2 and 5 (missing) or real values for gene content 0.12 …

Fields need to be separated by at least one space and Field 2 should be fixed format i.e. all rows of genotypes *must* start at the same column number!!!

 80   21101011002012011011010110111111211111210100
 8014 21110101511101120221110111511112101112210100
 516  21100101202252021120210121102111202212111101
 181  21110111112201120550200020101022212211111100

using fractional genotypes is also possible, i.e. from imputation. In this case the genotypes must be “stick” together with two decimal places i.e.

 80   2.001.001.000.001.00
 8014 2.001.001.001.000.00
 516  2.001.001.000.000.00 
1032  0.501.120.251.502.00

where the last individual has “fractional” genotype 0.50 1.12 0.25 1.50 2.00.

An utility program (illumina2pregs) is available to converts Illumina FinalReport and SNP_Map.txt into such format.

Some useful options:

OPTION missingAIPL will read and convert genotype codes 3 and 4 as missing, i.e. internally they are read and converted to 5.
OPTION QMSim will read and convert genotype codes 3 and 4 as 1, i.e. heterozygote. This is useful e.g. if QMSim is used to produce the simulations.\
OPTION fastread will use C's library zlib for faster reading (useful for *very* large number of genotypes). This library is always available in all systems and in that case it defaults to standard reading.

* Renumbered ID for genotypes

File gt.XrefID, where “gt” is genotype file.
This file is created by RENUMF90. Contains IDs renumbered sequentially (by RENUMF90) and the original IDs.
Rows of this file should be in the same order as in the genotype file !!!

Pedigree file from RENUMF90

1732 11010 10584 1 3 12  1 0 0 80
8474  8691  9908 1 3 12  1 0 0 8014
 406  8691  9825 1 3 12  1 0 2 516
9441  8691  8829 1 3 12  1 0 0 181

* Optional files:

Allele Frequencies (OPTION FreqFile)
Map file (OPTION map_file)
Weight file (OPTION weightedG)
G or its inverse, A22 or its inverse, etc,

As specified by respective OPTION lines.

OPTION map_file file

Read SNP map information from file.

Useful for check for Mendelian conflicts and HWE (with also OPTION sex_chr) and for GWAS (see PostGSF90 program)

The file should have a header with the following column names:
SNP_ID - identification of the SNP (alphanumeric)
CHR - chromosome number (numeric), starting from 1
POS - position bp (numeric)

Extra columns are possible

The first SNP in the Map file corresponds to the first SNP in the genotype file, and so on.

Other alphanumeric fields are optional.

If OPTION saveCleanSNPs is present fields are output.

Output files

GimA22i
By default PreGSf90 always create this file in binary format for use by later programs specifying OPTION readGimA22i.
With OPTION saveAscii this file can be stored as ASCII. Format: i,j, inv(G)-inv(A22)

freqdata.count
contains allele frequencies in the original genotype file, previous any QC.
format: snp number (related to the genotype file) and allele frequency

freqdata.count.after.clean
contains allele frequencies as used in calculations.
format: snp number (related to the genotype file), allele frequency and code of exclusion.

Codes:

1: Call Rate
2: MAF
3: Monomorphic
4: Excluded by request
5: Mendelian error
6: HWE
7: High Correlation with other(s) SNP

Gen_call_rate
contains list of animals excluded with call rate below the threshold.

Gen_conflicts
contains report of animals with Mendelian conflicts with their parents. The first part of the file includes:

1: a pair of animals compared
2: the progeny’s pedigree ID
3: the parent’s pedigree ID
4: the number of conflicts
5: proportion of markers conflicted

The second part of the file includes:

1: the genotyped animal’s position (row number) in the SNP file
2: the corresponding pedigree ID
3: the number of conflicts with this animal as a sire
4: the total number of comparisons with this animal as a sire
5: the number of conflicts with this animal as a dam
6: the total number of comparisons with this animal as a dam
7: the number of conflicts with this animal an individual (“the animal”)
8: the total number of comparisons with this animal as an individual.

Optional
The program can store files such as G or its inverse,
A22 or its inverse, or other reports from QC as specified by their respective OPTION lines.

Options for creation of genomic relationship Matrix (G)

The genomic relationship matrix can be created in different ways.

OPTION whichG x

specifies how G is created. The variable x can be:

1: G=ZZ'/k ; as in VanRaden, 2008 (default)
2: G=ZDZ'/n ; as in Amin et al., 2007; Leuttenger et al., 2003; where D=1/2p(1-p)
3: As 2 with modification UAR from Yang et al 2010

 OPTION whichfreq x

specifies what frequencies are used to create G in G=ZDZ'/k. The same frequencies are used to center Z and to scale in D. The variable x can be:

0: read from file freqdata or from other file as specified using OPTION FreqFile
1: 0.5
2: current calculated from genotypes(default)

OPTION whichfreqScale x

Specifies what frequencies are used to scale G in G=ZDZ'/k. Use this option if, for instance, want to use 0.5 for centering (using option above) but observed 2pq for scaling. The variable x can be:

0: read from file as specified in OPTION FreqFile
1: 0.5
2: current calculated from genotypes(default)

If option whichfreq is used, whichfreqScale will be considered only if it is read later in the parameter file

OPTION FreqFile  file

Reads allele frequencies from a file, for example based on allele frequencies calculated by estfreq.f90 (VanRaden, 2009)
format : snp, freq where snp corresponds to the index of SNP based on the same order that are in the genotype file.

If whichfreq is set to 0 the default file name is freqdata

OPTION whichScale x

specifies how G is scaled. The variable x can be:

1: 2Σ(p(1-p)) - VanRaden 2008 (default)
2: trace(ZZ')/n - Legarra 2009, Hayes 2009
3: correction - Gianola et al 2009

OPTION weightedG file

Reads weights from file to create weighted genomic relationship.
Weighting Z*= Z sqrt(D) ⇒ G = Z*Z*' = ZDZ'.
format: one column of weights in the same order as in the genotyped file.
Weights can be extracted from the output of PostGSF90.

OPTION maxsnp x

Set the maximum length of string for reading marker data from file, only necessary if greater than default (400,000)

Quality Control (QC) for G

By default the following QC are run:

MAF
Call rate (SNPs & animals)
Monomorphic
Parent-progeny conflicts (SNPs & animals)

Parameters can be modified with the following options:

OPTION minfreq x

ignores all SNP with MAF < x,
default value 0.05

OPTION callrate x

ignores SNP with call rates < x (number of calls/number of individuals with genotypes)
default value 0.90

OPTION callrateAnim x

ignores Genotypes with call rates < x (number of calls/number of SNPs),
default value 0.90

OPTION monomorphic x

ignores monomorphic SNPs Optional parameter x can be used to enable (1) or disable (0) the check, default value 1

OPTION hwe x

check departure of heterozygous from Hardy-Weinberg Equilibrium.
By default, this QC is not run.
Optional parameter x set the maximum difference between observed and expected frequency
the default value is 0.15 as used in Wiggans et al., 2009 JDS

OPTION high_correlation x y

check for high correlated SNP
By default this QC is not run.
Optional parameter x set the maximum difference in allele frequency to check pair of locus.
If no value 0.025 is used. Decrease this value to speed up calculation
A pair of locus is considered highly correlated if all the genotypes are the same (0-0, 1-1, 2-2) or the opposite (0-2, 1-1, 2-0) (Wiggans et al 2009 JDS)
Optional parameter y can be used to set a threshold to check the number of identical samples out of the number of genotypes.
default values x=0.025 y=0.995

OPTION verify_parentage x

Verify parent-progeny Mendelian conflicts.
Write report to file Gen_conflicts
Option x could be:

0: no action
1: only detect
2: detect and search for an alternate parent (Not implemented). This option is now implemented in the SeekParentF90 program
3: detect and eliminate progenies with conflicts (default)

OPTION exclusion_threshold x

Number of parent-progeny exclusions as percentage all SNP to determine the wrong relationship.
default value 1

OPTION exclusion_threshold_snp x

Number of parent-progeny exclusions for each locus as a percentage, of pair of genotyped animals evaluated, to exclude an SNP from the analysis
default value 10

OPTION number_parent_progeny_evaluations x

Set the number of minimum pair of parent-progeny evaluations to exclude SNPs due to parent-progeny exclusions
default value 100

OPTION outparent_progeny

Create a full log file (Gen_conflicts_all) with all pairs of parent-progeny tested for Mendelian conflicts.

OPTION out_snp_exclusion_error_rate

Create a log file (SNP_Mendelian_error_rate) with statistics for SNP Mendelian error rate.

OPTION excludeCHR n1 n2 n3 ...

Exclude all SNP from chromosomes n1 n2 n3 …
Map file need to be provided. See OPTION map_file

OPTION includeCHR n1 n2 n3 ...

Include all SNP from chromosomes n1 n2 n3 …
Map file need to be provided. See OPTION map_file

OPTION excludeSample n1 n2 n3 ...

Exclude genotype samples n1 n2 n3 …
n1, n2… are position number of individuals in the genotype file.

OPTION sex_chr n

Chromosomes with a number greater or equal to n are not considered as autosomes.
If selected this option, sex chromosomes will not be used for checking parent-progeny Mendelian conflicts, HWE and heritability of gene content
but they will be included for all remaining processes. If you want to remove sex chromosomes, which we do recommend, use OPTION excludeCHR.

Map file need to be provided. See OPTION map_file. However, note that sex chromosomes are not excluded

OPTION threshold_duplicate_samples x

Set the threshold to issue a warning for possible duplicate samples if G(i,j)/sqrt(G(i,i)*G(j,j)) > x
default value 0.9

OPTION high_threshold_diagonal_g x

Check and provide a list of individuals with extreme high diagonals in the genomic relationship matrix.
If optional x is present set the threshold
default value 1.6

OPTION low_threshold_diagonal_g x

Check and provide a list of individuals with extreme low diagonals in the genomic relationship matrix.
If optional x is present set the threshold
default value 0.7

OPTION plotpca <print/noprint>

Plot the first two principal components to look for stratification in the population. With noprint the program will save the first two principal components of G without showing the PCA plot on the screen; otherwise (print) it will print on the screen.

OPTION extra_info_pca file col

Reads from file the column col to plot with different colors for different classes.
The file should contain at least one variable with different classes for each genotyped individual, and the order should match the order of the genotypes file.
Variables could be alphanumeric and separated by one o more spaces.

OPTION calculate_LD

Calculate LD as the squared correlation of allele counts for two SNP

Results are stored in “ld_results”, columns: snp_i, chr_i, pos_i, freq_i, snp_j, chr_j, pos_j,freq_j, dist_ij, Rsq_ij

OPTION LD_by_chr

Calculate LD within chromosome

OPTION LD_by_pos x

Calculate LD within chromosome and windows of SNP based on position. Optional parameter x define with windows size in Bp, default value 200000

OPTION filter_by_LD x

Filter SNP with Rsq > threshold. Optional parameter x define the threshold. default value 0.8

OPTION thr_output_LD x

Threshold to print out Rsq between pair of SNP Optional parameter x define the threshold. default value 0.1

OPTION saveCleanSNPs

Save clean genotype data with excluded SNP and animals; based on the OPTIONS specified.
*_clean files are created:

gt_clean
gt_clean_XrefID

*_removed files are created:

gt_SNPs_removed
gt_Animals_removed

with gt being the genotype file

OPTION no_quality_control

This option turns off all quality control !!!!
Useful to speed up run when previous QC data was performed.

OPTION outcallrate

This option will print all call rate information for SNP and individuals.
The files callrate (SNP) and callrate_a (Individuals) are created.

OPTION h2_gene_content

It checks that the heritability of gene content is equal or close to 1 as described in Forneris et al. Genetics 199.3 (2015): 675-681. Markers with estimated h2<0.98 and significant p-values of the LRT (p<0.01) are discarded. In addition, heritability and status of each marker are written in file h2_gc_test.

The test is useful for homogenous populations (breeds) but theory does not hold for crossbred animals. This test uses explicitly inv(A22) so it is not suitable for very large populations.

Quality Control for Off-diagonal of A22 and G

OPTION thrWarnCorAG x

Set the threshold to issue a warning if cor(A22,G) < x
default value = 0.5

OPTION thrStopCorAG x

Set the threshold to Stop the analysis if cor(A22,G) < x
default value = 0.3

OPTION  thrCorAG x

Set the threshold to calculate corr(A22,G) for only A22 >= x
default value = 0.02

Options for H

Options include different weights to create GimA22i as:

tau inv(alpha G + beta A22 + gamma I + delta) - omega inv(A22)

where the parameters are to scale the genomic info to be compatible with
the pedigree info, to make matrices invertible in the presence of clones,
and to control bias.

The default values are:

tau=1 alpha =0.95 beta = 0.05 gamma=0 delta=0 omega=1

Options to change these defaults are specified with:

OPTION TauOmega tau omega

OPTION AlphaBeta alpha beta

OPTION GammaDelta gamma delta

OPTION tunedG x

Scale G based on A22.
The variable x can be:

0: no scaling
1: mean(diag(G))=1, mean(offdiag(G))=0. This implies that the estimated variance components and mean refer to the genotyped population Legarra, 2016
2: mean(diag(G))=mean(diag(A22)), mean(offdiag(G))=mean(offdiag(A22)) Christensen et al., 2012 This is the default
3: mean(G)=mean(A22) Vitezica et al. (2011)
4: rescale G using Fst adjustment. As in Powell et al. (2010) and Vitezica et al. (2011)
9: arbitrary parameters: specify two additional numbers $a$ and $b$ in $a+b\mathbf{G}$ as OPTION tunedG 9 a b.

Options to extract the diagonal of H (aka genomic improved inbreeding)

The diagonal of H contains an improved estimator of inbreeding: $\mathbf{F}_H = diag(\mathbf{H})-1$ . For genotyped animals, the diagonal of H is identical to the diagonal of G , and thus $\mathbf{F}_H=\mathbf{F}_G$ , sometimes called Genomic Inbreeding. The elements of $\mathbf{F}_H$ for non-genotyped animals include pedigree-based estimates of Genomic Inbreeding. See Colleau et al 2017 and Legarra et al. 2019. The last contains a description of the underlying methods.

To extract the diagonal of H one of these two OPTIONs needs to be used:

OPTION saveDiagH outputs $diag(\mathbf{H})$ with renumbered id's
OPTION saveDiagHOrig outputs $diag(\mathbf{H})$ with original and renumbered id's

User can use one of two equivalent methods :

1: using OPTION methodDiagH 1, does a sparse inversion of $\mathbf{H}^{-1}$ (default) . This option is very fast for small to medium pedigrees.
2: using OPTION methodDiagH 2, this is in fact Method 3 in Legarra et al. 2019, an outer product method that uses $\mathbf{M}=\mathbf{A}_{22}^{-1}(\mathbf{G}-\mathbf{A}_{22})\mathbf{A}_{22}^{-1}$. This method is recommended for large pedigrees as it is (for large pedigrees) less time and memory consuming.

The output depends on the method used. OPTION methodDiagH 1 shows only individual id and $diag(\mathbf{H})$. OPTION methodDiagH 2 shows individual id and the values of $diag(\mathbf{H})$, $diag(\mathbf{A})$ (pedigree-based relationship) and the difference $c$ such that $diag(\mathbf{H})=diag(\mathbf{A})+c$.

An example of the output obtained with OPTION methodDiagH 2 and OPTION saveDiagHOrig has original_id, $diag(\mathbf{H})$, $diag(\mathbf{A})$, and the difference $c=diag(\mathbf{H})-diag(\mathbf{A})$, and the renumbered_id:

testDiagH2_mf andres$ head diagHdirect.txt.2
45036060023                1.179829759235491  1.250322947770358 -0.070493188534867          1
64000169880047             1.126222691080622  1.220500000000000 -0.094277308919378          2
64000246030053             1.168573237141459  1.237528320312500 -0.068955083171041          3
45038980011                1.189937645185136  1.251295679897070 -0.061358034711934          4

GWAS options (PostGSF90)

OPTION Manhattan_plot

Uses GNUPLOT to plot the Manhattan plot (SNP effects) for each trait and correlated effect.

OPTION Manhattan_plot_R

Uses R to plot the Manhattan plot (SNP effects) for each trait and correlated effect.
pdf images are created: manplot_St1e2.pdf, but other formats can be specified.
Note: t1e2 corresponds to trait 1, effect 2.

OPTION Manhattan_plot_R_format <format>

Control the format type to create images in R
format values accepted:

pdf (default)
png
tif

OPTION plotsnp <n>

Control the values of SNP effects to use in Manhattan plots

1: plot regular SNP effects: abs(val)
2: plot standardized SNP effects: abs(val/sd) (default)

OPTION SNP_moving_average n

Solutions for SNP effects will be by moving average of n adjacents SNPs.

OPTION windows_variance n

Calculates the variance explained by n adjacents SNPs.

When this option is used, the sum of variance explained by n adjacent SNPs (column 8 of snp_sol or column 3 of chrsnpvar) is not 100%.
This is because moving variance is used. If windows size is 20, the proportion of variance assigned to SNP 1 is calculated from SNP 1 to 20, for SNP 2 it goes from 2 to 21, for SNP 3 it goes from 3 to 22, and so forth.
A file called windows_variance has variance that sums to 100% in column 9.

OPTION windows_variance_mbp n

Calculates the variance explained by n Mb window of adjacents SNPs.

OPTION windows_variance_type n

Sets windows type for variances calculations:

1: moving windows
2: exclusive windows

OPTION which_weight x

Generates a weight variable w to be used in the creation of a weighted genomic relationship matrix G=ZDZ'

1: w = y^2 * (2(p(1-p)))
2: w = y^2
3: experimental with the degree of belief
4: w = C**(abs(y)/sqrt(var(y's))-2) from VanRaden et al. (2009)
nonlinearA: same as 4

where y is the SNP solution, with scaled weight = w * nSnp/sum(w); and C is 1.125 by default (enable to change it using the second argument of the option line e.g. OPTION which_weight nonlinearA 1.2).

OPTION solutions_postGS x

Sets the file name for the solutions file, default solutions

OPTION postgs_trt_eff x1 x2

Computes postGS calculations (SNP solutions, variance explained, etc.) for only trait: x1 and effect: x2

OPTION snp_p_value

Computes p-values for GWAS from elements of the inverse of the Mixed Model Equations previously obtained from blupf90. This requires quite a lot of memory and time. For details see Aguilar et al. (2019).

OPTION snp_var

Creates a file with prediction error covariance (PEC) for SNP to be used in PREDF90 to compute reliability for indirect predictions. This option works when OPTION snp_p_value is used in BLUPF90+.

Output files for GWAS (postGSf90)

snp_sol

contains solutions of SNP and weights

1: trait
2: effect
3: SNP
4: Chromosome
5: Position
6: SNP solution
7: weight

if OPTION windows_variance is used

8: variance explained by n adjacents SNP.

if OPTION snp_p_value is used

9: variance of the SNP solution (used to compute the p-value)

chrsnp

contains data to create plot by GNUPLOT

1: trait
2: effect
3: values of SNP effects to use in Manhattan plots → [abs(SNP_i)/SD(SNP)]
4: SNP
5: Chromosome
6: Position

chrsnp_pval

contains data to create plot by GNUPLOT

1: trait
2: effect
3: -log10(p-value)
4: SNP
5: Chromosome
6: Position in bp

chrsnpvar

contains data to create plot by GNUPLOT

1: trait
2: effect
3: variance explained by n adjacents SNP
4: SNP
5: Chromosome
6: Position

windows_segment

contains information of windows segments used to get variance explained

1: label
2: window size (number of SNP)
3: Start SNP number for the window
4: End SNP number for the window
5: identification of window: (ChrNumber)'_'(startPositionMBP)
6: Start (ChrNumber)'_'(Position) for the window
7: End (ChrNumber)'_'(Position) for the window

windows_variance

contains variance explained for the biggest non-overlapping windows segments

1: trait
2: effect
3: Start SNP number or SNP name for the window
4: End SNP number or SNP name for the window
5: window size (number of SNP)
6: Start (ChrNumber)'_'(Position) for the window
7: End (ChrNumber)'_'(Position) for the window
8: identification of window: (ChrNumber)'_'(startPositionMBP)
9: variance explained by n adjacents SNP

snp_pred

contains allele frequencies + SNP effects

Graphic control files

Several files are created to generate graphics using either GNUPLOT or R
File names rules:

e.g.: 'Sft1e2.R'

first letter indicate
  'S' for solutions of SNP 
  'V' for variance explained
  'P' for p-values
  
t1e2
  indicates that the file is for the trait 1 and the effect 2
  
filename extension
   .gnuplot
   .R
   .pdf 
   .png
   .tif

Misc options

OPTION num_threads_pregs n

Specify the number of threads to be used with MKL-OpenMP for creating and inverting matrices

OPTION num_threads_iod n

Specify the number of threads to be used with MKL-OpenMP in BLUP90IOD program
for matrix-vector multiplications in the PCG algorithm

OPTION graphics s

Allows to generate plots with GNUPLOT program.
If present optional parameter s, set the time in seconds to show the plot.
Avoid using in batch programs !!!.

OPTION msg x

x set level of verbose; 0 minimal; 1 gives more diagnostic information

Save and Read options

OPTION saveAscii

Saves files intermediate matrices (GimA22i,G,Gi,etc) files as ASCII (default=binary)

OPTION saveHinv

Saves H inverse matrix in Hinv.txt
Format: i,j,val
with i,j, the index level for the additive genetic effect

OPTION saveAinv

Saves A inverse matrix in Ainv.txt
Format: i,j,val
with i,j, the index level for the additive genetic effect

The following options use the information of the Original ID (alphanumeric) stored in the 10th column of the renaddxx.ped file, create by the RENUMF90 program.

OPTION saveHinvOrig

Saves the H inverse matrix with original IDs

OPTION saveAinvOrig

Saves the A inverse matrix with original IDs

OPTION saveDiagGOrig

Saves diagonal of G matrix in DiagGOrig.txt
Format: id, val
with id the original IDs

OPTION saveGOrig

Saves G matrix in G_Orig.txt
Format: id_i, id_j, val
with id_i and id_j the original IDs

OPTION saveA22Orig

Saves A22 matrix in A22_Orig.txt
Format: id_i, id_j, val
with id_i and id_j the original IDs

OPTION saveGimA22iOrig

Saves GimA22i matrix in GimA22i_Orig.txt
Format: id_i, id_j, val
with id_i and id_j the original IDs as stored in the original pedigree file and in column 10 of renaddXX.ped

OPTION readOrigId

Reads information from renaddxx.ped file, Original ID, and possibly year of birth for its use in parent-progeny conflict output.
Only need if none of the previous save*Orig is present.

OPTION saveGimA22iRen

Saves GimA22i matrix in GimA22i_Ren.txt
Format: id_i, id_j, val
with id_i and id_j the renumbered IDs as stored in the first column of renaddXX.ped

OPTION savePLINK

Saves Genotype in PLINK format
files: toPLINK.ped and toPLINK.map

OPTION no_full_binary

Saves the elements of half-matrix instead of the full matrix. It is useful to keep the compatibility with the older version of preGSf90. The newer versions save the matrix in a more efficient way, where reading the information from the binary file is not trivial (i.e., not as i, j, val anymore).

Save and Read intermediate files

OPTION readGimA22i file

This option is used in the application programs (BLUPF90,REMLF90, etc.) to use the information already stored in GimA22i file (default filename).
In general, methods used to create and invert matrices in such programs don't use an optimized version.
For a large number of genotyped animals run first PreGSf90 and the read stored matrices in analyses programs.

Optional file can be used to specify a different filename (other than GimA22i) or a path, for example OPTION readGimA22i ../../pregsrun/GimA22i.

Other intermediate matrices files can be stored for inspection or for it use in BLUPF90 programs as user_file type of random effect. See tricks and REMLF90 for details.

OPTION saveA22 
OPTION saveA22Inverse 
OPTION saveA22InverseOrig
OPTION saveG [all]
       if optional all is present all intermediate matrices for G are saved
OPTION saveGInverse 
OPTION saveGInverseOrig
OPTION saveGmA22 

OPTION readG [file]
OPTION readGInverse [file]
OPTION readA22 [file]
OPTION readA22Inverse [file]
OPTION readGmA22 [file]

DEPRECATED OPTIONS

OPTION chrinfo file

This is deprecated. Use instead OPTION map_file.

Read SNP map information from file.

Useful for check for Mendelian conflicts and HWE (with also OPTION sex_chr) and for GWAS (see PostGSF90 program)

Format, all numeric variables: snp order , chromosome, position (bp).

snp order corresponds to the index number of the snp, in the sorted map by chromosome and position

chromosome should be only numbers and starting from 1

First line in file corresponds to first SNP in genotype file, and so on.

Other alphanumeric field are optional.

If OPTION saveCleanSNPs is present fields are output.

Table of Contents

PreGSF90 / PostGSF90

Summary

Input files

Output files

Options for creation of genomic relationship Matrix (G)

Quality Control (QC) for G

Quality Control for Off-diagonal of A22 and G

Options for H

Options to extract the diagonal of H (aka genomic improved inbreeding)

GWAS options (PostGSF90)

Output files for GWAS (postGSf90)

Misc options

Save and Read options

Save and Read intermediate files

DEPRECATED OPTIONS