Differences

This shows you the differences between two versions of the page.

--- readme.pregsf90 [2020/11/10 19:25] – [Input files] dani
+++ readme.pregsf90 [2024/03/25 18:22] – external edit 127.0.0.1
@@ Line 39: / Line 39: @@
     * Field 2 - genotype with 0,1,2 and 5 (missing) or real values for gene content 0.12 ...
-Fields need to be separated by at least one space and Field 2 should be fixed format i.e. all rows of genotypes should start at the same column number!!!
+Fields need to be separated by at least one space and Field 2 should be fixed format i.e. all rows of genotypes *must* start at the same column number!!!
 <file>
@@ Line 48: / Line 48: @@
 </file>
-An utility program ([[readme.illumina2pregs|illumina2pregs]]) is available to converts Illumina FinalReport and SNP_Map.txt in such format.
+using fractional genotypes is also possible, i.e. from imputation. In this case the genotypes must be "stick" together with **two** decimal places i.e.
+<file>
+   2.001.001.000.001.00
+2.001.001.001.000.00
+  2.001.001.000.000.00
+  0.501.120.251.502.00
+</file>
+where the last individual has "fractional" genotype ''0.50 1.12 0.25 1.50 2.00''.
+An utility program ([[readme.illumina2pregs|illumina2pregs]]) is available to converts Illumina FinalReport and SNP_Map.txt into such format.
+Some useful options:
+  * ''OPTION missingAIPL'' will read and convert genotype codes ''3'' and ''4'' as missing, i.e. internally they are read and converted to ''5''.
+  * ''OPTION QMSim'' will read and convert genotype codes ''3'' and ''4'' as ''1'', i.e. heterozygote. This is useful e.g. if [[https://animalbiosciences.uoguelph.ca/~msargol/qmsim/|QMSim]] is used to produce the simulations.\
+  * ''OPTION fastread'' will use C's library zlib for faster reading (useful for *very* large number of genotypes). This library is always available in all systems and in that case it defaults to standard reading.
 * Renumbered ID for genotypes
@@ Line 164: / Line 182: @@
 <file> OPTION whichfreq x</file>
-   specifies what frequencies are used to create **G** in **G=ZDZ**'. The same frequencies are used to center
+   specifies what frequencies are used to create **G** in **G=ZDZ**'/k. The same frequencies are used to center
  Z and to scale in D. The variable //x// can be:
              *0: read from file ''freqdata'' or from other file as specified using ''OPTION FreqFile''
@@ Line 172: / Line 190: @@
 <file>OPTION whichfreqScale x</file>
-Specifies what frequencies are used to __scale__ **G** in **G=ZDZ**'.
+Specifies what frequencies are used to __scale__ **G** in **G=ZDZ**'/k.
 Use this option if, for instance, want to use 0.5 for centering (using option above) but observed 2pq for scaling.
 The variable //x// can be:
@@ Line 198: / Line 216: @@
 Weighting Z*= Z sqrt(D) => G = Z*Z*' = ZDZ'.\\
 format: one column of weights in the same order as in the genotyped file.\\
-Weights can be extracted from output of ''PostGSF90'' program.
+Weights can be extracted from the output of ''PostGSF90''.
 <file>OPTION maxsnp x</file>
@@ Line 269: / Line 287: @@
 <file>OPTION outparent_progeny</file>
 Create a full log file (''Gen_conflicts_all'') with all pairs of parent-progeny tested for Mendelian conflicts.
+<file>OPTION out_snp_exclusion_error_rate</file>
+Create a log file (''SNP_Mendelian_error_rate'') with statistics for SNP Mendelian error rate.
 <file>OPTION excludeCHR n1 n2 n3 ...</file>
@@ Line 283: / Line 304: @@
 <file>OPTION sex_chr n</file>
-Chromosomes number equal or greater than //n// are not consider autosomes.\\
+Chromosomes with a number greater or equal to //n// are not considered as autosomes.\\
 If selected this option, sex chromosomes will not be used for checking parent-progeny Mendelian conflicts,  HWE and heritability of gene content\\
 but they will be included for all remaining processes. **If you want to remove sex chromosomes**, which we do recommend, use ''OPTION excludeCHR''.
@@ Line 294: / Line 315: @@
 <file>OPTION high_threshold_diagonal_g x</file>
-Check and remove individuals with extreme high diagonals in the genomic relationship matrix.\\
+Check and provide a list of individuals with extreme high diagonals in the genomic relationship matrix.\\
 If optional //x// is present set the threshold\\
 default value 1.6
 <file>OPTION low_threshold_diagonal_g x</file>
-Check and remove individuals with extreme low diagonals in the genomic relationship matrix.\\
+Check and provide a list of individuals with extreme low diagonals in the genomic relationship matrix.\\
 If optional //x// is present set the threshold\\
 default value 0.7
-<file>OPTION plotpca</file>
+<file>OPTION plotpca <print/noprint></file>
 Plot the first two principal components to look for stratification in the population.
 <file>OPTION extra_info_pca file col</file>
@@ Line 322: / Line 344: @@
 <file>OPTION LD_by_pos x</file>
-Calculate LD within chromosome and windows of SNP based on position
+Calculate LD within chromosome and windows of SNP based on position. Optional parameter x define with windows size in Bp, default value 200000
-optional parameter x define with windows size in Bp, default value 200000
 <file>OPTION filter_by_LD x</file>
@@ Line 365: / Line 386: @@
 <file>OPTION thrWarnCorAG x</file>
 Set the threshold to issue a warning if cor(A22,G) < //x//\\
-default value 0.5
+default value = 0.5
 <file>OPTION thrStopCorAG x</file>
 Set the threshold to Stop the analysis if cor(A22,G) < //x//\\
-default values 0.3
+default value = 0.3
 <file>OPTION  thrCorAG x</file>
 Set the threshold to calculate corr(A22,G) for only A22 >= //x//\\
-default values 0.02
+default value = 0.02
@@ Line 386: / Line 407: @@
 and to control bias.\\
-The defaults values are:
+The default values are:
 tau=1  alpha =0.95  beta = 0.05  gamma=0  delta=0  omega=1
@@ Line 400: / Line 421: @@
 The variable ''x'' can be:
   * 0: no scaling
-  * 1: mean(diag(G))=1, mean(offdiag(G))=0
+  * 1: mean(diag(G))=1, mean(offdiag(G))=0. This implies that the estimated variance components and mean refer to the genotyped population  [[http://dx.doi.org/10.1016/j.tpb.2015.08.005| Legarra, 2016]]
-  * 2: mean(diag(G))=mean(diag(A22)), mean(offdiag(G))=mean(offdiag(A22))  (default)
+  * 2: mean(diag(G))=mean(diag(A22)), mean(offdiag(G))=mean(offdiag(A22))  [[http://journals.cambridge.org/abstract_S1751731112000742|Christensen et al., 2012]] **This is the default**
-  * 3: mean(G)=mean(A22)
+  * 3: mean(G)=mean(A22) [[ https://www.cambridge.org/core/journals/genetics-research/article/bias-in-genomic-predictions-for-populations-under-selection/7A0ECD4D63EAFD33B1586FA1DE9DCF44|Vitezica et al. (2011)]]
-  * 4: rescale G using Fst adjustment. As in Powell et al. (2010) or Vitezica et al. (2011)
+  * 4: rescale G using Fst adjustment. As in [[https://www.nature.com/articles/nrg2865|Powell et al. (2010)]] and [[https://www.cambridge.org/core/journals/genetics-research/article/bias-in-genomic-predictions-for-populations-under-selection/7A0ECD4D63EAFD33B1586FA1DE9DCF44|Vitezica et al. (2011)]]
   * 9: arbitrary parameters: specify two additional numbers $a$ and $b$ in $a+b\mathbf{G}$ as ''OPTION tunedG 9 a b''.
+=====Options to extract the diagonal of H (aka genomic improved inbreeding)=====
+The diagonal of **H** contains an improved estimator of inbreeding: $\mathbf{F}_H = diag(\mathbf{H})-1$ . For genotyped animals, the diagonal of **H** is identical to the diagonal of **G** , and thus $\mathbf{F}_H=\mathbf{F}_G$ , sometimes called Genomic Inbreeding. The elements of $\mathbf{F}_H$ for non-genotyped animals include pedigree-based estimates of Genomic Inbreeding. See
+[[https://doi.org/10.1186/s12711-017-0363-9|Colleau et al 2017]] and [[https://doi.org/10.3168/jds.2019-17750|Legarra et al. 2019]]. The last contains a description of the underlying methods.
+To extract the diagonal of **H** one of these two ''OPTION''s needs to be used:
+  * ''OPTION saveDiagH'' outputs $diag(\mathbf{H})$ with renumbered id's
+  * ''OPTION saveDiagHOrig'' outputs $diag(\mathbf{H})$ with original //and// renumbered id's
+User can use one of two equivalent methods :
+  * 1: using ''OPTION methodDiagH 1'', does a sparse inversion of $\mathbf{H}^{-1}$ (default) . This option is very fast for small to medium pedigrees.
+  * 2: using ''OPTION methodDiagH 2'', this is in fact Method 3 in [[https://doi.org/10.3168/jds.2019-17750|Legarra et al. 2019]], an outer product  method  that uses $\mathbf{M}=\mathbf{A}_{22}^{-1}(\mathbf{G}-\mathbf{A}_{22})\mathbf{A}_{22}^{-1}$. This method is **recommended** for large pedigrees as it is (for large pedigrees) less time and memory consuming.
+The output depends on the method used. ''OPTION methodDiagH 1'' shows only individual id and $diag(\mathbf{H})$.  ''OPTION methodDiagH 2'' shows individual id and the values of $diag(\mathbf{H})$, $diag(\mathbf{A})$ (pedigree-based relationship) and the difference $c$ such that $diag(\mathbf{H})=diag(\mathbf{A})+c$.
+An example of the output obtained with ''OPTION methodDiagH 2'' and ''OPTION saveDiagHOrig'' has original_id, $diag(\mathbf{H})$, $diag(\mathbf{A})$, and the difference $c=diag(\mathbf{H})-diag(\mathbf{A})$, and the renumbered_id:
+<code>
+testDiagH2_mf andres$ head diagHdirect.txt.2
+45036060023                1.179829759235491  1.250322947770358 -0.070493188534867          1
+64000169880047             1.126222691080622  1.220500000000000 -0.094277308919378          2
+64000246030053             1.168573237141459  1.237528320312500 -0.068955083171041          3
+45038980011                1.189937645185136  1.251295679897070 -0.061358034711934          4
+</code>
 =====GWAS options (PostGSF90)=====
 <file>OPTION Manhattan_plot</file>
-Plot using GNUPLOT the Manhattan plot (SNP effects) for each trait and correlated effect.
+Uses GNUPLOT to plot the Manhattan plot (SNP effects) for each trait and correlated effect.
 <file>OPTION Manhattan_plot_R</file>
-Plot using R the Manhattan plot (SNP effects) for each trait and correlated effect.\\
+Uses R to plot the Manhattan plot (SNP effects) for each trait and correlated effect.\\
 ''pdf'' images are created: //manplot_St1e2.pdf//, but other formats can be specified.\\
 Note: //t1e2// corresponds to trait 1, effect 2.\\
@@ Line 440: / Line 489: @@
 Calculates the variance explained by //n// Mb window of adjacents SNPs.\\
+<file>OPTION windows_variance_type n</file>
+Sets windows type for variances calculations:\\
+  * 1: moving windows
+  * 2: exclusive windows\\
 <file>OPTION which_weight x</file>
@@ Line 445: / Line 498: @@
   * 1:  w = y^2 * (2(p(1-p)))
   * 2:  w = y^2
-  * 3:  experimental with the degree of brief
+  * 3:  experimental with the degree of belief
   * 4:  w = C**(abs(y)/sqrt(var(y's))-2) from VanRaden et al. (2009)
   * nonlinearA: same as 4
@@ Line 470: / Line 523: @@
 OPTION snp_p_value
 </file>
-Computes p-values for GWAS from elements of the inverse of the Mixed Model Equations previously obtained from blupf90. This requires quite a lot of memory and time. For details see  https://doi.org/10.1186/s12711-019-0469-3.
+Computes p-values for GWAS from elements of the inverse of the Mixed Model Equations previously obtained from blupf90. This requires quite a lot of memory and time. For details see  [[https://doi.org/10.1186/s12711-019-0469-3|Aguilar et al. (2019)]].
+<file>
+OPTION snp_var
+</file>
+Creates a file with prediction error covariance (PEC) for SNP to be used in [[http://nce.ads.uga.edu/wiki/doku.php?id=readme.predf90|PREDF90]] to compute reliability for indirect predictions. This option works when ''OPTION snp_p_value'' is used in BLUPF90+.
 =====Output files for GWAS (postGSf90)=====
@@ Line 492: / Line 550: @@
   * 1: trait
   * 2: effect
-  * 3: values of SNP effects to use in Manhattan plots
+  * 3: values of SNP effects to use in Manhattan plots  -> [abs(SNP_i)/SD(SNP)]
   * 4: SNP
   * 5: Chromosome
@@ Line 498: / Line 556: @@
 <file>chrsnp_pval</file>
-contains solutions of SNP and weights
+contains data to create plot by GNUPLOT
   * 1: trait
@@ Line 518: / Line 576: @@
 <file>windows_segment</file>
-contains information of windows segments used to get variance explainded
+contains information of windows segments used to get variance explained
   * 1: label
   * 2: window size (number of SNP)
@@ Line 551: / Line 609: @@
     'S' for solutions of SNP
     'V' for variance explained
+    'P' for p-values
   t1e2
@@ Line 582: / Line 641: @@
 <file>OPTION saveAscii</file>
-Save files intermediate matrices (GimA22i,G,Gi,etc) files as ASCII (default=binary)
+Saves files intermediate matrices (GimA22i,G,Gi,etc) files as ASCII (default=binary)
 <file>OPTION saveHinv</file>
-Save H inverse matrix in Hinv.txt\\
+Saves H inverse matrix in Hinv.txt\\
 Format: i,j,val \\
 with i,j, the index level for the additive genetic effect
 <file>OPTION saveAinv</file>
-Save A inverse matrix in Ainv.txt\\
+Saves A inverse matrix in Ainv.txt\\
 Format: i,j,val \\
 with i,j, the index level for the additive genetic effect
@@ Line 599: / Line 658: @@
 <file>OPTION saveHinvOrig</file>
-Save the H inverse matrix with original IDs
+Saves the H inverse matrix with original IDs
 <file>OPTION saveAinvOrig</file>
-Save the A inverse matrix with original IDs
+Saves the A inverse matrix with original IDs
 <file>OPTION saveDiagGOrig</file>
-Save diagonal of G matrix in DiagGOrig.txt\\
+Saves diagonal of G matrix in DiagGOrig.txt\\
 Format: id, val\\
 with id the original IDs
 <file>OPTION saveGOrig</file>
-Save G matrix in G_Orig.txt\\
+Saves G matrix in G_Orig.txt\\
 Format: id_i, id_j, val\\
 with id_i and id_j the original IDs
 <file>OPTION saveA22Orig</file>
-Save A22 matrix in A22_Orig.txt\\
+Saves A22 matrix in A22_Orig.txt\\
 Format: id_i, id_j, val\\
 with id_i and id_j the original IDs
 <file>OPTION saveGimA22iOrig </file>
-Save GimA22i matrix in GimA22i_Orig.txt\\
+Saves GimA22i matrix in GimA22i_Orig.txt\\
 Format: id_i, id_j, val\\
 with id_i and id_j the original IDs as stored in the renum file
 <file>OPTION readOrigId</file>
-Read information from renaddxx.ped file, Original ID and possibly year of birth for its use in parent-progeny conflict output.\\
+Reads information from renaddxx.ped file, Original ID, and possibly year of birth for its use in parent-progeny conflict output.\\
-Only need if not of the previous ''save*Orig'' are present.
+Only need if none of the previous ''save*Orig'' is present.
 <file>OPTION saveGimA22iRen </file>
-Save GimA22i matrix in GimA22i_Ren.txt\\
+Saves GimA22i matrix in GimA22i_Ren.txt\\
 Format: id_i, id_j, val\\
 with id_i and id_j the IDs as read from the data/pedigree file
@@ Line 636: / Line 695: @@
 <file>OPTION savePLINK</file>
-Save Genotype in ''PLINK'' format \\
+Saves Genotype in ''PLINK'' format \\
 files: toPLINK.ped and toPLINK.map
 <file>OPTION no_full_binary</file>
-Save the elements of half-matrix instead of the full matrix. It is useful to keep the compatibility with the older version of preGSf90.\\
+Saves the elements of half-matrix instead of the full matrix. It is useful to keep the compatibility with the older version of preGSf90. The newer versions save the matrix in a more efficient way, where reading the information from the binary file is not trivial (i.e., not as i, j, val anymore).\\
 =====Save and Read intermediate files=====
 <file>OPTION readGimA22i file </file>
-This option is used in analyses programs (BLUPF90,REMLF90, etc.) in order to use matrices stored in ''GimA22i'' file (default filename).\\
+This option is used in the application programs (BLUPF90,REMLF90, etc.) to use the information already stored in ''GimA22i'' file (default filename).\\
-In general methods used to create and invert matrices in such programs dont use optimized version.\\
+In general, methods used to create and invert matrices in such programs don't use an optimized version.\\
-For large number of genotyped animals run first ''PreGSf90'' and the read stored matrices in analyses programs.\\
+For a large number of genotyped animals run first ''PreGSf90'' and the read stored matrices in analyses programs.\\
-Optional //file// can be used to specify other filename or path, for example ''OPTION readGimA22i ../../pregsrun/GimA22i''.\\
+Optional //file// can be used to specify a different filename (other than ''GimA22i'') or a path, for example ''OPTION readGimA22i ../../pregsrun/GimA22i''.\\
@@ Line 673: / Line 732: @@
 =====DEPRECATED OPTIONS=====
 <file>OPTION chrinfo file </file> \\
+This is deprecated. Use instead ''OPTION map_file''.
 Read SNP map information from //file//.\\