readme.pregsf90
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
readme.pregsf90 [2020/11/10 19:25] – [Input files] dani | readme.pregsf90 [2024/03/25 18:22] – external edit 127.0.0.1 | ||
---|---|---|---|
Line 39: | Line 39: | ||
* Field 2 - genotype with 0,1,2 and 5 (missing) or real values for gene content 0.12 ... | * Field 2 - genotype with 0,1,2 and 5 (missing) or real values for gene content 0.12 ... | ||
- | Fields need to be separated by at least one space and Field 2 should be fixed format i.e. all rows of genotypes | + | Fields need to be separated by at least one space and Field 2 should be fixed format i.e. all rows of genotypes |
< | < | ||
Line 48: | Line 48: | ||
</ | </ | ||
- | An utility program ([[readme.illumina2pregs|illumina2pregs]]) is available to converts Illumina FinalReport and SNP_Map.txt | + | using fractional genotypes is also possible, i.e. from imputation. In this case the genotypes must be " |
+ | |||
+ | < | ||
+ | | ||
+ | 8014 2.001.001.001.000.00 | ||
+ | | ||
+ | 1032 0.501.120.251.502.00 | ||
+ | </ | ||
+ | |||
+ | where the last individual has " | ||
+ | |||
+ | |||
+ | An utility program ([[readme.illumina2pregs|illumina2pregs]]) is available to converts Illumina FinalReport and SNP_Map.txt | ||
+ | |||
+ | Some useful options: | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
* Renumbered ID for genotypes | * Renumbered ID for genotypes | ||
Line 164: | Line 182: | ||
< | < | ||
- | | + | |
Z and to scale in D. The variable //x// can be: | Z and to scale in D. The variable //x// can be: | ||
*0: read from file '' | *0: read from file '' | ||
Line 172: | Line 190: | ||
< | < | ||
- | Specifies what frequencies are used to __scale__ **G** in **G=ZDZ**' | + | Specifies what frequencies are used to __scale__ **G** in **G=ZDZ**' |
Use this option if, for instance, want to use 0.5 for centering (using option above) but observed 2pq for scaling. | Use this option if, for instance, want to use 0.5 for centering (using option above) but observed 2pq for scaling. | ||
The variable //x// can be: | The variable //x// can be: | ||
Line 198: | Line 216: | ||
Weighting Z*= Z sqrt(D) => G = Z*Z*' = ZDZ' | Weighting Z*= Z sqrt(D) => G = Z*Z*' = ZDZ' | ||
format: one column of weights in the same order as in the genotyped file.\\ | format: one column of weights in the same order as in the genotyped file.\\ | ||
- | Weights can be extracted from output of '' | + | Weights can be extracted from the output of '' |
< | < | ||
Line 269: | Line 287: | ||
< | < | ||
Create a full log file ('' | Create a full log file ('' | ||
+ | |||
+ | < | ||
+ | Create a log file ('' | ||
< | < | ||
Line 283: | Line 304: | ||
< | < | ||
- | Chromosomes number | + | Chromosomes |
If selected this option, sex chromosomes will not be used for checking parent-progeny Mendelian conflicts, | If selected this option, sex chromosomes will not be used for checking parent-progeny Mendelian conflicts, | ||
but they will be included for all remaining processes. **If you want to remove sex chromosomes**, | but they will be included for all remaining processes. **If you want to remove sex chromosomes**, | ||
Line 294: | Line 315: | ||
< | < | ||
- | Check and remove | + | Check and provide a list of individuals with extreme high diagonals in the genomic relationship matrix.\\ |
If optional //x// is present set the threshold\\ | If optional //x// is present set the threshold\\ | ||
default value 1.6 | default value 1.6 | ||
< | < | ||
- | Check and remove | + | Check and provide a list of individuals with extreme low diagonals in the genomic relationship matrix.\\ |
If optional //x// is present set the threshold\\ | If optional //x// is present set the threshold\\ | ||
default value 0.7 | default value 0.7 | ||
- | < | + | < |
Plot the first two principal components to look for stratification in the population. | Plot the first two principal components to look for stratification in the population. | ||
+ | |||
< | < | ||
Line 322: | Line 344: | ||
< | < | ||
- | Calculate LD within chromosome and windows of SNP based on position | + | Calculate LD within chromosome and windows of SNP based on position. Optional |
- | optional | + | |
< | < | ||
Line 365: | Line 386: | ||
< | < | ||
Set the threshold to issue a warning if cor(A22,G) < //x//\\ | Set the threshold to issue a warning if cor(A22,G) < //x//\\ | ||
- | default value 0.5 | + | default value = 0.5 |
< | < | ||
Set the threshold to Stop the analysis if cor(A22,G) < //x//\\ | Set the threshold to Stop the analysis if cor(A22,G) < //x//\\ | ||
- | default | + | default |
< | < | ||
Set the threshold to calculate corr(A22,G) for only A22 >= // | Set the threshold to calculate corr(A22,G) for only A22 >= // | ||
- | default | + | default |
| | ||
Line 386: | Line 407: | ||
and to control bias.\\ | and to control bias.\\ | ||
- | The defaults | + | The default |
tau=1 alpha =0.95 beta = 0.05 gamma=0 | tau=1 alpha =0.95 beta = 0.05 gamma=0 | ||
Line 400: | Line 421: | ||
The variable '' | The variable '' | ||
* 0: no scaling | * 0: no scaling | ||
- | * 1: mean(diag(G))=1, | + | * 1: mean(diag(G))=1, |
- | * 2: mean(diag(G))=mean(diag(A22)), | + | * 2: mean(diag(G))=mean(diag(A22)), |
- | * 3: mean(G)=mean(A22) | + | * 3: mean(G)=mean(A22) |
- | * 4: rescale G using Fst adjustment. As in Powell et al. (2010) | + | * 4: rescale G using Fst adjustment. As in [[https:// |
* 9: arbitrary parameters: specify two additional numbers $a$ and $b$ in $a+b\mathbf{G}$ as '' | * 9: arbitrary parameters: specify two additional numbers $a$ and $b$ in $a+b\mathbf{G}$ as '' | ||
+ | |||
+ | =====Options to extract the diagonal of H (aka genomic improved inbreeding)===== | ||
+ | |||
+ | The diagonal of **H** contains an improved estimator of inbreeding: $\mathbf{F}_H = diag(\mathbf{H})-1$ . For genotyped animals, the diagonal of **H** is identical to the diagonal of **G** , and thus $\mathbf{F}_H=\mathbf{F}_G$ , sometimes called Genomic Inbreeding. The elements of $\mathbf{F}_H$ for non-genotyped animals include pedigree-based estimates of Genomic Inbreeding. See | ||
+ | [[https:// | ||
+ | |||
+ | To extract the diagonal of **H** one of these two '' | ||
+ | |||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | |||
+ | User can use one of two equivalent methods : | ||
+ | * 1: using '' | ||
+ | * 2: using '' | ||
+ | |||
+ | The output depends on the method used. '' | ||
+ | |||
+ | An example of the output obtained with '' | ||
+ | |||
+ | < | ||
+ | testDiagH2_mf andres$ head diagHdirect.txt.2 | ||
+ | 45036060023 | ||
+ | 64000169880047 | ||
+ | 64000246030053 | ||
+ | 45038980011 | ||
+ | </ | ||
+ | |||
=====GWAS options (PostGSF90)===== | =====GWAS options (PostGSF90)===== | ||
< | < | ||
- | Plot using GNUPLOT the Manhattan plot (SNP effects) for each trait and correlated effect. | + | Uses GNUPLOT |
< | < | ||
- | Plot using R the Manhattan plot (SNP effects) for each trait and correlated effect.\\ | + | Uses R to plot the Manhattan plot (SNP effects) for each trait and correlated effect.\\ |
'' | '' | ||
Note: //t1e2// corresponds to trait 1, effect 2.\\ | Note: //t1e2// corresponds to trait 1, effect 2.\\ | ||
Line 440: | Line 489: | ||
Calculates the variance explained by //n// Mb window of adjacents SNPs.\\ | Calculates the variance explained by //n// Mb window of adjacents SNPs.\\ | ||
+ | < | ||
+ | Sets windows type for variances calculations: | ||
+ | * 1: moving windows | ||
+ | * 2: exclusive windows\\ | ||
< | < | ||
Line 445: | Line 498: | ||
* 1: w = y^2 * (2(p(1-p))) | * 1: w = y^2 * (2(p(1-p))) | ||
* 2: w = y^2 | * 2: w = y^2 | ||
- | * 3: experimental with the degree of brief | + | * 3: experimental with the degree of belief |
* 4: w = C**(abs(y)/ | * 4: w = C**(abs(y)/ | ||
* nonlinearA: same as 4 | * nonlinearA: same as 4 | ||
Line 470: | Line 523: | ||
OPTION snp_p_value | OPTION snp_p_value | ||
</ | </ | ||
- | Computes p-values for GWAS from elements of the inverse of the Mixed Model Equations previously obtained from blupf90. This requires quite a lot of memory and time. For details see https:// | + | Computes p-values for GWAS from elements of the inverse of the Mixed Model Equations previously obtained from blupf90. This requires quite a lot of memory and time. For details see |
+ | |||
+ | < | ||
+ | OPTION snp_var | ||
+ | </ | ||
+ | Creates a file with prediction error covariance (PEC) for SNP to be used in [[http:// | ||
=====Output files for GWAS (postGSf90)===== | =====Output files for GWAS (postGSf90)===== | ||
Line 492: | Line 550: | ||
* 1: trait | * 1: trait | ||
* 2: effect | * 2: effect | ||
- | * 3: values of SNP effects to use in Manhattan plots | + | * 3: values of SNP effects to use in Manhattan plots -> [abs(SNP_i)/ |
* 4: SNP | * 4: SNP | ||
* 5: Chromosome | * 5: Chromosome | ||
Line 498: | Line 556: | ||
< | < | ||
- | contains | + | contains |
* 1: trait | * 1: trait | ||
Line 518: | Line 576: | ||
< | < | ||
- | contains information of windows segments used to get variance | + | contains information of windows segments used to get variance |
* 1: label | * 1: label | ||
* 2: window size (number of SNP) | * 2: window size (number of SNP) | ||
Line 551: | Line 609: | ||
' | ' | ||
' | ' | ||
+ | ' | ||
| | ||
t1e2 | t1e2 | ||
Line 582: | Line 641: | ||
< | < | ||
- | Save files intermediate matrices (GimA22i, | + | Saves files intermediate matrices (GimA22i, |
< | < | ||
- | Save H inverse matrix in Hinv.txt\\ | + | Saves H inverse matrix in Hinv.txt\\ |
Format: i,j,val \\ | Format: i,j,val \\ | ||
with i,j, the index level for the additive genetic effect | with i,j, the index level for the additive genetic effect | ||
< | < | ||
- | Save A inverse matrix in Ainv.txt\\ | + | Saves A inverse matrix in Ainv.txt\\ |
Format: i,j,val \\ | Format: i,j,val \\ | ||
with i,j, the index level for the additive genetic effect | with i,j, the index level for the additive genetic effect | ||
Line 599: | Line 658: | ||
< | < | ||
- | Save the H inverse matrix with original IDs | + | Saves the H inverse matrix with original IDs |
< | < | ||
- | Save the A inverse matrix with original IDs | + | Saves the A inverse matrix with original IDs |
< | < | ||
- | Save diagonal of G matrix in DiagGOrig.txt\\ | + | Saves diagonal of G matrix in DiagGOrig.txt\\ |
Format: id, val\\ | Format: id, val\\ | ||
with id the original IDs | with id the original IDs | ||
< | < | ||
- | Save G matrix in G_Orig.txt\\ | + | Saves G matrix in G_Orig.txt\\ |
Format: id_i, id_j, val\\ | Format: id_i, id_j, val\\ | ||
with id_i and id_j the original IDs | with id_i and id_j the original IDs | ||
< | < | ||
- | Save A22 matrix in A22_Orig.txt\\ | + | Saves A22 matrix in A22_Orig.txt\\ |
Format: id_i, id_j, val\\ | Format: id_i, id_j, val\\ | ||
with id_i and id_j the original IDs | with id_i and id_j the original IDs | ||
< | < | ||
- | Save GimA22i matrix in GimA22i_Orig.txt\\ | + | Saves GimA22i matrix in GimA22i_Orig.txt\\ |
Format: id_i, id_j, val\\ | Format: id_i, id_j, val\\ | ||
with id_i and id_j the original IDs as stored in the renum file | with id_i and id_j the original IDs as stored in the renum file | ||
< | < | ||
- | Read information from renaddxx.ped file, Original ID and possibly year of birth for its use in parent-progeny conflict output.\\ | + | Reads information from renaddxx.ped file, Original ID, and possibly year of birth for its use in parent-progeny conflict output.\\ |
- | Only need if not of the previous '' | + | Only need if none of the previous '' |
< | < | ||
- | Save GimA22i matrix in GimA22i_Ren.txt\\ | + | Saves GimA22i matrix in GimA22i_Ren.txt\\ |
Format: id_i, id_j, val\\ | Format: id_i, id_j, val\\ | ||
with id_i and id_j the IDs as read from the data/ | with id_i and id_j the IDs as read from the data/ | ||
Line 636: | Line 695: | ||
< | < | ||
- | Save Genotype in '' | + | Saves Genotype in '' |
files: toPLINK.ped and toPLINK.map | files: toPLINK.ped and toPLINK.map | ||
< | < | ||
- | Save the elements of half-matrix instead of the full matrix. It is useful to keep the compatibility with the older version of preGSf90.\\ | + | Saves the elements of half-matrix instead of the full matrix. It is useful to keep the compatibility with the older version of preGSf90. The newer versions save the matrix in a more efficient way, where reading the information from the binary file is not trivial (i.e., not as i, j, val anymore).\\ |
=====Save and Read intermediate files===== | =====Save and Read intermediate files===== | ||
< | < | ||
- | This option is used in analyses | + | This option is used in the application |
- | In general methods used to create and invert matrices in such programs | + | In general, methods used to create and invert matrices in such programs |
- | For large number of genotyped animals run first '' | + | For a large number of genotyped animals run first '' |
- | Optional //file// can be used to specify | + | Optional //file// can be used to specify |
Line 673: | Line 732: | ||
=====DEPRECATED OPTIONS===== | =====DEPRECATED OPTIONS===== | ||
< | < | ||
+ | |||
+ | This is deprecated. Use instead '' | ||
+ | |||
Read SNP map information from //file//.\\ | Read SNP map information from //file//.\\ | ||
readme.pregsf90.txt · Last modified: 2024/05/16 15:11 by dani