Table of Contents
SeekParentF90
SeekParentF90
is an a program to check and assign paternity using genomic information.
Ignacio Aguilar, INIA Las Brujas, Uruguay
email: iaguilar at inia.org.uy
03/27/13 - 07/21/17
Summary
Program SeekParentF90
detect parent-offspring incompatibilities based on
counts of Mendelian conflicts, as described in: Hayes 2011 JDS
and Wiggans et al 2010 JDS
Originally was implemented as an option verify_parentage
in the PreGSf90
program.
Pedigree files not need to be in a particular order
Alphanumeric identification of individuals in the pedigree and marker files are supported, with a default length of 20 characters (see below to change it).
Depending on number of markers different thresholds are used to check conflicts or to assign a parent.
For marker files with less than 130 SNP conflicts are based on numbers of conflict (i.e. 1) and for marker files with greater number of SNP, conflicts are based on the percentage of the total number of SNP (i.e. 1%)
Usage
seekparentf90 --pedfile <pedigree_file_name> --snpfile <snp_file> [ ... ]
In order to use the program two command arguments are needed: –pedfile
and –snp_file
to provide file names for pedigree and marker files.
The pedigree file is assumed to have 3 fields with identification for animal,
sire and dam separated by at least one space.
The snp_file should contain two columns the first with the individual
identification and the second column with the marker information.
The second column should start in the same position for all rows.
For each genotyped animal in the pedigree with a genotyped parent the relationship will be checked.
A non-match is declared if number of Mendelian conflicts is greater than the threshold.
It also seek for a putative parent based on genotypes and year of birth.
Optional arguments
--only_in_list <list_file>
This option will restrict calculations for individuals present in the <list_file>.
--yob
Indicate that year of birth should be read in the 4th column.
If yob information is present, it will be used to validate a putative parent.
--seeksire <sire_file>
Indicate a list of sires that will be used to search for a parent
--seekdam <sire_file>
Indicate a list of dams that will be used to search for a parent
--seeksire_in_ped
Create a list of genotyped sires from the pedigree and use it as a list to search for a parent.
--seekdam_in_ped
Create a list of genotyped dams from the pedigree and use it as a list to search for a parent
--seektype <n>
Set the which animals will be used to search for a parent
Codes:
- 1: search only non-match parent (default)
- 2: search all genotyped individuals
--excl_thr_prob <r>
set the exclusion probability as percentage of number of SNP to check parent-progeny conflicts,
default r is 1%
--excl_thr_nb <n>
set the number of SNP to check parent-progeny conflicts,
Default n is 1
--assign_thr_prob <r>
set the exclusion probability as percentage of number of SNP to assign parent to progeny,
default r is 0.5%
--assign_thr_nb <n>
set the of number of SNP to assign parent to progeny,
Default n is 1
--thr_call_rate <cr>
set the call rate threshold to exclude samples
Default cr is 0.90
--trio
Sire and dam information will be used to check Mendelian conflicts.
This allow to use not only homozygous markers.
--alpha_size <n>
change the maximun length of characters for alphanumeric Identifications
--maxsnp <n>
Set the maximum length of string for reading marker data from file, only necessary if the number of SNP is greater than 400,000
--no_print_not_match
Exclude No-Match cases in output files for Seek parent options.
--full_log_checks
a full description of check will be provided in the output files.
--duplicate [thr]
check for duplicate samples based on a modified Hamming distance.
Optional parameter thr changes the default threshold to identify duplicate samples (default 0.9)
--find_duplicate <file> [thr]
check for duplicate samples only for individuals in the specified file across the full genotype file.
Optional parameter thr changes the default threshold to identify duplicate samples (default 0.9)
Chip and SNP information
Chips with different number of SNP can be used in the analyses.
In such case the genotype file must have the second column indicating the chip number and a map file must be provided to map SNP to chips.
Each sample in the genotype file should contain only the SNP present for that chip (see example below)
--chips <file>
This option should be used if more than one chip is used and/or to select SNP to be used in analyses.
The file indicate the name of the map file and should contains the following columns with names: SNP_ID, CHR, POS, and CHIP1, CHIP2..CHIPn.
Other columns could be present in the file.
The first line must have the column names.
If more than one chip is used then the number of SNP in common for two samples (base on both chips) will be used for check conflict and for discovering of parents.
Example
Consider a genotype file with 4 samples and 3 chips with the following number of SNP:
Chip 1: 40 SNP
Chip 2: 14 SNP
Chip 3: 20 SNP
Genotype file
1353 1 2110101100201201101101011011111121111121 8014 1 2111010151110112022111011151111210111221 516 2 2110510120 181 3 11101111122011205502
Map file
SNP_ID Chr pos chip1 chip2 chip3 SNP_1 1 135098 1 1 1 SNP_2 1 267940 2 0 2 SNP_3 1 305793 3 2 3 SNP_4 1 353745 4 0 0 SNP_5 1 393248 5 0 4 SNP_6 1 434180 6 0 5 SNP_7 1 471078 7 0 0 SNP_8 1 516404 8 3 6 SNP_9 1 533815 9 4 0 SNP_10 1 571340 10 0 7 SNP_11 1 654413 11 0 8 SNP_12 1 845494 12 5 0 SNP_13 1 883895 13 6 9 SNP_14 1 905632 14 0 10 SNP_15 1 929617 15 0 0 SNP_16 2 353763 16 7 11 SNP_17 2 393266 17 0 0 SNP_18 2 434198 18 8 12 SNP_19 2 471096 19 0 13 SNP_20 2 516422 20 0 14 SNP_21 2 533833 21 0 0 SNP_22 2 571358 22 9 15 SNP_23 2 654431 23 0 0 SNP_24 2 845512 24 0 16 SNP_25 2 883913 25 0 0 SNP_26 2 905650 26 0 0 SNP_27 2 929635 27 0 17 SNP_28 2 353781 28 10 0 SNP_29 2 393284 29 0 0 SNP_30 2 434216 30 0 0 SNP_31 3 393253 31 0 18 SNP_32 3 434185 32 0 0 SNP_33 3 471083 33 11 0 SNP_34 3 516409 34 0 0 SNP_35 3 533820 35 12 19 SNP_36 3 571345 36 0 0 SNP_37 3 654418 37 13 0 SNP_38 3 845499 38 0 20 SNP_39 3 883900 39 0 0 SNP_40 3 905637 40 14 0
Other options
--only_in_common
Select the SNP in common between all chips to be used in the analyses
--include_snp <file>
Only the list of SNP names in file will be included in the analyses
--exclude_snp <file>
The list of SNP names in file will be excluded from analyses
--include_chr <n1 n2.. n>
Only SNP in Chromosomes n1 n2, etc will be included in the analyses
--exclude_chr <n1 n2.. n>
SNP in Chromosomes n1 n2, etc will be excluded from analyses
--chr_x <n>
Indicate that the n chromosome is the X chromosome and then SNP will be excluded from check of parent-progeny conflicts and parentage discovering
Output files
Check_<pedigree_file_name>
contains the Modified pedigree file (animal, sire, dam, yob) with 0 for a non-match genotyped sire or dam
the last 2 columns indicate the result of the parentage check, for sire and dam respectively:
- “Match”
- “No-Match”
- “par_nogeno” in case if a parent does not have genotype in file
Check_Parent_Pedigree.txt
contains statistics for every pair of animal-parent checked
Animal-Sire aaaaaaa bbbbbbb 8 0.0003 1259 218 1452 .9508 .9915 .9432 Match Animal-Dam aaaaaaa ccccccc 4 0.0002 201 28 226 .9921 .9989 .9912 Match
Column format
- 1: type of check (Animal-Sire or Animal-Dam)
- 2: Id of Animal
- 3: Id of parent
- 4: Number of conflicts
- 5: Percentage of conflicts
- 6: Number of NoCall SNP for Animal sample
- 7: Number of NoCall SNP for Parent sample
- 8: Number of NoCall SNP for Animal-Parent
- 9: Call Rate for Animal sample
- 10: Call Rate for Parent sample
- 11: Call Rate for Animal-Parent
- 12: Result of parentage check
Seek_Sire.txt
contains statistics for parentage check for animal “aaaaaaaaa” with all sires
Seek for parent: aaaaaaaaa Sire ddddddddd No-Match: aaaaaaaaa xxxxxxx 1774 0.0694 114 0 114 0.9955 1.0000 0.9955 No-Match: aaaaaaaaa yyyyyyy 1961 0.0767 114 21 134 0.9955 0.9992 0.9948 Match: aaaaaaaaa bbbbbbb 6 0.0002 114 98 207 0.9955 0.9962 0.9919
Column format
- 1: Result of parentage check
- 2: Id of Animal
- 3: Id of parent
- 4: Number of conflicts
- 5: Percentage of conflicts
- 6: Number of NoCall SNP for Animal sample
- 7: Number of NoCall SNP for Parent sample
- 8: Number of NoCall SNP for Animal-Parent
- 9: Call Rate for Animal sample
- 10: Call Rate for Parent sample
- 11: Call Rate for Animal-Parent
Seek_Dam.txt
contains statistics for all dams used in the comparison
File format as Seek_Sire.txt
Assigned_<pedigree_file_name>
contains the Modified pedigree file after check of parentage and assignment of parents (animal, sire, dam, yob) with 0 for a non-match/not-found genotyped sire or dam
the last 2 columns indicate the result of the parentage check or assignment, for sire and dam respectively:
- “Match”
- “No-Found”
- “Assigned”
- “par_nogeno”: in case if a parent does not have genotype in file
- “Not-Tested” seek for this type of parent was not requested
xxxxxxx sssssss dddddd1 2013 Match Not-Tested yyyyyyy dddddd2 dddddd2 2013 Assigned Not-Tested zzzzzzz 0 dddddd3 2013 Not-Found Not-Tested