SeekParentF90

SeekParentF90 is an a program to check and assign paternity using genomic information.

Ignacio Aguilar, INIA Las Brujas, Uruguay
email: iaguilar at inia.org.uy

03/27/13 - 07/21/17

Summary

Program SeekParentF90 detect parent-offspring incompatibilities based on counts of Mendelian conflicts, as described in: Hayes 2011 JDS and Wiggans et al 2010 JDS

Originally was implemented as an option verify_parentage in the PreGSf90 program.

Pedigree files not need to be in a particular order

Alphanumeric identification of individuals in the pedigree and marker files are supported, with a default length of 20 characters (see below to change it).

Depending on number of markers different thresholds are used to check conflicts or to assign a parent.

For marker files with less than 130 SNP conflicts are based on numbers of conflict (i.e. 1) and for marker files with greater number of SNP, conflicts are based on the percentage of the total number of SNP (i.e. 1%)

Usage

seekparentf90 --pedfile <pedigree_file_name> --snpfile <snp_file> [ ... ]

In order to use the program two command arguments are needed: –pedfile and –snp_file to provide file names for pedigree and marker files.
The pedigree file is assumed to have 3 fields with identification for animal, sire and dam separated by at least one space.

The snp_file should contain two columns the first with the individual identification and the second column with the marker information.
The second column should start in the same position for all rows.

For each genotyped animal in the pedigree with a genotyped parent the relationship will be checked.

A non-match is declared if number of Mendelian conflicts is greater than the threshold.

It also seek for a putative parent based on genotypes and year of birth.

Optional arguments

--only_in_list <list_file>

This option will restrict calculations for individuals present in the <list_file>.

--yob

Indicate that year of birth should be read in the 4th column.
If yob information is present, it will be used to validate a putative parent.

--seeksire <sire_file>

Indicate a list of sires that will be used to search for a parent

--seekdam <sire_file>

Indicate a list of dams that will be used to search for a parent

--seeksire_in_ped

Create a list of genotyped sires from the pedigree and use it as a list to search for a parent.

--seekdam_in_ped

Create a list of genotyped dams from the pedigree and use it as a list to search for a parent

--seektype <n>

Set the which animals will be used to search for a parent

Codes:

• 1: search only non-match parent (default)
• 2: search all genotyped individuals
--excl_thr_prob <r>

set the exclusion probability as percentage of number of SNP to check parent-progeny conflicts,
default r is 1%

--excl_thr_nb <n>

set the number of SNP to check parent-progeny conflicts,
Default n is 1

--assign_thr_prob <r>

set the exclusion probability as percentage of number of SNP to assign parent to progeny,
default r is 0.5%

--assign_thr_nb <n>

set the of number of SNP to assign parent to progeny,
Default n is 1

--thr_call_rate <cr>

set the call rate threshold to exclude samples
Default cr is 0.90

--trio

Sire and dam information will be used to check Mendelian conflicts.
This allow to use not only homozygous markers.

--alpha_size <n>

change the maximun length of characters for alphanumeric Identifications

--maxsnp <n>

Set the maximum length of string for reading marker data from file, only necessary if the number of SNP is greater than 400,000

--no_print_not_match

Exclude No-Match cases in output files for Seek parent options.

--full_log_checks

a full description of check will be provided in the output files.

--duplicate [thr]

check for duplicate samples based on a modified Hamming distance.
Optional parameter thr changes the default threshold to identify duplicate samples (default 0.9)

--find_duplicate <file> [thr]

check for duplicate samples only for individuals in the specified file across the full genotype file.
Optional parameter thr changes the default threshold to identify duplicate samples (default 0.9)

Chip and SNP information

Chips with different number of SNP can be used in the analyses.
In such case the genotype file must have the second column indicating the chip number and a map file must be provided to map SNP to chips.
Each sample in the genotype file should contain only the SNP present for that chip (see example below)

--chips <file>

This option should be used if more than one chip is used and/or to select SNP to be used in analyses.
The file indicate the name of the map file and should contains the following columns with names: SNP_ID, CHR, POS, and CHIP1, CHIP2..CHIPn.
Other columns could be present in the file.
The first line must have the column names.

If more than one chip is used then the number of SNP in common for two samples (base on both chips) will be used for check conflict and for discovering of parents.

Example

Consider a genotype file with 4 samples and 3 chips with the following number of SNP:
Chip 1: 40 SNP
Chip 2: 14 SNP
Chip 3: 20 SNP

Genotype file

 1353 1  2110101100201201101101011011111121111121
8014 1  2111010151110112022111011151111210111221
516  2  2110510120
181  3  11101111122011205502

Map file

SNP_ID	Chr	pos	chip1	chip2	chip3
SNP_1	1	135098	1	1	1
SNP_2	1	267940	2	0	2
SNP_3	1	305793	3	2	3
SNP_4	1	353745	4	0	0
SNP_5	1	393248	5	0	4
SNP_6	1	434180	6	0	5
SNP_7	1	471078	7	0	0
SNP_8	1	516404	8	3	6
SNP_9	1	533815	9	4	0
SNP_10	1	571340	10	0	7
SNP_11	1	654413	11	0	8
SNP_12	1	845494	12	5	0
SNP_13	1	883895	13	6	9
SNP_14	1	905632	14	0	10
SNP_15	1	929617	15	0	0
SNP_16	2	353763	16	7	11
SNP_17	2	393266	17	0	0
SNP_18	2	434198	18	8	12
SNP_19	2	471096	19	0	13
SNP_20	2	516422	20	0	14
SNP_21	2	533833	21	0	0
SNP_22	2	571358	22	9	15
SNP_23	2	654431	23	0	0
SNP_24	2	845512	24	0	16
SNP_25	2	883913	25	0	0
SNP_26	2	905650	26	0	0
SNP_27	2	929635	27	0	17
SNP_28	2	353781	28	10	0
SNP_29	2	393284	29	0	0
SNP_30	2	434216	30	0	0
SNP_31	3	393253	31	0	18
SNP_32	3	434185	32	0	0
SNP_33	3	471083	33	11	0
SNP_34	3	516409	34	0	0
SNP_35	3	533820	35	12	19
SNP_36	3	571345	36	0	0
SNP_37	3	654418	37	13	0
SNP_38	3	845499	38	0	20
SNP_39	3	883900	39	0	0
SNP_40	3	905637	40	14	0

Other options

--only_in_common

Select the SNP in common between all chips to be used in the analyses

--include_snp <file>

Only the list of SNP names in file will be included in the analyses

--exclude_snp <file>

The list of SNP names in file will be excluded from analyses

--include_chr <n1 n2.. n>

Only SNP in Chromosomes n1 n2, etc will be included in the analyses

--exclude_chr <n1 n2.. n>

SNP in Chromosomes n1 n2, etc will be excluded from analyses

--chr_x <n>

Indicate that the n chromosome is the X chromosome and then SNP will be excluded from check of parent-progeny conflicts and parentage discovering

Output files

Check_<pedigree_file_name>

contains the Modified pedigree file (animal, sire, dam, yob) with 0 for a non-match genotyped sire or dam the last 2 columns indicate the result of the parentage check, for sire and dam respectively:
- “Match”
- “No-Match”
- “par_nogeno” in case if a parent does not have genotype in file

Check_Parent_Pedigree.txt

contains statistics for every pair of animal-parent checked

   Animal-Sire              aaaaaaa              bbbbbbb          8    0.0003  1259   218  1452 .9508 .9915 .9432 Match
Animal-Dam               aaaaaaa              ccccccc          4    0.0002   201    28   226 .9921 .9989 .9912 Match 

Column format

• 1: type of check (Animal-Sire or Animal-Dam)
• 2: Id of Animal
• 3: Id of parent
• 4: Number of conflicts
• 5: Percentage of conflicts
• 6: Number of NoCall SNP for Animal sample
• 7: Number of NoCall SNP for Parent sample
• 8: Number of NoCall SNP for Animal-Parent
• 9: Call Rate for Animal sample
• 10: Call Rate for Parent sample
• 11: Call Rate for Animal-Parent
• 12: Result of parentage check
Seek_Sire.txt

contains statistics for parentage check for animal “aaaaaaaaa” with all sires

Seek for parent: aaaaaaaaa              Sire ddddddddd
No-Match: aaaaaaaaa              xxxxxxx              1774    0.0694   114     0   114    0.9955    1.0000    0.9955
No-Match: aaaaaaaaa              yyyyyyy              1961    0.0767   114    21   134    0.9955    0.9992    0.9948
Match:    aaaaaaaaa              bbbbbbb              6    0.0002   114    98   207    0.9955    0.9962    0.9919

Column format

• 1: Result of parentage check
• 2: Id of Animal
• 3: Id of parent
• 4: Number of conflicts
• 5: Percentage of conflicts
• 6: Number of NoCall SNP for Animal sample
• 7: Number of NoCall SNP for Parent sample
• 8: Number of NoCall SNP for Animal-Parent
• 9: Call Rate for Animal sample
• 10: Call Rate for Parent sample
• 11: Call Rate for Animal-Parent
Seek_Dam.txt

contains statistics for all dams used in the comparison
File format as Seek_Sire.txt

Assigned_<pedigree_file_name>

contains the Modified pedigree file after check of parentage and assignment of parents (animal, sire, dam, yob) with 0 for a non-match/not-found genotyped sire or dam
the last 2 columns indicate the result of the parentage check or assignment, for sire and dam respectively:

1. “Match”
2. “No-Found”
3. “Assigned”
4. “par_nogeno”: in case if a parent does not have genotype in file
5. “Not-Tested” seek for this type of parent was not requested
xxxxxxx              sssssss              dddddd1                    2013 Match           Not-Tested
yyyyyyy              dddddd2              dddddd2                    2013 Assigned        Not-Tested
zzzzzzz              0                    dddddd3                    2013 Not-Found       Not-Tested