Table of Contents

SeekParentF90

SeekParentF90 is an a program to check and assign paternity using genomic information.

Ignacio Aguilar, INIA Las Brujas, Uruguay
email: iaguilar at inia.org.uy

03/27/13 - 07/21/17

Summary

Program SeekParentF90 detect parent-offspring incompatibilities based on counts of Mendelian conflicts, as described in: Hayes 2011 JDS and Wiggans et al 2010 JDS

Originally was implemented as an option verify_parentage in the PreGSf90 program.

Pedigree files not need to be in a particular order

Alphanumeric identification of individuals in the pedigree and marker files are supported, with a default length of 20 characters (see below to change it).

Depending on number of markers different thresholds are used to check conflicts or to assign a parent.

For marker files with less than 130 SNP conflicts are based on numbers of conflict (i.e. 1) and for marker files with greater number of SNP, conflicts are based on the percentage of the total number of SNP (i.e. 1%)

Usage

seekparentf90 --pedfile <pedigree_file_name> --snpfile <snp_file> [ ... ]

In order to use the program two command arguments are needed: –pedfile and –snp_file to provide file names for pedigree and marker files.
The pedigree file is assumed to have 3 fields with identification for animal, sire and dam separated by at least one space.

The snp_file should contain two columns the first with the individual identification and the second column with the marker information.
The second column should start in the same position for all rows.

For each genotyped animal in the pedigree with a genotyped parent the relationship will be checked.

A non-match is declared if number of Mendelian conflicts is greater than the threshold.

It also seek for a putative parent based on genotypes and year of birth.

Optional arguments

--only_in_list <list_file>

This option will restrict calculations for individuals present in the <list_file>.

--yob

Indicate that year of birth should be read in the 4th column.
If yob information is present, it will be used to validate a putative parent.

--seeksire <sire_file>

Indicate a list of sires that will be used to search for a parent

--seekdam <sire_file>

Indicate a list of dams that will be used to search for a parent

--seeksire_in_ped

Create a list of genotyped sires from the pedigree and use it as a list to search for a parent.

--seekdam_in_ped

Create a list of genotyped dams from the pedigree and use it as a list to search for a parent

--seektype <n>

Set the which animals will be used to search for a parent

Codes:

--excl_thr_prob <r>

set the exclusion probability as percentage of number of SNP to check parent-progeny conflicts,
default r is 1%

--excl_thr_nb <n>

set the number of SNP to check parent-progeny conflicts,
Default n is 1

--assign_thr_prob <r>

set the exclusion probability as percentage of number of SNP to assign parent to progeny,
default r is 0.5%

--assign_thr_nb <n>

set the of number of SNP to assign parent to progeny,
Default n is 1

--thr_call_rate <cr>

set the call rate threshold to exclude samples
Default cr is 0.90

--trio

Sire and dam information will be used to check Mendelian conflicts.
This allow to use not only homozygous markers.

--alpha_size <n>

change the maximun length of characters for alphanumeric Identifications

--maxsnp <n>

Set the maximum length of string for reading marker data from file, only necessary if the number of SNP is greater than 400,000

--no_print_not_match 

Exclude No-Match cases in output files for Seek parent options.

--full_log_checks

a full description of check will be provided in the output files.

--duplicate [thr]

check for duplicate samples based on a modified Hamming distance.
Optional parameter thr changes the default threshold to identify duplicate samples (default 0.9)

--find_duplicate <file> [thr]

check for duplicate samples only for individuals in the specified file across the full genotype file.
Optional parameter thr changes the default threshold to identify duplicate samples (default 0.9)

Chip and SNP information

Chips with different number of SNP can be used in the analyses.
In such case the genotype file must have the second column indicating the chip number and a map file must be provided to map SNP to chips.
Each sample in the genotype file should contain only the SNP present for that chip (see example below)

--chips <file>

This option should be used if more than one chip is used and/or to select SNP to be used in analyses.
The file indicate the name of the map file and should contains the following columns with names: SNP_ID, CHR, POS, and CHIP1, CHIP2..CHIPn.
Other columns could be present in the file.
The first line must have the column names.

If more than one chip is used then the number of SNP in common for two samples (base on both chips) will be used for check conflict and for discovering of parents.

Example

Consider a genotype file with 4 samples and 3 chips with the following number of SNP:
Chip 1: 40 SNP
Chip 2: 14 SNP
Chip 3: 20 SNP

Genotype file

 1353 1  2110101100201201101101011011111121111121
 8014 1  2111010151110112022111011151111210111221
 516  2  2110510120
 181  3  11101111122011205502

Map file

SNP_ID  Chr     pos     chip1   chip2   chip3
SNP_1   1       135098  1       1       1
SNP_2   1       267940  2       0       2
SNP_3   1       305793  3       2       3
SNP_4   1       353745  4       0       0
SNP_5   1       393248  5       0       4
SNP_6   1       434180  6       0       5
SNP_7   1       471078  7       0       0
SNP_8   1       516404  8       3       6
SNP_9   1       533815  9       4       0
SNP_10  1       571340  10      0       7
SNP_11  1       654413  11      0       8
SNP_12  1       845494  12      5       0
SNP_13  1       883895  13      6       9
SNP_14  1       905632  14      0       10
SNP_15  1       929617  15      0       0
SNP_16  2       353763  16      7       11
SNP_17  2       393266  17      0       0
SNP_18  2       434198  18      8       12
SNP_19  2       471096  19      0       13
SNP_20  2       516422  20      0       14
SNP_21  2       533833  21      0       0
SNP_22  2       571358  22      9       15
SNP_23  2       654431  23      0       0
SNP_24  2       845512  24      0       16
SNP_25  2       883913  25      0       0
SNP_26  2       905650  26      0       0
SNP_27  2       929635  27      0       17
SNP_28  2       353781  28      10      0
SNP_29  2       393284  29      0       0
SNP_30  2       434216  30      0       0
SNP_31  3       393253  31      0       18
SNP_32  3       434185  32      0       0
SNP_33  3       471083  33      11      0
SNP_34  3       516409  34      0       0
SNP_35  3       533820  35      12      19
SNP_36  3       571345  36      0       0
SNP_37  3       654418  37      13      0
SNP_38  3       845499  38      0       20
SNP_39  3       883900  39      0       0
SNP_40  3       905637  40      14      0

Other options

--only_in_common

Select the SNP in common between all chips to be used in the analyses

--include_snp <file>

Only the list of SNP names in file will be included in the analyses

--exclude_snp <file> 

The list of SNP names in file will be excluded from analyses

--include_chr <n1 n2.. n>

Only SNP in Chromosomes n1 n2, etc will be included in the analyses

--exclude_chr <n1 n2.. n>

SNP in Chromosomes n1 n2, etc will be excluded from analyses

--chr_x <n>

Indicate that the n chromosome is the X chromosome and then SNP will be excluded from check of parent-progeny conflicts and parentage discovering

Output files

Check_<pedigree_file_name>

contains the Modified pedigree file (animal, sire, dam, yob) with 0 for a non-match genotyped sire or dam the last 2 columns indicate the result of the parentage check, for sire and dam respectively:
- “Match”
- “No-Match”
- “par_nogeno” in case if a parent does not have genotype in file

Check_Parent_Pedigree.txt

contains statistics for every pair of animal-parent checked

   Animal-Sire              aaaaaaa              bbbbbbb          8    0.0003  1259   218  1452 .9508 .9915 .9432 Match     
   Animal-Dam               aaaaaaa              ccccccc          4    0.0002   201    28   226 .9921 .9989 .9912 Match 

Column format

Seek_Sire.txt

contains statistics for parentage check for animal “aaaaaaaaa” with all sires

Seek for parent: aaaaaaaaa              Sire ddddddddd           
    No-Match: aaaaaaaaa              xxxxxxx              1774    0.0694   114     0   114    0.9955    1.0000    0.9955
    No-Match: aaaaaaaaa              yyyyyyy              1961    0.0767   114    21   134    0.9955    0.9992    0.9948
    Match:    aaaaaaaaa              bbbbbbb              6    0.0002   114    98   207    0.9955    0.9962    0.9919

Column format

Seek_Dam.txt

contains statistics for all dams used in the comparison
File format as Seek_Sire.txt

Assigned_<pedigree_file_name>

contains the Modified pedigree file after check of parentage and assignment of parents (animal, sire, dam, yob) with 0 for a non-match/not-found genotyped sire or dam
the last 2 columns indicate the result of the parentage check or assignment, for sire and dam respectively:

  1. “Match”
  2. “No-Found”
  3. “Assigned”
  4. “par_nogeno”: in case if a parent does not have genotype in file
  5. “Not-Tested” seek for this type of parent was not requested
xxxxxxx              sssssss              dddddd1                    2013 Match           Not-Tested     
yyyyyyy              dddddd2              dddddd2                    2013 Assigned        Not-Tested        
zzzzzzz              0                    dddddd3                    2013 Not-Found       Not-Tested