RENUMF90

A renumbering program for the BLUPF90 family now works with SNP info
Ignacy Misztal and Ignacio Aguilar, University of Georgia
August 27, 2001 - Mar 17, 2011

Summary

RENUMF90 is a renumbering program for the BLUPF90 family of programs. It supports multiple traits, different effects per trait, alphanumeric and numeric fields. The program provides data statistics, performs comprehensive pedigree checking, and supports unknown parent groups etc.

It accepts files where fields in data and pedigree files are separated by spaces. The program is still in active development so errors are possible and some features may not work or work incorrectly.

Warnings

input files cannot contain character #.
missing animals have code 0; 00 may be treated as a known animal

Hint: type renumf90 --show-template to have a template parameter file.

Structure of parameter file

The parameter file contains keywords in capital followed by specifications of a given effect/data item.
The keywords need to be typed exactly.
Specific keywords need to occur sequentially, as shown below.

Bugs

IDs starting with “-” may not work

Fields in the parameter file

# Parameter file for renumf90. It is translated into a parameter file for the BLUPF90 family of programs.

Lines with # are treated as comments

DATAFILE
f1

The data file is f1

SKIP_HEADER
n

This is optional. It skips the first n lines as header in the data file.

TRAITS
t1 t2 .. tn

t1-tn are positions of traits in datafile; n defines the number of traits

FIELDS_PASSED TO OUTPUT
p1 p2 .. pm

fields p1-pn are passed to output without changes; can be empty

WEIGHT(S)
w1 [w2 w3...wn]

w1 [w2 w3…wn] are position of weight(s) if present; can be empty (which means weight of 1 or no weight). Either there is a single weight, the same for all traits, or one weight per trait in which case ntrait positions are needed in this line. See blupf90+ section on WEIGHTS for details.

RESIDUAL_VARIANCE
r

r is matrix of residual (co)variances of size n x n

EFFECT
e1.. en type form

this line defines one group of effects; e1 .. en are positions of this effect for all traits;

positions can be different for each trait for fixed effects;

for random effects, only one position + 0 (missing) efefct are possible.

type is 'cross' for crossclassified or 'cov' for covariables
for crossclassified effects: form is 'alpha' for alphanumeric or 'numer' for numeric
for covariables: neither 'alpha' nor 'numer' are needed; don't put anything after 'cov'

NESTED
d1 .. dn form

optional for covariables only, specifies nesting;

form is as above

RANDOM
rtype

the RANDOM keyword occurs only if the current effect is random; rtype is:

'diagonal'
'sire'; not yet implemented
'animal'

OPTIONAL
o1 o2.. oq

causes extra effects appended to the animal effect;

current options include:

'pe' for permanent environment
'mat' for maternal
'mpe' for maternal permanent environment; only if 'mat' is used

FILE
fped

for animal and sire model only

fped specifies the pedigree file

SKIP_HEADER
n

This is optional. It skips the first n lines as header in the pedigree file.

FILE_POS
an s d alt_dam yob

for animal effect only;

specifies positions in the pedigree file of animal (an), sire (s), dam (d), alternate dam (alt_dam) , and year of birth (yob)

missing alt_dam or yob can be replaced by 0

if this line is not given, defaults are 1 2 3 0 0.

If maternal effect is specified, the maternal effect is due to position of d if alt_dam field is 0, or otherwise is due to alt_dam;

If alt_dam field is not zero, it should include ID of real or recipient dam.

SNP_FILE
fsnp

optional;
fsnp specifies files with ID and SNP information;

if present, the relationship matrix will be constructed as in Aguilar et al. (2010) and will include the genomic information;

file fsnp should start with ID with the same format as fped and SNP info needs to start from a fixed column and include digits 0, 1, 2 and 5;

ID and SNP info need to be separated by at leats one space; see info for program PreGSf90

PED_DEPTH
p

optional

for animal effect only;

p specifies the depth of pedigree search;

the default is 3

all pedigrees are loaded if p=0. This is the fastest as it reads the pedigree file only once. However, if you want to extract the informative animals (genotyped and phenotyped animals + their ancestors traced back) put a large number like 100. With p=0, RENUMF90 tries to include all animals found in the raw pedigree file even if the animals in the pedigree are not related to the animals with phenotype or genotype. Thus, p=0 is not recommended unless your pedigree file is already prepared and consists in the informative animals or the animals of interest.

GEN_INT
min avg max

optional

specifies minimum, average and maximum generation interval

applicable only if year of birth present; minimum and maximum used for pedigree checks

average used to predict year of birth of parent with missing pedigree.

REC_SEX
i

optional
if only one sex has records, specifies which parent it is

used for pedigree checks.

UPG_TYPE
t

optional

if t is 'yob', the assignment is based on year of birth; the subsequent line should contain list of years to separate different UPG;

if t is 'in_pedigrees', the value of a missing parent should be -x, where x is UPG number that this missing parent should be allocated to; in this option, all known parents should have pedigree lines, i.e., each parent field should contain either the ID of a real parent, or a negative UPG number.

if t is 'internal', allocation is by a user-written function custom_upg(year_of_birth,sex,ID, parent_code); not yet implemented.

There are the other values for t: 'group', 'group_unisex', and 'group_sex'. See below for details.

INBREEDING
inb_type

optional
use of inbreeding coefficients to compute inb/upg code in the 4th column of the output pedigree file. Inbreeding calculation is a default in RENUMF90 ≥ v1.157, even if this keyword is not used.

inb_type could be:

'pedigree' - the program computes inbreeding coefficients with Meuwissen and Luo (1992) using the pedigree to be saved in renaddxx.ped; calculated inbreeding coefficients will be saved in a file “renf90.inb” with the original ID
'file' - the program reads inbreeding coefficients from an external file. You should put the filename after 'file' e.g. 'file inbreeding.txt'. For instance a file “renf90.inb” (see above) from a previous run can be used. The file has at least 2 columns: original_ID and inbreeding value (from 0.0 to 1.0). The program just skips unnecessary IDs
'self x' - Calculates inbreeding with selfing, where x is the column in the pedigree file with the number of selfing generation
'no-inbreeding' - turn inbreeding calculation off in RENUMF90 ≥ v1.157

FIXED_REGRESSION 
r_type

It is the same as RANDOM_REGRESSION (see the explanation below) but it is effective only for fixed effects.

RANDOM_REGRESSION 
r_type

Specifies that random regressions should be applied to the animal and corresponding random effects (mat, pe and mpe) or the diagonal random effect.

this keyword also could be applied to set covariables for fixed effects;

r_type could be:

'data' if covariables for random regressions are in the data
“legendre' if legendre polynomials are to be generated from a single data variable; fully implemented now

RR_POSITION
r1 .. rq

for random regressions,

r1-rq specifies positions of covariables if r_type='data'
r1 is order of legendre polynomial and r2 is position of covariable if r_type='legendre'

(CO)VARIANCES
g

g are (co)variances for the animal effect

the dimensions of g should account for random correlated effect if present (maternal or random regression)

(CO)VARIANCES_PE
gpe

gpe are (co)variances for the PE effect if present

(CO)VARIANCES_MPE
gmpe

gmpe are (co)variances for the MPE effect if present

User-defined UPG code

See a separate documentation for details.

The program accepts one of the following keywords in UPG_TYPE: group, group_unisex, and group_sex. With one of these options, the program looks at a particular column in the pedigree file as a group code and use it for assigning the UPG code. If an animal has a missing parent, the program assigns a UPG code based on the group code.

group_unisex: The program assigns a UPG code to the unknown parent regardless of the parent's sex.
group: The program assigns a separate UPG code to the unknown sire and dam.
group_sex: The user can specify a sex-specific UPG in the original pedigree file.

For group_unisex and group, the column in the pedigree file is specified with the 6th item in FILE_POS. The following example tells the program the 5th column in the pedigree file as the group code. The group code will be treated as characters.

  FILE_POS
  1 2 3 0 0 5

For group_sex, you need two additional columns in the pedigree file: one is for an unknown sire, and the other is for an unknown dam. For example, assume the 5th column is for unknown sire, and the 6th column is for unknown dam, the FILE_POS entry has 7 items.

  FILE_POS
  1 2 3 0 0 5 6

The program now accept 3, 5, 6, or 7 items in FILE_POS.

Extra comments

Sections starting from EFFECTS can be repeated any number of types.

If (Co)variances for any effect are missing, they are substituted with matrices containing 1.0 on diagonals and 0.1 on off-diagonals.

Warning: for variance estimation by EM REML,usually there is improved convergence rate if the starting values for (co)variances are too large than too small.

The sequence of keywords should be as above although optional fields can be skipped.

Keywords out of order may not be recognized.

Options

The following options can be added at the end of the parameter file to redefine parameters used to read the input file:

- the default size of character fields (default = 20)

OPTION alpha_size nn

where nn is the new size.

- the size of the record length (default = 800)

OPTION max_string_readline nn

where nn is the new size.

- the maximum number of fields (default = 100)

OPTION max_field_readline nn

where nn is the number of fields.

OPTION missing x

allows indicating that the missing value is the number x (e.g., 999), for instance, if 0 is a valid record. This is only to represent the missing value in the data. If there are covariables in the data, 0 is treated as a value, not missing information. Missing pedigree is always 0 and cannot be changed to another value.

OPTION remove_all_missing

removes lines in the data where phenotypes are missing. Keeping those lines may cause unexpected behavior in some programs.

OPTION missing_in_weights

in addition, this indicates that if a weight for the trait is 0, then the value of the trait is converted to “missing” in the output file renf90.dat, i.e. 0 by default or another value is set if OPTION missing is used.

OPTION no_basic_statistics

avoids the computation of basic statistics (min, max, correlations, …), which take a certain time for very large data file.

OPTION inbreeding_method m

allows choosing a method for inbreeding calculation. The inbreeding coefficients are used later (in the other programs) to set up the coefficients for the A-inverse. Acceptable values for m are:

1: Meuwissen and Luo (1992)
2: Modified Meuwissen & Luo by Sargolzaei & Iwaisaki (2004)
3: Modified Colleau by Sargolzaei et al. (2005) 
4: recursive tabular method 
5: method of Tier (1990)
6: Hybrid parallel computing, which is basically a parallel (OMP) version of Meuwissen and Luo (1992)
7: Recursive tabular with self-breeding generations. For populations with selfing, i.e., wheat

The default is method 1. Large speed-ups are made using method 6, but this requires using several threads (e.g., using OMP_NUM_THREADS=4)

OPTION ped_search complete

By default, renumf90 traces back the ancestors looping through the pedigree file. This can lead to inconsistencies when animals are added to the pedigree, i.e. some ancestors are skipped. Option OPTION ped_search complete traces back the ancestors in a way that it is “complete” and avoids these problems.

The end of the parameter file for RENUMF90 can contain many lines beginning with OPTION.

All of these lines are passed to the parameter file renf90.par to be used by application programs.

Combining fields or interactions

Several fields in the data file can be combined into one using a COMBINE keyword.

COMBINE a b c ....

catenates b c … into a.
Keyword COMBINE needs to be on top of the parameter file, but possibly after comments.

There may be many combined fields. For example:

COMBINE 7 2 3 4

combines content of fields 2 3 4 into field 7;

the data file is not changed, only the program treats field 7 as fields 2 3 4 put together (without spaces).

The combined fields can be treated as “numeric”, if they are composed of numbers and if their total length is <9. Otherwise, they need to be used as “alpha”.

Please note that the maximum size of the combined variable is limited by the largest size of the “alpha” field.

Additive Pedigree File

The additive pedigree file(s) renadd* has the following structure:

 1) animal number (from 1)                                        
 2) parent 1 number or unknown parent group number for parent 1   
 3) parent 2 number or unknown parent group number for parent 2   
 4) 3 minus number of known parents or inbreeding code if inbreeding is used (inbreeding is default now)                              
 5) known or estimated year of birth (0 if not provided)          
 6) number of known parents (parents might be eliminated if not contributing;
    if animal has genotype 10+number of know parents                                                  
 7) number of records             
 8) number of progeny (before elimination due to other effects) as parent 1
 9) number of progeny (before elimination due to other effects) as parent 2  
10) original animal id

Extensions

The program is being modified to support inbreeding, dominance, random regressions with automatic calculations of Legendre polynomials,…

Example

data file - data.test

1 aa 34.5 11 12 zz
3 bb 21.333 22 23 xx
8 cc 23.666 33 34 yy
1 dd 29 44 45 xx 
3 aa  30 55 56 yy
5 bb 1234567.890 66 67 zz

pedigree file - test.ped

qq 0 0
aa 0 0
bb qq aa
cc qq 0
dd 0 aa

parameter file - testpar1

# Parameter file for program renf90; it is translated to parameter
# file for BLUPF90 family programs.
DATAFILE
data.test
TRAITS
3 4
FIELDS_PASSED TO OUTPUT
2 1 # passing alphanumeric
WEIGHT(S)

RESIDUAL_VARIANCE
5 2
2 4
EFFECT
1 1 cross alpha
EFFECT
2 2 cross alpha
RANDOM
animal
OPTIONAL
mat mpe pe
FILE
test.ped
(CO)VARIANCES
10 3 2 1
3 11 4 5
2 4 12 6
1 5 6 13.01
(CO)VARIANCES_PE
5.3 2.1
2.1 4.85
(CO)VARIANCES_MPE
1.03 .27
.27 .85
EFFECT
5 0 cov
NESTED
 1 0 alpha
EFFECT
6 6 cross alpha
RANDOM
diagonal

printout

(temporary; the amount of details may change)

 RENUMF90 version 1.93
 name of parameter file? testpar1                                
 datafile:data.test                                         
 traits:           3           4
 fields passed:           2           1
 R
   5.000       2.000    
   2.000       4.000    

 Processing effect  1 of type cross     
 item_kind=alpha     

 Processing effect  2 of type cross     
 item_kind=alpha     
 Optional maternal effect
 Optional maternal permanent environment
 Optional permanent environment
 pedigree file name  "test.ped"
 positions of animal, sire, dam, alternate dam and yob    1    2    3    0    0
 Reading (CO)VARIANCES:           4 x           4
 Reading (CO)VARIANCES_PE:           2 x           2
 Reading (CO)VARIANCES_MPE:           2 x           2

 Processing effect  3 of type cov       
 item_kind=alpha     

 Processing effect  4 of type cross     
 item_kind=alpha     

 Maximum size of character fields: 20

 Maximum size of record (max_string_readline): 800

 Maximum number of fields innput file (max_field_readline): 100

 hash tables for effects set up
 read            6  records
 table with            4  elements sorted
 added count
 Effect group            1  of column            1  with            4  levels
 table expanded from        10000  to        10000  records
 added count
 Effect group            2  of column            1  with            4  levels
 table with            4  elements sorted
 added count
 Effect group            3  of column            1  with            4  levels
 table expanded from        10000  to        10000  records
 table with            3  elements sorted
 added count
 Effect group            4  of column            1  with            3  levels
 table expanded from        10000  to        10000  records
 wrote statistics in file "renf90.tables"

 Basic statistics for input data  (missing value code is 0)
 Pos  Min         Max         Mean        SD                 N
   3    21.333     0.12346E+07 0.20578E+06 0.50400E+06       6
   4    11.000      66.000      38.500      20.579           6
   5    12.000      67.000      39.500      20.579           6

 Correlation matrix
        3     4     5
  3   1.00  0.65  0.65
  4   0.65  1.00  1.00
  5   0.65  1.00  1.00

 Counts of nonzero values (order as above)
          6         6         6
          6         6         6
          6         6         6

 random effect   2
 type:animal    
 opened output pedigree file "renadd02.ped"
 read            5  pedigree records
 loaded            3  parent(s) in round            1

 Pedigree checks
 
 Number of animals with records:           4
 Number of parents without records:           1
 Number of phantom dams:           2
 Total number of animals:           7

 random effect   4
 type:diag      

 Wrote parameter file "renf90.par"
 Wrote renumbered data "renf90.dat"

new parameter file - renf90.par

# BLUPF90 parameter file created by RENF90
DATAFILE
 renf90.dat
NUMBER_OF_TRAITS
           2
NUMBER_OF_EFFECTS
           7
OBSERVATION(S)
    1    2
WEIGHT(S)
 
EFFECTS: POSITIONS_IN_DATAFILE NUMBER_OF_LEVELS TYPE_OF_EFFECT[EFFECT NESTED]
  3  3         4 cross 
  4  4         7 cross 
  5  5         7 cross
  5  5         7 cross
  4  4         7 cross
  6  0         4 cov   7  0
  8  8         3 cross 
RANDOM_RESIDUAL VALUES
   5.000       2.000    
   2.000       4.000    
 RANDOM_GROUP
     2     3
 RANDOM_TYPE
 add_animal
 FILE
renadd02.ped                                                
(CO)VARIANCES
   10.00       3.000       2.000       1.000    
   3.000       11.00       4.000       5.000    
   2.000       4.000       12.00       6.000    
   1.000       5.000       6.000       13.01    
 RANDOM_GROUP
     4
 RANDOM_TYPE
 diagonal  
 FILE
                                                            
(CO)VARIANCES
   1.030      0.2700    
  0.2700      0.8500    
 RANDOM_GROUP
     5
 RANDOM_TYPE
 diagonal  
 FILE
                                                            
(CO)VARIANCES
   5.300       2.100    
   2.100       4.850    
 RANDOM_GROUP
     7
 RANDOM_TYPE
 diagonal  
 FILE
                                                            
(CO)VARIANCES
   1.000      0.1000    
  0.1000       1.000

data file - renf90.dat

 34.5 11 1 3 5 12 1 3 aa 1
 21.333 22 2 1 3 23 2 1 bb 3
 23.666 33 4 4 7 34 4 2 cc 8
 29 44 1 2 3 45 1 1 dd 1
 30 55 2 3 5 56 2 2 aa 3
 1234567.890 66 3 1 3 67 3 3 bb 5

Pedigree file - renadd02.ped

 1 6 3 1 0 2 2 0 0 bb
 6 0 0 1 0 0 0 2 0 qq
 2 0 3 1 0 1 1 0 0 dd
 7 0 0 1 0 0 0 0 1 D@@0000002
 5 0 0 1 0 0 0 0 1 D@@0000001
 3 0 5 1 0 1 2 0 2 aa
 4 6 7 1 0 2 1 0 0 cc

renumbering tables - renf90.tables

 Effect group 1 of column 1 with 4 levels
 Value    #    consecutive number
1 2 1 
3 2 2 
5 1 3 
8 1 4 
 Effect group 3 of column 1 with 4 levels
 Value    #    consecutive number
1 2 1 
3 2 2 
5 1 3 
8 1 4 
 Effect group 4 of column 1 with 3 levels
 Value    #    consecutive number
xx 2 1 
yy 2 2 
zz 2 3

Table of Contents

RENUMF90

Summary

Structure of parameter file

Bugs

Fields in the parameter file

User-defined UPG code

Extra comments

Options

Combining fields or interactions

Additive Pedigree File

Extensions

Example