GIGI2
Imputation of untyped SNPs has become important in Genome-wide Association Studies(GWAS). Imputation of rare variants, which are enriched in pedigrees, is more suitable in family-baseddesigns. The costs of performing relatively large family-based GWAS can be significantly reduced by fully sequencing only a fraction of the pedigree and performing imputation on the remaining subjects. The
program GIGI can efficiently perform imputation in large pedigrees but can be time consuming. Here, we implement GIGI’s imputation approach in a new program, GIGI2, which performs imputation with computational time reduced by at least
25x on one thread and 120x on eight threads using multi-threaded imputation. The memory usage of GIGI2 is reduced by at least 30x. This reduction is achieved by implementing better memory layout and a better algorithm for solving the Identity by Descent graphs.
We also make GIGI2 available as a webserver based on the same framework as the Michigan Imputation Server.
Availability: GIGI2 is available as:
- Gitlab Source Code (described below)
- Web Server
- Docker Container
- As a Snap:
GIGI2
Requirements
If you need to compile the binaries, a C++ compiler, static 32 and 64 bit binaries are included that will work on many Linux versions.
Getting GIGI-Quick
With Git
Run the following command to clone the repository with git (git is a version management program started by Linus Torvalds https://git-scm.com/downloads)
git clone https://cse-git.qcri.org/eullah/GIGI2.git
With a Browser
Go to this url: https://cse-git.qcri.org/eullah/GIGI2/tree/master
Click on the icon with the download arrow above the column “Last Update” on the right hand side.
There are several download options with different compressions. If you get GIGI2 this way, then you will need to decompress it before proceeding further.
Installation
Once you have the files, most users won’t need to do anything else to use GIGI2. There are executables compiled on Ubuntu 64 bit Linux for 64 bit and 32 bit (via multilib) x86 systems.
You may be able to use these unless your system has a different architecture (e.g. PowerPC, ARM). If you encounter any strange behavior, compiling GIGI2 natively for your system is a good first troubleshooting step.
Compiling
We compile GIGI2 with g++, there is a compile.sh script to do this
cd GIGI2/src
./compile.sh
This should create the GIGI2 binary executable file, we use this name for the GIGI2 binary below, just substitute GIGI2-static-32 or GIGI2-static-64 instead of GIGI2 if you are not compiling it.
Extra Integration
As an Unprivledged User
If you like you can now add GIGI2 to your path, the examples assume that you have, you can do this by adding the following to your .bashrc (located in your home folder)
export PATH=$:/path/to/folder/where/you/put/GIGI2
Then source your .bashrc to apply the changes right away
source ~/.bashrc
As a Root/Sudo User
To add run_GIGI to the path system-wide for all users you can create a symlink in /usr/bin pointing to the run_GIGI script:
ln -s /path/to/GIGI2 /usr/bin/GIGI2
Or you can move or copy to executable to that location, e.g.
mv /path/to/GIGI2 /usr/bin/GIGI2
Usage
GIGI2 <parameter file>
The parameter file must have the following mandatory parameters, one parameter per line:
–ped <file> Pedigree file
–meiosis <file> Inheritance vector (meiosis) file
–iter <count> Number of meiosis iterations
–genocall <method> [t1] [t2] Genotype calling method (1=Max prob, 2=Confidence based)
–smap <file> Sparse (framework) map file
–dmap <file> Dense map file
–geno <file> Genotype file
–afreq <file> Allele frequency file
–out <file> Output file prefix
Optional parameters:
–threads <count> Number of threads (Default=Hardware threads - 1)
–seed <number> Random number generator seed (Default=1234)
–mbuffer <size> Maximum number of markers to be loaded in memory (Default=10000)
–drange <min> <max> Range of dense markers to be imputed (Default=0~1.79769e+308), note that these values correspond to the values in the dense map file.
Example:
GIGI2 ./Pedigree_1.param
The parameter file includes similar information as the parameter file for the original GIGI, which is described in GIGI’s documentation: https://faculty.washington.edu/wijsman/progdists/gigi/software/GIGI/GIGI_v1_06.1_Documentation.pdf
The GIGI documentation also describes the input files for GIGI, which are the same as for GIGI2 except that GIGI2 only accepts the long format for the genotype file.
Example Parameter File:
--ped ./pedigrees/Ped5/Chr22/ped.txt
--meiosis ./pedigrees/Ped5/Chr22/chr22.mi
--iter 1000
--smap ./pedigrees/Ped5/Chr22/chr22.frame_map
--dmap ./pedigrees/Ped5/Chr22/chr22.dense_marker_cm_positions.map
--geno ./pedigrees/Ped5/Chr22/chr22.dense_geno.corrected_long
--afreq ./pedigrees/Ped5/Chr22/dense_chr22_corrected_freq.freq
--out ./Ped5_Chr22_Output_1_thread/output
--genocall 2 0.8 0.9
--threads 4
--seed 1234
--mbuffer 10000
--drange 0 42
Because GIGI2 only accepts the long format parameter files, there is a utility included to convert wide format genotype files to the long format. This is Wide2Long, like GIGI2 we have included static 32 and 64 bit binary executables of this utility. If you need to compile it, it can be done simply with g++:
g++ Wide2Long.cpp -o Wide2Long
The utility is used as:
Wide2Long [Wide format file] [Long format file]
Where the ‘Wide format file’ is the input and the ‘Long format file’ is the location for the new file being output.
Logs
The run logs are output in the file <file>.log, where <file> is the output file prefix given to the --out argument in the parameter file.