GIGI2


Imputation of untyped SNPs has become important in Genome-wide Association Studies(GWAS). Imputation of rare variants, which are enriched in pedigrees, is more suitable in family-baseddesigns. The costs of performing relatively large family-based GWAS can be significantly reduced by fully sequencing only a fraction of the pedigree and performing imputation on the remaining subjects. The program GIGI can efficiently perform imputation in large pedigrees but can be time consuming. Here, we implement GIGI’s imputation approach in a new program, GIGI2, which performs imputation with computational time reduced by at least 25x on one thread and 120x on eight threads using multi-threaded imputation. The memory usage of GIGI2 is reduced by at least 30x. This reduction is achieved by implementing better memory layout and a better algorithm for solving the Identity by Descent graphs. We also make GIGI2 available as a webserver based on the same framework as the Michigan Imputation Server.

Availability: GIGI2 is available as:

  • Gitlab Source Code (described below)
  • Web Server
  • Docker Container
  • A Snap, install with "sudo snap install gigi2" Source code here. Note that the snap is used with lowercase 'gigi2' instead of the uppercase 'GIGI2' below, this is done to follow Unix/Linux package/program naming conventions


GIGI2

Requirements

If you need to compile the binaries, a C++ compiler, static 32 and 64 bit binaries are included that will work on many Linux versions.

Getting GIGI-Quick

With Git

Run the following command to clone the repository with git (git is a version management program started by Linus Torvalds https://git-scm.com/downloads)

git clone https://cse-git.qcri.org/eullah/GIGI2.git

With a Browser

Go to this url: https://cse-git.qcri.org/eullah/GIGI2/tree/master

Click on the icon with the download arrow above the column “Last Update” on the right hand side.
There are several download options with different compressions. If you get GIGI2 this way, then you will need to decompress it before proceeding further.

Installation

Once you have the files, most users won’t need to do anything else to use GIGI2. There are executables compiled on Ubuntu 64 bit Linux for 64 bit and 32 bit (via multilib) x86 systems.
You may be able to use these unless your system has a different architecture (e.g. PowerPC, ARM). If you encounter any strange behavior, compiling GIGI2 natively for your system is a good first troubleshooting step.

Compiling

We compile GIGI2 with g++, there is a compile.sh script to do this
cd GIGI2/src

./compile.sh

This should create the GIGI2 binary executable file, we use this name for the GIGI2 binary below, just substitute GIGI2-static-32 or GIGI2-static-64 instead of GIGI2 if you are not compiling it.

Extra Integration

As an Unprivledged User

If you like you can now add GIGI2 to your path, the examples assume that you have, you can do this by adding the following to your .bashrc (located in your home folder)

export PATH=$:/path/to/folder/where/you/put/GIGI2

Then source your .bashrc to apply the changes right away

source ~/.bashrc

As a Root/Sudo User

To add run_GIGI to the path system-wide for all users you can create a symlink in /usr/bin pointing to the run_GIGI script:

ln -s /path/to/GIGI2 /usr/bin/GIGI2

Or you can move or copy to executable to that location, e.g.

mv /path/to/GIGI2 /usr/bin/GIGI2

Usage

GIGI2 <parameter file>

The parameter file must have the following mandatory parameters, one parameter per line:

–ped <file> Pedigree file
–meiosis <file> Inheritance vector (meiosis) file
–iter <count> Number of meiosis iterations
–genocall <method> [t1] [t2] Genotype calling method (1=Max prob, 2=Confidence based)
–smap <file> Sparse (framework) map file
–dmap <file> Dense map file
–geno <file> Genotype file
–afreq <file> Allele frequency file
–out <file> Output file prefix

Optional parameters:
–threads <count> Number of threads (Default=Hardware threads - 1)
–seed <number> Random number generator seed (Default=1234)
–mbuffer <size> Maximum number of markers to be loaded in memory (Default=10000)
–drange <min> <max> Range of dense markers to be imputed (Default=0~1.79769e+308), note that these values correspond to the values in the dense map file.

Example:

  GIGI2 ./Pedigree_1.param

The parameter file includes similar information as the parameter file for the original GIGI, which is described in GIGI’s documentation: https://faculty.washington.edu/wijsman/progdists/gigi/software/GIGI/GIGI_v1_06.1_Documentation.pdf
The GIGI documentation also describes the input files for GIGI, which are the same as for GIGI2 except that GIGI2 only accepts the long format for the genotype file.

Example Parameter File:

--ped ./pedigrees/Ped5/Chr22/ped.txt
--meiosis ./pedigrees/Ped5/Chr22/chr22.mi
--iter 1000
--smap ./pedigrees/Ped5/Chr22/chr22.frame_map
--dmap ./pedigrees/Ped5/Chr22/chr22.dense_marker_cm_positions.map
--geno ./pedigrees/Ped5/Chr22/chr22.dense_geno.corrected_long
--afreq ./pedigrees/Ped5/Chr22/dense_chr22_corrected_freq.freq
--out ./Ped5_Chr22_Output_1_thread/output
--genocall 2 0.8 0.9
--threads 4
--seed 1234
--mbuffer 10000
--drange 0 42

Because GIGI2 only accepts the long format parameter files, there is a utility included to convert wide format genotype files to the long format. This is Wide2Long, like GIGI2 we have included static 32 and 64 bit binary executables of this utility. If you need to compile it, it can be done simply with g++:

g++ Wide2Long.cpp -o Wide2Long

The utility is used as:

Wide2Long [Wide format file] [Long format file]

Where the ‘Wide format file’ is the input and the ‘Long format file’ is the location for the new file being output.

Logs

The run logs are output in the file <file>.log, where <file> is the output file prefix given to the --out argument in the parameter file.


GIGI Discussion