Disease Module Identification (DMI)


Disease processes are usually driven by several genes interacting in molecular modules or pathways leading to the disease. The identification of such modules in gene or protein networks is the core of computational methods in biomedical research. With this pretext, the Disease Module Identification (DMI) DREAM Challenge was initiated as an effort to systematically assess module identification methods on a panel of 6 diverse genomic networks. In the paper (An unsupervised disease module identification technique in biological networks using novel quality metric based on connectivity, conductance and modularity), we propose a generic refinement method based on ideas of merging and splitting the hierarchical tree obtained from any community detection technique for constrained DMI in biological networks. The only constraint was that size of community is in the range [3, 100]. We propose a novel model evaluation metric, called F-score, computed from several unsupervised quality metrics like modularity, conductance and connectivity to determine the quality of a graph partition at given level of hierarchy.

Availability: DMI is also available at:


We describe in detail the pre-requisites for running the proposed Adaptive Refinement technique (in conjugation with any community detection method) for Disease Module Identification.

  1. Matlab: https://www.mathworks.com/pricing-licensing/index.html?prodcode=ST&s_iid=main_pl_ST_tb

     There are 4 license versions, your institution/company may already have licenses please check with them. A free trial may also be an option, see link at bottom of page, "Get a Product Trial" This does require a MathWorks account though, you may create one if you do not have one. 
  2. Statistics and Machine Learning Toolbox in Matlab:

     The code was written in Matlab and requires the 'Statistics and Machine Learning Toolbox' package. If you do not have this package, Matlab offers a free trial (need to create a MathWorks account to receive the free trial) https://www.mathworks.com/programs/trials/trial_request.html?prodcode=ST&s_iid=solmain_trial_mlr_rb. We use this Toolbox for 'Linkage' clustering. ( https://www.mathworks.com/help/stats/linkage.html ). 

    The file you need to run is ‘dream_merge_split.m’.

  3. Louvain Method: Included, for a newer version see below

    In order to automatically run the scripts we need to have the latest version of Louvain method (C++ Implementation).
    - This is available as Community_latest from Blondel et al. ( https://sites.google.com/site/findcommunities/newversion/community.tgz?attredirects=0 ) and is included with our code.
    - If there is a newer version you would like to use, then delete the ‘Community_latest’ folder and move the new ‘Community_latest’ folder to the ‘final_scripts’ folder.
    - Before running anything in the ‘final_scripts’, perform >chmod -R 700 Community_latest, or anything else to get appropriate user permissions for your system.
    - Compile the Louvain method by performing >make in ‘Community_latest’ folder within the ‘final_scripts’ folder.
    - Since the Louvain method is written in C++, a compatible gcc compiler is required, we recommend first trying the latest available in your OS distribution’s repositories.

  4. A folder named ‘subchallenge1’ created in ‘final_scripts’ folder containing the network files to run on.

  5. Python: You will need Python >=2.7.x for running the codes. Maybe Python 3 will work, we haven’t tried it.

  6. Change the filename variable in dream_merge_split.m to the filename of the network you want to run it on without the .txt extension. Run the ‘dream_merge_split.m’ file in Matlab.

    • Examples: open dream_merge_split.m and in the file use: “filename = ‘1_ppi_anonym_v2’” To run on the first network
      or “filename = ‘2_ppi_anonym_v2’” To run on the 2nd
      or “filename = ‘3_signal_anonym_directed_v3’” 3rd
      or “filename = ‘4_coexpr_anonym_v2’” etc…
      or “filename = ‘5_cancer_anonym_v2’”
      or “filename = ‘6_homology_anonym_v2’” Without the double quotes " "

INFO:

  1. We need to have internal quality metrics which are in ‘Evaluation_Metrics’ folder

    • Consists of ‘graph_metrics’ folder
    • Comprises ‘conductance’ folder
  2. Results of Louvain method are produced in ‘Louvain_Results’

  3. Results of Adaptive Recitified Louvain method is ‘Final_Results’

    • These are the results we used as Final Submission for the Challenge.
  4. As output a plot is also generated indicating how confidence threshold effects modularity.

  5. Additional Results of Multilevel Hierarchical Kernel Spectral Clustering in combination with our proposed adaptive technique are in folder AMHKSC_Results, these were generated by us earlier and are not generated by running the code as described here.

Copyright © [2016] QCRI a member of HBKU. All Rights Reserved;


DMI Discussion