HPIpy - Guide

Documentation

Welcome to the HPIpy's User Guide. This guide will help you get started and make the most of its features.

Installation Instructions

Step-by-step Installation

Install Miniconda (if not installed previously):

Download Minconda from: Miniconda
To install, run: bash Miniconda3-latest-Linux-x86_64.sh
Check conda installation using: conda --version

Obtain the package from Downloads page or GitHub and execute the following command accordingly:

Decompress the file (if using compressed version):

tar -xvzf hpipy.tar.gz

After downloading (and extracting) the package files from one of the above sources, execute:

cd hpipy
conda env create -f environment.yml
conda activate hpipy

Usage

Package Help

To see the different options available in HPIpy, see package help:

python3 -m hpipy --help

Basic Usage

For basic usage, provide the path to your input protein sequence files (fasta / fa / faa format only; their compressed formats are also accepted) for host and pathogen species, and choose the respective model suitable to your species (below example is for "humanVirus" model).

python3 -m hpipy --host exampleData/hostProteins.fasta --pathogen exampleData/pathogenProteins.fasta --model humanVirus --computation interolog

Advanced Options

For enhanced analysis, additional (optional) arguments can be used as needed. See package "Help" below. If you want to predict the interactions again using different parameters such as BLAST identity, coverage, phylogenetic profiling threshold, etc., use the resume_ppis option. This will not re-run the computationally intensive programs (BLAST, HMMER, etc.) again, and will resume the pipeline from the predictions step.

      
        usage: python3 -m hpipy [options]

          hpipy: A package to predict host-microbe protein-protein interactions
          ---------------------------------------------------------------------

          To obtain more information about the package, visit: https://kaabil.net/hpipy/
                                        

          options:
            -h, --help            show this help message and exit

          Required arguments:
            --host                Protein sequences of host species (formats accepted: .fasta, .fasta.gz, .fasta.zip, .fa, .fa.gz, .fa.zip, .faa, .faa.gz, .faa.zip)
            --pathogen            Protein sequences of pathogen species (formats accepted: .fasta, .fasta.gz, .fasta.zip, .fa, .fa.gz, .fa.zip, .faa, .faa.gz, .faa.zip)
            --computation         Computational method(s) to be implemented for the analysis. Provide a space-separated list;
                                  Available methods: interolog, domain, phyloProfiling, gosim
            --model               Host-pathogen model to be implemented for the analysis;
                                  Available models: plantPathogen, animalPathogen, humanVirus, humanBacteria

          Optional Arguments:
            --version             Display package version and exit
            --outputdir           Directory where output files will be written; default: "HPIpy_results"
            --use_slurm           To run jobs using SLURM job scheduler
            --slurm_account       To run SLURM jobs on a specific account on the cluster
            --network             To perform network analysis for the predicted interactions. This step will 
                                  take more time based on the number of predicted interactions.
            --num_threads         Number of threads to be used; default: 4
            --seq_homology        Sequence identity for CD-HIT (0.1 to 1.0); default: 1.0 (100 percent)
            --resume_ppis         To predict interactions using different parameters without running the whole pipeline. Provide 
                                  'interproscan' option if it was used before, although it will no be executed again if interproscan's 
                                  output files already exist

          Interolog model prediction arguments (optional):
            --interIdentity       Sequence identity to filter BLAST alignments; default: 50
            --interCoverage       Sequence coverage to filter BLAST alignments; default: 50
            --interEvalue         e-value to filter BLAST alignments; default: 1e-05

          Domain model prediction arguments (optional):
            --domHostEvalue       e-value to filter host HMMER output; default is based on the selected model
            --domPathogenEvalue   e-value to filter pathogen HMMER output; default is based on the selected model

          Phylogenetic profiling model prediction arguments (optional):
            --genome_pool         Genome pool to be used for phylogenetic profiling model; Available pools: UP82, 
                                  BC20, protPhylo490; default: BC20
            --phyloEvalue         e-value to filter DIAMOND BLAST alignments; default: 1e-05
            --phyloIdentity       Sequence identity to filter DIAMOND BLAST alignments; default: 50
            --phyloCoverage       Sequence coverage to filter DIAMOND BLAST alignments; default: 50
            --phyloThreshold      Threshold value to filter predicted interactions based on phylogenetic distance (0.1 to 1.0); default: 0.9

          GO semantic similarity model prediction arguments (optional):
            --interproscan        To run InterProScan locally to obtain GO terms for GO similarity model; InterProScan should be installed locally
            --hostGOFile          Comma- or tab-separated file containing GO terms of host proteins
            --pathogenGOFile      Comma- or tab-separated file containing GO terms of pathogen proteins
            --go_combine          Method to combine GO similarity scores. Available methods: max, avg, rcmax and BMA; default: BMA
            --goSimThreshold      Threshold value to filter predicted interactions based on GO semantic similarity (0.1 to 1.0); default: 0.9

Exemplar Results

Below are files/directories of the results generated by HPIpy for "human-virus" model to help you understand the output:

      
        HPIpy_results
        ├── Alignment
        │   └── Interolog
        │       ├── hostProteins_biogrid_blast.txt
        │       ├── hostProteins_dip_blast.txt
        │       ├── hostProteins_hpidb_blast.txt
        │       ├── hostProteins_intact_blast.txt
        │       ├── hostProteins_mint_blast.txt
        │       ├── hostProteins_virhostnet_blast.txt
        │       ├── pathogenProteins_biogrid_blast.txt
        │       ├── pathogenProteins_dip_blast.txt
        │       ├── pathogenProteins_hpidb_blast.txt
        │       ├── pathogenProteins_intact_blast.txt
        │       ├── pathogenProteins_mint_blast.txt
        │       └── pathogenProteins_virhostnet_blast.txt
        ├── Clustering
        │   ├── hostProteins.fasta
        │   ├── hostProteins.fasta.clstr
        │   ├── pathogenProteins.fasta
        │   └── pathogenProteins.fasta.clstr
        ├── Domains
        │   ├── hostProteins_did3_domains.txt
        │   ├── hostProteins_domains.txt
        │   ├── hostProteins_domine_domains.txt
        │   ├── hostProteins_iddi_domains.txt
        │   ├── pathogenProteins_did3_domains.txt
        │   ├── pathogenProteins_domains.txt
        │   ├── pathogenProteins_domine_domains.txt
        │   └── pathogenProteins_iddi_domains.txt
        ├── HPIpy.log
        ├── logs
        │   ├── hmmpress.log
        │   ├── hostProteins_blastdb_biogrid.log
        │   ├── hostProteins_blastdb_dip.log
        │   ├── hostProteins_blastdb_hpidb.log
        │   ├── hostProteins_blastdb_intact.log
        │   ├── hostProteins_blastdb_mint.log
        │   ├── hostProteins_blastdb_virhostnet.log
        │   ├── hostProteins_cdhit_out.log
        │   ├── hostProteins_hmmscan.log
        │   ├── pathogenProteins_blastdb_biogrid.log
        │   ├── pathogenProteins_blastdb_dip.log
        │   ├── pathogenProteins_blastdb_hpidb.log
        │   ├── pathogenProteins_blastdb_intact.log
        │   ├── pathogenProteins_blastdb_mint.log
        │   ├── pathogenProteins_blastdb_virhostnet.log
        │   ├── pathogenProteins_cdhit_out.log
        │   └── pathogenProteins_hmmscan.log
        └── Predictions
            ├── Combined_PPIs
            │   └── Combined_PPIs.csv
            ├── Consensus_PPIs
            │   └── interolog_domain_consensus_PPIs.csv
            ├── Domain-based
            │   ├── did3_PPI.csv
            │   ├── Domain_Annotations.txt
            │   ├── Domain_PPIs.csv
            │   ├── domine_PPI.csv
            │   ├── extracted_sequences
            │   ├── human_annotations
            │   ├── iddi_PPI.csv
            │   └── network_analysis
            ├── Interolog-based
            │   ├── biogrid_PPI.csv
            │   ├── dip_PPI.csv
            │   ├── extracted_sequences
            │   ├── hpidb_PPI.csv
            │   ├── human_annotations
            │   ├── intact_PPI.csv
            │   ├── Interolog_Annotations.txt
            │   ├── Interolog_PPIs.csv
            │   ├── mint_PPI.csv
            │   ├── network_analysis
            │   └── virhostnet_PPI.csv
            └── Prediction_stats.txt

Contact Us

For any queries, contact us at bioinfo@kaabil.net.

Top