Documentation
Welcome to the HPIpy's User Guide. This guide will help you get started and make the most of its features.
Installation Instructions
Step-by-step Installation
- Install Miniconda (if not installed previously):
- Download Minconda from: Miniconda
- To install, run:
bash Miniconda3-latest-Linux-x86_64.sh
- Check conda installation using:
conda --version
- Obtain the package from Downloads page or GitHub and execute the following command accordingly:
- Decompress the file (if using compressed version):
tar -xvzf hpipy.tar.gz
cd hpipy
conda env create -f environment.yml
conda activate hpipy
After downloading (and extracting) the package files from one of the above sources, execute:
Usage
Package Help
To see the different options available in HPIpy, see package help:
python3 -m hpipy --help
Basic Usage
For basic usage, provide the path to your input protein sequence files (fasta / fa / faa format only; their compressed formats are also accepted) for host and pathogen species, and choose the respective model suitable to your species (below example is for "humanVirus" model).
python3 -m hpipy --host exampleData/hostProteins.fasta --pathogen exampleData/pathogenProteins.fasta --model humanVirus --computation interolog
Advanced Options
For enhanced analysis, additional (optional) arguments can be used as needed. See package "Help" below. If you want to predict the interactions
again using different parameters such as BLAST identity, coverage, phylogenetic profiling threshold, etc., use the resume_ppis
option. This will not re-run the computationally intensive programs (BLAST, HMMER, etc.) again, and will resume the pipeline from
the predictions step.
usage: python3 -m hpipy [options]
hpipy: A package to predict host-microbe protein-protein interactions
---------------------------------------------------------------------
To obtain more information about the package, visit: https://kaabil.net/hpipy/
options:
-h, --help show this help message and exit
Required arguments:
--host Protein sequences of host species (formats accepted: .fasta, .fasta.gz, .fasta.zip, .fa, .fa.gz, .fa.zip, .faa, .faa.gz, .faa.zip)
--pathogen Protein sequences of pathogen species (formats accepted: .fasta, .fasta.gz, .fasta.zip, .fa, .fa.gz, .fa.zip, .faa, .faa.gz, .faa.zip)
--computation Computational method(s) to be implemented for the analysis. Provide a space-separated list;
Available methods: interolog, domain, phyloProfiling, gosim
--model Host-pathogen model to be implemented for the analysis;
Available models: plantPathogen, animalPathogen, humanVirus, humanBacteria
Optional Arguments:
--version Display package version and exit
--outputdir Directory where output files will be written; default: "HPIpy_results"
--use_slurm To run jobs using SLURM job scheduler
--slurm_account To run SLURM jobs on a specific account on the cluster
--network To perform network analysis for the predicted interactions. This step will
take more time based on the number of predicted interactions.
--num_threads Number of threads to be used; default: 4
--seq_homology Sequence identity for CD-HIT (0.1 to 1.0); default: 1.0 (100 percent)
--resume_ppis To predict interactions using different parameters without running the whole pipeline. Provide
'interproscan' option if it was used before, although it will no be executed again if interproscan's
output files already exist
Interolog model prediction arguments (optional):
--interIdentity Sequence identity to filter BLAST alignments; default: 50
--interCoverage Sequence coverage to filter BLAST alignments; default: 50
--interEvalue e-value to filter BLAST alignments; default: 1e-05
Domain model prediction arguments (optional):
--domHostEvalue e-value to filter host HMMER output; default is based on the selected model
--domPathogenEvalue e-value to filter pathogen HMMER output; default is based on the selected model
Phylogenetic profiling model prediction arguments (optional):
--genome_pool Genome pool to be used for phylogenetic profiling model; Available pools: UP82,
BC20, protPhylo490; default: BC20
--phyloEvalue e-value to filter DIAMOND BLAST alignments; default: 1e-05
--phyloIdentity Sequence identity to filter DIAMOND BLAST alignments; default: 50
--phyloCoverage Sequence coverage to filter DIAMOND BLAST alignments; default: 50
--phyloThreshold Threshold value to filter predicted interactions based on phylogenetic distance (0.1 to 1.0); default: 0.9
GO semantic similarity model prediction arguments (optional):
--interproscan To run InterProScan locally to obtain GO terms for GO similarity model; InterProScan should be installed locally
--hostGOFile Comma- or tab-separated file containing GO terms of host proteins
--pathogenGOFile Comma- or tab-separated file containing GO terms of pathogen proteins
--go_combine Method to combine GO similarity scores. Available methods: max, avg, rcmax and BMA; default: BMA
--goSimThreshold Threshold value to filter predicted interactions based on GO semantic similarity (0.1 to 1.0); default: 0.9
Exemplar Results
Below are files/directories of the results generated by HPIpy for "human-virus" model to help you understand the output:
HPIpy_results
├── Alignment
│ └── Interolog
│ ├── hostProteins_biogrid_blast.txt
│ ├── hostProteins_dip_blast.txt
│ ├── hostProteins_hpidb_blast.txt
│ ├── hostProteins_intact_blast.txt
│ ├── hostProteins_mint_blast.txt
│ ├── hostProteins_virhostnet_blast.txt
│ ├── pathogenProteins_biogrid_blast.txt
│ ├── pathogenProteins_dip_blast.txt
│ ├── pathogenProteins_hpidb_blast.txt
│ ├── pathogenProteins_intact_blast.txt
│ ├── pathogenProteins_mint_blast.txt
│ └── pathogenProteins_virhostnet_blast.txt
├── Clustering
│ ├── hostProteins.fasta
│ ├── hostProteins.fasta.clstr
│ ├── pathogenProteins.fasta
│ └── pathogenProteins.fasta.clstr
├── Domains
│ ├── hostProteins_did3_domains.txt
│ ├── hostProteins_domains.txt
│ ├── hostProteins_domine_domains.txt
│ ├── hostProteins_iddi_domains.txt
│ ├── pathogenProteins_did3_domains.txt
│ ├── pathogenProteins_domains.txt
│ ├── pathogenProteins_domine_domains.txt
│ └── pathogenProteins_iddi_domains.txt
├── HPIpy.log
├── logs
│ ├── hmmpress.log
│ ├── hostProteins_blastdb_biogrid.log
│ ├── hostProteins_blastdb_dip.log
│ ├── hostProteins_blastdb_hpidb.log
│ ├── hostProteins_blastdb_intact.log
│ ├── hostProteins_blastdb_mint.log
│ ├── hostProteins_blastdb_virhostnet.log
│ ├── hostProteins_cdhit_out.log
│ ├── hostProteins_hmmscan.log
│ ├── pathogenProteins_blastdb_biogrid.log
│ ├── pathogenProteins_blastdb_dip.log
│ ├── pathogenProteins_blastdb_hpidb.log
│ ├── pathogenProteins_blastdb_intact.log
│ ├── pathogenProteins_blastdb_mint.log
│ ├── pathogenProteins_blastdb_virhostnet.log
│ ├── pathogenProteins_cdhit_out.log
│ └── pathogenProteins_hmmscan.log
└── Predictions
├── Combined_PPIs
│ └── Combined_PPIs.csv
├── Consensus_PPIs
│ └── interolog_domain_consensus_PPIs.csv
├── Domain-based
│ ├── did3_PPI.csv
│ ├── Domain_Annotations.txt
│ ├── Domain_PPIs.csv
│ ├── domine_PPI.csv
│ ├── extracted_sequences
│ ├── human_annotations
│ ├── iddi_PPI.csv
│ └── network_analysis
├── Interolog-based
│ ├── biogrid_PPI.csv
│ ├── dip_PPI.csv
│ ├── extracted_sequences
│ ├── hpidb_PPI.csv
│ ├── human_annotations
│ ├── intact_PPI.csv
│ ├── Interolog_Annotations.txt
│ ├── Interolog_PPIs.csv
│ ├── mint_PPI.csv
│ ├── network_analysis
│ └── virhostnet_PPI.csv
└── Prediction_stats.txt
Contact Us
For any queries, contact us at bioinfo@kaabil.net.
Top