The HMMER web service supports querying against a range of regularly updated sequence and HMM target databases.
- Large, comprehensive sequence collection
- UniProtKB - Comprehensive resource for protein sequence and annotation data produced by the Universal Protein Resource consortium.
- Annotated sequences and determined 3D structures
- Representative Sets
- Representative Proteomes - Representative Proteomes (RPs) are determined by selecting one proteome from a representative proteome group containing similar proteomes calculated based on sequence co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold are available as target databases. More information on Representative Proteomes is available. The data set also includes model organisms and viral reference proteomes as defined by UniProt. The complete proteomes database comes from PIR.
- Reference Proteomes - A set of proteomes from UniProt that gives broad coverage of the tree of life, and constitutes a representative cross-section of the taxonomic diversity to be found within UniProtKB. Produced by UniProt, in collaboration with Ensembl and the NCBI Reference Sequence collection.
- Ensembl Genomes - Ensembl Genomes is a resource for genomic data for several thousands of invertebrate species. All translations resulting from known and novel gene predictions in Ensembl Genomes, including hypothetical proteins, are included. For lists of all the species in each sub division within Ensembl Genomes please see Bacteria, Fungi, Metazoa, Plants and Protists.
- Ensembl - Searches may be performed across the entire set or one of Human, Mouse, or Zebrafish
- Quest for Orthologs
- MEROPS - a set of domain sequences from the MEROPS database of proteolytic enzymes. For each peptidase in the collection, the sequence of the known or predicted domain that carries the active site residues is included. Homologues that are not proteolytically active because one or more active site residues are missing or replaced are also included. For each inhibitor, the sequence is that of each inhibitory domain. Domains homologous to an inhibitory domain are also included, even if no inhibitory activity is known.
- ChEMBL - A manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.
The default database is UniProt reference proteomes.
Profile HMM databases¶
- Pfam - A large comprehensive collection of protein families.
- TIGRFAMs - Models that are designed for automated sequence annotation and that are aimed at matching the full length (or near) of the sequence.
- Gene3D - A collection of models that are based on CATH structural protein domains.
- SUPERFAMILY - A collection of models, which represent structural protein domains at the SCOP superfamily level.
- PIRSF - Models that are designed to provide a comprehensive and non-overlapping clustering of UniProtKB.
- TreeFam - A database of phylogenetic trees of animal gene families.
The default database is Pfam.
Clicking ‘Search Details’ at the end of the result page reveals a box that provides details of the search, including the query sequence (if applicable) and information regarding the date/release of the target databases, which should be recorded for future reference when trying to recreate the results, discussing with colleagues or reporting bugs.