Appendices¶

Appendix A - Result object format¶

The results are returned from the search servers as a binary data object. This can be a little complex when first looked at. However, the data structure is fairly simple and is represented pictorially below:

In the following sections the contents of each part of the results data structure will be described. Parts of the data structure will be referred to as hashes (key, value pairs) or arrays, but depending on the type of response requested will translate into different entities, for example elements and attributes for an XML response.

“Results” hash¶

stats	The stats hash
hits	Array of sequence hashes
uuid	The unique job identifier
algo	The HMMER search algorithm
searchDB	The target search database
_internal	Hash containing some internal accounting

“Stats” hash¶

nhits	The number of hits found above reporting thresholds
Z	The number of sequences or models in the target database
domZ	The number of hits in the target database
nmodels	The number of models in this search
nincluded	The number of sequences or models scoring above the significance threshold
nreported	The number of sequences or models scoring above the reporting threshold

“Sequence” hash¶

The hits array contains one or more sequences. Only parts of the response actually deemed useful will be described. With the non-redundant databases, the redundant sequence information will also be included, but as the sequences are identical, the information about the hit is identical.

name	Name of the target (sequence for phmmer/hmmsearch, HMM for hmmscan)
acc	Accession of the target
acc2	Secondary accession of the target
id	Identifier of the target
desc	Description of the target
score	Bit score of the sequence (all domains, without correction)
pvalue	P-value of the score
evalue	E-value of the score
nregions	Number of regions evaluated
nenvelopes	Number of envelopes handed over for domain definition, null2, alignment, and scoring.
ndom	Total number of domains identified in this sequence
nreported	Number of domains satisfying reporting thresholding
nincluded	Number of domains satisfying inclusion thresholding
taxid	The NCBI taxonomy identifier of the target (if applicable)
species	The species name of the target (if applicable)
kg	The kingdom of life that the target belongs to - based on placing in the NCBI taxonomy tree (if applicable)
seqs	An array containing information about the 100% redundant sequences
pdbs	Array of pdb identifiers (which chains information)

“Domain” Hash¶

The domain or hit hash contains the details of the match, in particular the alignment between the query and the target.

ienv	Envelope start position
jenv	Envelope end position
iali	Alignment start position
jali	Alignment end position
bias	null2 score contribution
oasc	Optimal alignment accuracy score
bitscore	Overall score in bits, null corrected, if this were the only domain in seq
cevalue	Conditional E-value based on the domain correction
ievalue	Independent E-value based on the domain correction
is_reported	1 if domain meets reporting thresholds
is_included	1 if domain meets inclusion thresholds
alimodel	Aligned query consensus sequence phmmer and hmmsearch, target hmm for hmmscan
alimline	Match line indicating identities, conservation +’s, gaps
aliaseq	Aligned target sequence for phmmer and hmmsearch, query for hmmscan
alippline	Posterior probability annotation
alihmmname	Name of HMM (query sequence for phmmer, alignment for hmmsearch and target hmm for hmmscan)
alihmmacc	Accession of HMM
alihmmdesc	Description of HMM
alihmmfrom	Start position on HMM
alihmmto	End position on HMM
aliM	Length of model
alisqname	Name of target sequence (phmmer, hmmscan) or query sequence(hmmscan)
alisqacc	Accession of sequence
alisqdesc	Description of sequence
alisqfrom	Start position on sequence
alisqto	End position on sequence
aliL	Length of sequence

Appendix B - response codes¶

One of the philosophies of a RESTful API is to also pass the appropriate HTTP status code in response to the query URL. Most of the time a 200 (success) status code will be received. However, there may be times when that is not the case. There is a complete list of HTTP codes elsewhere, but we have listed most of the status codes that may be returned and how they relate to what is actually going on at the server.

200 (OK): The job has either been run or queued up successfully. In the former case, the body should contain the results, whereas the latter will contain your job identifier that can be used to query/fetch the results in the future.
201 (Create): The job has been created successfully. Response will contain either the content describing the job and/or a redirection to the created resource in the HTTP header.
202 (Accepted): The job has been accepted by the search system and is either pending (waiting to be started) or running. After a short delay, your script should check for results again.
302 (Found/Redirection): The request was found, but the client must take additional action to complete the request. Usually there is a redirection URL found in the response header.
400 (Bad Request): Your job contained either invalid parameters or parameter values. The body of your response should contain information about which parameter or value failed and possibly the reason why it failed. If you continue to receive this in response to a request and can not understand why it is failing, you should contact the help desk for assistance.
410 (Gone): Your job was deleted from the search system. This may be because the time that we have been able to store the results has expired or that you have explicitly asked for the results to be deleted.
500 (Internal server error): There was a problem with running your job, typically due to a problem with the back-end compute servers, rather than the job itself. The body of the response may contain an error message from the server. Contact the help desk for assistance with the problem.
502 (Bad gateway): There was a problem scheduling or running the job. The job has failed and will not produce results. There is no need to check the status again.
503 (Service unavailable): The body of the response may contain a message as to why the job has been put on hold. This may be due to site maintenance, database updates, queue overload or if there is a problem. This status is set typically by an administrator and should this status code be present for longer that a few hours, you should contact the help desk.

Appendix C - data formats¶

The RESTful interface supports three different, commonly used, machine readable formats: XML, JSON and YAML. In addition to these, we also provide HTML and text. Which format used is really down to personal choice. XML is widely used with libraries in many different languages. JSON is readily applicable to use with websites, in which a server may make a call to a HMMER web service and pass the resulting JSON string back to the client/browser, where the HMMER result may be post-processed by JavaScript running on the client. YAML is a more recent markup language which, despite being readily parsed by software, is more human-readable than XML or JSON. The HTML responses are not really meant for anything other than a browser or command line tools such as curl or wget. The text output is the best output if you want to cut and paste results into a lab book.

Appendix D - unsupported features¶

We have tried to provide as many services as possible via REST. However, there are still a few things that we do not provide. For example, there is no way of generating a domain graphic or getting a graph of the distribution of hits. We can not provide this via REST as the both of these are generated client side using JavaScript libraries and the HTML5 canvas element. The RESTful services are also, naturally, restricted to just the set of HMMER programs that are available via the website. But, if there is something that you think would be useful, then please get in touch and we will consider it for inclusion.

Appendix E - Job ID¶

The job ID, also refered to as UUID (Universally Unique IDentifier), is a 36 character sequence that looks like 10F15DB0-2E1C-11E0-B944-D59DDB6B6FDE and that uniquely identifies a job submitted on the website.

Appendix F - JSON format¶

The results visualised in the score, taxonomy and architecture views are all available using the API in JSON format. For the score endpoint, the object returned includes the Stats, Sequence and Domain hashes referred to above (Appendix A).

The taxonomy endpoint provides a recursive hash of the tree with the keys

id	NCBI taxonomy identifier
parentid	taxonomy identifier of parent
name	taxonomy name
hitcount	number of hits to this node
hitdist	binned log e-values of hits to this node
children	children of this node (recursive)