Inputs
The inputs to the server are a structure file in the PDB format and a structural alphabet. The PDB file input contains the atomic coordinates describing the 3D structure of the protein. It can be obtained from the worldwide Protein Data Bank (wwPDB) (http://www.wwpdb.org). The user can provide the input structure in different ways: by providing a PDB identifier or by uploading a PDB file format thanks to a file select control on the webform. A drop-down list on the webform allows a choice amongst two structural alphabets: SAFlex.V1 or SAFlex.beta. These choices are mandatory for encoding protein chains in the PDB entry. If the user tries to submit the form without PDB file or selecting a structural alphabet, an error message indicates to the user to fill in the missing data will be displayed. The form data are processed by a C++ programs that produce several Fasta file results. The job manager, running on a Linux server, processes a single job by submitting it to machine server and starts programs to run everything necessary to get the results requested by the user.
Outputs
The results page is displayed online and organized in five sections: the 3D view results, the protein structure details, the entropy results and download outputs. These sections are detailed in the following paragraphs.
3D view results
SAFlex server has been developed to interpret the outputs from C++ programs and to provide users graphical representations of their encoding protein structures. The input structure is displayed in PV and JSmol window on the left side of the results page. So the users can visualize protein 3D structure directly in the website. They can also zoom on different parts of the structure, save a high-resolution image of the customized structure view and rotate the structure in the PV or JSmol. Four distinct display modes are available to render structures in different styles and colors. The users can select from three modes of predefined style and a new customized style. The default mode colors residues by subunit (color chain). This mode renders all protein subunits in different colors. The color SS mode colors the structure based on secondary structure type and the color AA colors the structure by residue. The custom mode color SL colors structural fragments by structural letter. This mode is very important to show the users what a structural letter looks like. So the 3D structure view and the structural letters sequence can be linked. All display modes render structure in Cartoon style for both viewers PV and JSmol. In addition, SAFlex web server provides three other options to explore the selecting polymer chain. The default option visualizes the 3D view encoding chain by displaying all polymer chains but only the selecting chain is colourful, all other chains are turn grey. The users can also select Show only this chain or Center this chain with a focus on the selected chain or preferably both options together. These are very useful options in case where the selected structure has several entities including protein, DNA, RNA and ligands. The 3LK4 entry is a concrete example of such case that includes 36 protein chains. Thereby combining the two options Show only this chain and Center this chain can be a very practical and precise solution. It should be noted that all display modes mentionned above are functional also with these three options.
Protein structure details
For each chain of polymer in the input structure a section briefly summarizes informations about the polymer. The informations displayed include the protein name, the chain reference (Uniprot identifier), the chain length, the resolved chain length, missing residues number and amino acid residues sequence. It can also be found interesting informations on encoding protein chain as fragments number, and the structural letter sequence. These chain informations are displayed on the right side of the web results page, then the structural encoding can be viewed in JSmol and PV on the left side of the web page. As we have already mentioned previously in section structural alphabet description model, the chain structure is divided into overlapping fragments sequence. Therefore, each letter in the chain structural encoding corresponds to a structural fragment including four residues. It has a specific color dependent on its class and its frequency. Moreover, if the user hovers the pointer over a structural letter, without clicking it, a tooltip may appear with information about the fragment number and the residues that build it. For each of these residues, the number and the name that are mentioned in PDB file, are provided for the users by the tooltip. Furthermore, the users can click on the link provided above the 3D view display windows to provide access to all details about the input structure in the RCSB PDB website. On another note, a link to encoding entropy results is available on any protein chain information window, discussed in the next section.
Entropy and NEFF results
SAFlex Server has also been developed to provide a mean to check and ensure the quality of structural encoding using the entropy. It interprets entropy results by graphical representations. Thereby, three visual presentation modes are available for each protein chain in the PDB entry. The first mode displays a chart on top left corner of the entropy results page. It shows the encoding entropy value for each structural fragment. The users can select from three display options: bar plot or line plot or combine together. They can also save the chart with the required option. SAFlex provides color legend as a key to explain the significance of the encoding entropy. The green bar in the graph indicates that the fragment encoding is certain. However, the red bar do not communicate information about encoding certainty, but this is a very useful indicator for missing data in PDB structures. The bar graph changes from green to red when the setpoint (equal to 0.75) is exceeded. The second and third mode display for the protein chain, the structural sequence according to entropy values followed by the display of the chain in PV window. Therefore, the users can save a high-resolution image of this customized structure view. Furthermore, the users can change the protein chain by selecting any chain in the drop-down list to switch to the selected chain results display.
The effective number (NEFF) of SL at each given position is also derived from the entropy of POST. This NEFF is typically close to 1 when the encoding is highly certain, and can reach 27 for totally uncertain positions. This NEFF hence provides a convenient measurement of the encoding certainty.
Consensus 3D encoding results
The new model can take into account multi-chains data and include a new consensus encoding for homomers. SAFlex propose to encode homomers either as independent structures (one structural sequence per chain) or as a single consensus one, where a single hidden structural sequence is shared by all homomeric chains. The resulting consensus encoding hence represents the variability of the homomer across the chains, this variability being either due to measurement uncertainty or to intrinsic flexibility.
POST results
The marginal posterior distribution (POST), can be used to quantify the level of uncertainty of the structural letter encoding by computing the marginal posterior entropy (ENT) and effective number of SL (NEFF). This information is particularly useful for the structural regions where the encoding is difficult or variable