PASSer:

Protein Allosteric Sites Server

Tutorial

This tutorial shows the major functions of PASSer and helps to understand the requirements for the input data and explain the output results.

Introduction

PASSer is a web server to predict the probabilities of protein pockets being allosteric sites. For each protein structure, PASSer applies FPocket[1] to split it into several pockets. Currently, PASSer provides three machine learning models: (1) ensemble learning through extreme gradient boosting[2] and graph convolutional neural network[3]; (2) automated machine learning through AutoGluon framework[4]; and (3) learning to rank through lightgbm[5].

Input Data

Two options are provided for input. If there is an existing PDB ID in the Protein Data Bank, the user can enter the four-letter PDB ID. The user can also upload a customed PDB file whose name ends with ".pdb". Please refer to this link to understand the PDB format.

By default, all chains in the PDB file will be used in the calculation and analysis if the chain ID box is left blank. A chain ID is needed if the user wants to analyze a specific chain. The chain ID can be either a single letter, such as "A", or multiple letters separated with a comma, such as "A,B". Please notice that there is no space between the comma and chain ID.

Table below summarizes the speed and prediction type of each model. Through extensive testing of multiple mid-sized proteins (100 to 300 residues), the ensemble learning model can finish prediction tasks around 1 to 2 seconds on average and the learning-to-rank model is slightly faster. The automated ML model takes around 20 seconds due to the loading of 14 base models. For prediction types, both the ensemble learning model and automated ML output the probability of pocket being allosteric sites. Instead, the learning-to-rank model predicts the relevance to allosteric pocket. We recommend the users choosing ensemble learning and learning-to-rank models for time sensitive tasks, ensemble learning and automated learning models for good output interpretability, and learning-to-rank model for benchmark study and performance comparison.

Selection criteria of machine learning models on PASSer.
Model	Speed	Type
Ensemble	Fast (1s)	Probability
AutoML	Slow (20s)	Probability
Rank	Fast (1s)	Score

Output Results

The result page shows the top three pockets most likely to be allosteric sites with corresponding predicted probabilities/scores. A result table is provided to show the residues in each pocket. Click "Show Residues" to see those residues, and click again to close the popup window. Click the link on the result page to download a zip file containing FPocket results and PASSer predictions. The PASSer result file is named passer.txt, including the top 3 pocket numbers indicated in FPocket results and their probabilities/scores.

Pocket Number	Probability	Residues
1	89.65%	Show Residues
2	20.23%	Show Residues
3	16.84%	Show Residues

The user can interact with the protein in the window powered by JSmol. The complete tutorial of JSmol can be found here. The top 3 pockets are colored red, orange, and gold. Pockets can be seen upon clicking the "Load Pocket x" button. The user can either click "Hide Pocket x" for specific pockets or "Reset" to hide all currently shown pockets.

Table below is a description of the files included in the downloadable ZIP file.

Description of files in the ZIP file.
File name	Description
PDBID_info.txt	FPocket calculated features of each pocket
PDBID_out.pdb	FPocket generated PDB file.
PDBID_pockets.pqr	FPocket generated PQR file.
PDBID_PYMOL.sh	Visualization script in PyMOL.
PDBID_VMD.sh	Visualization script in VMD.
PDBID.pml	FPocket generated PML file.
PDBID_out.pdb	FPocket generated TCL file.
passer.txt	Prediction results of all detected pockets.
pockets	A folder of all detected pockets.

References

[1] Le Guilloux, V., Schmidtke, P. and Tuffery, P., 2009. Fpocket: an open source platform for ligand pocket detection. BMC bioinformatics, 10(1), pp.1-11.
[2] Chen, T. and Guestrin, C., 2016, August. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
[3] Kipf, T.N. and Welling, M., 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
[4] Erickson, N., Mueller, J., Shirkov, A., Zhang, H., Larroy, P., Li, M. and Smola, A., 2020. Autogluon-tabular: Robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505.
[5] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.Y., 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.