This tutorial shows the major functions of PASSer and helps to understand the requirements for the input data and explain the output results.
Two options are provided for input. If there is an existing PDB ID in the Protein Data Bank, the user can enter the four-letter PDB ID. The user can also upload a customed PDB file whose name ends with ".pdb". Please refer to this link to understand the PDB format.
By default, all chains in the PDB file will be used in the calculation and analysis if the chain ID box is left blank. A chain ID is needed if the user wants to analyze a specific chain. The chain ID can be either a single letter, such as "A", or multiple letters separated with a comma, such as "A,B". Please notice that there is no space between the comma and chain ID.
Table below summarizes the speed and prediction type of each model. Through extensive testing of multiple mid-sized proteins (100 to 300 residues), the ensemble learning model can finish prediction tasks around 1 to 2 seconds on average and the learning-to-rank model is slightly faster. The automated ML model takes around 20 seconds due to the loading of 14 base models. For prediction types, both the ensemble learning model and automated ML output the probability of pocket being allosteric sites. Instead, the learning-to-rank model predicts the relevance to allosteric pocket. We recommend the users choosing ensemble learning and learning-to-rank models for time sensitive tasks, ensemble learning and automated learning models for good output interpretability, and learning-to-rank model for benchmark study and performance comparison.
Model | Speed | Type |
---|---|---|
Ensemble | Fast (1s) | Probability |
AutoML | Slow (20s) | Probability |
Rank | Fast (1s) | Score |
The result page shows the top three pockets most likely to be allosteric sites with corresponding predicted probabilities/scores. A result table is provided to show the residues in each pocket. Click "Show Residues" to see those residues, and click again to close the popup window. Click the link on the result page to download a zip file containing FPocket results and PASSer predictions. The PASSer result file is named passer.txt, including the top 3 pocket numbers indicated in FPocket results and their probabilities/scores.
Pocket Number | Probability | Residues |
---|---|---|
1 | 89.65% | Show Residues |
2 | 20.23% | Show Residues |
3 | 16.84% | Show Residues |
File name | Description |
---|---|
PDBID_info.txt | FPocket calculated features of each pocket |
PDBID_out.pdb | FPocket generated PDB file. |
PDBID_pockets.pqr | FPocket generated PQR file. |
PDBID_PYMOL.sh | Visualization script in PyMOL. |
PDBID_VMD.sh | Visualization script in VMD. |
PDBID.pml | FPocket generated PML file. |
PDBID_out.pdb | FPocket generated TCL file. |
passer.txt | Prediction results of all detected pockets. |
pockets | A folder of all detected pockets. |
[1] Le Guilloux, V., Schmidtke, P. and Tuffery, P., 2009. Fpocket: an open source platform for ligand pocket detection. BMC bioinformatics, 10(1), pp.1-11.
[2] Chen, T. and Guestrin, C., 2016, August. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
[3] Kipf, T.N. and Welling, M., 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
[4] Erickson, N., Mueller, J., Shirkov, A., Zhang, H., Larroy, P., Li, M. and Smola, A., 2020. Autogluon-tabular: Robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505.
[5] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.Y., 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.