KUPS Benchmark Collection


Citation:

Chen, X. W., J. C. Jeong, et al. (2011). "KUPS: constructing datasets of interacting and non-interacting protein pairs with associated attributions." Nucleic Acids Res 39(Database issue): D750-754.

Jeong, J. C., X. Lin, et al. (2011). "On position-specific scoring matrix for protein function prediction." IEEE/ACM Trans Comput Biol Bioinform 8(2): 308-315.

Descriptions:

PPI dataset #1: Balanced problem - In this set, the the quantities of positive and negative samples are balanced.

PPI dataset #2: Imbalanced problem - In this set, there are significantly more negative samples than positive samples. This distribution more closely mirrors the observed sparsity of protein interactions.

Notes:

Protein-protein interaction (PPI): Positives

Non-interacting protein pairs (NIP): Negatives

Please click '+' symbol on each classifier to see the details of the experimental environment


  • TP : the number of positive samples that are predicted as positive
    TN : the number of negative samples that are predicted as negative
    FP : the number of negative samples that are predicted as positive
    FN : the number of positive samples that are predicted as negative

    Overall =
    Specificity =
    Recall(Sensitivity) =
    Precision =
    F-measure =
    Correlation coefficient (CC) =


PPI dataset #1

  • Download
  • The number of features: 400
  • TRAIN: 5259 Positives, 5259 Negatives
  • TEST: 5258 Positives, 5258 Negatives
Classifier Overall Sensitivity Specificity Precision F-measure CC
expand NaiveBayes 57.6% 73.7% 41.5% 55.7% 63.5% 0.16
Confusion Matrix
Predicted
Actual -1 1
-1 2180 3078
1 1382 3876
  • Software : NaiveBayes.fit() in Matlab2009a
  • Parameters : Using default
  • expand Decision Tree (C4.5) 58.9% 59.4% 58.3% 58.8% 59.1% 0.18
    Confusion Matrix
    Predicted
    Actual -1 1
    -1 3066 2192
    1 2134 3124
    • Software : classregtree() in Matlab2009a
    • Parameters : Method='classification', others=default
    expand Support Vector Machine 70.8% 65.8% 75.8% 73.1% 69.3% 0.42
    Confusion Matrix
    Predicted
    Actual -1 1
    -1 3988 1270
    1 1800 3458
    • Software : LIBSVM
    • Parameters : Kernel=RBF, C=1, Gamma=0.0025
    expand Random Forests 71.5% 69.0% 74.1% 72.7% 70.8% 0.42
    Confusion Matrix
    Predicted
    Actual -1 1
    -1 3894 1364
    1 1632 3626






    PPI dataset #2

    • Download
    • The number of features: 400
    • TRAIN: 4249 Positives, 15000 Negatives
    • TEST: 4249 Positives, 15000 Negatives





    Purpose

        These benchmarks and results provide a framework for comparing published PPI prediction algorithms.


    Submissions

        Results should be submitted to Xue-wen Chen. Please include a citation to the relevant publication.


    Why submit?

    • Improve this service.
    • Compare results with other researchers.