KUPS - Help

A : Nevigation taps to define parameters for building a dataset. This helps to move any steps in parameter definition by clicking corresponding taps

B : Each example automatically defines parameters in each step. Once example is loaded (see the below image), please navigate and take a look each step how parameters are defined by the given example.


C : 'Next Page' will get you to the next step of parameter selection.
A : Defining the number of Protein-protein interaction (PPI) pairs: Minimum = 100, Maximum=15,000, Default = 1,500 protein-protein interaction pairs

B : Species can be chosen either all species or user-defined species by clicking “ Use All ” or “ Select Species…” respectively .

C : Selected species appears in this area and can be cancled by clicking 'remove'

D : This can specify the interaction types of protein pairs. Users can decide to use all interaction types or specific interaction types by clicking “User All” or “Select Interaction Types…” respectively.

E : This can specify the detection methods for PPI pairs. Users can decide to use all interaction types or specific interaction types by clicking “User All” or “Select Detection Methods…” respectively. In this example, All detection methods are selected and appeared green color.

F : 'Prev Page' will get you back to the previous step of parameter selection, so you can change parameters any time before submitting the form.

******** NOTICE ******** : If you cannot get any results on a specific species, please check out the filtering options.
Some species in KUPS are not available for filtering all ‘Interaction Types’ and ‘Detection Methods’ since some of filtering options are not belonging to the species.
To prevent or avoid this problem, please choose ‘Use All’ filtering option first and then narrow down the filtering options.
A : Defining the number of non-interacting protein pairs ( Negative set ): Minimum = 100, Maximum=15,000, Default = 1,500 protein-protein interaction pairs
Restrict protein selection pool— if it is checked then interacting protein pairs filtered by pre-defined parameters through a user interface of positive selection are considered as only positive PPIs; therefore information on selecting negative (non-interacting) protein pairs are restricted by only these filtered positive PPIs. Otherwise entire PPIs in our database are considered for selecting negative sets. Unchecking this option can be resulted in significant speed reduction due to increasing search space .

B : Possible methods to generate negative dataset (non-interacting protein pairs).
  • Uniform random pairs: All protein pairs that are not in the positve set. Positive interacting protein pairs can be defined either using all PPIs in our database or user defined PPIs through positive set filtering.
  • Functionally dissimilar pairs: Calculating functional similarity by using GO terms among PPI pairs that statisfy the conditions of "Uniform random pairs" such that semantic distances of GO annotations between PPI pairs are defined with the lowest common ancestor of GO terms in PPI pairs
  • Spatially separate pairs: Comparing subcellular localization by using annotations of cellular component defined in UniProt among PPI pairs that statisfy the conditions of "Uniform random pairs"
  • Non-interacting domains: Mapping proteins into Pfam and selecting protein pairs with non-interacting domains defined in Negatome Database. These pairs also statisfy the conditions of "Uniform random pairs"
For more informaion about the methods, please refer "citation".
A : Selecting features that will appear in the final dataset
  • Interaction type: Interaction type of the given pair which was defined in the positive parameter selection
  • Detection method: Detection method for defining PPI pairs which was defined in the positive parameter selection
  • AA sequence: Amino acid sequence retrieved from UniProt
  • Species: Specifying species of referred proteins
  • Locality: Subcellular localization defined in UniProt
  • GO annotation: Gene Ontology annotation
  • PSSM: Position Specific Score Matrix (PSSM) produced by PSI-BLAST witho non-redundent database using default value except e-value=0.001 and iteration=3 were used
    ADDING this feature WILL INCREASE the PROCESSING TIME due to requiring extra processing steps for creating compressed PSSM files.
B : Generating data matrix by converting sequences into real values based on the AAindex scales. Auto-complete text box is supported, so this will show whole list of AAindices which are matched with typed key words (i.e. at least 4 letters: copy & paste may NOT work properly) or AAindex ID. In this example "alpha-helix indices" are used and it shows entire AAindices which are matched with our database. For more about the AAindex, please refer "citation".

KUPS AAindex offers advanced AAindex search
A : It shows the summary of parameter selections on Positive Set.

B : It shows the summary of parameter selections on Negative Set.

C : It shows the summary of parameter selections on appearing Features in the final output file.

D : By clicking 'Submit' button, the queries are confirmed and submitted to KUPS, and the page is redirected to 'Results' tap. The image of download page is shown below.
Before get to the final download page, you will see processing status box, and it will automatically redirect the page to the download page. If expected datasets are large, it may take time, but you do not need to wait until it shows the download page. You can bookmark the page by clicking the link in red circle or copy the URL appeared in the bottom of the box for later visit.

TOP









Positive Set

FILE NAME :positive_interactions.txt

A : UniProt ID of one of the PPI pairs(P1)
B : UniProt ID of one of the PPI pairs(P2)
C : Data source of PPIs
D : UniProt Taxonomic identifier of P1 in PPI pairs
E : UniProt Taxonomic identifier of P2 in PPI pairs

Negative Set

FILE NAME :negative_interactions.txt

A : UniProt ID of one of the non-PPI pairs(P1)
B : UniProt ID of one of the non-PPI pairs(P2)

Annotations

FILE NAME :annotations.txt

ID : UniProt ID
SQ : Amino acid sequence
SP : UniProt Taxonomic identifier for species
GO : 2nd column: GO term, 3rd column: Evidence codes
  • EXP: Inferred from Experiment
  • IDA: Inferred from Direct Assay
  • IPI: Inferred from Physical Interaction
  • IMP: Inferred from Mutant Phenotype
  • IGI: Inferred from Genetic Interaction
  • IEP: Inferred from Expression Pattern
  • ISS: Inferred from Sequence or Structural Similarity
  • ISO: Inferred from Sequence Orthology
  • ISA: Inferred from Sequence Alignment
  • ISM: Inferred from Sequence Model
  • IGC: Inferred from Genomic Context
  • RCA: inferred from Reviewed Computational Analysis
  • TAS: Traceable Author Statement
  • NAS: Non-traceable Author Statement
  • IC: Inferred by Curator
  • ND: No biological Data available
  • IEA: Inferred from Electronic Annotation
  • NR: Not Recorded
AA : Amino acid index &mdash the given amino acid sequence is converted into AAindex scale (2nd column = AAIndex ID, rest of columns = converted values)




TOP









Contents

Description
  • Test set is collections of dataset which have been published in journals.
  • Some of data can be updated without any notification and we may not update promptly, so please refer original source to get latest datasets.
A : By clicking this, it will get you to "Predicting Protein Function" appeared in below image "A"

B : By clicking this, it will get you to "Predicting Protein Interface Residues" appeared in below image "B"




TOP









Contents

Description
  • Benchmark provides competitions and comparisons among participators.
  • We will generate golden protein-protein interaction datasets with various features, so participators can freely use this dataset and compare it to their own algorithm.
A : By clicking this, it will show definitions of evaluation methods which appeared below.

B : By clicking this, it will change the symbol (+) into (-) and show the confusion matrix correspnding to the selected classifier

Definition




TOP