Protein-Protein Interaction Data Set Builder

This service was created by Dr. Xue-wen Chen's lab and is hosted on KU's ITTC web servers. For more information click here. No registration or email is required to use this service. Results are available via URL up to 48 hours after generation. The options are arranged in a series of panels, each panel will refine the final data set. Additional information is provided on each page.


  1. Select filters for the positive set (known interactions) and a set size.
  2. Select a method for selecting the negative set and a negative set size.
  3. Select the features to include.
  4. Specify output format.
  5. Confirm and download.

Sample Input ( help ):

  • Example 1: 500 random positive and negative interactions with sequences.
  • Example 2: 1000 human protein interactions and 800 negative interactions with PSSM and GO Annotation
  • Example 3: 600 Antibody-Antigen interactions and negative selection with Resnik GOA similarity with species, locality and PSSM
  • Example 4: 5000 random positive and negative interactions with several features.

Species (e.g. 'Homo sapiens', help):

Interaction Types (e.g. 'Direct Interaction', help):

Detection Methods (e.g. 'Use All', help):

These settings effect the generation of the negative interaction training set.

Negative Set Parameters [minimum=100, maximum=75,000] (help):

Use only proteins from interactions in the positive set.

Negative interaction selection strategy (e.g. 'Uniform random pairs', help):

All methods prevent selection of pairs from known interactions.

Proteins pairs are selected at random with equal probability.
Protein pairs are selected to minimize functional similarity using Gene Ontology annotations.
Protein pairs are selected to avoid proteins with the same locality using Gene Ontology annotations.
Only protein pairs with Pfam pairs listed in Negatome's combined non-interacting protein domain dataset are used. Note, here the 'restrict' option is enforced for only one protein instead of both.

Include the following protein features (help):

Selecting PSSM can significantly increase the processing time due to the additional steps on the file compression.

Include sequence annotations using the following AAIndices (help):

Enter a keyword or AAindex name and select an aaindex from the list.
To give more details on AAindex KUPS AAindex offers advanced AAindex search.

These settings allow specific formatting of the generated files. By default, a general format including all information is used. Custom formats can be specified by providing templates below.

Use custom formatting

Positive Interaction File Format (e.g. 'p1=%p1 p2="%p2"' produces 'p1=Q8WZ42 p2="Q8FX16"'):

%p1 : Uniprot ID 1 (e.g. "Q8WZ42")
%p2 : Uniprot ID 2 (e.g. "Q8FX16")
%it : Interaction Type (e.g. "Physical Association")
%detect : Detection method (e.g. "Y2H")
%db : Database of origin (e.g. "IntAct")
%id : ID in database (e.g. "EBI-375746")

Negative Interaction File Format:

%p1 : Uniprot ID 1 (e.g. "Q8WZ42")
%p2 : Uniprot ID 2 (e.g. "Q8FX16")
%sim : Calculated similarity (If using 'Functionally dissimilar pairs')
%loc1 : Locality for p1 (If using 'Spatially seperate pairs')
%loc2 : Locality for p2
%pfam1 : Pfams for p1 (If using Non-interacting domains)
%pfam2 : Pfams for p2

Annotation File Formatting:

%pid : Uniprot ID (e.g. "Q8WZ42")
%name : Protein full name (e.g. "Titin")
%taxid : NCBI taxonomic ID (e.g. "9606")
%species : Species full name (e.g. "Homo sapiens")
%goterms : Tab delimited GO annotations
%seq : Amino acid sequence
%aaindexa : AAIndex annotated sequences