Predicted scaled solubility:




The scaled solubility value (QuerySol) is the predicted solubility. The population average for the experimental dataset (PopAvrSol) is 0.45, and therefore any scaled solubility value greater than 0.45 is predicted to have a higher solubility than the average soluble E.coli protein from the experimental solubility dataset Niwa et al 2009, and any protein with a lower scaled solubility value is predicted to be less soluble.

The protein-sol sequence algorithm calculated 35 sequence features. This includes the composition of the standard 20 amino acids and sequence length (len), as well as the following features which are calculated over a sliding 21 amino acid window.

There are 7 amino acid composite scores:
KmR = K minus R, DmE=DminusE, KpR = K plus R, DpE = D+E, PmN = K+R-D-E, PpN = K+R+D+E, aro = F + W + Y

We then calculate a further 7 sequence features:
fld = folding propensity Uversky et al 2000, dis = disorder propensity Linding et al 2003, bet = beta strand propensities Costantini et al 2006, mem = Kyte-Doolittle hydropathy Kyte and Doolittle 1982, pI, ent = sequence entropy, abs = absolute charge at pH 7.

Further information is available in the paper.


Hebditch M, Carballo-Amador M.A., Charonis S, Curtis R, Warwicker J
Protein-Sol: a web tool for predicting protein solubility from sequence.
Bioinformatics (2017)