We provide machine learning predictions for the 12 biophysical platforms assessed in the Jain 2018 dataset. Each experimental platform is ranked from 1-100 where 1 is the best score, and 100 the worst for that platforim. We also provide a meta score which combines and averages multiple biophysical platforms. The meta score is calculated by ranking the original Jain dataset in order from best to worst result, and then calculating where the candidate sequence falls within that ranking for each biophysical platform. We then combine and average the ranks for the biophysical platforms as follows:
Group X: ELISA, BVP ELISA, PSR, CSI, ACC STAB and CIC
Group Y: SMAC and HIC
The lower the ranking the better for each group, and thus the closer to origin (0,0) the better we predict the candidate to behave on the platforms.
The above scatter plot shows the experimental results for the Jain clinical stage therapeutics on the X axis, and our prediction for the same protein, from the available scFV sequence, on the Y axis. Hovering over each data point for the Jain dataset displays the real experimental value and the predicted value from the algorithm.
The Jain set of antibodies are color coded in 3 shades of green dependent on their current progression through the FDA approval pipeline. In orange is the candidate sequence provided by the user. For the candidate protein only the predicted value is available.
The heatmap is color coded for each scFV dependent on the threshold value. This threshold value is calculated by taking the worst 10% cutoff for the predicted Jain dataset values. If the predicted value is above the threshold value for the experiment, the corresponding square is colored red, otherwise it is colored green. Hovering over the heatmap changes the displayed scatter graph to display the predictions for that category. We also provide a ranking of the candidate antibody in comparison to the Jain dataset.
Hebditch M and Warwicker J.
Charge and hydrophobicity are key features in sequence-trained machine learning models for predicting the biophysical properties of clinical-stage antibodies