The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at http://knime.com.
This node draws ROC curves for two-class classification problems. The input table
must contain a column with the real class values (including all class values as possible values)
and a second column with the probabilities that an item (=row) will be classified as being
from the selected class. Therefore only learners/predictors that output class probabilities can
be used.
In order to create a ROC curve for a model, the input table is first sorted by the class probabilities
for the positive class i.e. rows for which the model is certain that it belongs to the positive class
are sorted to front. Then the sorted rows are checked if the real class value is the actually the positive
class. If so, the ROC curve goes up one step, if not it goes one step to the right. Ideally, all positive
rows are sorted to front, so you have a line going up to 100% first and then going straight to right. As a
rule of thumb, the greater the area under the curve, the better is the model.
You may compare the ROC curves of several trained models by first joining the class probability columns
from the different predictors into one table and then selecting several column in the column filter
panel.
The light gray diagonal line in the diagram is the random line which is the worst possible performance a
model can achieve.
0 | Input data with actual values and class probabilities |
0 | A one-column table with the area(s) under the ROC curve(s) |