The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at http://knime.com.

Entropy Scorer

Scorer for clustering results given a reference clustering. Connect the table containing the reference clustering to the first input port (the table should contain a column with the cluster IDs) and the table with the clustering results to the second input port (it should also contain a column with some cluster IDs). Select the respective columns in both tables from the dialog. After successful execution, the view will show entropy values (the smaller the better) and some quality value (in [0,1] - with 1 being the best possible value, as used in Fuzzy Clustering in Parallel Universes , section 6: "Experimental results").

Dialog Options

Reference column
Column containing the reference clustering. This column is provided by the first input table.
Clustering column
Column containing the cluster IDs to evaluate. This column is provided by the second input table.

Ports

Input Ports
0 Table containing reference clustering.
1 Table containing clustering (to score).
Output Ports
0 Table containing entropy values for each cluster. The last row contains statistics on the entire clustering. It corresponds to the table show in the Statistics View.

Views

Statistics View
Simple statistics on the clustering such as number of clusters being found, number of objects in clusters, number of reference clusters, and total number of objects. Further statistics include: The table at the bottom of the view provides statistics on cluster size, cluster entropy, normalized cluster entropy and quality. The entropy of a clusters is based on the reference clustering (provided at the first input port) and the normalized entropy is this value scaled to an interval [0, 1]. More precisely, it is the entropy divided by log2(number of different clusters in the reference set). The quality value is only available in the last row (showing the overall statistics).
This node is contained in KNIME Base Nodes provided by KNIME GmbH, Konstanz, Germany.