The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at http://knime.com.
Calculates for each pair of selected columns a correlation coefficient, i.e. a measure of the correlation of the two variables.
Which correlation measure is applied depends on the types of the
underlying variables:
numeric <-> numeric:
Pearson's product-moment coefficient.
Missing values in a column are ignored in such a way that for the
computation of the correlation between two columns only complete
records are taken into account. For instance, if there are three
columns A, B and C and a row contains a missing value in column A
but not in B and C, then the row will be ignored for computing the
correlation between (A, B) and (A, C). It will not be ignored for
the correlation between (B, C). This corresponds to the function
cor(<data.frame>, use="pairwise.complete.obs")
in the R statistics package.
The value of this measure ranges from -1 (strong negative
correlation) to 1 (strong positive correlation). A value of 0
represents no linear correlation (the columns might still be
highly dependent on each other, though).
nominal <-> nominal:
Pearson's chi square test on the contingency table.
This value is then normalized to a range [0,1] using
Cramer's V, whereby 0 represents no correlation and 1
a strong correlation. Missing values in nominal columns are
treated such as they were a self-contained possible value.
If one of the two columns contains more possible values than
specified in the dialog (default 50), the correlation will not
be computed.
Correlation measures for other pairs of columns are not
available, they are represented by missing values in the output
table and crosses in the accompanying view.
0 | Numeric input data to evaluate |
0 | Correlation variables in a square matrix |
1 | A model containing the correlation measures. This model is appropriate to be read by the Correlation Filter node. |