The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at http://knime.com.

Linear Correlation

Calculates for each pair of selected columns a correlation coefficient, i.e. a measure of the correlation of the two variables.

Which correlation measure is applied depends on the types of the underlying variables:
numeric <-> numeric: Pearson's product-moment coefficient. Missing values in a column are ignored in such a way that for the computation of the correlation between two columns only complete records are taken into account. For instance, if there are three columns A, B and C and a row contains a missing value in column A but not in B and C, then the row will be ignored for computing the correlation between (A, B) and (A, C). It will not be ignored for the correlation between (B, C). This corresponds to the function cor(<data.frame>, use="pairwise.complete.obs") in the R statistics package.
The value of this measure ranges from -1 (strong negative correlation) to 1 (strong positive correlation). A value of 0 represents no linear correlation (the columns might still be highly dependent on each other, though).
nominal <-> nominal: Pearson's chi square test on the contingency table. This value is then normalized to a range [0,1] using Cramer's V, whereby 0 represents no correlation and 1 a strong correlation. Missing values in nominal columns are treated such as they were a self-contained possible value. If one of the two columns contains more possible values than specified in the dialog (default 50), the correlation will not be computed.
Correlation measures for other pairs of columns are not available, they are represented by missing values in the output table and crosses in the accompanying view.

Dialog Options

Column Filter
Include
This list contains the column names for which correlation values shall be computed.
Enforce Inclusion
Select this option to enforce the current inclusion list to stay the same even if the input table specification changes. New columns will automatically be added to the exclusion list.
Select
Use these buttons to move columns between the Include and Exclude list.
Search
Use one of these fields to search either within the Include or Exclude list for certain column names or name substrings. Repeated clicking of the search button marks the next column that matches the search text. The check box 'Mark all search hits' causes all matching columns to be selected making them movable between the two lists.
Exclude
This list contains the column names of the input table that are left out of the calculation.
Enforce Exclusion
Select this option to enforce the current exclusion list to stay the same even if the input table specification changes. New columns will automatically be added to the inclusion list.
Possible Values Count
Select an upper bound for the number of possible values for each of the nominal columns. If more values are encountered in a nominal column, the column will be ignored (no correlation values will be computed).

Ports

Input Ports
0 Numeric input data to evaluate
Output Ports
0 Correlation variables in a square matrix
1 A model containing the correlation measures. This model is appropriate to be read by the Correlation Filter node.

Views

Correlation Matrix
Squared table view showing the pair-wise correlation values of all columns. The color range varies from dark red (strong negative correlation), over white (no correlation) to dark blue (strong positive correlation). If a correlation value for a pair of column is not available, the corresponding cell contains a missing value (shown as cross in the color view).
This node is contained in KNIME Base Nodes provided by KNIME GmbH, Konstanz, Germany.