The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at
http://knime.com.
Crosstab
Creates a cross table (also referred as
contingency table or cross tab). It can be used to analyze
the relation of two columns with categorical data and does
display the frequency distribution of the categorical variables
in a table.
This node provides chi-square test statistics and, in case of a
cross tabulation of 2x2 dimension, Fisher's exact test. Both
statistics test the null hypothesis of no association between the
row variable and the column variable. The p-values are provided in
the view and in the second output port.
Dialog Options
- Row variable
-
The input column used as the row variable in the
cross-tabulation.
- Column variable
-
The input column used as the column variable in
the cross-tabulation.
- Weight column
-
Applies a numeric weight for each record in the
input causing the Crosstab node to treat each record as if it were
repeated WEIGHT number of times.
- Enable hiliting
-
If enabled, the hiliting of a cell in the crosstab will hilite all
cells with same categories in attached views. Depending on the number
of rows, enabling this feature might consume a lot of memory.
Ports
Input Ports
0 |
Input table containing columns with
categorical data.
|
Output Ports
0 |
The cross table in list form.
|
1 |
The table with the statistics.
|
Views
- Cross tabulation
-
The following properties are displayed in the cross tabulation view:
Frequency: The cell frequency.
Expected: The expected frequency which is computed as
(column total / total) * row total.
Deviation: The deviation is computed as
Frequency - Expected.
Percent: The percent is the relative frequency computed as
Frequency / total.
Row Percent: The row percent is computed as
Frequency / row total.
Column Percent: The column percent is computed as
Frequency / column total.
Cell Chi-Square: The contribution of this cell to the value
of the Chi-Square statistic. The Cell Chi-Square sums up to the
value of the Chi-Square statistic.
For some properties the row totals and column totals are displayed
beside the table and underneath the table, respectively.
You can control the size of the displayed table with the
Max rows and the Max columns controls.
The statistics table provides chi-square test statistics and, in
case of a cross tabulation of 2x2 dimension, Fisher's exact test.
Both statistics test the null hypothesis of no association between
the row variable and the column variable. You can reject the null
hypothesis when the p-value (Prop) is less than a significance value
which is typically 0.01 or 0.05. In this case the result is said to
be statistically significant. Please bear in mind that the
Chi-Square test is based on
some assumptions.
This node is contained in KNIME Base Nodes
provided by KNIME GmbH, Konstanz, Germany.