The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at http://knime.com.

GroupBy

Groups the rows of a table by the unique values in the selected columns. A row is created for each unique value group of the selected column(s). The remaining rows are aggregated by the defined method. The output table therefore contains one row for each existing value combination of the selected group column(s).

To change the aggregation method of more than one column select all columns to change, open the context menu with a right mouse click and select the aggregation method to use.

A detailed description of the available aggregation methods can be found on the 'Description' tab in the node dialog.

If the 'Sort in memory' option is checked the complete table is loaded into the memory to speed up the sorting process.

Dialog Options

Group settings
Select one or more column(s) according to which the group(s) is/are created.
Aggregation settings
Select one or more column(s) for aggregation from the available columns list. Change the aggregation method in the Aggregation column of the table. You can add the same column multiple times. In order to change the aggregation method of more than one column select all columns to change, open the context menu with a right mouse click and select the aggregation method to use. Tick the missing box to include missing values. This option might be disabled if the aggregation method does not support missing values. The parameter column shows an "Edit" button for all aggregation operators that require additional information. Clicking on the "Edit" button opens the parameter dialog which allows changing the operator specific settings.
Maximum unique values per group
Defines the maximum number of unique values per group to avoid problems with memory overloading. All groups with more unique values are skipped during the calculation and a missing value is set in the corresponding column, and a warning is displayed.
Value delimiter
The value delimiter used by aggregation methods such as concatenate.
Column naming
The name of the resulting aggregation column(s) depends on the selected naming schema. All aggregation methods get a * appended if the missing value option is not ticked in the aggregation settings in order to distinguish between columns that considered missing values in the aggregation process and columns that does not.
Enable hiliting
If enabled, the hiliting of a group row will hilite all rows of this group in other views. Depending on the number of rows, enabling this feature might consume a lot of memory.
Process in memory
Process the table in the memory. Requires more memory but is faster since the table needs not to be sorted prior aggregation. The memory consumption depends on the number of unique groups and the chosen aggregation method. The row order of the input table is automatically retained.
Retain row order
Retains the original row order of the input table. Could result in longer execution time. The row order is automatically retained if the process in memory option is selected.
Missing
Missing values are considered during aggregation if the missing option is ticked for the corresponding row in the column aggregation table. Some aggregation methods do not support the changing of the missing option such as means.

Ports

Input Ports
0 The input table to group.
Output Ports
0 Result table with one row for each existing value combination of the selected columns.
This node is contained in KNIME Base Nodes provided by KNIME GmbH, Konstanz, Germany.