The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at
http://knime.com.
Partitioning
The input table is split into two partitions (i.e. row-wise),
e.g. train and test data. The two partitions are available at the
two output ports. The following options are available in the dialog:
Dialog Options
- Absolute
-
Specify the absolute number of rows in the first partition. If there are
less rows than specified here, all rows are entered into the first
table, while the second table contains no rows.
- Relative
-
The percentage of the number of rows in the input table that are
in the first partition. It must be between 0 and 100, inclusively.
- Take from top
- This mode puts the top-most rows
into the first output table and the remainder in the second table.
- Linear sampling
-
This mode always includes the first and the last row and selects the remaining rows linearly over the whole
table (e.g. every third row). This is useful to downsample a sorted column while maintaining minimum and
maximum value.
- Draw randomly
-
Random sampling of all rows, you may optionally specify a fixed seed (see below).
- Stratified sampling
-
Check this button if you want stratified sampling, i.e. the distribution
of values in the selected column is (approximately) retained in
the output tables.
You may optionally specify a fixed seed (see below).
- Use random seed
-
If either random or stratified sampling is selected, you may enter a fixed seed here
in order to get reproducible results upon re-execution. If you do not specify a seed,
a new random seed is taken for each execution.
Ports
Output Ports
0 |
First partition (as defined in dialog). |
1 |
Second partition (remaining rows). |
This node is contained in KNIME Base Nodes
provided by KNIME GmbH, Konstanz, Germany.