The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at
http://knime.com.
Cell Splitter
This node uses a user-specified delimiter character to
split the content of a selected column into parts. It appends either a
fixed number of columns to the input table, each carrying one part of the
original column, or a single column containing a collection (list or
set) of cells with the split output. It can be specified whether the
output consists of one or more columns, only one column containing
list cells, or only one column containing set cells in which duplicates
are removed.
If the column contains more delimiters than needed
(leading to more parts than appended columns are available) the
additional delimiters are ignored (resulting in the last column containing
the unsplit rest of the column).
If the selected column contains too
few delimiters (leading to less parts than expected), empty columns
will be created in that row.
Based on the delimiters and the resulting parts the collection cells
can have different sizes.
The content of the new columns will be trimmed if specified
(i.e. leading and trailing spaces will be deleted).
Dialog Options
- Column selection
-
Select the column whose values are split.
- Delimiter
- Specify the delimiter in the value, that
splits each part.
- Use escape character
- If enabled, the backslash ("\")
can be used to escape characters, such as "\t" for tabs. You can use
the full escape capabilities of Java.
- Quotation character
- Specify the quotation character, if
the different parts in the value are quoted. (The character to escape
quotes is always the backslash.) If no quotation character
is needed leave it empty.
- remove leading and trailing white space chars (trim)
-
If checked, leading and trailing white spaces of each part (token) will
be deleted.
- Output - as list
-
If selected, the output will consist of one column containing list
collection cells in which the split parts are stored. Duplicates
can occur in list cells.
- Output - as set (remove duplicates)
-
If selected, the output will consist of one column containing set
collection cells in which the split parts are stored. Duplicates
are removed and can not occur in set cells.
- Output - as new columns
-
If selected, the output will consist of one or more columns, each
containing a split part.
- Set Array Size
- Check this and specify the number of columns
to append. All created columns will be of type String. (See above for
what happens if the split produces a different number of parts.)
- Guess Size and Column Types
- If this is checked, the node
performs an additional scan through the entire data table and computes
the number of columns needed to hold all parts of the split. In addition
it determines the column type of the new columns.
- Missing Value Handling
- If select, the node creates
empty string cell instead of missing value cells.
Ports
Input Ports
0 |
Input data table with column containing the cells to split |
Output Ports
0 |
Output data table with additional columns. |
This node is contained in KNIME Base Nodes
provided by KNIME GmbH, Konstanz, Germany.