The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at
http://knime.com.
Bitvector Generator
Generates bitvectors either from a table containing numerical values,
or from a string column containing the bit positions to set,
hexadecimal or binary strings.
Numeric input (many columns)
In the case of a numerical input the columns correspond to the bit positions
in the resulting bitvector, i.e. if only one numerical column is available
all bitvectors will have length 1.
All numeric columns in the table are considered.
There are two options to determine if the bit is set for the value in
the corresponding column or not:
- either a global threshold is defined, then all values which are above or equal to
the threshold are converted into set bits, all other bit positions remain 0, or
- a certain percentage of the mean of each column is used as a threshold,
then all values which are above or equal to the percentage of the mean
are converted into set bits.
Strings (one column)
In the case of a string input only the column containing the string is
considered for the generation of the bitvectors. The string is parsed
and converted into a bitvector. There are three valid input formats
which can be parsed and converted:
- Hexadecimal strings: strings consisting only of the characters 0-9 and A - F
(where lower- or uppercase is not important). The represented hexadecimal number is
converted into a binary number which is represented by the resulting bitvector.
- Binary strings: strings consisting only of 0s and 1s are parsed and
converted into the according bitvectors.
- ID strings: strings consisting of numbers (separated by spaces)
where the numbers refer to those positions in the bitvector which should be set.
(Typical input format for association rule mining).
Missing values
For numeric data the incoming missing values will result in 0s.
For the string input missing values will also result in a missing value
in the output table. If a string could not be parsed it will also result in
a missing cell in the output table.
Dialog Options
- Numeric input
-
Select if several numeric columns should be converted into a bitvector.
- Threshold
-
If the "numeric input" is checked, specify the global threshold.
All values which are above or equal to this threshold will result
in a 1 in the bitvector.
- Use percentage of the mean
-
Check, if a percentage of the mean of each column should serve as
threshold above which the bits are set.
- Percentage
-
Specify which percentage of the mean a value should have in order to be set.
- Parse bitvectors from string column
-
Check, if the input for the bitvectors is a string column that should be converted
into a bitvector (see description above for valid input formats).
Uncheck, if the data is a table with numerical data that should be
converted into bitvectors. All numerical columns will be considered,
all others are irgnored.
- String column to be parsed
-
If the "parse from string column" is checked, select the column
containing the strings.
- Kind of string representation
-
Select one of the three valid input formats: HEX (hexadecimal),
ID (bit positions) or BIT (binary strings). See description above.
- Remove column(s) used for bit vector creation:
-
If it is checked the generating column(s) (included columns if numeric input was used
or the selected string column) are removed.
If it is unchecked the generated bitvectors are appended to the input table.
Ports
Input Ports
0 |
Datatable with numerical data or a string column to be parsed. |
Output Ports
0 |
Datatable with the generated bitvectors. |
Views
- Statistics View
-
Provides information about the generation of the bitvectors from
the data. In particular this is the number of processed rows,
the total number of generated zeros and ones and the resulting
ratio of 1s to 0s.
This node is contained in KNIME Base Nodes
provided by KNIME GmbH, Konstanz, Germany.