The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at
http://knime.com.
Similarity Search
This node takes each row in the query table (Port 0) and searches the
reference table (Port 1) for a number of rows matching the specified
similarity/distance criteria. If multiple results are requested,
the query result row is duplicated for each subsequent match.
Dialog Options
- Distance function
-
Choose which method is used to calculate the distance (or similarity)
for the query.
Euclidean Distance
Requires 1 or more numeric columns.
Manhattan Distance
Requires 1 or more numeric columns.
Distance Vector
Requires one column of the Distance Vector type as is generated by
the Distance Matrix Calculate node.
Tanimoto Similarity
Requires a bit-vector fingerprint.
Tanimoto Similarity (old)
A deprecated version of Tanimoto similarity. Requires a deprecated
bit vector type.
Cosine Similarity
Requires 1 or more numeric values.
Cosine Bitvector Similarity
Requires a bit-vector fingerprint.
Dice's Coefficient
Requires a bit-vector fingerprint.
- Column Selection
-
Choose which columns to use in the calculation. Unusable columns will be
ignored.
- Coefficient Type
-
Determines how the output is represented. It does not have an effect
on the calculation. Note, this is only meaningful with the Tanimoto
Similarity metric.
Distance
More different rows have a smaller index.
Similarity
More similar rows have a smaller index.
- Neighbor Selection
-
Choose whether more similar or more distant results match the query.
- Range Filtering
-
Specify a similarity/distance range query for query hits. For example, a
search using Tanimoto Similarity with a range filter of 0 to
0.9999999 would return the nearest non-identical matches to the query
row.
- Output column name prefix
-
This string will be used in the construction of output column names.
- Representative Column
-
The column used to identify the entries in the lower table that match
the query criteria.
- RowID Suffix Separator
-
When multiple search results are requested, this delimiter separates the
original Row ID from the index of the result. For example, if the Row
ID is RowN, the delimiter is set to "_" and the node is configured to
find 3 neighbors, then the resulting Row IDs would be "RowN_1",
"RowN_2", and "RowN_3".
Ports
Input Ports
0 |
Each row is used as a query for similar
(or non-similar) entries in the reference table. |
1 |
Data set to in which to search for
nearest/farthest neighbors. |
Output Ports
0 |
The input data set with three additional columns for (i) neighbor index
(ii) neighbor
(the row id or some other representative column) and (iii) the
distance/similarity value. The 2nd, 3rd, ...
next neighbors are represented by additional rows.
|
This node is contained in KNIME Distance Matrix Extension
provided by KNIME GmbH, Konstanz, Germany.