The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at
http://knime.com.
Frequency Filter
Filters terms in the given bag of words with a certain frequency value.
On the one hand minimum and maximum values can be defined to be
used for filtering. If the value of a specified frequency column
is less than the minimum or greater than the maximum value the term is
filtered. On the other hand a number k of terms to keep can be defined.
Only those k terms with the highest frequency value are kept, the rest
is filtered.
Dialog Options
Deep Filtering
- Document column
-
Specifies the column containing the documents to apply the
filtering.
- Deep filtering
-
If deep filtering is checked, the terms contained inside
the documents are filtered too, this means that the documents are
changed, which is more time consuming.
Filter options
- Filter unmodifiable terms
-
Usually terms which have been set unmodifiable are not modified
or filtered. If this setting is checked, these terms are filtered
as well if they don't fit the specified requirements.
- Filter column
-
The column containing the values to apply the filtering, i.e.
the TF measure of each term can be computed before by the TF node.
Once the column is appended, the filtering can be applied to this
values.
- Filtering by
-
The filter option specifies which filtering is be applied, the
threshold filtering or the number of terms filtering.
The threshold filtering
keeps all rows with values contained in the specified filter column
which are greater than the specified min and less than the maximum
value. The number of terms filter on the other hand keeps a number K
rows with the highest values.
- Min Max settings
-
Specifies the minimum and the maximum threshold of the values of
the filter column.
- Number Settings
-
Specifies a number K of rows to keep, the rest is filtered out.
Only these K rows with the highest value of the filter column are
kept.
Ports
Input Ports
0 |
The input table which contains terms and documents. |
Output Ports
0 |
The output table which contains terms documents and a corresponding
frequency value.
|
This node is contained in KNIME Textprocessing Plug-in
provided by KNIME GmbH, Konstanz, Germany.