The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at http://knime.com.

Frequency Filter

Filters terms in the given bag of words with a certain frequency value. On the one hand minimum and maximum values can be defined to be used for filtering. If the value of a specified frequency column is less than the minimum or greater than the maximum value the term is filtered. On the other hand a number k of terms to keep can be defined. Only those k terms with the highest frequency value are kept, the rest is filtered.

Dialog Options

Deep Filtering
Document column
Specifies the column containing the documents to apply the filtering.
Deep filtering
If deep filtering is checked, the terms contained inside the documents are filtered too, this means that the documents are changed, which is more time consuming.
Filter options
Filter unmodifiable terms
Usually terms which have been set unmodifiable are not modified or filtered. If this setting is checked, these terms are filtered as well if they don't fit the specified requirements.
Filter column
The column containing the values to apply the filtering, i.e. the TF measure of each term can be computed before by the TF node. Once the column is appended, the filtering can be applied to this values.
Filtering by
The filter option specifies which filtering is be applied, the threshold filtering or the number of terms filtering. The threshold filtering keeps all rows with values contained in the specified filter column which are greater than the specified min and less than the maximum value. The number of terms filter on the other hand keeps a number K rows with the highest values.
Min Max settings
Specifies the minimum and the maximum threshold of the values of the filter column.
Number Settings
Specifies a number K of rows to keep, the rest is filtered out. Only these K rows with the highest value of the filter column are kept.

Ports

Input Ports
0 The input table which contains terms and documents.
Output Ports
0 The output table which contains terms documents and a corresponding frequency value.
This node is contained in KNIME Textprocessing Plug-in provided by KNIME GmbH, Konstanz, Germany.