The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at http://knime.com.

Term co-occurrence counter

The node counts the number of co-occurrences for the given list of terms within the selected parts e.g. sentence, paragraph, section and title of the corresponding document. The order two terms occur is not considered. Thus the occurrence of T1 followed by T2 is equal to the occurrence of T2 followed by T1. The output table returns the term pairs in alphabetical order.

Dialog Options

Document column
The column that contains the document to search for the term co-occurrences.
Term column
The column that contains the terms to compute the co-occurrence for.
Co-occurrence level
Select the co-occurrence level to be calculated. They are ordered from more general (document co-occurrence) to more specific (neighbors). The more general levels include the more specific levels e.g. the sentence level includes the neighbor and title co-occurrence calculation. Notice: The calculation of the more general statistic especially the document level statistics might result in a very large data table.
Check term tags
The tags e.g. POS tags of a term are considered when matching terms if this option is selected. If this option is not selected only their textual representation is checked when matching terms.
Sort input table
Unselect this option if the input table is already sorted by the document column.
Maximum number of parallel proceses
Decrease the number of parallel processes in case of memory problems.

Ports

Input Ports
0 Input table with a document and term column
Output Ports
0 Table with the co-occurrence statistics for the input table
This node is contained in KNIME Textprocessing Plug-in provided by KNIME GmbH, Konstanz, Germany.