The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at
http://knime.com.
Association Rule Learner
Searches for frequent itemsets meeting the user-defined minimum
support criterion and, optionally, creates association rules from
them. The column containing the transactions (BitVectors or
Collections) has to be selected. The minimum
support as an absolute number must be provided (therefore check the number
of transactions to obtain a sensible criterion). If the frequent itemsets
should be free (unconstrained) or closed or maximal has also be defined.
Closed itemsets are frequent itemsets, which have no superset with the
same support, thus providing all the information from free itemsets
in a compressed form. Maximal itemsets are sets which have no
frequent superset at all. The maximal itemset length must also be
defined. If association rules are generated, a confidence value has to be
provided. The confidence is a value to define how often the rule is
right. Association rules generated here are in the form to have only one
item in the consequence.
The underlying data structure used by the algorithm can be either an
ARRAY or a TIDList. Choose the former when there are many
transactions an less items, and the latter if the structure of the
input data is vice versa.
Dialog Options
- Column containing transactions
-
Select the column containing the transactions (BitVector or Collection)
to mine for frequent itemsets or association rules. There must be at
least one, since this is the only valid input for the subgroup miner.
- Minimum support (0-1)
-
An itemset is considered to be frequent if there are at least "minimum
support" transactions, where the itemset occurs. Make sure, to have here
a meaningful number in proportion of the number of rows of the input.
- Underlying data structure
-
Either ARRAY or TIDList: ARRAY is recommended when the number of
transactions (rows) is larger than the number of items, and the TIDList
if the number of rows is small and the number of items large. In
general, the ARRAY option needs more memory and is faster, whereas the
TIDList need less memory but is slower.
- Itemset type
-
Choose either free, closed or maximal. Free are mostly redundant, closed
provide the most information and maximal may hide some information.
- Maximal itemset length
-
The maximal length of the resulting itemsets. A lower value may reduce
the runtime if there are very long frequent itemsets.
- Output association rules
-
Check if association rules should be generated out of the frequent
itemsets. Note: association rules are always generated from free
frequent itemsets and are constrained to have only one item in the
consequence.
- Minimum confidence
-
The confidence is a measure for "how often the rule is right". Thus, how
often, if the items in the antecedence appeared also the consequence
occurred in the transactions.
Ports
Input Ports
0 |
Datatable containing transactions. |
Output Ports
0 |
Datatable with discovered frequent itemsets or association rules. |
This node is contained in KNIME Base Nodes
provided by KNIME GmbH, Konstanz, Germany.