The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at
http://knime.com.
SOTA Learner
Clusters numerical and fuzzy data hierarchically with the self organizing
tree algorithm and visualizes the cluster tree similarly like a dendogram.
Additionally it builds a model for class prediction of new data. The model can be
load into the SOTA Predictor node.
The SOTA Learner node has a dialog, in which you can choose the winner, ancestor and
sister learning rate, to adjust the cluster representants: with the
minimal resource and variability value to stop the growing of the tree; the
minimal error, to end a cycle and the distance metric (cosinus, euclidean).
The node will cluster the given data hierarchically by use of the self
organizing tree algorithm and will produce a cluster tree, which is
visualized by the view afterwards, similar to a dendogram.
The data is also displayed and can be hilit, as well as each cluster
representative.
Class information can be trained too by selecting the "Use class column"
option for uses of class prediction by the SOTA Predictor node.
For more information about the SOTA clustering see:
Herrero J., Valencia A., Dopazo J.:
A hierarchical unsupervised growing neural network for clustering gene
expression patterns.
Dialog Options
- Winner learningrate
-
Set the winners learning rate which is used to adjust the winning cluster
representative to the given data.
- Sister learningrate
-
Set the sisters learning rate which is used to adjust the sister-cluster
representative of the winner to the given data.
- Ancestor learningrate
-
Set the ancestors learning rate which is used to adjust the
ancestor-cluster representative of the winner to the given data.
- Minimal variability
-
Set the minimal variability value, which is the threshold for stopping
the growing of the tree, if the "Use variability" checkbox is checked.
If minimal variability is set to 0 the clustering will not stop before
each datapoint is represented by a cluster representative.
- Minimal resource
-
Set the minimal resource value, which is the threshold for stopping
the growing of the tree, if the "Use variability" checkbox is not
checked. The resource value is used as threshold to stop the growing
by default. If minimal resource is set to a very small number i.e. 0.01
the clustering will not stop before each datapoint is represented by a
cluster representant. The minimal resource value cannot be set to 0
but only to very small numbers.
- Minimal error
-
Set the minimal error value to control the epochs per cycle. If the
minimal error value is small, more epochs will be needed to end a cycle.
This means, that all cluster representatives will be pulled closer to
their data.
- Use variability
-
Check this if the minimal variability value should be used to
stop the growing of the tree, or leave it unchecked to use the
minimal resource value. Be aware that the computation of the minimal
variability value is very time consuming when clustering large amounts
of data. The complexity is n*n.
- Distance metric
-
Choose the euclidean or cosinus distance metric, to measure the distance
between datapoints and cluster representatives.
- Use hierarchical fuzzy data
-
Check to cluster hierarchical fuzzy rules hierarchically. The column
with the fuzzy rule level has to be selected in the drop down menu on
the right. The drop down menu will be enabled when the checkbox is
checked.
- Hierarchical level
-
If the checkbox on the left is checked, to cluster hierarchical fuzzy
rules, the column with the fuzzy rule level must be chosen by
selecting it in the drop down menu.
- Use class column
-
Check to use the class data specified by the drop down box below.
The class data will be trained too for the use of class prediction by the
SOTA Predictor node.
- Class column
-
Specifies the column containing the class data. Only string columns can
be chosen as class columns.
Ports
Input Ports
0 |
Datatable with integer columns to do clustering on
|
Output Ports
0 |
The model of the trained SOTA.
|
Views
- SOTA Tree View
-
Displays the results of the SOTA clustering
This node is contained in KNIME Base Nodes
provided by KNIME GmbH, Konstanz, Germany.