The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at http://knime.com.

Fuzzy c-Means

The fuzzy c-means algorithm is a well-known unsupervised learning technique that can be used to reveal the underlying structure of the data. Fuzzy clustering allows each data point to belong to several clusters, with a degree of membership to each one.
Make sure that the input data is normalized to obtain better clustering results.
The list of attributes to use can be set in the second tab of the dialog.
The first output datatable provides the original datatable with the cluster memberships to each cluster. The second datatable provides the values of the cluster prototypes.
Additionally, it is possible to induce a noise cluster, to detect noise in the dataset, based on the approach from R. N. Dave: 'Characterization and detection of noise in clustering'.
If the optional PMML inport is connected and contains preprocessing operations in the TransformationDictionary those are added to the learned model.

Dialog Options

Number of clusters
Number of clusters to use for the algorithm.
Maximum number of iterations
This is the maximum number of iterations to be performed.
Fuzzifier
Indicates how much the clusters are allowed to overlap.
Induce noise cluster
Whether to induce a noise cluster or not.
Set delta
Delta is the fixed distance from every datapoint to the noise cluster.
Set delta automatically, specify lambda
Delta is updated in each iteration, based on the average interpoint distances. However, a lambda paramater has to be set, according to the shape of the clusters.
Perform the clustering in memory
If this option is selected, the clustering is performed in the memory, which speeds up the process.
Compute cluster quality measures
Whether to calculate quality measures for the clustering. This can be time and memory consuming with large datasets.

Ports

Input Ports
0 Datatable with training data. Make sure that the data are normalized!
1 Optional PMML port object containing preprocessing operations.
Output Ports
0 Input table extended by cluster membership
1 Cluster centers

Views

Statistics View
Shows the WithinClusterVariation and the BetweenClusterVariation, which are indicators for 'good' clustering.
This node is contained in KNIME Base Nodes provided by KNIME GmbH, Konstanz, Germany.