The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at
http://knime.com.
Multiobjective Subset Selection (NSGA-II)
This node finds (near)optimal fixed-sized subsets of rows based
one one or more criteria. It uses the
NSGA-II algorithm
to find an approximation of the set of non-dominated solutions, i.e. the
Pareto front
.
In the dialogue you can choose which column from the input table should be optimized in which way. Each
column represents an objective which - together with a function on that column, like sum of all values in
the selected set or average distance of all values - can either be minimized or maximized. By default each
objective is
maximized
, thus if you want to minimize negate the objective.
The node runs until a certain number of individuals have been evaluated. You can also stop the search
manually in the node's view.
Dialog Options
Basic settings
- Number of rows
- Choose the number of rows the solution should contain
- Output nondominated solutions only
- If this option is checked, only nondominated solutions (the Pareto front approximation)
are output. Otherwise all examined solutions are output (which can be quite a few).
- Enable hiliting
-
If selected, the resulting solution at the output table can be hilit and the contained
rows in the input
table are hilit, too.
- Compute hypervolume
-
Enables the computation of the hypervolume enclosed by the Pareto front approximations.
You need to provide a
reference point which is dominated by all solutions. The reference point must be
entered in the text field, with the
coordinates separated by spaces. The number and order of coordinates
must match the number and order of the objectives.
Hypervolume computation is very expensive if more than
two objectives are used, therefore an approximation algorithm is used.
Objectives
- Column List
-
This list shows all available columns from the input table (and some static function).
The columns can be used to
together with a certain function to create an expression in the "Expression"
field.
- Flow Variable List
-
This list shows all available flow variables. They can be used similar to column names
or as constants in an expression.
- Function
-
This list shows all available functions. A description of the function will appear on
the right once it is selected.
Note certain functions only work with certain column types, e.g. distance
matrices or numeric columns.
- Expression
-
In this field you can built an arithmetic expression that is used as one objective
function.
GA settings
- Population size
- How many individuals are in one population evolved by the GA.
- Mutation probability
- The probability that a newly generated solution is mutated.
- Maximum individuals created
- The maximum number of individuals that will be evaluated before the node stops
automatically.
- Gene representation
- Select a gene representation for subsets here.
Ports
Input Ports
0 |
Datatable with all rows to choose from during evaluating subsets |
Output Ports
0 |
A list of non-dominated solutions found during the search, together with their
objective values and the rows
from the input table that are contained in the solution.
|
1 |
This table contains the evolution of the hypervolume of the Pareto front.
|
2 |
A list of non-dominated solutions found during the search, together with their
objective values and the rows
from the input table that are contained in the solution. This model can be used
with the Rowset Filter node
to filter out certain rows from the input table.
|
Views
- Pareto front
-
Show all non-dominated solutions found so far. This view is updated during
the search. The search can be
stopped from the view and the currently found solutions are available at the
output port. If hypervolume computation is enabled the second tab will show the
evolution of the hypervolume.
This node is contained in KNIME Optimization extension
provided by KNIME GmbH, Konstanz, Germany.