The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at http://knime.com.
(Variant) of Naive Bayes for fingerprint columns, i.e. bitvectors. The learner implements a Naive Bayes like algorithm that incorporates sparsely occupied bits and unbalanced class distributions. Details of the algorithm are described in
Prediction of Biological Targets for Compounds Using Multiple-Category Bayesian Models Trained on Chemogenomics Databases, Nidhi Meir Glick, John W. Davies, and Jeremy L. Jenkins, J. Chem. Inf. Model., 2006, 46 (3), pp 1124–1133
0 | The data to learn from. It needs to contain a fingerprint column and a categorical class column. |
0 | A table containing the
scores of the training data, whereby each row is predicted using a model
trained on the n-1 remaining rows (leave-one-out). The table is sorted by
descending score; it contains the following columns:
Note, these scores could also be determined using a Cross-Validation meta node. However, they are provided here as they can be easily computed in a single scan on the training data (as opposed to an expensive cross validation run). This table can be very well visualized using a ROC Curve node. |
1 | A table representing each bit's importance on the different classes. The table has as many rows as there are bits in the fingerprint. The columns show for each bit position, how often a bit is set in (i) any of the rows and (ii) in rows of the respective target class. The value of the "logP" column is the logarithm of equation (6) in the above cited article. A value smaller than 0 indicates that the bit is uncharacteristic for the target class, a value larger 0 shows a strong characteristic for that bit and class. A value ~0 indicates that there is no or a weak relationship between the bit and the class. |
2 | The model; it's the input to the predictor node. |