The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at http://knime.com.
This node allows you to read PDF documents and create a document for each file. The documents title and authors will be extracted form the PDFs meta data. The full text of the PDF is extracted, the structure of the PDF is not taken into account. For text extraction the PDFBox library is used. (see http://pdfbox.apache.org/ for details).
0 | An output table containing the parsed document data. |