The following node is available in the Open Source KNIME predictive analytics and data mining platform version 2.7.1. Discover over 1000 other nodes, as well as enterprise functionality at
http://knime.com.
XLS Reader
This node reads a spread sheet and provides it at its output port.
It reads only data from one sheet at the moment. It can read only
numeric data (or strings) but, of course, no diagrams, pictures, or other items.
Currently supported KNIME types are String, Double, and Int. (Time or date in the
spread sheet is represented as String and can be translated into the appropriate
TimeAndDateCells with the corresponding nodes.)
It reads in the data from the sheet and sets a type for all columns that
is compatible with the data in that column (in the worst case "String" covers all).
The performance of the reader node is limited (due to the underlying library
of the Apache POI project). Reading large files takes a very long time and uses a
lot of memory (especially files in xlsx format).
Dialog Options
- File to read
-
Enter a valid file name. You can also choose a previously
selected file from the drop-down list, or select a file from
the "Browse..." dialog. Supported are xls and xlsx file
formats. (Note: reading large xlsx files is very slow and memory consuming.)
- Sheet to Read
-
After selecting a file, you can select the sheet from the
available sheets in the file.
- Column Names
-
If you want to use column names from the spread sheet, check
"Table contains column names" and enter the number
of the row that contains the column names (enter the number
(starting at one), not the index).
If this option is not checked, default names are generated
("Col0", "Col1", etc.).
- Row IDs
-
If you want to use row IDs from the spread sheet, check
"Table contains row IDs" and provide the column that
contains the row ids. Enter the label of the column ("A",
"B", etc.) or the number (starting at one) of the column.
The rowIDs in the sheet must be unique, otherwise the execution fails.
If you check "Make row IDs unique", the node will append
a suffix to duplicates, ensuring row ID uniqueness.
For very large data sets this could cause memory problems.
- Area of Interest
-
Specify the region of the data sheet that should be read in. If you
check "Read entire data sheet" the region provided by the
data sheet file will be read in. This includes areas where diagrams,
borders, coloring, etc. are placed and could create quite some empty
rows or columns (see skip empty rows and columns options below).
If you want to read in a fixed area, remove the check mark and enter
the first and last column, and first and last row to read. (Last
row and column is optional, causing it to read to the last row or
column provided by the sheet). For columns
you need to enter the label ("A", "B", etc.),
for rows enter the number (one-based).
- Skip Empty Rows or Columns
-
If empty rows or columns should be removed from the result Data Table,
check the appropriate option.
- Evaluation Error Handling
-
Specify the data that is inserted in a cell with an error. Not all
formulas supported by all spreadsheet applications are supported by
the XLS Reader Node. If an error occurs during formula evaluation
(or if the spreadsheet contains an error cell), the selected data is
inserted. You can either choose to insert a cell representing a
missing value, or to insert a certain string pattern. A pattern
causes the entire column to become a string column in case an
error occurs. A missing value is type innocent, but also unobtrusive.
- Preview
-
The "Preview" tab shows you the output table with the current
settings in the dialog. If the settings are invalid an
error message will be displayed in this tab. The table updates only
when the "refresh" button is clicked. A warning is shown,
if the table content is out of sync with the current settings.
- File Content
-
The "File Content" table shows you the content of the
currently selected sheet. It shows the entire content (no settings
are applied there). The column names and row numbers here are the ones
that need to be specified in the corresponding setting fields.
Ports
Output Ports
0 |
The data table read in.
|
This node is contained in KNIME POI Plug-in
provided by KNIME GmbH, Konstanz, Germany.