Package org.openrefine.importers.tree
Class XmlImportUtilities
- java.lang.Object
-
- org.openrefine.importers.tree.TreeImportUtilities
-
- org.openrefine.importers.tree.XmlImportUtilities
-
public class XmlImportUtilities extends TreeImportUtilities
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.openrefine.importers.tree.TreeImportUtilities
TreeImportUtilities.ColumnIndexAllocator
-
-
Constructor Summary
Constructors Constructor Description XmlImportUtilities()
-
Method Summary
All Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected static void
addImportRecordToProject(ImportRecord record, List<Row> project, boolean includeFileSources, String fileSource, boolean includeArchiveName, String archiveName)
protected static String
composeName(String prefix, String localName)
static String[]
detectPathFromTag(TreeReader parser, String tag)
static String[]
detectRecordElement(TreeReader parser)
Seeks for recurring element in a parsed document which are likely candidates for being data recordsprotected static List<String>
detectRecordElement(TreeReader parser, String tag)
Looks for an element with the given tag name in the Tree data being parsed, returning the path hierarchy to reach it.protected static org.openrefine.importers.tree.RecordElementCandidate
detectRecordElement(TreeReader parser, String[] path)
protected static void
findRecord(TreeImportUtilities.ColumnIndexAllocator allocator, List<Row> rows, TreeReader parser, String[] recordPath, int pathIndex, ImportColumnGroup rootColumnGroup, int limit, ImportParameters parameters)
Deprecated.Use the version of this method which expands all parameters in the signature of the functionprotected static void
findRecord(TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, TreeReader parser, String[] recordPath, int pathIndex, ImportColumnGroup rootColumnGroup, long limit, boolean trimStrings, boolean storeEmptyStrings, boolean guessDataTypes, boolean includeFileSource, String fileSource, boolean includeArchiveName, String archiveFileName)
static void
importTreeData(TreeReader parser, TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, String[] recordPath, ImportColumnGroup rootColumnGroup, int limit, ImportParameters parameters)
Deprecated.2020-07-23 Use the version of this method which expands all parameters in the signature of the functionstatic void
importTreeData(TreeReader parser, TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, String[] recordPath, ImportColumnGroup rootColumnGroup, long limit, boolean trimStrings, boolean storeEmptyStrings, boolean guessDataTypes, boolean includeFileSources, String fileSource, boolean includeArchiveName, String archiveFileName)
protected static void
processFieldAsRecord(TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, TreeReader parser, ImportColumnGroup rootColumnGroup, boolean trimStrings, boolean storeEmptyStrings, boolean guessDataType, boolean includeFileSources, String fileSource, boolean includeArchiveName, String archiveFileName)
processFieldAsRecord parses Tree data for a single element and its sub-elements, adding the parsed data as a row to the projectprotected static void
processRecord(TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, TreeReader parser, ImportColumnGroup rootColumnGroup, boolean trimStrings, boolean storeEmptyStrings, boolean guessDataTypes, boolean includeFileSources, String fileSource, boolean includeArchiveName, String archiveFileName)
processRecord parses Tree data for a single element and it's sub-elements, adding the parsed data as a row to the projectprotected static void
processRecord(TreeImportUtilities.ColumnIndexAllocator allocator, List<Row> rows, TreeReader parser, ImportColumnGroup rootColumnGroup, ImportParameters parameter)
Deprecated.Use the version of this method which expands all parameters in the signature of the functionprotected static void
processSubRecord(TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, TreeReader parser, ImportColumnGroup columnGroup, ImportRecord record, int level, boolean trimStrings, boolean storeEmptyStrings, boolean guessDataType)
protected static void
processSubRecord(TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, TreeReader parser, ImportColumnGroup columnGroup, ImportRecord record, int level, ImportParameters parameter)
Deprecated.protected static void
skip(TreeReader parser)
-
Methods inherited from class org.openrefine.importers.tree.TreeImportUtilities
addCell, addCell, addCell, createColumn, createColumnGroup, createColumnsFromImport, getColumn, getColumnGroup, sortRecordElementCandidates
-
-
-
-
Method Detail
-
detectPathFromTag
public static String[] detectPathFromTag(TreeReader parser, String tag) throws TreeReaderException
- Throws:
TreeReaderException
-
detectRecordElement
protected static List<String> detectRecordElement(TreeReader parser, String tag) throws TreeReaderException
Looks for an element with the given tag name in the Tree data being parsed, returning the path hierarchy to reach it.- Parameters:
parser
-tag
- The element name (can be qualified) to search for- Returns:
- If the tag is found, an array of strings is returned. If the tag is at the top level, the tag will be the only item in the array. If the tag is nested beneath the top level, the array is filled with the hierarchy with the tag name at the last index null if the the tag is not found.
- Throws:
TreeReaderException
-
detectRecordElement
public static String[] detectRecordElement(TreeReader parser)
Seeks for recurring element in a parsed document which are likely candidates for being data records- Parameters:
parser
- The parser loaded with tree data- Returns:
- The path to the most numerous of the possible candidates. null if no candidates were found (less than 6 recurrences)
-
detectRecordElement
protected static org.openrefine.importers.tree.RecordElementCandidate detectRecordElement(TreeReader parser, String[] path)
-
importTreeData
@Deprecated public static void importTreeData(TreeReader parser, TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, String[] recordPath, ImportColumnGroup rootColumnGroup, int limit, ImportParameters parameters) throws TreeReaderException
Deprecated.2020-07-23 Use the version of this method which expands all parameters in the signature of the function- Parameters:
parser
-columnIndexAllocator
-rows
-recordPath
-rootColumnGroup
-limit
-parameters
-- Throws:
TreeReaderException
-
importTreeData
public static void importTreeData(TreeReader parser, TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, String[] recordPath, ImportColumnGroup rootColumnGroup, long limit, boolean trimStrings, boolean storeEmptyStrings, boolean guessDataTypes, boolean includeFileSources, String fileSource, boolean includeArchiveName, String archiveFileName) throws TreeReaderException
- Throws:
TreeReaderException
-
findRecord
@Deprecated protected static void findRecord(TreeImportUtilities.ColumnIndexAllocator allocator, List<Row> rows, TreeReader parser, String[] recordPath, int pathIndex, ImportColumnGroup rootColumnGroup, int limit, ImportParameters parameters) throws TreeReaderException
Deprecated.Use the version of this method which expands all parameters in the signature of the function- Parameters:
allocator
-rows
-parser
-recordPath
-pathIndex
-rootColumnGroup
-limit
-parameters
-- Throws:
TreeReaderException
-
findRecord
protected static void findRecord(TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, TreeReader parser, String[] recordPath, int pathIndex, ImportColumnGroup rootColumnGroup, long limit, boolean trimStrings, boolean storeEmptyStrings, boolean guessDataTypes, boolean includeFileSource, String fileSource, boolean includeArchiveName, String archiveFileName) throws TreeReaderException
- Parameters:
columnIndexAllocator
-rows
-parser
-recordPath
-pathIndex
-rootColumnGroup
-limit
-trimStrings
- trim whitespace from strings if truestoreEmptyStrings
- store empty strings if trueguessDataTypes
- guess whether strings represent numbers and convertarchiveFileName
-includeArchiveName
-- Throws:
TreeReaderException
-
skip
protected static void skip(TreeReader parser) throws TreeReaderException
- Throws:
TreeReaderException
-
processRecord
@Deprecated protected static void processRecord(TreeImportUtilities.ColumnIndexAllocator allocator, List<Row> rows, TreeReader parser, ImportColumnGroup rootColumnGroup, ImportParameters parameter) throws TreeReaderException
Deprecated.Use the version of this method which expands all parameters in the signature of the function- Parameters:
allocator
-rows
-parser
-rootColumnGroup
-parameter
-- Throws:
TreeReaderException
-
processRecord
protected static void processRecord(TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, TreeReader parser, ImportColumnGroup rootColumnGroup, boolean trimStrings, boolean storeEmptyStrings, boolean guessDataTypes, boolean includeFileSources, String fileSource, boolean includeArchiveName, String archiveFileName) throws TreeReaderException
processRecord parses Tree data for a single element and it's sub-elements, adding the parsed data as a row to the project- Parameters:
columnIndexAllocator
-rows
-parser
-rootColumnGroup
-archiveFileName
-includeArchiveName
-- Throws:
TreeReaderException
-
processFieldAsRecord
protected static void processFieldAsRecord(TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, TreeReader parser, ImportColumnGroup rootColumnGroup, boolean trimStrings, boolean storeEmptyStrings, boolean guessDataType, boolean includeFileSources, String fileSource, boolean includeArchiveName, String archiveFileName) throws TreeReaderException
processFieldAsRecord parses Tree data for a single element and its sub-elements, adding the parsed data as a row to the project- Parameters:
columnIndexAllocator
-parser
-rootColumnGroup
-- Throws:
TreeReaderException
-
addImportRecordToProject
protected static void addImportRecordToProject(ImportRecord record, List<Row> project, boolean includeFileSources, String fileSource, boolean includeArchiveName, String archiveName)
-
processSubRecord
@Deprecated protected static void processSubRecord(TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, TreeReader parser, ImportColumnGroup columnGroup, ImportRecord record, int level, ImportParameters parameter) throws TreeReaderException
Deprecated.- Throws:
TreeReaderException
-
processSubRecord
protected static void processSubRecord(TreeImportUtilities.ColumnIndexAllocator columnIndexAllocator, List<Row> rows, TreeReader parser, ImportColumnGroup columnGroup, ImportRecord record, int level, boolean trimStrings, boolean storeEmptyStrings, boolean guessDataType) throws TreeReaderException
- Parameters:
columnIndexAllocator
-rows
-parser
-columnGroup
-record
-- Throws:
TreeReaderException
-
-