Package org.openrefine.runners.local.pll
Class TextFilePLL
- java.lang.Object
-
- org.openrefine.runners.local.pll.PLL<String>
-
- org.openrefine.runners.local.pll.TextFilePLL
-
public class TextFilePLL extends PLL<String>
A PLL whose contents are read from a set of text files. The text files are partitioned using a method similar to that of Hadoop, using new lines as boundaries. This class aims at producing a certain number of partitions determined by the default parallelism of the PLL context.- Author:
- Antonin Delpeuch
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
TextFilePLL.TextFilePartition
-
Nested classes/interfaces inherited from class org.openrefine.runners.local.pll.PLL
PLL.LastFlush, PLL.PLLExecutionError
-
-
Field Summary
-
Fields inherited from class org.openrefine.runners.local.pll.PLL
cachedPartitions, context, id, name
-
-
Constructor Summary
Constructors Constructor Description TextFilePLL(PLLContext context, String path, Charset encoding)
TextFilePLL(PLLContext context, String path, Charset encoding, boolean ignoreEarlyEOF)
Constructs a PLL out of a text file.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected CloseableIterator<String>
compute(Partition partition)
Iterate over the elements of the given partition.List<PLL<?>>
getParents()
Returns the PLLs that this PLL depends on, to compute its contents.io.vavr.collection.Array<? extends Partition>
getPartitions()
void
setProgressHandler(MultiFileReadingProgress progress)
-
Methods inherited from class org.openrefine.runners.local.pll.PLL
aggregate, batchPartitions, cacheAsync, collect, collectPartitionsAsync, computePartitionSizes, concatenate, concatenate, count, dropFirstElements, dropLastElements, filter, flatMap, getContext, getId, getPartitionSizes, getQueryTree, hasCachedPartitionSizes, isCached, isEmpty, iterate, iterateFromPartition, iterator, limitPartitions, map, mapPartitions, mapToPair, mapToPair, numPartitions, retainPartitions, runOnPartitions, runOnPartitions, runOnPartitionsAsync, runOnPartitionsAsync, runOnPartitionsWithoutInterruption, runOnPartitionsWithoutInterruption, saveAsTextFile, saveAsTextFileAsync, scanMap, scanMapStream, sort, take, toString, uncache, withCachedPartitionSizes, writeOriginalPartition, writePartition, writePlannedPartition, zipWithIndex
-
-
-
-
Constructor Detail
-
TextFilePLL
public TextFilePLL(PLLContext context, String path, Charset encoding) throws IOException
- Throws:
IOException
-
TextFilePLL
public TextFilePLL(PLLContext context, String path, Charset encoding, boolean ignoreEarlyEOF) throws IOException
Constructs a PLL out of a text file.- Parameters:
context
- the associated context, whose thread pool will be usedpath
- the path to the file or directory whose contents should be readencoding
- the encoding in which the files should be readignoreEarlyEOF
- whether to ignore early ends of files, due to an interrupted write- Throws:
IOException
-
-
Method Detail
-
setProgressHandler
public void setProgressHandler(MultiFileReadingProgress progress)
-
compute
protected CloseableIterator<String> compute(Partition partition)
Description copied from class:PLL
Iterate over the elements of the given partition. This is the method that should be implemented by subclasses. As this method forces computation, ignoring any caching, consumers should not call it directly but rather usePLL.iterate(Partition)
. Once the iterator is not needed anymore, it should be closed. This makes it possible to release the underlying resources supporting it, such as open files or sockets.
-
getPartitions
public io.vavr.collection.Array<? extends Partition> getPartitions()
- Specified by:
getPartitions
in classPLL<String>
- Returns:
- the partitions in this list
-
getParents
public List<PLL<?>> getParents()
Description copied from class:PLL
Returns the PLLs that this PLL depends on, to compute its contents. This is used for debugging purposes, to display the tree of dependencies of a given PLL.- Specified by:
getParents
in classPLL<String>
- See Also:
PLL.getQueryTree()
-
-