Package org.openrefine.runners.local.pll
Class SinglePartitionPLL<T>
- java.lang.Object
-
- org.openrefine.runners.local.pll.PLL<T>
-
- org.openrefine.runners.local.pll.SinglePartitionPLL<T>
-
public class SinglePartitionPLL<T> extends PLL<T>
A PLL which wraps an iterable, with a single partition corresponding to the iterable itself. This is typically not very efficient, as it prevents any meaningful parallelization of terminal operations run on this PLL or its derivatives. However, this can be a useful start before the PLL is repartitioned (for instance while it is saved).
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.openrefine.runners.local.pll.PLL
PLL.LastFlush, PLL.PLLExecutionError
-
-
Field Summary
Fields Modifier and Type Field Description protected CloseableIterable<T>
iterable
protected long
knownSize
-
Fields inherited from class org.openrefine.runners.local.pll.PLL
cachedPartitions, context, id, name
-
-
Constructor Summary
Constructors Constructor Description SinglePartitionPLL(PLLContext context, CloseableIterable<T> iterable, long knownSize)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected CloseableIterator<T>
compute(Partition partition)
Iterate over the elements of the given partition.io.vavr.collection.Array<Long>
computePartitionSizes()
List<PLL<?>>
getParents()
Returns the PLLs that this PLL depends on, to compute its contents.io.vavr.collection.Array<? extends Partition>
getPartitions()
boolean
hasCachedPartitionSizes()
Is this PLL aware of the size of its partitions?-
Methods inherited from class org.openrefine.runners.local.pll.PLL
aggregate, batchPartitions, cacheAsync, collect, collectPartitionsAsync, concatenate, concatenate, count, dropFirstElements, dropLastElements, filter, flatMap, getContext, getId, getPartitionSizes, getQueryTree, isCached, isEmpty, iterate, iterateFromPartition, iterator, limitPartitions, map, mapPartitions, mapToPair, mapToPair, numPartitions, retainPartitions, runOnPartitions, runOnPartitions, runOnPartitionsAsync, runOnPartitionsAsync, runOnPartitionsWithoutInterruption, runOnPartitionsWithoutInterruption, saveAsTextFile, saveAsTextFileAsync, scanMap, scanMapStream, sort, take, toString, uncache, withCachedPartitionSizes, writeOriginalPartition, writePartition, writePlannedPartition, zipWithIndex
-
-
-
-
Field Detail
-
iterable
protected final CloseableIterable<T> iterable
-
knownSize
protected final long knownSize
-
-
Constructor Detail
-
SinglePartitionPLL
public SinglePartitionPLL(PLLContext context, CloseableIterable<T> iterable, long knownSize)
-
-
Method Detail
-
hasCachedPartitionSizes
public boolean hasCachedPartitionSizes()
Description copied from class:PLL
Is this PLL aware of the size of its partitions?- Overrides:
hasCachedPartitionSizes
in classPLL<T>
-
computePartitionSizes
public io.vavr.collection.Array<Long> computePartitionSizes()
- Overrides:
computePartitionSizes
in classPLL<T>
-
compute
protected CloseableIterator<T> compute(Partition partition)
Description copied from class:PLL
Iterate over the elements of the given partition. This is the method that should be implemented by subclasses. As this method forces computation, ignoring any caching, consumers should not call it directly but rather usePLL.iterate(Partition)
. Once the iterator is not needed anymore, it should be closed. This makes it possible to release the underlying resources supporting it, such as open files or sockets.
-
getPartitions
public io.vavr.collection.Array<? extends Partition> getPartitions()
- Specified by:
getPartitions
in classPLL<T>
- Returns:
- the partitions in this list
-
getParents
public List<PLL<?>> getParents()
Description copied from class:PLL
Returns the PLLs that this PLL depends on, to compute its contents. This is used for debugging purposes, to display the tree of dependencies of a given PLL.- Specified by:
getParents
in classPLL<T>
- See Also:
PLL.getQueryTree()
-
-