Package org.openrefine.runners.local.pll
Class IndexedPLL<T>
- java.lang.Object
-
- org.openrefine.runners.local.pll.PLL<Tuple2<Long,T>>
-
- org.openrefine.runners.local.pll.IndexedPLL<T>
-
- Type Parameters:
T
-
public class IndexedPLL<T> extends PLL<Tuple2<Long,T>>
A PLL indexed in sequential order. For each entry, the key is the index of the element in the list. This comes with a partitioner which makes it more efficient to retrieve elements by index than scanning the entire collection.- Author:
- Antonin Delpeuch
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
IndexedPLL.IndexedPartition
-
Nested classes/interfaces inherited from class org.openrefine.runners.local.pll.PLL
PLL.LastFlush, PLL.PLLExecutionError
-
-
Field Summary
-
Fields inherited from class org.openrefine.runners.local.pll.PLL
cachedPartitions, context, id, name
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
IndexedPLL(PLL<T> parent)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected CloseableIterator<Tuple2<Long,T>>
compute(Partition partition)
Iterate over the elements of the given partition.List<PLL<?>>
getParents()
Returns the PLLs that this PLL depends on, to compute its contents.io.vavr.collection.Array<? extends Partition>
getPartitions()
static <T> PairPLL<Long,T>
index(PLL<T> pll)
Create an indexed PLL by indexing an existing PLL.-
Methods inherited from class org.openrefine.runners.local.pll.PLL
aggregate, batchPartitions, cacheAsync, collect, collectPartitionsAsync, computePartitionSizes, concatenate, concatenate, count, dropFirstElements, dropLastElements, filter, flatMap, getContext, getId, getPartitionSizes, getQueryTree, hasCachedPartitionSizes, isCached, isEmpty, iterate, iterateFromPartition, iterator, limitPartitions, map, mapPartitions, mapToPair, mapToPair, numPartitions, retainPartitions, runOnPartitions, runOnPartitions, runOnPartitionsAsync, runOnPartitionsAsync, runOnPartitionsWithoutInterruption, runOnPartitionsWithoutInterruption, saveAsTextFile, saveAsTextFileAsync, scanMap, scanMapStream, sort, take, toString, uncache, withCachedPartitionSizes, writeOriginalPartition, writePartition, writePlannedPartition, zipWithIndex
-
-
-
-
Method Detail
-
index
public static <T> PairPLL<Long,T> index(PLL<T> pll)
Create an indexed PLL by indexing an existing PLL. This triggers a task to count the number of elements in all partitions but the last one.- Type Parameters:
T
-- Parameters:
pll
-- Returns:
-
compute
protected CloseableIterator<Tuple2<Long,T>> compute(Partition partition)
Description copied from class:PLL
Iterate over the elements of the given partition. This is the method that should be implemented by subclasses. As this method forces computation, ignoring any caching, consumers should not call it directly but rather usePLL.iterate(Partition)
. Once the iterator is not needed anymore, it should be closed. This makes it possible to release the underlying resources supporting it, such as open files or sockets.
-
getPartitions
public io.vavr.collection.Array<? extends Partition> getPartitions()
- Specified by:
getPartitions
in classPLL<Tuple2<Long,T>>
- Returns:
- the partitions in this list
-
getParents
public List<PLL<?>> getParents()
Description copied from class:PLL
Returns the PLLs that this PLL depends on, to compute its contents. This is used for debugging purposes, to display the tree of dependencies of a given PLL.- Specified by:
getParents
in classPLL<Tuple2<Long,T>>
- See Also:
PLL.getQueryTree()
-
-