Package org.openrefine.runners.local.pll
Class RecordPLL
- java.lang.Object
-
- org.openrefine.runners.local.pll.PLL<Tuple2<Long,Record>>
-
- org.openrefine.runners.local.pll.RecordPLL
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
RecordPLL.RecordEnd
protected static class
RecordPLL.RecordPartition
-
Nested classes/interfaces inherited from class org.openrefine.runners.local.pll.PLL
PLL.LastFlush, PLL.PLLExecutionError
-
-
Field Summary
Fields Modifier and Type Field Description protected int
keyColumnIndex
-
Fields inherited from class org.openrefine.runners.local.pll.PLL
cachedPartitions, context, id, name
-
-
Constructor Summary
Constructors Constructor Description RecordPLL(PairPLL<Long,IndexedRow> grid, int keyColumnIndex)
Constructs a PLL of records by grouping rows together.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected CloseableIterator<Tuple2<Long,Record>>
compute(Partition partition)
Iterate over the elements of the given partition.protected static RecordPLL.RecordEnd
extractRecordEnd(CloseableIterator<IndexedRow> iterator, int keyColumnIndex)
List<PLL<?>>
getParents()
Returns the PLLs that this PLL depends on, to compute its contents.io.vavr.collection.Array<? extends Partition>
getPartitions()
static PairPLL<Long,Record>
groupIntoRecords(PairPLL<Long,IndexedRow> grid, int keyColumnIndex)
Constructs an indexed PLL of records by grouping rows together.protected static CloseableIterator<Tuple2<Long,Record>>
groupIntoRecords(CloseableIterator<IndexedRow> indexedRows, int keyCellIndex, boolean ignoreFirstRows, List<Row> additionalRows)
-
Methods inherited from class org.openrefine.runners.local.pll.PLL
aggregate, batchPartitions, cacheAsync, collect, collectPartitionsAsync, computePartitionSizes, concatenate, concatenate, count, dropFirstElements, dropLastElements, filter, flatMap, getContext, getId, getPartitionSizes, getQueryTree, hasCachedPartitionSizes, isCached, isEmpty, iterate, iterateFromPartition, iterator, limitPartitions, map, mapPartitions, mapToPair, mapToPair, numPartitions, retainPartitions, runOnPartitions, runOnPartitions, runOnPartitionsAsync, runOnPartitionsAsync, runOnPartitionsWithoutInterruption, runOnPartitionsWithoutInterruption, saveAsTextFile, saveAsTextFileAsync, scanMap, scanMapStream, sort, take, toString, uncache, withCachedPartitionSizes, writeOriginalPartition, writePartition, writePlannedPartition, zipWithIndex
-
-
-
-
Constructor Detail
-
RecordPLL
public RecordPLL(PairPLL<Long,IndexedRow> grid, int keyColumnIndex)
Constructs a PLL of records by grouping rows together. Any partitioner on the parent PLL can be used to partition this resulting PLL.- Parameters:
grid
- the PLL of rowskeyColumnIndex
- the index of the column used as record key
-
-
Method Detail
-
groupIntoRecords
public static PairPLL<Long,Record> groupIntoRecords(PairPLL<Long,IndexedRow> grid, int keyColumnIndex)
Constructs an indexed PLL of records by grouping rows together. Any partitioner on the parent PLL will be used on the resulting pair PLL.- Parameters:
grid
- the PLL of rowskeyColumnIndex
- the index of the column used as record key
-
extractRecordEnd
protected static RecordPLL.RecordEnd extractRecordEnd(CloseableIterator<IndexedRow> iterator, int keyColumnIndex)
-
groupIntoRecords
protected static CloseableIterator<Tuple2<Long,Record>> groupIntoRecords(CloseableIterator<IndexedRow> indexedRows, int keyCellIndex, boolean ignoreFirstRows, List<Row> additionalRows)
-
compute
protected CloseableIterator<Tuple2<Long,Record>> compute(Partition partition)
Description copied from class:PLL
Iterate over the elements of the given partition. This is the method that should be implemented by subclasses. As this method forces computation, ignoring any caching, consumers should not call it directly but rather usePLL.iterate(Partition)
. Once the iterator is not needed anymore, it should be closed. This makes it possible to release the underlying resources supporting it, such as open files or sockets.
-
getPartitions
public io.vavr.collection.Array<? extends Partition> getPartitions()
- Specified by:
getPartitions
in classPLL<Tuple2<Long,Record>>
- Returns:
- the partitions in this list
-
getParents
public List<PLL<?>> getParents()
Description copied from class:PLL
Returns the PLLs that this PLL depends on, to compute its contents. This is used for debugging purposes, to display the tree of dependencies of a given PLL.- Specified by:
getParents
in classPLL<Tuple2<Long,Record>>
- See Also:
PLL.getQueryTree()
-
-