Package org.openrefine.runners.local.pll
Class OrderedJoinPLL<K,V,W>
- java.lang.Object
-
- org.openrefine.runners.local.pll.PLL<Tuple2<K,Tuple2<V,W>>>
-
- org.openrefine.runners.local.pll.OrderedJoinPLL<K,V,W>
-
- Type Parameters:
K
-V
-W
-
public class OrderedJoinPLL<K,V,W> extends PLL<Tuple2<K,Tuple2<V,W>>>
A PLL which represents the join of two others, assuming both are sorted by keys. The types of joins supported are listed inOrderedJoinPLL.JoinType
.
The partitions of this PLL are taken from the first PLL supplied (left). It is assumed that each key appears at most once in each collection.- Author:
- Antonin Delpeuch
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
OrderedJoinPLL.JoinPartition
static class
OrderedJoinPLL.JoinType
Type of join to perform (using SQL's terminology)-
Nested classes/interfaces inherited from class org.openrefine.runners.local.pll.PLL
PLL.LastFlush, PLL.PLLExecutionError
-
-
Field Summary
-
Fields inherited from class org.openrefine.runners.local.pll.PLL
cachedPartitions, context, id, name
-
-
Constructor Summary
Constructors Constructor Description OrderedJoinPLL(PairPLL<K,V> first, PairPLL<K,W> second, Comparator<K> comparator, OrderedJoinPLL.JoinType joinType)
Constructs a PLL representing the join of two others
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected CloseableIterator<Tuple2<K,Tuple2<V,W>>>
compute(Partition partition)
Iterate over the elements of the given partition.io.vavr.collection.Array<Long>
computePartitionSizes()
List<PLL<?>>
getParents()
Returns the PLLs that this PLL depends on, to compute its contents.Optional<Partitioner<K>>
getPartitioner()
io.vavr.collection.Array<? extends Partition>
getPartitions()
boolean
hasCachedPartitionSizes()
Is this PLL aware of the size of its partitions?protected static <K,V,W>
CloseableIterator<Tuple2<K,Tuple2<V,W>>>joinStreams(CloseableIterator<Tuple2<K,V>> firstIterator, CloseableIterator<Tuple2<K,W>> secondIterator, Comparator<K> comparator, OrderedJoinPLL.JoinType joinType)
Merges two key-ordered streams where each key is guaranteed to appear at most once in each stream.-
Methods inherited from class org.openrefine.runners.local.pll.PLL
aggregate, batchPartitions, cacheAsync, collect, collectPartitionsAsync, concatenate, concatenate, count, dropFirstElements, dropLastElements, filter, flatMap, getContext, getId, getPartitionSizes, getQueryTree, isCached, isEmpty, iterate, iterateFromPartition, iterator, limitPartitions, map, mapPartitions, mapToPair, mapToPair, numPartitions, retainPartitions, runOnPartitions, runOnPartitions, runOnPartitionsAsync, runOnPartitionsAsync, runOnPartitionsWithoutInterruption, runOnPartitionsWithoutInterruption, saveAsTextFile, saveAsTextFileAsync, scanMap, scanMapStream, sort, take, toString, uncache, withCachedPartitionSizes, writeOriginalPartition, writePartition, writePlannedPartition, zipWithIndex
-
-
-
-
Constructor Detail
-
OrderedJoinPLL
public OrderedJoinPLL(PairPLL<K,V> first, PairPLL<K,W> second, Comparator<K> comparator, OrderedJoinPLL.JoinType joinType)
Constructs a PLL representing the join of two others- Parameters:
first
- assumed to be sorted by keyssecond
- assumed to be sorted by keyscomparator
- the comparator for the common order of keysjoinType
- whether the join should be inner or outer
-
-
Method Detail
-
getPartitioner
public Optional<Partitioner<K>> getPartitioner()
-
computePartitionSizes
public io.vavr.collection.Array<Long> computePartitionSizes()
-
hasCachedPartitionSizes
public boolean hasCachedPartitionSizes()
Description copied from class:PLL
Is this PLL aware of the size of its partitions?
-
compute
protected CloseableIterator<Tuple2<K,Tuple2<V,W>>> compute(Partition partition)
Description copied from class:PLL
Iterate over the elements of the given partition. This is the method that should be implemented by subclasses. As this method forces computation, ignoring any caching, consumers should not call it directly but rather usePLL.iterate(Partition)
. Once the iterator is not needed anymore, it should be closed. This makes it possible to release the underlying resources supporting it, such as open files or sockets.
-
joinStreams
protected static <K,V,W> CloseableIterator<Tuple2<K,Tuple2<V,W>>> joinStreams(CloseableIterator<Tuple2<K,V>> firstIterator, CloseableIterator<Tuple2<K,W>> secondIterator, Comparator<K> comparator, OrderedJoinPLL.JoinType joinType)
Merges two key-ordered streams where each key is guaranteed to appear at most once in each stream.- Parameters:
firstIterator
- the first stream to joinsecondIterator
- the second stream to joincomparator
- the comparator with respect to which both are sortedjoinType
- the type of join to compute- Returns:
-
getPartitions
public io.vavr.collection.Array<? extends Partition> getPartitions()
-
getParents
public List<PLL<?>> getParents()
Description copied from class:PLL
Returns the PLLs that this PLL depends on, to compute its contents. This is used for debugging purposes, to display the tree of dependencies of a given PLL.- Specified by:
getParents
in classPLL<Tuple2<K,Tuple2<V,W>>>
- See Also:
PLL.getQueryTree()
-
-