java.lang.Object
- org.openrefine.runners.local.pll.PLL<Tuple2<K,V>>
- - org.openrefine.runners.local.pll.PairPLL<K,V>

Type Parameters:

K - the type of keys in the collection

V - the type of values in the collection
```
public class PairPLL<K,V>
extends PLL<Tuple2<K,V>>
```
Adds additional methods for PLLs of keyed collections. The supplied Partitioner determines in which partition an element should be found, depending on its key.

Author:

Antonin Delpeuch

Nested Class Summary
- Nested classes/interfaces inherited from class org.openrefine.runners.local.pll.PLL
  PLL.LastFlush, PLL.PLLExecutionError

Field Summary

Fields
Modifier and Type Field Description

protected Optional<Partitioner<K>> partitioner

protected PLL<Tuple2<K,V>> pll
- Fields inherited from class org.openrefine.runners.local.pll.PLL
  cachedPartitions, context, id, name

Constructor Summary

Constructors
Modifier	Constructor	Description
	`PairPLL(PLL<Tuple2<K,V>> pll, Optional<Partitioner<K>> partitioner)`
`protected`	`PairPLL(PLL<Tuple2<K,V>> pll, Optional<Partitioner<K>> partitioner, io.vavr.collection.Array<Long> partitionSizes)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`static <T> PairPLL<Long,T>`	`assumeIndexed(PairPLL<Long,T> pairPLL, long totalRowCount)`	Assuming that the keys of the PairPLL are indices, deduce the partition sizes from the first element of each partition and the total number of elements, creating an appropriate RangePartitioner and adding it to the PLL.
`static <T> PairPLL<Long,T>`	`assumeSorted(PairPLL<Long,T> pairPLL)`	Assumes that a PLL is sorted by key and derive the appropriate partitioner for it.
`ProgressingFuture<Void>`	`cacheAsync()`	Prevent double caching from this PairPLL wrapper and the underlying PLL by defering the caching to the child.
`protected CloseableIterator<Tuple2<K,V>>`	`compute(Partition partition)`	Iterate over the elements of the given partition.
`io.vavr.collection.Array<Long>`	`computePartitionSizes()`
`PairPLL<K,V>`	`dropFirstElements(long n)`	Drops the first n elements at the beginning of the collection.
`PairPLL<K,V>`	`dropLastElements(long n)`	Drops the first n elements at the end of the collection.
`PairPLL<K,V>`	`filter(Predicate<? super Tuple2<K,V>> predicate)`	Derives a new PLL by filtering the collection to only contain elements which match the supplied predicate.
`<W> PairPLL<K,Tuple2<V,W>>`	`fullJoinOrdered(PairPLL<K,W> other, Comparator<K> comparator)`	Assuming both PairPLLs are ordered by key, and each key appears at most once in each dataset, returns an ordered PairPLL with the full (outer) join of both PLLs.
`protected static <K,V> List<Tuple2<K,V>>`	`gatherElementsBefore(K upperBound, int limit, CloseableIterator<Tuple2<K,V>> stream, Comparator<K> comparator)`	Returns the last n elements whose key is strictly less than the supplied upper bound.
`io.vavr.collection.Array<V>`	`get(K key)`	Returns the list of elements of the PLL indexed by the given key.
`io.vavr.collection.Array<Tuple2<K,V>>`	`getByKeys(Set<K> keys)`	Returns the list of elements whose keys match one of the supplied keys.
`List<PLL<?>>`	`getParents()`	Returns directly the parents of the underlying PLL of this PairPLL, so that the PairPLL wrapper is transparent in query trees.
`Optional<Partitioner<K>>`	`getPartitioner()`
`io.vavr.collection.Array<? extends Partition>`	`getPartitions()`
`QueryTree`	`getQueryTree()`
`List<Tuple2<K,V>>`	`getRangeAfter(K from, int limit, Comparator<K> comparator)`	Returns the list of elements starting at the given key and for up to the given number of elements.
`List<Tuple2<K,V>>`	`getRangeBefore(K upperBound, int limit, Comparator<K> comparator)`	Returns the list of elements ending at the given key (excluded) and for up to the given number of elements.
`boolean`	`hasCachedPartitionSizes()`	Is this PLL aware of the size of its partitions?
`<W> PairPLL<K,Tuple2<V,W>>`	`innerJoinOrdered(PairPLL<K,W> other, Comparator<K> comparator)`	Assuming both PairPLLs are ordered by key, and each key appears at most once in each dataset, returns an ordered PairPLL with the inner join of both PLLs.
`boolean`	`isCached()`	Prevent double caching from this PairPLL wrapper and the underlying PLL by defering the caching to the child.
`CloseableIterator<Tuple2<K,V>>`	`iterate(Partition partition)`	Iterate over the elements of the given partition.
`PLL<K>`	`keys()`	Returns a PLL of the keys contained in this collection, in the same order.
`<W> PairPLL<K,Tuple2<V,W>>`	`leftJoinOrdered(PairPLL<K,W> other, Comparator<K> comparator)`	Assuming both PairPLLs are ordered by key, and each key appears at most once in each dataset, returns an ordered PairPLL with the left join of both PLLs.
`PairPLL<K,V>`	`limitPartitions(long limit)`	Limit each partition to contain only their first N elements.
`<W> PairPLL<K,W>`	`mapValues(BiFunction<K,V,W> mapFunction, String mapDescription)`	Returns a PLL obtained by mapping each element and preserving the indexing.
`PairPLL<K,V>`	`retainPartitions(List<Integer> partitionIndices)`	Only retain partitions designated by the given list of indices.
`<W> PairPLL<K,Tuple2<V,W>>`	`rightJoinOrdered(PairPLL<K,W> other, Comparator<K> comparator)`	Assuming both PairPLLs are ordered by key, and each key appears at most once in each dataset, returns an ordered PairPLL with the right join of both PLLs.
`CloseableIterator<Tuple2<K,V>>`	`streamBetweenKeys(Optional<K> from, Optional<K> upTo, Comparator<K> comparator)`	Iterates over the elements of this PLL between the given keys, where both boundaries are optional.
`CloseableIterator<Tuple2<K,V>>`	`streamBetweenKeys(K from, K upTo, Comparator<K> comparator)`	Iterates over the elements of this PLL from the given key (inclusive) and up to the other given key (exclusive).
`CloseableIterator<Tuple2<K,V>>`	`streamFromKey(K from, Comparator<K> comparator)`	Iterates over the elements of this PLL starting from a given key (inclusive).
`CloseableIterator<Tuple2<K,V>>`	`streamUpToKey(K upTo, Comparator<K> comparator)`	Iterates over the elements of this PLL up to the given key (exclusive).
`PLL<Tuple2<K,V>>`	`toPLL()`	Returns the underlying PLL, discarding any partitioner.
`String`	`toString()`
`void`	`uncache()`	Unloads the partition contents from memory
`PLL<V>`	`values()`	Returns a PLL of the values contained in this collection, in the same order.
`PairPLL<K,V>`	`withCachedPartitionSizes(io.vavr.collection.Array<Long> newCachedPartitionSizes)`	Returns a copy of this PairPLL with the given partition sizes, when they are externally known.
`PairPLL<K,V>`	`withPartitioner(Optional<Partitioner<K>> partitioner)`	Returns a copy of this PairPLL with a changed partitioner.

Methods inherited from class org.openrefine.runners.local.pll.PLL
aggregate, batchPartitions, collect, collectPartitionsAsync, concatenate, concatenate, count, flatMap, getContext, getId, getPartitionSizes, isEmpty, iterateFromPartition, iterator, map, mapPartitions, mapToPair, mapToPair, numPartitions, runOnPartitions, runOnPartitions, runOnPartitionsAsync, runOnPartitionsAsync, runOnPartitionsWithoutInterruption, runOnPartitionsWithoutInterruption, saveAsTextFile, saveAsTextFileAsync, scanMap, scanMapStream, sort, take, writeOriginalPartition, writePartition, writePlannedPartition, zipWithIndex

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - partitioner
```
protected final Optional<Partitioner<K>> partitioner
```
  - pll
```
protected final PLL<Tuple2<K,V>> pll
```
- Constructor Detail
  - PairPLL
```
public PairPLL(PLL<Tuple2<K,V>> pll,
               Optional<Partitioner<K>> partitioner)
```
  - PairPLL
```
protected PairPLL(PLL<Tuple2<K,V>> pll,
                  Optional<Partitioner<K>> partitioner,
                  io.vavr.collection.Array<Long> partitionSizes)
```
- Method Detail
  - getPartitioner
```
public Optional<Partitioner<K>> getPartitioner()
```
    Returns:
    
    the partitioner used to locate elements by key
  - computePartitionSizes
```
public io.vavr.collection.Array<Long> computePartitionSizes()
```
    Overrides:
    
    computePartitionSizes in class PLL<Tuple2<K,V>>
  - hasCachedPartitionSizes
```
public boolean hasCachedPartitionSizes()
```
    Description copied from class: PLL
    
    Is this PLL aware of the size of its partitions?
    
    Overrides:
    
    hasCachedPartitionSizes in class PLL<Tuple2<K,V>>
  - keys
```
public PLL<K> keys()
```
    Returns a PLL of the keys contained in this collection, in the same order.
    
    Returns:
  - values
```
public PLL<V> values()
```
    Returns a PLL of the values contained in this collection, in the same order.
    
    Returns:
  - mapValues
```
public <W> PairPLL<K,W> mapValues(BiFunction<K,V,W> mapFunction,
                                        String mapDescription)
```
    Returns a PLL obtained by mapping each element and preserving the indexing. If a partitioner is set on this PLL, it will also be used by the returned PLL.
    
    Type Parameters:
    
    W -
    
    Parameters:
    
    mapFunction - the function to apply on each element
    
    mapDescription - a short description of the function, for debugging purposes
    
    Returns:
  - get
```
public io.vavr.collection.Array<V> get(K key)
```
    Returns the list of elements of the PLL indexed by the given key. This operation will be more efficient if a partitioner is available, making it possible to scan the relevant partition only. Furthermore, if the PLL is known to be sorted (which happens when it bears a RangePartitioner), then we only scan the relevant partition up to the given key and do not scan the greater keys.
  - getRangeAfter
```
public List<Tuple2<K,V>> getRangeAfter(K from,
                                             int limit,
                                             Comparator<K> comparator)
```
    Returns the list of elements starting at the given key and for up to the given number of elements. This assumes that the PLL is sorted by keys.
    
    Parameters:
    
    from - the first key to return
    
    limit - the maximum number of elements to return
    
    comparator - the ordering on K that is assumed on the PLL
    
    Returns:
  - getRangeBefore
```
public List<Tuple2<K,V>> getRangeBefore(K upperBound,
                                              int limit,
                                              Comparator<K> comparator)
```
    Returns the list of elements ending at the given key (excluded) and for up to the given number of elements. This assumes that the PLL is sorted by keys.
    
    Parameters:
    
    upperBound - the least key not to return
    
    limit - the maximum number of elements to return
    
    comparator - the ordering on K that is assumed on the PLL
    
    Returns:
    
    TODO this could be optimized further, when the partitions are cached in memory, we can iterate from them in reverse
  - gatherElementsBefore
```
protected static <K,V> List<Tuple2<K,V>> gatherElementsBefore(K upperBound,
                                                                          int limit,
                                                                          CloseableIterator<Tuple2<K,V>> stream,
                                                                          Comparator<K> comparator)
```
    Returns the last n elements whose key is strictly less than the supplied upper bound.
    
    Parameters:
    
    stream - the stream to take the elements from, which is assumed to be in increasing order
  - getByKeys
```
public io.vavr.collection.Array<Tuple2<K,V>> getByKeys(Set<K> keys)
```
    Returns the list of elements whose keys match one of the supplied keys.
    
    Parameters:
    
    keys - the keys to look up
    
    Returns:
    
    the list of elements in the order they appear in the PLL
  - streamFromKey
```
public CloseableIterator<Tuple2<K,V>> streamFromKey(K from,
                                                          Comparator<K> comparator)
```
    Iterates over the elements of this PLL starting from a given key (inclusive). This assumes that the PLL is sorted.
    
    Parameters:
    
    from - the first key to start iterating from
    
    comparator - the order used to compare the keys
    
    Returns:
    
    a streams which starts on the first element whose key is greater or equal to the provided one
  - streamUpToKey
```
public CloseableIterator<Tuple2<K,V>> streamUpToKey(K upTo,
                                                          Comparator<K> comparator)
```
    Iterates over the elements of this PLL up to the given key (exclusive). This assumes that the PLL is sorted.
    
    Parameters:
    
    upTo - the key to stop iterating at
    
    comparator - the order used to compare the keys
    
    Returns:
  - streamBetweenKeys
```
public CloseableIterator<Tuple2<K,V>> streamBetweenKeys(K from,
                                                              K upTo,
                                                              Comparator<K> comparator)
```
    Iterates over the elements of this PLL from the given key (inclusive) and up to the other given key (exclusive). This assumes that the PLL is sorted.
    
    Parameters:
    
    from - the key to start iterating from
    
    upTo - the key to stop iterating at
    
    comparator - the order used to compare the keys
    
    Returns:
  - streamBetweenKeys
```
public CloseableIterator<Tuple2<K,V>> streamBetweenKeys(Optional<K> from,
                                                              Optional<K> upTo,
                                                              Comparator<K> comparator)
```
    Iterates over the elements of this PLL between the given keys, where both boundaries are optional. This assumes that the PLL is sorted.
    
    Parameters:
    
    from -
    
    upTo -
    
    comparator -
    
    Returns:
  - filter
```
public PairPLL<K,V> filter(Predicate<? super Tuple2<K,V>> predicate)
```
    Description copied from class: PLL
    
    Derives a new PLL by filtering the collection to only contain elements which match the supplied predicate. The predicate is evaluated lazily, so it can be called multiple times on the same element, depending on the actions called on the returned PLL.
    
    Overrides:
    
    filter in class PLL<Tuple2<K,V>>
    
    Returns:
  - limitPartitions
```
public PairPLL<K,V> limitPartitions(long limit)
```
    Description copied from class: PLL
    
    Limit each partition to contain only their first N elements.
    
    Overrides:
    
    limitPartitions in class PLL<Tuple2<K,V>>
    
    Parameters:
    
    limit - the maximum number of items per partition
    
    Returns:
  - retainPartitions
```
public PairPLL<K,V> retainPartitions(List<Integer> partitionIndices)
```
    Description copied from class: PLL
    
    Only retain partitions designated by the given list of indices. The other partitions are dropped from the PLL. The partitions in the resulting PLL are ordered according to the list of indices supplied.
    
    Overrides:
    
    retainPartitions in class PLL<Tuple2<K,V>>
    
    Parameters:
    
    partitionIndices - the indices of the partitions to retain
  - dropFirstElements
```
public PairPLL<K,V> dropFirstElements(long n)
```
    Drops the first n elements at the beginning of the collection. This also adapts any partitioner to work on the cropped collection.
    
    Overrides:
    
    dropFirstElements in class PLL<Tuple2<K,V>>
    
    Parameters:
    
    n - the number of elements to remove
    
    Returns:
  - dropLastElements
```
public PairPLL<K,V> dropLastElements(long n)
```
    Drops the first n elements at the end of the collection. This also adapts any partitioner to work on the cropped collection.
    
    Overrides:
    
    dropLastElements in class PLL<Tuple2<K,V>>
    
    Parameters:
    
    n - the number of elements to remove
    
    Returns:
  - compute
```
protected CloseableIterator<Tuple2<K,V>> compute(Partition partition)
```
    Description copied from class: PLL
    
    Iterate over the elements of the given partition. This is the method that should be implemented by subclasses. As this method forces computation, ignoring any caching, consumers should not call it directly but rather use PLL.iterate(Partition). Once the iterator is not needed anymore, it should be closed. This makes it possible to release the underlying resources supporting it, such as open files or sockets.
    
    Specified by:
    
    compute in class PLL<Tuple2<K,V>>
    
    Parameters:
    
    partition - the partition to iterate over
    
    Returns:
  - iterate
```
public CloseableIterator<Tuple2<K,V>> iterate(Partition partition)
```
    Description copied from class: PLL
    
    Iterate over the elements of the given partition. If the contents of this PLL have been cached, this will iterate from the cache instead.
    
    Overrides:
    
    iterate in class PLL<Tuple2<K,V>>
    
    Parameters:
    
    partition - the partition to iterate over
    
    Returns:
  - getPartitions
```
public io.vavr.collection.Array<? extends Partition> getPartitions()
```
    Specified by:
    
    getPartitions in class PLL<Tuple2<K,V>>
    
    Returns:
    
    the partitions in this list
  - toPLL
```
public PLL<Tuple2<K,V>> toPLL()
```
    Returns the underlying PLL, discarding any partitioner.
  - assumeIndexed
```
public static <T> PairPLL<Long,T> assumeIndexed(PairPLL<Long,T> pairPLL,
                                                      long totalRowCount)
```
    Assuming that the keys of the PairPLL are indices, deduce the partition sizes from the first element of each partition and the total number of elements, creating an appropriate RangePartitioner and adding it to the PLL.
    If the total row count is unknown (negative) then partition sizes are not inferred.
    Note: this method is static for type-checking purposes (it can only apply to a PairPLL keyed by Long).
    
    Parameters:
    
    pairPLL - the new PLL with the derived partitioner
    
    totalRowCount - the known row count
    
    Returns:
  - assumeSorted
```
public static <T> PairPLL<Long,T> assumeSorted(PairPLL<Long,T> pairPLL)
```
    Assumes that a PLL is sorted by key and derive the appropriate partitioner for it.
    
    Type Parameters:
    
    T -
    
    Parameters:
    
    pairPLL -
    
    Returns:
  - withPartitioner
```
public PairPLL<K,V> withPartitioner(Optional<Partitioner<K>> partitioner)
```
    Returns a copy of this PairPLL with a changed partitioner.
    
    Parameters:
    
    partitioner -
    
    Returns:
  - withCachedPartitionSizes
```
public PairPLL<K,V> withCachedPartitionSizes(io.vavr.collection.Array<Long> newCachedPartitionSizes)
```
    Returns a copy of this PairPLL with the given partition sizes, when they are externally known.
    
    Overrides:
    
    withCachedPartitionSizes in class PLL<Tuple2<K,V>>
    
    Returns:
  - getParents
```
public List<PLL<?>> getParents()
```
    Returns directly the parents of the underlying PLL of this PairPLL, so that the PairPLL wrapper is transparent in query trees. This is because the PairPLL class is a bare wrapper on top of PLL for type-checking purposes, such as to hold a partitioner, without adding any computation on top of the underlying PLL.
    
    Specified by:
    
    getParents in class PLL<Tuple2<K,V>>
    
    See Also:
    
    PLL.getQueryTree()
  - getQueryTree
```
public QueryTree getQueryTree()
```
    Overrides:
    
    getQueryTree in class PLL<Tuple2<K,V>>
    
    Returns:
    
    a tree-based representation of the dependencies of this PLL.
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class PLL<Tuple2<K,V>>
  - isCached
```
public boolean isCached()
```
    Prevent double caching from this PairPLL wrapper and the underlying PLL by defering the caching to the child.
    
    Overrides:
    
    isCached in class PLL<Tuple2<K,V>>
  - cacheAsync
```
public ProgressingFuture<Void> cacheAsync()
```
    Prevent double caching from this PairPLL wrapper and the underlying PLL by defering the caching to the child.
    
    Overrides:
    
    cacheAsync in class PLL<Tuple2<K,V>>
  - uncache
```
public void uncache()
```
    Description copied from class: PLL
    
    Unloads the partition contents from memory
    
    Overrides:
    
    uncache in class PLL<Tuple2<K,V>>
  - innerJoinOrdered
```
public <W> PairPLL<K,Tuple2<V,W>> innerJoinOrdered(PairPLL<K,W> other,
                                                               Comparator<K> comparator)
```
    Assuming both PairPLLs are ordered by key, and each key appears at most once in each dataset, returns an ordered PairPLL with the inner join of both PLLs. This resulting PLL is partitioned with the same partitioner as the left PLL (the instance on which this method is called).
  - leftJoinOrdered
```
public <W> PairPLL<K,Tuple2<V,W>> leftJoinOrdered(PairPLL<K,W> other,
                                                              Comparator<K> comparator)
```
    Assuming both PairPLLs are ordered by key, and each key appears at most once in each dataset, returns an ordered PairPLL with the left join of both PLLs. This resulting PLL is partitioned with the same partitioner as the left PLL (the instance on which this method is called).
  - rightJoinOrdered
```
public <W> PairPLL<K,Tuple2<V,W>> rightJoinOrdered(PairPLL<K,W> other,
                                                               Comparator<K> comparator)
```
    Assuming both PairPLLs are ordered by key, and each key appears at most once in each dataset, returns an ordered PairPLL with the right join of both PLLs. This resulting PLL is partitioned with the same partitioner as the left PLL (the instance on which this method is called).
  - fullJoinOrdered
```
public <W> PairPLL<K,Tuple2<V,W>> fullJoinOrdered(PairPLL<K,W> other,
                                                              Comparator<K> comparator)
```
    Assuming both PairPLLs are ordered by key, and each key appears at most once in each dataset, returns an ordered PairPLL with the full (outer) join of both PLLs. This resulting PLL is partitioned with the same partitioner as the left PLL (the instance on which this method is called).

Modifier and Type	Field	Description
`protected Optional<Partitioner<K>>`	`partitioner`
`protected PLL<Tuple2<K,V>>`	`pll`

Class PairPLL<K,​V>

Nested Class Summary

Nested classes/interfaces inherited from class org.openrefine.runners.local.pll.PLL

Field Summary

Fields inherited from class org.openrefine.runners.local.pll.PLL

Constructor Summary

Method Summary

Methods inherited from class org.openrefine.runners.local.pll.PLL

Methods inherited from class java.lang.Object

Field Detail

partitioner

pll

Constructor Detail

PairPLL

PairPLL

Method Detail

getPartitioner

computePartitionSizes

hasCachedPartitionSizes

keys

values

mapValues

get

getRangeAfter

getRangeBefore

gatherElementsBefore

getByKeys

streamFromKey

streamUpToKey

streamBetweenKeys

streamBetweenKeys

filter

limitPartitions

retainPartitions

dropFirstElements

dropLastElements

compute

iterate

getPartitions

toPLL

assumeIndexed

assumeSorted

withPartitioner

withCachedPartitionSizes

getParents

getQueryTree

toString

isCached

cacheAsync

uncache

innerJoinOrdered

leftJoinOrdered

rightJoinOrdered

fullJoinOrdered

Class PairPLL<K,V>