Package org.openrefine.runners.local.pll
Class PLLContext
- java.lang.Object
-
- org.openrefine.runners.local.pll.PLLContext
-
public class PLLContext extends Object
An object holding the necessary context instances to manipulate partitioned lazy lists (PLL).- Author:
- Antonin Delpeuch
-
-
Constructor Summary
Constructors Constructor Description PLLContext(com.google.common.util.concurrent.ListeningExecutorService executorService, int defaultParallelism, long minSplitSize, long maxSplitSize, long minSplitRowCount, long maxSplitRowCount)
Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description long
allocateId()
protected int
getDefaultParallelism()
Returns the default number of partitions that text files should be split intocom.google.common.util.concurrent.ListeningExecutorService
getExecutorService()
Returns the thread pool used in this contextprotected long
getMaxSplitRowCount()
The maximum size of a partition in number of elementsprotected long
getMaxSplitSize()
The maximum size of a partition in bytesprotected long
getMinSplitRowCount()
The minimum size of a partition in number of elementsprotected long
getMinSplitSize()
The minimum size of a partition in bytes<T> PLL<T>
parallelize(int numPartitions, List<T> rows)
Turns a regular list into a Partitioned Lazy List.void
shutdown()
<T> PLL<T>
singlePartitionPLL(CloseableIterable<T> iterable, long itemCount)
Turns a closeable iterable into a Partitioned Lazy List, which has a single partition.TextFilePLL
textFile(String path, Charset encoding, boolean ignoreEarlyEOF)
Loads a text file as a PLL.
-
-
-
Constructor Detail
-
PLLContext
public PLLContext(com.google.common.util.concurrent.ListeningExecutorService executorService, int defaultParallelism, long minSplitSize, long maxSplitSize, long minSplitRowCount, long maxSplitRowCount)
Constructor.- Parameters:
executorService
- the executor service to use to run the threads necessary for concurrent operations on PLLsdefaultParallelism
- the default number of partitions a PLL should be split in, or in other words the default number of processor cores to use for parallel operations.minSplitSize
- the minimum size (in bytes) of a partition. The runner will attempt, when possible, not to create partitions smaller than that.maxSplitSize
- the maximum size (in bytes) of a partition. The runner will attempt, when possible, not to create partitions bigger than that.minSplitRowCount
- the minimum size (in number of rows) of a partition. Used when repartitioning an existing PLL.maxSplitRowCount
- the maximum size (in number of rows) of a partition. Used when repartitioning an existing PLL.
-
-
Method Detail
-
getExecutorService
public com.google.common.util.concurrent.ListeningExecutorService getExecutorService()
Returns the thread pool used in this context
-
textFile
public TextFilePLL textFile(String path, Charset encoding, boolean ignoreEarlyEOF) throws IOException
Loads a text file as a PLL.- Throws:
IOException
-
shutdown
public void shutdown() throws IOException
- Throws:
IOException
-
parallelize
public <T> PLL<T> parallelize(int numPartitions, List<T> rows)
Turns a regular list into a Partitioned Lazy List.- Parameters:
numPartitions
- the desired number of partitions
-
singlePartitionPLL
public <T> PLL<T> singlePartitionPLL(CloseableIterable<T> iterable, long itemCount)
Turns a closeable iterable into a Partitioned Lazy List, which has a single partition.- Parameters:
iterable
- the collection of elements of the PLL.itemCount
- if known, the number of elements in the collection. If not known, -1. Supplying this may avoid iterations over the collection in some cases.
-
getDefaultParallelism
protected int getDefaultParallelism()
Returns the default number of partitions that text files should be split into
-
getMinSplitSize
protected long getMinSplitSize()
The minimum size of a partition in bytes
-
getMaxSplitSize
protected long getMaxSplitSize()
The maximum size of a partition in bytes
-
getMinSplitRowCount
protected long getMinSplitRowCount()
The minimum size of a partition in number of elements
-
getMaxSplitRowCount
protected long getMaxSplitRowCount()
The maximum size of a partition in number of elements
-
allocateId
public long allocateId()
-
-