Package org.openrefine.runners.local.pll
Class PLLContext
- java.lang.Object
-
- org.openrefine.runners.local.pll.PLLContext
-
public class PLLContext extends Object
An object holding the necessary context instances to manipulate partitioned lazy lists (PLL).- Author:
- Antonin Delpeuch
-
-
Constructor Summary
Constructors Constructor Description PLLContext(com.google.common.util.concurrent.ListeningExecutorService executorService, int defaultParallelism, long minSplitSize, long maxSplitSize, long minSplitRowCount, long maxSplitRowCount)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description longallocateId()protected intgetDefaultParallelism()Returns the default number of partitions that text files should be split intocom.google.common.util.concurrent.ListeningExecutorServicegetExecutorService()Returns the thread pool used in this contextprotected longgetMaxSplitRowCount()The maximum size of a partition in number of elementsprotected longgetMaxSplitSize()The maximum size of a partition in bytesprotected longgetMinSplitRowCount()The minimum size of a partition in number of elementsprotected longgetMinSplitSize()The minimum size of a partition in bytes<T> PLL<T>parallelize(int numPartitions, List<T> rows)Turns a regular list into a Partitioned Lazy List.voidshutdown()<T> PLL<T>singlePartitionPLL(CloseableIterable<T> iterable, long itemCount)Turns a closeable iterable into a Partitioned Lazy List, which has a single partition.TextFilePLLtextFile(String path, Charset encoding, boolean ignoreEarlyEOF)Loads a text file as a PLL.
-
-
-
Constructor Detail
-
PLLContext
public PLLContext(com.google.common.util.concurrent.ListeningExecutorService executorService, int defaultParallelism, long minSplitSize, long maxSplitSize, long minSplitRowCount, long maxSplitRowCount)Constructor.- Parameters:
executorService- the executor service to use to run the threads necessary for concurrent operations on PLLsdefaultParallelism- the default number of partitions a PLL should be split in, or in other words the default number of processor cores to use for parallel operations.minSplitSize- the minimum size (in bytes) of a partition. The runner will attempt, when possible, not to create partitions smaller than that.maxSplitSize- the maximum size (in bytes) of a partition. The runner will attempt, when possible, not to create partitions bigger than that.minSplitRowCount- the minimum size (in number of rows) of a partition. Used when repartitioning an existing PLL.maxSplitRowCount- the maximum size (in number of rows) of a partition. Used when repartitioning an existing PLL.
-
-
Method Detail
-
getExecutorService
public com.google.common.util.concurrent.ListeningExecutorService getExecutorService()
Returns the thread pool used in this context
-
textFile
public TextFilePLL textFile(String path, Charset encoding, boolean ignoreEarlyEOF) throws IOException
Loads a text file as a PLL.- Throws:
IOException
-
shutdown
public void shutdown() throws IOException- Throws:
IOException
-
parallelize
public <T> PLL<T> parallelize(int numPartitions, List<T> rows)
Turns a regular list into a Partitioned Lazy List.- Parameters:
numPartitions- the desired number of partitions
-
singlePartitionPLL
public <T> PLL<T> singlePartitionPLL(CloseableIterable<T> iterable, long itemCount)
Turns a closeable iterable into a Partitioned Lazy List, which has a single partition.- Parameters:
iterable- the collection of elements of the PLL.itemCount- if known, the number of elements in the collection. If not known, -1. Supplying this may avoid iterations over the collection in some cases.
-
getDefaultParallelism
protected int getDefaultParallelism()
Returns the default number of partitions that text files should be split into
-
getMinSplitSize
protected long getMinSplitSize()
The minimum size of a partition in bytes
-
getMaxSplitSize
protected long getMaxSplitSize()
The maximum size of a partition in bytes
-
getMinSplitRowCount
protected long getMinSplitRowCount()
The minimum size of a partition in number of elements
-
getMaxSplitRowCount
protected long getMaxSplitRowCount()
The maximum size of a partition in number of elements
-
allocateId
public long allocateId()
-
-