Class PLLContext


  • public class PLLContext
    extends Object
    An object holding the necessary context instances to manipulate partitioned lazy lists (PLL).
    Author:
    Antonin Delpeuch
    • Constructor Detail

      • PLLContext

        public PLLContext​(com.google.common.util.concurrent.ListeningExecutorService executorService,
                          int defaultParallelism,
                          long minSplitSize,
                          long maxSplitSize,
                          long minSplitRowCount,
                          long maxSplitRowCount)
        Constructor.
        Parameters:
        executorService - the executor service to use to run the threads necessary for concurrent operations on PLLs
        defaultParallelism - the default number of partitions a PLL should be split in, or in other words the default number of processor cores to use for parallel operations.
        minSplitSize - the minimum size (in bytes) of a partition. The runner will attempt, when possible, not to create partitions smaller than that.
        maxSplitSize - the maximum size (in bytes) of a partition. The runner will attempt, when possible, not to create partitions bigger than that.
        minSplitRowCount - the minimum size (in number of rows) of a partition. Used when repartitioning an existing PLL.
        maxSplitRowCount - the maximum size (in number of rows) of a partition. Used when repartitioning an existing PLL.
    • Method Detail

      • getExecutorService

        public com.google.common.util.concurrent.ListeningExecutorService getExecutorService()
        Returns the thread pool used in this context
      • parallelize

        public <T> PLL<T> parallelize​(int numPartitions,
                                      List<T> rows)
        Turns a regular list into a Partitioned Lazy List.
        Parameters:
        numPartitions - the desired number of partitions
      • singlePartitionPLL

        public <T> PLL<T> singlePartitionPLL​(CloseableIterable<T> iterable,
                                             long itemCount)
        Turns a closeable iterable into a Partitioned Lazy List, which has a single partition.
        Parameters:
        iterable - the collection of elements of the PLL.
        itemCount - if known, the number of elements in the collection. If not known, -1. Supplying this may avoid iterations over the collection in some cases.
      • getDefaultParallelism

        protected int getDefaultParallelism()
        Returns the default number of partitions that text files should be split into
      • getMinSplitSize

        protected long getMinSplitSize()
        The minimum size of a partition in bytes
      • getMaxSplitSize

        protected long getMaxSplitSize()
        The maximum size of a partition in bytes
      • getMinSplitRowCount

        protected long getMinSplitRowCount()
        The minimum size of a partition in number of elements
      • getMaxSplitRowCount

        protected long getMaxSplitRowCount()
        The maximum size of a partition in number of elements
      • allocateId

        public long allocateId()