Class LocalRunner

  • All Implemented Interfaces:
    Runner

    public class LocalRunner
    extends Object
    implements Runner
    The default implementation of the Runner interface. It is optimized for local (single machine) use, with Grids and ChangeData being read off disk lazily. Those objects can be partitioned, allowing for concurrent processing via threads.
    • Field Detail

      • defaultReconciledCellCost

        protected static final int defaultReconciledCellCost
        See Also:
        Constant Field Values
      • defaultUnreconciledCellCost

        protected static final int defaultUnreconciledCellCost
        See Also:
        Constant Field Values
      • pllContext

        protected final PLLContext pllContext
      • defaultParallelism

        protected int defaultParallelism
      • minSplitSize

        protected long minSplitSize
      • maxSplitSize

        protected long maxSplitSize
      • minSplitRowCount

        protected long minSplitRowCount
      • maxSplitRowCount

        protected long maxSplitRowCount
      • reconciledCellCost

        protected int reconciledCellCost
      • unreconciledCellCost

        protected int unreconciledCellCost
      • rowCost

        protected int rowCost
    • Constructor Detail

      • LocalRunner

        public LocalRunner()
    • Method Detail

      • getPLLContext

        public PLLContext getPLLContext()
      • loadGrid

        public Grid loadGrid​(File path)
                      throws IOException
        Description copied from interface: Runner
        Loads a Grid serialized at a given location.
        Specified by:
        loadGrid in interface Runner
        Parameters:
        path - the directory where the Grid is stored
        Returns:
        the grid
        Throws:
        IOException - when loading the grid failed, or when the grid's serialization was incomplete (lacking a _SUCCESS marker)
      • gridFromIterable

        public Grid gridFromIterable​(ColumnModel columnModel,
                                     CloseableIterable<Row> rows,
                                     Map<String,​OverlayModel> overlayModels,
                                     long rowCount,
                                     long recordCount)
        Description copied from interface: Runner
        Creates a Grid from an iterable collection of rows. By default, this just gathers the iterable in a list and delegates to Runner.gridFromList(ColumnModel, List, Map), but implementations may implement a different approach which delays the loading of the collection in memory.
        Specified by:
        gridFromIterable in interface Runner
        rowCount - if the number of rows is known, supply it in this parameter as it might improve efficiency. Otherwise, set to -1.
        recordCount - if the number of records is known, supply it in this parameter as it might improve efficiency. Otherwise, set to -1.
      • incompleteUpperBounds

        protected static List<Long> incompleteUpperBounds​(List<Optional<Long>> firstKeys)
        Given a list of first keys found in the last (n-1) partitions, where n is the total number of partitions of a ChangeData object, compute a list of size n which contains for each partition up to which upper bound it should be filled up with pending change data objects.
      • fillWithIncompleteIndexedData

        protected static <T> CloseableIterator<Tuple2<Long,​IndexedData<T>>> fillWithIncompleteIndexedData​(CloseableIterator<Tuple2<Long,​IndexedData<T>>> originalIterator,
                                                                                                                long initialIndex,
                                                                                                                long upperBound)
        Utility function used to pad a stream of elements read from an incomplete change data with "virtual", pending IndexedData elements to represent those which may be computed later.
        Parameters:
        upperBound - a strict upper bound on the indices to enumerate, or Long.MAX_VALUE if unbounded
      • loadTextFile

        public Grid loadTextFile​(String path,
                                 MultiFileReadingProgress progress,
                                 Charset encoding,
                                 long limit)
                          throws IOException
        Description copied from interface: Runner
        Loads a text file as a Grid with a single column named "Column" and whose contents are the lines in the file, parsed as strings.
        Specified by:
        loadTextFile in interface Runner
        Parameters:
        path - the path to the text file to load
        encoding - TODO
        limit - the maximum number of lines to read
        Throws:
        IOException
      • changeDataFromIterable

        public <T> ChangeData<T> changeDataFromIterable​(CloseableIterable<IndexedData<T>> iterable,
                                                        long itemCount)
        Description copied from interface: Runner
        Creates a ChangeData from an iterable. By default, this just gathers the iterable in a list and delegates to Runner.changeDataFromList(List), but implementations may implement a different approach which delays the loading of the collection in memory.
        Specified by:
        changeDataFromIterable in interface Runner
        itemCount - if the number of items is known, supply it here, otherwise set this parameter to -1.
      • emptyChangeData

        public <T> ChangeData<T> emptyChangeData()
        Description copied from interface: Runner
        Creates an empty change data object of a given type, marked as incomplete.
        Specified by:
        emptyChangeData in interface Runner
      • supportsProgressReporting

        public boolean supportsProgressReporting()
        Description copied from interface: Runner
        Indicates whether this implementation supports progress reporting. If not, progress objects will be left untouched when passed to methods in this interface.
        Specified by:
        supportsProgressReporting in interface Runner
      • getReconciledCellCost

        public int getReconciledCellCost()
        Returns the predicted memory cost of a reconciled cell, when caching grids
      • getUnreconciledCellCost

        public int getUnreconciledCellCost()
        Returns the predicted memory cost of an unreconciled cell, when caching grids
      • getRowCost

        public int getRowCost()
        Returns the predicted memory cost of a row (on top of the cost of its cells), when caching grids