Package org.openrefine.runners.local
Class LocalGrid
- java.lang.Object
-
- org.openrefine.runners.local.LocalGrid
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.openrefine.model.Grid
Grid.ApproxCount, Grid.Metadata, Grid.PartialAggregation<T extends Serializable>
-
-
Field Summary
Fields Modifier and Type Field Description protected long
cachedRecordCount
protected ColumnModel
columnModel
protected boolean
constructedFromRows
protected PairPLL<Long,IndexedRow>
grid
protected Map<String,OverlayModel>
overlayModels
protected PairPLL<Long,Record>
records
protected LocalRunner
runner
-
Fields inherited from interface org.openrefine.model.Grid
GRID_PATH, METADATA_PATH
-
-
Constructor Summary
Constructors Constructor Description LocalGrid(LocalRunner runner, ColumnModel columnModel, PairPLL<Long,IndexedRow> grid, Map<String,OverlayModel> overlayModels, long cachedRecordCount)
Constructs a grid, supplying all required fields.LocalGrid(LocalRunner runner, PairPLL<Long,Row> grid, ColumnModel columnModel, Map<String,OverlayModel> overlayModels, long cachedRecordCount)
Convenience constructor to construct a grid from a PLL of a slightly different type.LocalGrid(PairPLL<Long,Record> records, LocalRunner runner, ColumnModel columnModel, Map<String,OverlayModel> overlayModels, long rowCount)
Constructs a grid from a grid of records.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description <T extends Serializable>
TaggregateRecords(RecordAggregator<T> aggregator, T initialState)
Computes the result of a row aggregator on the grid.<T extends Serializable>
Grid.PartialAggregation<T>aggregateRecordsApprox(RecordAggregator<T> aggregator, T initialState, long maxRecords)
Computes the result of a row aggregator on the grid, reading about at most a fixed number of records.<T extends Serializable>
TaggregateRows(RowAggregator<T> aggregator, T initialState)
Computes the result of a row aggregator on the grid.<T extends Serializable>
Grid.PartialAggregation<T>aggregateRowsApprox(RowAggregator<T> aggregator, T initialState, long maxRows)
Computes the result of a row aggregator on the grid, reading about at most a fixed number of rows.protected static <T> CloseableIterator<Tuple2<Long,IndexedData<T>>>
applyRecordChangeDataMapper(RecordChangeDataProducer<T> recordMapper, List<Tuple2<Long,Record>> recordBatch)
protected static <T> CloseableIterator<Tuple2<Long,IndexedData<T>>>
applyRecordChangeDataMapperWithIncompleteData(RecordChangeDataProducer<T> recordMapper, List<Tuple2<Long,Tuple2<Record,IndexedData<T>>>> recordBatch)
protected static <T> CloseableIterator<Tuple2<Long,IndexedData<T>>>
applyRowChangeDataMapper(RowChangeDataProducer<T> rowMapper, List<IndexedRow> rowBatch)
protected static <T> CloseableIterator<Tuple2<Long,IndexedData<T>>>
applyRowChangeDataMapperWithIncompleteData(RowChangeDataProducer<T> rowMapper, List<Tuple2<Long,Tuple2<IndexedRow,IndexedData<T>>>> rowBatch)
boolean
cache()
Attempt to cache this grid in memory.ProgressingFuture<Boolean>
cacheAsync()
Attempt to cache this grid in memory, in an async way.List<Record>
collectRecords()
Returns all records in a list.List<IndexedRow>
collectRows()
Returns all rows in a list.Grid
concatenate(List<Grid> otherGrids)
Concatenates this with other grids, in the given order.Grid
concatenate(Grid other)
Creates a new grid containing all rows in this grid, followed by all rows in the other grid supplied.protected static Grid
concatenateGridList(List<Grid> grids)
long
countMatchingRecords(RecordFilter filter)
Return the number of records which are filtered by this filter.Grid.ApproxCount
countMatchingRecordsApprox(RecordFilter filter, long limit)
Return the number of records matching the given record filter, but by processing about at most a fixed number of records.long
countMatchingRows(RowFilter filter)
Count the number of rows which match a given filter.Grid.ApproxCount
countMatchingRowsApprox(RowFilter filter, long limit)
Return the number of rows matching the given row filter, but by processing about at most a fixed number of row.Grid
dropRows(long rowsToDrop)
Drop the first rows.Grid
flatMapRows(RowFlatMapper mapper, ColumnModel newColumnModel)
Returns a new grid, where the rows have been mapped by the flat mapper.ColumnModel
getColumnModel()
protected Grid.Metadata
getMetadata()
Map<String,OverlayModel>
getOverlayModels()
Record
getRecord(long id)
Returns a record obtained by its id.List<Record>
getRecordsAfter(long start, int limit)
Returns a list of records, starting from a given index and defined by a maximum size.List<Record>
getRecordsAfter(RecordFilter filter, long start, int limit)
Among the filtered subset of records, returns a list of records, starting from a given index and defined by a maximum size.List<Record>
getRecordsBefore(long end, int limit)
Returns a list of consecutive records, ending at a given index (exclusive) and defined by a maximum size.List<Record>
getRecordsBefore(RecordFilter filter, long end, int limit)
Among the filtered subset of records, returns a list of records, ending at a given index (exclusive) and defined by a maximum size.QueryTree
getRecordsQueryTree()
Row
getRow(long id)
Returns a row by index.List<IndexedRow>
getRows(List<Long> rowIndices)
Returns a list of rows corresponding to the row indices supplied.List<IndexedRow>
getRowsAfter(long start, int limit)
Returns a list of rows, starting from a given index and defined by a maximum size.List<IndexedRow>
getRowsAfter(RowFilter filter, long start, int limit)
Among the subset of filtered rows, return a list of rows, starting from a given index and defined by a maximum size.List<IndexedRow>
getRowsBefore(long end, int limit)
Returns a list of consecutive rows, just before the given row index (not included) and up to a maximum size.List<IndexedRow>
getRowsBefore(RowFilter filter, long end, int limit)
Among the subset of filtered rows, return a list of rows, just before the row with a given index (excluded) and defined by a maximum size.QueryTree
getRowsQueryTree()
LocalRunner
getRunner()
protected PLL<IndexedRow>
indexedRows()
boolean
isCached()
Is this grid cached in memory?CloseableIterator<Record>
iterateRecords(RecordFilter filter)
Iterate over records matched by a filter.CloseableIterator<IndexedRow>
iterateRows(RowFilter filter)
Iterate over rows matched by a filter, in the order determined by a sorting configuration.<T> Grid
join(ChangeData<T> changeData, RecordChangeDataJoiner<T> recordJoiner, ColumnModel newColumnModel)
Joins pre-computed change data with the current grid data, record by record.<T> Grid
join(ChangeData<T> changeData, RowChangeDataFlatJoiner<T> rowJoiner, ColumnModel newColumnModel)
Joins pre-computed change data with the current grid data, with a joiner function that can return multiple rows for a given original row.<T> Grid
join(ChangeData<T> changeData, RowChangeDataJoiner<T> rowJoiner, ColumnModel newColumnModel)
Joins pre-computed change data with the current grid data, row by row.Grid
limitRows(long rowLimit)
Only keep the first rows.<T> ChangeData<T>
mapRecords(RecordFilter filter, RecordChangeDataProducer<T> recordMapper, Optional<ChangeData<T>> incompleteChangeData)
Extract change data by applying a function to each filtered record.Grid
mapRecords(RecordMapper mapper, ColumnModel newColumnModel)
Returns a new grid, where the records have been mapped by the mapper<T> ChangeData<T>
mapRows(RowFilter filter, RowChangeDataProducer<T> rowMapper, Optional<ChangeData<T>> incompleteChangeData)
Extract change data by applying a function to each filtered row.Grid
mapRows(RowMapper mapper, ColumnModel newColumnModel)
Returns a new grid, where the rows have been mapped by the mapper.<S extends Serializable>
GridmapRows(RowScanMapper<S> mapper, ColumnModel newColumnModel)
Returns a new grid where the rows have been mapped by the stateful mapper.long
recordCount()
protected PairPLL<Long,Record>
records()
Grid
removeRecords(RecordFilter filter)
Removes all records selected by a filterGrid
removeRows(RowFilter filter)
Removes all rows selected by a filterGrid
reorderRecords(SortingConfig sortingConfig, boolean permanent)
Returns a new grid where records have been reordered according to the configuration supplied.Grid
reorderRows(SortingConfig sortingConfig, boolean permanent)
Returns a new grid where rows have been reordered according to the configuration supplied.long
rowCount()
protected <T,U extends Serializable>
Grid.PartialAggregation<U>sample(PLL<T> source, long sampleSize, U initialValue, BiFunction<U,T,U> fold, BiFunction<U,U,U> combine)
Compute how many partitions and how many elements in those partitions we should process, if we want to get a sample of the given size.void
saveToFile(File file)
Saves the grid to a specified directory, following OpenRefine's format for grid storage.ProgressingFuture<Void>
saveToFileAsync(File file)
Saves the grid to a specified directory, in an asynchronous fashion.protected static String
serializeIndexedRow(IndexedRow indexedRow)
protected boolean
smallEnoughToCacheInMemory()
Heuristic which predicts how much RAM a grid will occupy in memory after caching.String
toString()
void
uncache()
Free up any memory used to cache this grid in memory.Grid
withColumnModel(ColumnModel newColumnModel)
Grid
withOverlayModels(Map<String,OverlayModel> newOverlayModels)
Returns a new grid where the overlay models have changed.
-
-
-
Field Detail
-
runner
protected final LocalRunner runner
-
grid
protected final PairPLL<Long,IndexedRow> grid
-
columnModel
protected final ColumnModel columnModel
-
overlayModels
protected final Map<String,OverlayModel> overlayModels
-
constructedFromRows
protected final boolean constructedFromRows
-
cachedRecordCount
protected long cachedRecordCount
-
-
Constructor Detail
-
LocalGrid
public LocalGrid(LocalRunner runner, ColumnModel columnModel, PairPLL<Long,IndexedRow> grid, Map<String,OverlayModel> overlayModels, long cachedRecordCount)
Constructs a grid, supplying all required fields.- Parameters:
runner
-columnModel
-grid
-overlayModels
-
-
LocalGrid
public LocalGrid(LocalRunner runner, PairPLL<Long,Row> grid, ColumnModel columnModel, Map<String,OverlayModel> overlayModels, long cachedRecordCount)
Convenience constructor to construct a grid from a PLL of a slightly different type.
-
LocalGrid
public LocalGrid(PairPLL<Long,Record> records, LocalRunner runner, ColumnModel columnModel, Map<String,OverlayModel> overlayModels, long rowCount)
Constructs a grid from a grid of records.- Parameters:
records
- the PLL of records for the current grid, which is assumed to have consistent indexingrunner
-columnModel
-overlayModels
-
-
-
Method Detail
-
indexedRows
protected PLL<IndexedRow> indexedRows()
-
getRunner
public LocalRunner getRunner()
-
getColumnModel
public ColumnModel getColumnModel()
- Specified by:
getColumnModel
in interfaceGrid
- Returns:
- the column metadata at this stage of the workflow
-
withColumnModel
public Grid withColumnModel(ColumnModel newColumnModel)
- Specified by:
withColumnModel
in interfaceGrid
- Parameters:
newColumnModel
- the column model to apply to the grid- Returns:
- a copy of this grid with a modified column model.
-
getRow
public Row getRow(long id)
Description copied from interface:Grid
Returns a row by index. Repeatedly calling this method to obtain multiple rows might be inefficient compared to fetching them by batch, depending on the implementation.- Specified by:
getRow
in interfaceGrid
- Parameters:
id
- the row index. This refers to the current position of the row in the grid, which corresponds toIndexedRow.getIndex()
.- Returns:
- the row at the given index
-
getRowsAfter
public List<IndexedRow> getRowsAfter(long start, int limit)
Description copied from interface:Grid
Returns a list of rows, starting from a given index and defined by a maximum size.- Specified by:
getRowsAfter
in interfaceGrid
- Parameters:
start
- the first row id to fetch (inclusive)limit
- the maximum number of rows to fetch- Returns:
- the list of rows with their ids (if any)
-
getRowsAfter
public List<IndexedRow> getRowsAfter(RowFilter filter, long start, int limit)
Description copied from interface:Grid
Among the subset of filtered rows, return a list of rows, starting from a given index and defined by a maximum size.- Specified by:
getRowsAfter
in interfaceGrid
- Parameters:
filter
- the subset of rows to paginate through. This object and its dependencies are required to be serializable.start
- the first row id to fetch (inclusive)limit
- the maximum number of rows to fetch- Returns:
- the list of rows with their ids (if any)
- See Also:
Grid.getRowsBefore(long, int)
-
getRowsBefore
public List<IndexedRow> getRowsBefore(long end, int limit)
Description copied from interface:Grid
Returns a list of consecutive rows, just before the given row index (not included) and up to a maximum size.- Specified by:
getRowsBefore
in interfaceGrid
- Parameters:
end
- the last row id to fetch (exclusive)limit
- the maximum number of rows to fetch- Returns:
- the list of rows with their ids (if any)
- See Also:
Grid.getRowsAfter(long, int)
-
getRowsBefore
public List<IndexedRow> getRowsBefore(RowFilter filter, long end, int limit)
Description copied from interface:Grid
Among the subset of filtered rows, return a list of rows, just before the row with a given index (excluded) and defined by a maximum size.- Specified by:
getRowsBefore
in interfaceGrid
- Parameters:
filter
- the subset of rows to paginate through. This object and its dependencies are required to be serializable.end
- the last row id to fetch (exclusive)limit
- the maximum number of rows to fetch- Returns:
- the list of rows with their ids (if any)
-
getRows
public List<IndexedRow> getRows(List<Long> rowIndices)
Description copied from interface:Grid
Returns a list of rows corresponding to the row indices supplied. By default, this callsGrid.getRow(long)
on all values, but implementations can override this to more efficient strategies if available.
-
iterateRows
public CloseableIterator<IndexedRow> iterateRows(RowFilter filter)
Description copied from interface:Grid
Iterate over rows matched by a filter, in the order determined by a sorting configuration. This might not require loading all rows in memory at once, but might be less efficient thanGrid.collectRows()
if all rows are to be stored in memory downstream.- Specified by:
iterateRows
in interfaceGrid
-
countMatchingRows
public long countMatchingRows(RowFilter filter)
Description copied from interface:Grid
Count the number of rows which match a given filter.- Specified by:
countMatchingRows
in interfaceGrid
- Parameters:
filter
- the row filter- Returns:
- the number of rows for which this filter returns true
-
countMatchingRowsApprox
public Grid.ApproxCount countMatchingRowsApprox(RowFilter filter, long limit)
Description copied from interface:Grid
Return the number of rows matching the given row filter, but by processing about at most a fixed number of row.- Specified by:
countMatchingRowsApprox
in interfaceGrid
- Parameters:
filter
- counts the number of records on which it returns truelimit
- maximum number of records to process
-
collectRows
public List<IndexedRow> collectRows()
Description copied from interface:Grid
Returns all rows in a list. This is inefficient for large datasets as it forces the entire grid to be loaded in memory.- Specified by:
collectRows
in interfaceGrid
-
getRecord
public Record getRecord(long id)
Description copied from interface:Grid
Returns a record obtained by its id. Repeatedly calling this method to obtain multiple records might be inefficient depending on the implementation.- Specified by:
getRecord
in interfaceGrid
- Parameters:
id
- the row id of the first row in the record. This refers to the current position of the record in the grid, which corresponds toRecord.getStartRowId()
.- Returns:
- the corresponding record
-
getRecordsAfter
public List<Record> getRecordsAfter(long start, int limit)
Description copied from interface:Grid
Returns a list of records, starting from a given index and defined by a maximum size.- Specified by:
getRecordsAfter
in interfaceGrid
- Parameters:
start
- the first record id to fetch (inclusive)limit
- the maximum number of records to fetch- Returns:
- the list of records (if any)
- See Also:
Grid.getRecordsBefore(long, int)
-
getRecordsAfter
public List<Record> getRecordsAfter(RecordFilter filter, long start, int limit)
Description copied from interface:Grid
Among the filtered subset of records, returns a list of records, starting from a given index and defined by a maximum size.- Specified by:
getRecordsAfter
in interfaceGrid
- Parameters:
filter
- the filter which defines the subset of records to paginate through This object and its dependencies are required to be serializable.start
- the first record id to fetch (inclusive)limit
- the maximum number of records to fetch- Returns:
- the list of records (if any)
-
getRecordsBefore
public List<Record> getRecordsBefore(long end, int limit)
Description copied from interface:Grid
Returns a list of consecutive records, ending at a given index (exclusive) and defined by a maximum size.- Specified by:
getRecordsBefore
in interfaceGrid
- Parameters:
end
- the last record id to fetch (exclusive)limit
- the maximum number of records to fetch- Returns:
- the list of records (if any)
- See Also:
Grid.getRecordsAfter(long, int)
-
getRecordsBefore
public List<Record> getRecordsBefore(RecordFilter filter, long end, int limit)
Description copied from interface:Grid
Among the filtered subset of records, returns a list of records, ending at a given index (exclusive) and defined by a maximum size.- Specified by:
getRecordsBefore
in interfaceGrid
- Parameters:
filter
- the filter which defines the subset of records to paginate through This object and its dependencies are required to be serializable.end
- the last record id to fetch (exclusive)limit
- the maximum number of records to fetch- Returns:
- the list of records (if any)
-
iterateRecords
public CloseableIterator<Record> iterateRecords(RecordFilter filter)
Description copied from interface:Grid
Iterate over records matched by a filter. This might not require loading all records in memory at once, but might be less efficient thanGrid.collectRecords()
if all records are to be stored in memory downstream.- Specified by:
iterateRecords
in interfaceGrid
-
countMatchingRecords
public long countMatchingRecords(RecordFilter filter)
Description copied from interface:Grid
Return the number of records which are filtered by this filter.- Specified by:
countMatchingRecords
in interfaceGrid
- Parameters:
filter
- the filter to evaluate- Returns:
- the number of records for which this filter evaluates to true
-
countMatchingRecordsApprox
public Grid.ApproxCount countMatchingRecordsApprox(RecordFilter filter, long limit)
Description copied from interface:Grid
Return the number of records matching the given record filter, but by processing about at most a fixed number of records.- Specified by:
countMatchingRecordsApprox
in interfaceGrid
- Parameters:
filter
- counts the number of records on which it returns truelimit
- maximum number of records to process
-
collectRecords
public List<Record> collectRecords()
Description copied from interface:Grid
Returns all records in a list. This is inefficient for large datasets as it forces all records to be loaded in memory.- Specified by:
collectRecords
in interfaceGrid
-
rowCount
public long rowCount()
-
recordCount
public long recordCount()
- Specified by:
recordCount
in interfaceGrid
- Returns:
- the number of records in the table
-
getOverlayModels
public Map<String,OverlayModel> getOverlayModels()
- Specified by:
getOverlayModels
in interfaceGrid
- Returns:
- the overlay models in this state
-
saveToFileAsync
public ProgressingFuture<Void> saveToFileAsync(File file)
Description copied from interface:Grid
Saves the grid to a specified directory, in an asynchronous fashion.- Specified by:
saveToFileAsync
in interfaceGrid
- Parameters:
file
- the directory where to save the grid- Returns:
- a future which completes once the save is complete
-
saveToFile
public void saveToFile(File file) throws IOException
Description copied from interface:Grid
Saves the grid to a specified directory, following OpenRefine's format for grid storage.- Specified by:
saveToFile
in interfaceGrid
- Parameters:
file
- the directory where to save the grid- Throws:
IOException
-
getMetadata
protected Grid.Metadata getMetadata()
-
serializeIndexedRow
protected static String serializeIndexedRow(IndexedRow indexedRow)
-
aggregateRows
public <T extends Serializable> T aggregateRows(RowAggregator<T> aggregator, T initialState)
Description copied from interface:Grid
Computes the result of a row aggregator on the grid.- Specified by:
aggregateRows
in interfaceGrid
-
aggregateRecords
public <T extends Serializable> T aggregateRecords(RecordAggregator<T> aggregator, T initialState)
Description copied from interface:Grid
Computes the result of a row aggregator on the grid.- Specified by:
aggregateRecords
in interfaceGrid
-
aggregateRowsApprox
public <T extends Serializable> Grid.PartialAggregation<T> aggregateRowsApprox(RowAggregator<T> aggregator, T initialState, long maxRows)
Description copied from interface:Grid
Computes the result of a row aggregator on the grid, reading about at most a fixed number of rows. The rows read should be deterministic for a given implementation.- Specified by:
aggregateRowsApprox
in interfaceGrid
-
aggregateRecordsApprox
public <T extends Serializable> Grid.PartialAggregation<T> aggregateRecordsApprox(RecordAggregator<T> aggregator, T initialState, long maxRecords)
Description copied from interface:Grid
Computes the result of a row aggregator on the grid, reading about at most a fixed number of records. The records read should be deterministic for a given implementation.- Specified by:
aggregateRecordsApprox
in interfaceGrid
-
withOverlayModels
public Grid withOverlayModels(Map<String,OverlayModel> newOverlayModels)
Description copied from interface:Grid
Returns a new grid where the overlay models have changed.- Specified by:
withOverlayModels
in interfaceGrid
- Parameters:
newOverlayModels
- the new overlay models to apply to the grid- Returns:
- the changed grid
-
mapRows
public Grid mapRows(RowMapper mapper, ColumnModel newColumnModel)
Description copied from interface:Grid
Returns a new grid, where the rows have been mapped by the mapper.
-
flatMapRows
public Grid flatMapRows(RowFlatMapper mapper, ColumnModel newColumnModel)
Description copied from interface:Grid
Returns a new grid, where the rows have been mapped by the flat mapper.- Specified by:
flatMapRows
in interfaceGrid
- Parameters:
mapper
- the function used to transform rows This object and its dependencies are required to be serializable.newColumnModel
- the column model of the resulting grid- Returns:
- the resulting grid
-
mapRows
public <S extends Serializable> Grid mapRows(RowScanMapper<S> mapper, ColumnModel newColumnModel)
Description copied from interface:Grid
Returns a new grid where the rows have been mapped by the stateful mapper. This can be significantly less efficient than a stateless mapper, so only use this if you really need to rely on state.
-
mapRecords
public Grid mapRecords(RecordMapper mapper, ColumnModel newColumnModel)
Description copied from interface:Grid
Returns a new grid, where the records have been mapped by the mapper- Specified by:
mapRecords
in interfaceGrid
- Parameters:
mapper
- the function used to transform records This object and its dependencies are required to be serializable.newColumnModel
- the column model of the resulting grid- Returns:
- the resulting grid
-
reorderRows
public Grid reorderRows(SortingConfig sortingConfig, boolean permanent)
Description copied from interface:Grid
Returns a new grid where rows have been reordered according to the configuration supplied.- Specified by:
reorderRows
in interfaceGrid
- Parameters:
sortingConfig
- the criteria to sort rowspermanent
- if true, forget the original row ids. If false, store them in the correspondingIndexedRow.getOriginalIndex()
.- Returns:
- the resulting grid
-
reorderRecords
public Grid reorderRecords(SortingConfig sortingConfig, boolean permanent)
Description copied from interface:Grid
Returns a new grid where records have been reordered according to the configuration supplied.- Specified by:
reorderRecords
in interfaceGrid
- Parameters:
sortingConfig
- the criteria to sort recordspermanent
- if true, forget the original record ids. If false, store them in the correspondingRecord.getOriginalStartRowId()
.- Returns:
- the resulting grid
-
removeRows
public Grid removeRows(RowFilter filter)
Description copied from interface:Grid
Removes all rows selected by a filter- Specified by:
removeRows
in interfaceGrid
- Parameters:
filter
- which returns true when we should delete the row- Returns:
- the grid where the matching rows have been removed
-
removeRecords
public Grid removeRecords(RecordFilter filter)
Description copied from interface:Grid
Removes all records selected by a filter- Specified by:
removeRecords
in interfaceGrid
- Parameters:
filter
- which returns true when we should delete the record- Returns:
- the grid where the matching record have been removed
-
mapRows
public <T> ChangeData<T> mapRows(RowFilter filter, RowChangeDataProducer<T> rowMapper, Optional<ChangeData<T>> incompleteChangeData)
Description copied from interface:Grid
Extract change data by applying a function to each filtered row. The calls to the change data producer are batched if requested by the producer.- Specified by:
mapRows
in interfaceGrid
- Type Parameters:
T
- the type of change data that is serialized to disk for each row- Parameters:
filter
- a filter to select which rows to maprowMapper
- produces the change data for each rowincompleteChangeData
- a previously, incompletely fetched version of the same change data, from which the computation should be resumed, to avoid recomputing the items already in the incomplete change data
-
applyRowChangeDataMapper
protected static <T> CloseableIterator<Tuple2<Long,IndexedData<T>>> applyRowChangeDataMapper(RowChangeDataProducer<T> rowMapper, List<IndexedRow> rowBatch)
-
applyRowChangeDataMapperWithIncompleteData
protected static <T> CloseableIterator<Tuple2<Long,IndexedData<T>>> applyRowChangeDataMapperWithIncompleteData(RowChangeDataProducer<T> rowMapper, List<Tuple2<Long,Tuple2<IndexedRow,IndexedData<T>>>> rowBatch)
-
mapRecords
public <T> ChangeData<T> mapRecords(RecordFilter filter, RecordChangeDataProducer<T> recordMapper, Optional<ChangeData<T>> incompleteChangeData)
Description copied from interface:Grid
Extract change data by applying a function to each filtered record. The calls to the change data producer are batched if requested by the producer.- Specified by:
mapRecords
in interfaceGrid
- Type Parameters:
T
- the type of change data that is serialized to disk for each row- Parameters:
filter
- a filter to select which rows to maprecordMapper
- produces the change data for each recordincompleteChangeData
- a previously, incompletely fetched version of the same change data, from which the computation should be resumed, to avoid recomputing the items already in the incomplete change data
-
limitRows
public Grid limitRows(long rowLimit)
Only keep the first rows. Overridden for efficiency when a partitioner is known.
-
dropRows
public Grid dropRows(long rowsToDrop)
Drop the first rows. Overridden for efficiency when partition sizes are known.
-
applyRecordChangeDataMapper
protected static <T> CloseableIterator<Tuple2<Long,IndexedData<T>>> applyRecordChangeDataMapper(RecordChangeDataProducer<T> recordMapper, List<Tuple2<Long,Record>> recordBatch)
-
applyRecordChangeDataMapperWithIncompleteData
protected static <T> CloseableIterator<Tuple2<Long,IndexedData<T>>> applyRecordChangeDataMapperWithIncompleteData(RecordChangeDataProducer<T> recordMapper, List<Tuple2<Long,Tuple2<Record,IndexedData<T>>>> recordBatch)
-
join
public <T> Grid join(ChangeData<T> changeData, RowChangeDataJoiner<T> rowJoiner, ColumnModel newColumnModel)
Description copied from interface:Grid
Joins pre-computed change data with the current grid data, row by row.- Specified by:
join
in interfaceGrid
- Type Parameters:
T
- the type of change data that was serialized to disk for each row- Parameters:
changeData
- the serialized change datarowJoiner
- produces the new row by joining the old row with change datanewColumnModel
- the column model to apply to the new grid
-
join
public <T> Grid join(ChangeData<T> changeData, RowChangeDataFlatJoiner<T> rowJoiner, ColumnModel newColumnModel)
Description copied from interface:Grid
Joins pre-computed change data with the current grid data, with a joiner function that can return multiple rows for a given original row.- Specified by:
join
in interfaceGrid
- Type Parameters:
T
- the type of change data that was serialized to disk for each row- Parameters:
changeData
- the serialized change datarowJoiner
- produces the new row by joining the old row with change datanewColumnModel
- the column model to apply to the new grid
-
join
public <T> Grid join(ChangeData<T> changeData, RecordChangeDataJoiner<T> recordJoiner, ColumnModel newColumnModel)
Description copied from interface:Grid
Joins pre-computed change data with the current grid data, record by record.- Specified by:
join
in interfaceGrid
- Type Parameters:
T
- the type of change data that was serialized to disk for each record- Parameters:
changeData
- the serialized change datarecordJoiner
- produces the new list of rows by joining the old record with change datanewColumnModel
- the column model to apply to the new grid
-
concatenate
public Grid concatenate(Grid other)
Description copied from interface:Grid
Creates a new grid containing all rows in this grid, followed by all rows in the other grid supplied. The overlay models of this grid have priority over the others.The two grids are required to have the same number of columns.
- Specified by:
concatenate
in interfaceGrid
- Parameters:
other
- the grid to concatenate to this one- Returns:
- a new grid, union of the two
-
concatenate
public Grid concatenate(List<Grid> otherGrids)
Description copied from interface:Grid
Concatenates this with other grids, in the given order. This is a variant ofGrid.concatenate(Grid)
which implementations can override to make more efficient than making repeated calls toGrid.concatenate(Grid)
(which is the default implementation).- Specified by:
concatenate
in interfaceGrid
- Parameters:
otherGrids
- the list of other grids to concatenate with this one.- Returns:
- a new grid, union of all those grids
-
isCached
public boolean isCached()
Description copied from interface:Grid
Is this grid cached in memory? If not, its contents are stored on disk.
-
uncache
public void uncache()
Description copied from interface:Grid
Free up any memory used to cache this grid in memory.
-
cacheAsync
public ProgressingFuture<Boolean> cacheAsync()
Description copied from interface:Grid
Attempt to cache this grid in memory, in an async way.- Specified by:
cacheAsync
in interfaceGrid
- Returns:
- a future to keep track of the status of the caching process. The future returns whether the caching succeeded.
-
cache
public boolean cache()
Description copied from interface:Grid
Attempt to cache this grid in memory. If the grid is too big, this can fail.
-
smallEnoughToCacheInMemory
protected boolean smallEnoughToCacheInMemory()
Heuristic which predicts how much RAM a grid will occupy in memory after caching.- Returns:
- whether there is enough free memory to cache this one.
-
sample
protected <T,U extends Serializable> Grid.PartialAggregation<U> sample(PLL<T> source, long sampleSize, U initialValue, BiFunction<U,T,U> fold, BiFunction<U,U,U> combine)
Compute how many partitions and how many elements in those partitions we should process, if we want to get a sample of the given size. This will generally result in using all partitions and dividing the sample size by that number of partitions, but if there are many partitions we might not want to use them all (for instance, if there are more partitions than the desired sample size).- Parameters:
sampleSize
- the number of elements we want to process in total- Returns:
- a tuple, where the first component is the number of partitions and the second one is the number of elements in each of those partitions.
-
getRowsQueryTree
public QueryTree getRowsQueryTree()
-
getRecordsQueryTree
public QueryTree getRecordsQueryTree()
-
-