Package org.openrefine.model.changes
Interface RowChangeDataProducer<T>
-
- Type Parameters:
T
-
- All Superinterfaces:
Serializable
- All Known Implementing Classes:
ColumnAdditionByFetchingURLsOperation.URLFetchingChangeProducer
,PerformWikibaseEditsOperation.RowEditingResultsProducer
,ReconOperation.ReconChangeDataProducer
,RowInRecordChangeDataProducer
public interface RowChangeDataProducer<T> extends Serializable
A function which computes change data to be persisted to disk, to be later joined back to the project to produce the new grid. This data might be serialized because it is volatile or expensive to compute.The calls to the external resource can be batched by overriding
getBatchSize()
to specify the size of batches andcallRowBatch(List)
for the batch processing itself. In that case, thecall(long, Row)
method's implementation can be omitted (by throwing NotImplementedException for instance).It is also possible to limit the number of concurrent calls to the producer (for instance for rate-limited resources) by overriding
getMaxConcurrency()
.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description T
call(long rowId, Row row)
Compute the change data on a given row.default List<T>
callRowBatch(List<IndexedRow> rows)
Compute the change data on a batch of consecutive rows.default int
getBatchSize()
The size of batches this producer would like to be called on.default int
getMaxConcurrency()
The maximum number of concurrent calls to this change data producer.
-
-
-
Method Detail
-
callRowBatch
default List<T> callRowBatch(List<IndexedRow> rows)
Compute the change data on a batch of consecutive rows. This defaults to individual calls if the method is not overridden.- Parameters:
rows
- the list of rows to fetch change data on- Returns:
- a list of the same size
-
getBatchSize
default int getBatchSize()
The size of batches this producer would like to be called on. Smaller batches can be submitted (for instance at the end of a partition). Defaults to 1.
-
getMaxConcurrency
default int getMaxConcurrency()
The maximum number of concurrent calls to this change data producer. If 0, there is no limit to the concurrency.
-
-