Package org.openrefine.runners.local
Class LocalChangeData<T>
- java.lang.Object
-
- org.openrefine.runners.local.LocalChangeData<T>
-
- All Implemented Interfaces:
ChangeData<T>
,CloseableIterable<IndexedData<T>>
public class LocalChangeData<T> extends Object implements ChangeData<T>
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.openrefine.util.CloseableIterable
CloseableIterable.Wrapper<T>
-
-
Constructor Summary
Constructors Constructor Description LocalChangeData(LocalRunner runner, PairPLL<Long,IndexedData<T>> grid, io.vavr.collection.Array<Long> parentPartitionSizes, Callable<Boolean> complete, int maxConcurrency)
Constructs a change data.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description IndexedData<T>
get(long rowId)
Returns the change data at a given row.PairPLL<Long,IndexedData<T>>
getPLL()
Runner
getRunner()
The runner which was used to create this change data.boolean
isComplete()
Whether the entire change data is available to be iterated on statically, without performing any new computation or fetching.CloseableIterator<IndexedData<T>>
iterator()
void
saveToFile(File file, ChangeDataSerializer<T> serializer)
Saves the change data to a specified directory, following OpenRefine's format for change data.ProgressingFuture<Void>
saveToFileAsync(File file, ChangeDataSerializer<T> serializer)
Saves the change data to a specified directory, following OpenRefine's format for change data, in an asynchronous way.protected static <T> CloseableIterator<Tuple2<Long,T>>
wrapStreamWithProgressReporting(long startIdx, CloseableIterator<Tuple2<Long,T>> iterator, TaskSignalling taskSignalling)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.openrefine.util.CloseableIterable
drop, filter, map, take
-
-
-
-
Constructor Detail
-
LocalChangeData
public LocalChangeData(LocalRunner runner, PairPLL<Long,IndexedData<T>> grid, io.vavr.collection.Array<Long> parentPartitionSizes, Callable<Boolean> complete, int maxConcurrency)
Constructs a change data.- Parameters:
grid
- expected not to contain any null value (they should be filtered out first)parentPartitionSizes
- the size of each partition in the grid this change data was generated from (can be null if not available). This is used to compute progress as a percentage of the original grid swept through. This is more efficient than counting the number of elements in each partition of the change data.maxConcurrency
- the maximum number of concurrent calls to the underlying resource (fetcher). This is respected when saving the change data to a file.
-
-
Method Detail
-
iterator
public CloseableIterator<IndexedData<T>> iterator()
- Specified by:
iterator
in interfaceCloseableIterable<T>
-
get
public IndexedData<T> get(long rowId)
Description copied from interface:ChangeData
Returns the change data at a given row. The data encapsulated in thisIndexedData
may be null, but not the return value of this function itself.- Specified by:
get
in interfaceChangeData<T>
- Parameters:
rowId
- the 0-based row index
-
getRunner
public Runner getRunner()
Description copied from interface:ChangeData
The runner which was used to create this change data.- Specified by:
getRunner
in interfaceChangeData<T>
-
saveToFileAsync
public ProgressingFuture<Void> saveToFileAsync(File file, ChangeDataSerializer<T> serializer)
Description copied from interface:ChangeData
Saves the change data to a specified directory, following OpenRefine's format for change data, in an asynchronous way.- Specified by:
saveToFileAsync
in interfaceChangeData<T>
- Parameters:
file
- the directory where to save the gridserializer
- the serializer used to convert the items to strings- Returns:
- a future which completes once the save is complete
-
saveToFile
public void saveToFile(File file, ChangeDataSerializer<T> serializer) throws IOException, InterruptedException
Description copied from interface:ChangeData
Saves the change data to a specified directory, following OpenRefine's format for change data.- Specified by:
saveToFile
in interfaceChangeData<T>
- Parameters:
file
- the directory where to save the gridserializer
- the serializer used to convert the items to strings- Throws:
IOException
InterruptedException
-
isComplete
public boolean isComplete()
Description copied from interface:ChangeData
Whether the entire change data is available to be iterated on statically, without performing any new computation or fetching. This happens when this ChangeData object is loaded back from disk and a suitable completion marker was found.- Specified by:
isComplete
in interfaceChangeData<T>
-
getPLL
public PairPLL<Long,IndexedData<T>> getPLL()
-
wrapStreamWithProgressReporting
protected static <T> CloseableIterator<Tuple2<Long,T>> wrapStreamWithProgressReporting(long startIdx, CloseableIterator<Tuple2<Long,T>> iterator, TaskSignalling taskSignalling)
-
-