Class TextFilePLL


  • public class TextFilePLL
    extends PLL<String>
    A PLL whose contents are read from a set of text files. The text files are partitioned using a method similar to that of Hadoop, using new lines as boundaries. This class aims at producing a certain number of partitions determined by the default parallelism of the PLL context.
    Author:
    Antonin Delpeuch
    • Constructor Detail

      • TextFilePLL

        public TextFilePLL​(PLLContext context,
                           String path,
                           Charset encoding,
                           boolean ignoreEarlyEOF)
                    throws IOException
        Constructs a PLL out of a text file.
        Parameters:
        context - the associated context, whose thread pool will be used
        path - the path to the file or directory whose contents should be read
        encoding - the encoding in which the files should be read
        ignoreEarlyEOF - whether to ignore early ends of files, due to an interrupted write
        Throws:
        IOException
    • Method Detail

      • compute

        protected CloseableIterator<String> compute​(Partition partition)
        Description copied from class: PLL
        Iterate over the elements of the given partition. This is the method that should be implemented by subclasses. As this method forces computation, ignoring any caching, consumers should not call it directly but rather use PLL.iterate(Partition). Once the iterator is not needed anymore, it should be closed. This makes it possible to release the underlying resources supporting it, such as open files or sockets.
        Specified by:
        compute in class PLL<String>
        Parameters:
        partition - the partition to iterate over
        Returns:
      • getPartitions

        public io.vavr.collection.Array<? extends Partition> getPartitions()
        Specified by:
        getPartitions in class PLL<String>
        Returns:
        the partitions in this list
      • getParents

        public List<PLL<?>> getParents()
        Description copied from class: PLL
        Returns the PLLs that this PLL depends on, to compute its contents. This is used for debugging purposes, to display the tree of dependencies of a given PLL.
        Specified by:
        getParents in class PLL<String>
        See Also:
        PLL.getQueryTree()