Class HistogramState

  • All Implemented Interfaces:
    Serializable

    public class HistogramState
    extends Object
    implements Serializable
    Stores a histogram of numerical values, used in range facets. The histogram is computed using a uniform subdivision of the interval in which the distribution spreads and the bin size is required to be an integer power of the bin base (10 by default).
    Author:
    Antonin Delpeuch
    See Also:
    Serialized Form
    • Constructor Detail

      • HistogramState

        public HistogramState​(long numericCount,
                              long nonNumericCount,
                              long errorCount,
                              long blankCount,
                              int logBinSize,
                              long minBin,
                              long[] bins)
        Creates a state where multiple distinct numeric values are stored.
        Parameters:
        numericCount - number of numeric values
        nonNumericCount - number of non numeric values (text, dates…)
        errorCount - number of error values
        blankCount - number of blank values
        logBinSize - size of a bin, as an exponent of the base (10)
        minBin - lowest boundary of the lowest bin, divided by the bin size
        bins - array of bins, each of which contains the number of values in that bin
      • HistogramState

        public HistogramState​(long numericCount,
                              long nonNumericCount,
                              long errorCount,
                              long blankCount,
                              double singleValue)
    • Method Detail

      • getNumericCount

        public long getNumericCount()
      • getNonNumericCount

        public long getNonNumericCount()
      • getErrorCount

        public long getErrorCount()
      • getBlankCount

        public long getBlankCount()
      • getLogBinSize

        public int getLogBinSize()
      • getBinSize

        public double getBinSize()
      • getMinBin

        public long getMinBin()
        The start of the first bin, represented as an integer. To get the actual value (as a double), multiply by the bin size.
      • getMaxBin

        public long getMaxBin()
        The end of the last bin, represented as an integer. To get the actual value (as a double), multiply by the bin size.
      • getBins

        public long[] getBins()
      • getSingleValue

        public double getSingleValue()
      • rescale

        public HistogramState rescale​(int newLogBinSize)
        Given a larger bin size (therefore generating coarser bins), return a new version of this facet state by merging the neighbouring bins together to obtain the desired bin size.
        Parameters:
        newLogBinSize - the new power of 10 to use as a bin size
        Returns:
      • addCounts

        public HistogramState addCounts​(long nonNumericCount,
                                        long errorCount,
                                        long blankCount)
        Add counts outside the numeric range (non numeric, errors and blanks).
        Parameters:
        nonNumericCount -
        errorCount -
        blankCount -
        Returns:
        a new histogram state
      • extend

        public HistogramState extend​(long newMinBin,
                                     long newMaxBin)
        Extend the bins to new bounds, filling the newly-created bins with zeroes.
        Parameters:
        newMinBin -
        newMaxBin -
        Returns:
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object