module Nuggets::Array::HistogramMixin

Constants

FORMATS

Provides some default formats for formatted_histogram.

Example:

(default)         ab  [==]  2
(percent)         xyz [===] 3 (37.50%)
(numeric)          42 [==]  2
(numeric_percent) 123 [=]   1 (12.50%)

The “numeric” variants format the item as a (decimal) number.

HistogramItem

Encapsulates a histogram item and provides the following attributes (see also annotated_histogram):

item

The original item

freq

The item's frequency in the collection

percentage

The percentage of the item's frequency in the collection

max_freq

The maximum frequency in the collection

max_freq_length

The maximum frequency's “width”

max_item_length

The maximum item length in the collection

Public Instance Methods

annotated_histogram → anArray click to toggle source
annotated_histogram { |hist_item| ... } → aHash

Calculates the histogram for array and yields each histogram item (see HistogramItem) to the block or returns an Array of the histogram items.

# File lib/nuggets/array/histogram_mixin.rb, line 96
def annotated_histogram
  hist, items = histogram, []

  percentage = size / 100.0

  max_freq = hist.values.max
  max_freq_length = max_freq.to_s.length

  max_item_length = hist.keys.map { |item| item.to_s.length }.max

  # try to sort the histogram hash
  begin
    hist = hist.sort
  rescue ::ArgumentError
  end

  hist.each { |item, freq|
    hist_item = HistogramItem.new(
      item, freq, max_freq, max_freq_length, max_item_length, freq / percentage
    )

    block_given? ? yield(hist_item) : items << hist_item
  }

  block_given? ? hist : items
end
formatted_histogram([format[, indicator]]) → aString click to toggle source

Returns the histogram of array as a formatted String according to format, using indicator to draw the frequency bar.

format may be a Symbol indicating one of the provided default formats (see FORMATS) or a format String (see Kernel#sprintf) that will receive the following arguments (in order):

  1. max_item_length (Integer)

  2. item (String)

  3. “frequency_bar” (String)

  4. “padding” (String)

  5. max_freq_length (Integer)

  6. freq (Integer)

  7. percentage (Float, optional)

See HistogramItem for further details on the individual arguments.

# File lib/nuggets/array/histogram_mixin.rb, line 142
def formatted_histogram(format = :default, indicator = '=')
  format = FORMATS[format] if FORMATS.key?(format)
  raise ::TypeError, "String expected, got #{format.class}" unless format.is_a?(::String)

  include_percentage = format.include?('%%')
  indicator_length   = indicator.length

  lines = []

  annotated_histogram { |hist|
    arguments = [
      hist.max_item_length, hist.item,                     # item (padded)
      indicator * hist.freq,                               # indicator bar
      (hist.max_freq - hist.freq) * indicator_length, '',  # indicator padding
      hist.max_freq_length, hist.freq                      # frequency (padded)
    ]

    arguments << hist.percentage if include_percentage     # percentage (optional)

    lines << format % arguments
  }

  lines.join("\n")
end
histogram → aHash click to toggle source
histogram { |x| ... } → aHash

Calculates the frequency histogram of the values in array. Returns a Hash that maps any value, or the result of the value yielded to the block, to its frequency.

# File lib/nuggets/array/histogram_mixin.rb, line 68
def histogram
  hist = ::Hash.new(0)
  each { |x| hist[block_given? ? yield(x) : x] += 1 }
  hist
end
pmf(&block)
probability_mass_function → aHash click to toggle source
probability_mass_function { |x| ... } → aHash

Calculates the probability mass function (normalized histogram) of the values in array. Returns a Hash that maps any value, or the result of the value yielded to the block, to its probability (via histogram).

# File lib/nuggets/array/histogram_mixin.rb, line 82
def probability_mass_function(&block)
  hist, n = histogram(&block), size.to_f
  hist.each { |k, v| hist[k] = v / n }
end
Also aliased as: pmf