module Nuggets::Array::HistogramMixin



Provides some default formats for formatted_histogram.


(default)         ab  [==]  2
(percent)         xyz [===] 3 (37.50%)
(numeric)          42 [==]  2
(numeric_percent) 123 [=]   1 (12.50%)

The “numeric” variants format the item as a (decimal) number.


Encapsulates a histogram item and provides the following attributes (see also annotated_histogram):


The original item


The item's frequency in the collection


The percentage of the item's frequency in the collection


The maximum frequency in the collection


The maximum frequency's “width”


The maximum item length in the collection

Public Instance Methods

annotated_histogram → anArray click to toggle source
annotated_histogram { |hist_item| ... } → aHash

Calculates the histogram for array and yields each histogram item (see HistogramItem) to the block or returns an Array of the histogram items.

# File lib/nuggets/array/histogram_mixin.rb, line 96
def annotated_histogram
  hist, items = histogram, []

  percentage = size / 100.0

  max_freq = hist.values.max
  max_freq_length = max_freq.to_s.length

  max_item_length = { |item| item.to_s.length }.max

  # try to sort the histogram hash
    hist = hist.sort
  rescue ::ArgumentError

  hist.each { |item, freq|
    hist_item =
      item, freq, max_freq, max_freq_length, max_item_length, freq / percentage

    block_given? ? yield(hist_item) : items << hist_item

  block_given? ? hist : items
formatted_histogram([format[, indicator]]) → aString click to toggle source

Returns the histogram of array as a formatted String according to format, using indicator to draw the frequency bar.

format may be a Symbol indicating one of the provided default formats (see FORMATS) or a format String (see Kernel#sprintf) that will receive the following arguments (in order):

  1. max_item_length (Integer)

  2. item (String)

  3. “frequency_bar” (String)

  4. “padding” (String)

  5. max_freq_length (Integer)

  6. freq (Integer)

  7. percentage (Float, optional)

See HistogramItem for further details on the individual arguments.

# File lib/nuggets/array/histogram_mixin.rb, line 142
def formatted_histogram(format = :default, indicator = '=')
  format = FORMATS[format] if FORMATS.key?(format)
  raise ::TypeError, "String expected, got #{format.class}" unless format.is_a?(::String)

  include_percentage = format.include?('%%')
  indicator_length   = indicator.length

  lines = []

  annotated_histogram { |hist|
    arguments = [
      hist.max_item_length, hist.item,                     # item (padded)
      indicator * hist.freq,                               # indicator bar
      (hist.max_freq - hist.freq) * indicator_length, '',  # indicator padding
      hist.max_freq_length, hist.freq                      # frequency (padded)

    arguments << hist.percentage if include_percentage     # percentage (optional)

    lines << format % arguments

histogram → aHash click to toggle source
histogram { |x| ... } → aHash

Calculates the frequency histogram of the values in array. Returns a Hash that maps any value, or the result of the value yielded to the block, to its frequency.

# File lib/nuggets/array/histogram_mixin.rb, line 68
def histogram
  hist =
  each { |x| hist[block_given? ? yield(x) : x] += 1 }
probability_mass_function → aHash click to toggle source
probability_mass_function { |x| ... } → aHash

Calculates the probability mass function (normalized histogram) of the values in array. Returns a Hash that maps any value, or the result of the value yielded to the block, to its probability (via histogram).

# File lib/nuggets/array/histogram_mixin.rb, line 82
def probability_mass_function(&block)
  hist, n = histogram(&block), size.to_f
  hist.each { |k, v| hist[k] = v / n }
Also aliased as: pmf