module Nuggets::Array::HistogramMixin

Constants

FORMATS

Provides some default formats for formatted_histogram.

Example:

(default)         ab  [==]  2
(percent)         xyz [===] 3 (37.50%)
(numeric)          42 [==]  2
(numeric_percent) 123 [=]   1 (12.50%)

The “numeric” variants format the item as a (decimal) number.

HistogramItem

Encapsulates a histogram item and provides the following attributes (see also annotated_histogram):

item

The original item

freq

The item's frequency in the collection

percentage

The percentage of the item's frequency in the collection

max_freq

The maximum frequency in the collection

max_freq_length

The maximum frequency's “width”

max_item_length

The maximum item length in the collection

Public Instance Methods

annotated_histogram → anArray click to toggle source
annotated_histogram { |hist_item| ... } → aHash

Calculates the histogram for array and yields each histogram item (see HistogramItem) to the block or returns an Array of the histogram items.

    # File lib/nuggets/array/histogram_mixin.rb
 96 def annotated_histogram
 97   hist, items = histogram, []
 98 
 99   percentage = size / 100.0
100 
101   max_freq = hist.values.max
102   max_freq_length = max_freq.to_s.length
103 
104   max_item_length = hist.keys.map { |item| item.to_s.length }.max
105 
106   # try to sort the histogram hash
107   begin
108     hist = hist.sort
109   rescue ::ArgumentError
110   end
111 
112   hist.each { |item, freq|
113     hist_item = HistogramItem.new(
114       item, freq, max_freq, max_freq_length, max_item_length, freq / percentage
115     )
116 
117     block_given? ? yield(hist_item) : items << hist_item
118   }
119 
120   block_given? ? hist : items
121 end
formatted_histogram([format[, indicator]]) → aString click to toggle source

Returns the histogram of array as a formatted String according to format, using indicator to draw the frequency bar.

format may be a Symbol indicating one of the provided default formats (see FORMATS) or a format String (see Kernel#sprintf) that will receive the following arguments (in order):

  1. max_item_length (Integer)

  2. item (String)

  3. “frequency_bar” (String)

  4. “padding” (String)

  5. max_freq_length (Integer)

  6. freq (Integer)

  7. percentage (Float, optional)

See HistogramItem for further details on the individual arguments.

    # File lib/nuggets/array/histogram_mixin.rb
142 def formatted_histogram(format = :default, indicator = '=')
143   format = FORMATS[format] if FORMATS.key?(format)
144   raise ::TypeError, "String expected, got #{format.class}" unless format.is_a?(::String)
145 
146   include_percentage = format.include?('%%')
147   indicator_length   = indicator.length
148 
149   lines = []
150 
151   annotated_histogram { |hist|
152     arguments = [
153       hist.max_item_length, hist.item,                     # item (padded)
154       indicator * hist.freq,                               # indicator bar
155       (hist.max_freq - hist.freq) * indicator_length, '',  # indicator padding
156       hist.max_freq_length, hist.freq                      # frequency (padded)
157     ]
158 
159     arguments << hist.percentage if include_percentage     # percentage (optional)
160 
161     lines << format % arguments
162   }
163 
164   lines.join("\n")
165 end
histogram → aHash click to toggle source
histogram { |x| ... } → aHash

Calculates the frequency histogram of the values in array. Returns a Hash that maps any value, or the result of the value yielded to the block, to its frequency.

   # File lib/nuggets/array/histogram_mixin.rb
68 def histogram
69   hist = ::Hash.new(0)
70   each { |x| hist[block_given? ? yield(x) : x] += 1 }
71   hist
72 end
pmf(&block)
probability_mass_function → aHash click to toggle source
probability_mass_function { |x| ... } → aHash

Calculates the probability mass function (normalized histogram) of the values in array. Returns a Hash that maps any value, or the result of the value yielded to the block, to its probability (via histogram).

   # File lib/nuggets/array/histogram_mixin.rb
82 def probability_mass_function(&block)
83   hist, n = histogram(&block), size.to_f
84   hist.each { |k, v| hist[k] = v / n }
85 end
Also aliased as: pmf