137

I have the following Array = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

How do I produce a count for each identical element?

Where:
"Jason" = 2, "Judah" = 3, "Allison" = 1, "Teresa" = 1, "Michelle" = 1?

or produce a hash Where:

Where: hash = { "Jason" => 2, "Judah" => 3, "Allison" => 1, "Teresa" => 1, "Michelle" => 1 }

1
  • 10
    As of Ruby 2.7 you can use Enumerable#tally. More info here. Commented Jun 7, 2019 at 11:37

15 Answers 15

204

Ruby v2.7+ (latest)

As of ruby v2.7.0 (released December 2019), the core language now includes Enumerable#tally - a new method, designed specifically for this problem:

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

names.tally
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

Ruby v2.4+ (EOL ruby version, but still valid code on newer versions)

The following code was not possible in standard ruby when this question was first asked (February 2011), as it uses:

These modern additions to Ruby enable the following implementation:

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

names.group_by(&:itself).transform_values(&:count)
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

Ruby v2.2+ (EOL ruby version, but still valid code on newer versions)

For even older ruby versions, without access to the above mentioned Hash#transform_values method, you could instead use Array#to_h, which was added to Ruby v2.1.0 (released December 2013):

names.group_by(&:itself).map { |k,v| [k, v.length] }.to_h
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

For even older ruby versions (<= 2.1), there are several ways to solve this, but (in my opinion) there is no clear-cut "best" way. See the other answers to this post.

Sign up to request clarification or add additional context in comments.

4 Comments

I was about to post :P. Is there any discernible difference between using count instead of size/length?
@SagarPandya No, there is no difference. Unlike Array#size and Array#length, Array#count can take an optional argument or block; but if used with neither then its implementation is identical. More specifically, all three methods call LONG2NUM(RARRAY_LEN(ary)) under the hood: count / length
Extra credit! Sort by count .group_by(&:itself).transform_values(&:count).sort_by{|k, v| v}.reverse
@Abram you can sort_by{ |k, v| -v}, no reverse needed! ;-)
133
names.inject(Hash.new(0)) { |total, e| total[e] += 1 ;total}

gives you

{"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1} 

4 Comments

+1 Like the selected answer, but I prefer the use of inject and no "external" variable.
If you use each_with_object instead of inject you don't have to return (;total) at the block.
For posterity, this is what @mfilej means: array.each_with_object(Hash.new(0)){|string, hash| hash[string] += 1}
From Ruby 2.7, you can simply do: names.tally.
95
names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
counts = Hash.new(0)
names.each { |name| counts[name] += 1 }
# => {"Jason" => 2, "Teresa" => 1, ....

Comments

28

Now using Ruby 2.2.0 you can leverage the itself method.

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
counts = {}
names.group_by(&:itself).each { |k,v| counts[k] = v.length }
# counts > {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

3 Comments

Agree, but I slightly prefer names.group_by(&:itself).map{|k,v| [k, v.count]}.to_h so that you don't have to ever declare a hash object
@andrewkday Taking this one step further, ruby v2.4 added the method: Hash#transform_values which allows us to simplify your code even more: names.group_by(&:itself).transform_values(&:count)
Also, this is a very subtle point (which is likely no longer relevant to future readers!), but note that your code also uses Array#to_h - which was added to Ruby v2.1.0 (released December 2013 - i.e. almost 3 years after the original question was asked!)
26

Ruby 2.7+

Ruby 2.7 is introducing Enumerable#tally for this exact purpose. There's a good summary here.

In this use case:

array.tally
# => { "Jason" => 2, "Judah" => 3, "Allison" => 1, "Teresa" => 1, "Michelle" => 1 }

Docs on the features being released are here.

1 Comment

Fantastic news!
17

There's actually a data structure which does this: MultiSet.

Unfortunately, there is no MultiSet implementation in the Ruby core library or standard library, but there are a couple of implementations floating around the web.

This is a great example of how the choice of a data structure can simplify an algorithm. In fact, in this particular example, the algorithm even completely goes away. It's literally just:

Multiset.new(*names)

And that's it. Example, using https://GitHub.Com/Josh/Multimap/:

require 'multiset'

names = %w[Jason Jason Teresa Judah Michelle Judah Judah Allison]

histogram = Multiset.new(*names)
# => #<Multiset: {"Jason", "Jason", "Teresa", "Judah", "Judah", "Judah", "Michelle", "Allison"}>

histogram.multiplicity('Judah')
# => 3

Example, using http://maraigue.hhiro.net/multiset/index-en.php:

require 'multiset'

names = %w[Jason Jason Teresa Judah Michelle Judah Judah Allison]

histogram = Multiset[*names]
# => #<Multiset:#2 'Jason', #1 'Teresa', #3 'Judah', #1 'Michelle', #1 'Allison'>

4 Comments

Does the MultiSet concept originate from mathematics, or another programming language?
@Andrew Grimm: Both he word "multiset" (de Bruijn, 1970s) and the concept (Dedekind 1888) originated in mathematics. Multiset is governed by strict mathematical rules and supports the typical set operations (union, intersection, complement, ...) in a way that is mostly consistent with the axioms, laws and theorems of "normal" mathematical set theory, although some important laws do not hold when you try to generalize them to multisets. But that's way beyond my understanding of the matter. I use them as a programming data structure, not as a mathematical concept.
To expand a little on that point: "... in a way that is mostly consistent with the axioms ...": "Normal" sets are usually formally defined by a set of axioms (assumptions) called "Zermelo-Frankel set theory". However, one of these axioms: the axiom of extensionality states that a set is defined precisely by its members - e.g. {A, A, B} = {A, B}. This is clearly a violation of the very definition of multi-sets!
...However, without going into too much detail (as this is a software forum, not advanced maths!), one can formally define multi-sets mathematically via axioms for Crisp sets, the Peano axioms and other MultiSet-specific axioms.
14

Enumberable#each_with_object saves you from returning the final hash.

names.each_with_object(Hash.new(0)) { |name, hash| hash[name] += 1 }

Returns:

=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

1 Comment

Agree, each_with_object variant is more readable to me than inject
6

This works.

arr = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
result = {}
arr.uniq.each{|element| result[element] = arr.count(element)}

2 Comments

+1 For a different approach -- although this has worse theoretical complexity -- O(n^2) (which will matter for some values of n) and does extra work (it has to count for "Judah" 3x, for instance)!. I would also suggest each instead of map (the map result is being discarded)
Thanks for that! I've changed the map to each.Also, I've uniq'ed the array before going through it. Maybe now the complexity issue is solved?
6

The following is a slightly more functional programming style:

array_with_lower_case_a = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
hash_grouped_by_name = array_with_lower_case_a.group_by {|name| name}
hash_grouped_by_name.map{|name, names| [name, names.length]}
=> [["Jason", 2], ["Teresa", 1], ["Judah", 3], ["Michelle", 1], ["Allison", 1]]

One advantage of group_by is that you can use it to group equivalent but not exactly identical items:

another_array_with_lower_case_a = ["Jason", "jason", "Teresa", "Judah", "Michelle", "Judah Ben-Hur", "JUDAH", "Allison"]
hash_grouped_by_first_name = another_array_with_lower_case_a.group_by {|name| name.split(" ").first.capitalize}
hash_grouped_by_first_name.map{|first_name, names| [first_name, names.length]}
=> [["Jason", 2], ["Teresa", 1], ["Judah", 3], ["Michelle", 1], ["Allison", 1]]

1 Comment

Did I hear functional programming? +1 :-) This is definitely the best way, although it can be argued that is not memory-efficient. Notice also that Facets has a Enumerable#frequency.
5
names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
Hash[names.group_by{|i| i }.map{|k,v| [k,v.size]}]
# => {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

Comments

5
a = [1, 2, 3, 2, 5, 6, 7, 5, 5]
a.each_with_object(Hash.new(0)) { |o, h| h[o] += 1 }

# => {1=>1, 2=>2, 3=>1, 5=>3, 6=>1, 7=>1}

Credit Frank Wambutt

Comments

2

Lots of great implementations here.

But as a beginner I would consider this the easiest to read and implement

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

name_frequency_hash = {}

names.each do |name|
  count = names.count(name)
  name_frequency_hash[name] = count  
end
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

The steps we took:

  • we created the hash
  • we looped over the names array
  • we counted how many times each name appeared in the names array
  • we created a key using the name and a value using the count

It may be slightly more verbose (and performance wise you will be doing some unnecessary work with overriding keys), but in my opinion easier to read and understand for what you want to achieve

3 Comments

I don't see how that's any easier to read than the accepted answer, and it's clearly a worse design (doing lots of unnecessary work).
@Tom Lord - I do agree with you on performance (I even mentioned that in my answer) - but as a beginner trying to understand the actual code and steps required, I find it helps to be more verbose and then one can refactor to improve performance and make code more declarative
I agree somewhat with @SamiBirnbaum. This is the only one that uses almost no special ruby knowledge like Hash.new(0). The closest to pseudocode. That can be a good thing for readability but also doing unnecessary work can harm readability for readers who notice it because in more complex cases they will spend a little time thinking they're going crazy trying to figure out why it's done.
2

With ruby 2.6 you can do:

names.to_h{ |name| [name, names.count(name)] }

gives you:

{"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

Comments

1

This is more a comment than an answer, but a comment wouldn't do it justice. If you do Array = foo, you crash at least one implementation of IRB:

C:\Documents and Settings\a.grimm>irb
irb(main):001:0> Array = nil
(irb):1: warning: already initialized constant Array
=> nil
C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3177:in `rl_redisplay': undefined method `new' for nil:NilClass (NoMethodError)
        from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3873:in `readline_internal_setup'
        from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4704:in `readline_internal'
        from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4727:in `readline'
        from C:/Ruby19/lib/ruby/site_ruby/1.9.1/readline.rb:40:in `readline'
        from C:/Ruby19/lib/ruby/1.9.1/irb/input-method.rb:115:in `gets'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:139:in `block (2 levels) in eval_input'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:271:in `signal_status'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:138:in `block in eval_input'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `call'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `buf_input'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:103:in `getc'
        from C:/Ruby19/lib/ruby/1.9.1/irb/slex.rb:205:in `match_io'
        from C:/Ruby19/lib/ruby/1.9.1/irb/slex.rb:75:in `match'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:287:in `token'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:263:in `lex'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:234:in `block (2 levels) in each_top_level_statement'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `loop'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `block in each_top_level_statement'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `catch'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `each_top_level_statement'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:153:in `eval_input'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:70:in `block in start'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:69:in `catch'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:69:in `start'
        from C:/Ruby19/bin/irb:12:in `<main>'

C:\Documents and Settings\a.grimm>

That's because Array is a class.

Comments

1
arr = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

arr.uniq.inject({}) {|a, e| a.merge({e => arr.count(e)})}

Time elapsed 0.028 milliseconds

interestingly, stupidgeek's implementation benchmarked:

Time elapsed 0.041 milliseconds

and the winning answer:

Time elapsed 0.011 milliseconds

:)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.