70
arr = [1,2,1,3,5,2,4]

How can I count the array by group value with sorting? I need the following output:

x[1] = 2  
x[2] = 2  
x[3] = 1  
x[4] = 1  
x[5] = 1
2
  • 2
    possible duplicate of How to count duplicates in Ruby Arrays Commented Mar 29, 2011 at 22:33
  • 1
    Why without a loop? There's going to be a loop going on somewhere. Commented Nov 15, 2012 at 16:07

11 Answers 11

127
x = arr.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h }
Sign up to request clarification or add additional context in comments.

7 Comments

Many thanks michael and Terw. I like this very short. But, can you please briefly explain the above short line. :).
inject "injects" an accumulator into an Enumerable, which in our case is a Hash with a default value of 0. On every iteration, we add one to the value with the key of the current element (e). Finally we return the accumulator. ruby-doc.org/core/classes/Enumerable.html#M001494
The "inject" operation is often called "fold" in functional programming languages, which I think is a more intuitive name.
But that code doesn't sort hash. So in the end it's need more: Hash[#code here#.sort] or even sort_by
prefer .each_with_object over inject when building hashes versus arithmetic. See @sawa's answer below.
|
71

There is a short version which is in ruby 2.7 => Enumerable#tally.

[1,2,1,3,5,2,4].tally  #=> { 1=>2, 2=>2, 3=>1, 5=>1, 4=>1 }

1 Comment

38

Only available under ruby 1.9

Basically the same as Michael's answer, but a slightly shorter way:

x = arr.each_with_object(Hash.new(0)) {|e, h| h[e] += 1}

In similar situations,

  • When the starting element is a mutable object such as an Array, Hash, String, you can use each_with_object, as in the case above.
  • When the starting element is an immutable object such as Numeric, you have to use inject as below.

    sum = (1..10).inject(0) {|sum, n| sum + n} # => 55

4 Comments

In terms of characters, it's longer. In terms of tokens, it's shorter. Thanks for comment.
Thanks @sawa. Absolutely it's very shorter and faster. Because, my actual array is mutable format and it holds a very large amount of data. thanks once again.
Though I've noticed with this approach that the values isn't in sorted order like the answer said.
This is the cleanest answer. each_with_object has been added to avoid h[e] += 1 ; h
26
arr.group_by(&:itself).transform_values(&:size)
#=> {1=>2, 2=>2, 3=>1, 5=>1, 4=>1}

2 Comments

A thing of beauty!
This is the most "beautiful ruby" answer here, and an excellent example of ruby in general. If you don't have ruby 2.7 (and so can't use #tally), use this.
21

Yet another - similar to others - approach:

result=Hash[arr.group_by{|x|x}.map{|k,v| [k,v.size]}]
  1. Group by each element's value.
  2. Map the grouping to an array of [value, counter] pairs.
  3. Turn the array of paris into key-values within a Hash, i.e. accessible via result[1]=2 ....

1 Comment

Cleaner and easier to understand than the accepted answer.
16

Whenever you find someone asserting that something is the fastest on this type of primitive routine, I always find its interesting to confirm that because without confirmation most of us are really just guessing. So I took all of the methods here and benchmarked them.

I took an array of 120 links I extracted from a web page that I needed to group by count and implemented all of these using a seconds = Benchmark.realtime do loop and got all the times.

Assume links is the name of the array I need to count:

#0.00077
seconds = Benchmark.realtime do
  counted_links = {}
  links.each { |e| counted_links[e] = links.count(e) if counted_links[e].nil?}
end
seconds

#0.000232
seconds = Benchmark.realtime do
  counted_links = {}
  links.sort.group_by {|x|x}.each{|x,y| counted_links[x] = y.size}
end

#0.00076
seconds = Benchmark.realtime do 
  Hash[links.uniq.map{ |i| [i, links.count(i)] }]
end

#0.000107 
seconds = Benchmark.realtime do 
  links.inject(Hash.new(0)) {|h, v| h[v] += 1; h}
end

#0.000109
seconds = Benchmark.realtime do 
  links.each_with_object(Hash.new(0)) {|e, h| h[e] += 1}
end

#0.000143
seconds = Benchmark.realtime do 
  links.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h }
end

And then a little bit of ruby to figure out the answer:

times = [0.00077, 0.000232, 0.00076, 0.000107, 0.000109, 0.000143].min
==> 0.000107

So the actual fastest method, ymmv of course, is:

links.inject(Hash.new(0)) {|h, v| h[v] += 1; h}

3 Comments

I think "leap in logic" befits your conclusion. :-)
Thanks for the benchmarks, this helped me select the quickest option, which is all i was interested in.
#tally is the fastest option now, unless you need to count based on some derived value, in which case the each_with_object option is faster than map plus tally for large arrays.
11
x = Hash[arr.uniq.map{ |i| [i, arr.count(i)] }]

Latest Ruby has to_h method:

x = arr.uniq.map{ |i| [i, arr.count(i)] }.to_h

5 Comments

Michael Kohl beat me, but he's code should be faster. This code takes about twice as long
@fl00r..that is interesting..I thought this would be slower as it loops through and then again use count method on the array. Maybe using built in methods has their advantage. :)
@fl00r: Really? I originally had a version using count, but thought it wouldn't scale well with array length, so replaced it by my current answer. Can you run your benchmark with somewhat bigger array and compare again.
Not really. I was wrong. As far as this is O(n2) it is faster in benchmarks with small arrays, but it will increadibly slow with big arrays. My fault is I was testing present array in million cycle bench - so it was 20% faster.
@fl00r..yeah..definitely this would be slow for larger arrays.
6

I am sure there are better ways,

>> arr.sort.group_by {|x|x}.each{|x,y| print "#{x} #{y.size}\n"}
1 2
2 2
3 1
4 1
5 1

assign x and y values to a hash as needed.

6 Comments

it is not necessary to sort before group_by. arr.group_by {...} will do the same thing
@user102008 The OP implied the results are to be presented in order. Not [2,1].group_by {|x|x} #=> {2=>[2], 1=>[1]} kurumi, what ways are better?
@CarySwoveland: group_by returns a Hash which has no order. The order in which entries in a Hash is iterated is unpredictable.
@user102008, group_by preserves order, at least in MRI 1.9+. AFAIK, it is not documented, but should be, as it's part of the spec.
@CarySwoveland: The values corresponding to each key have an order; sorting might be relevant if you cared about that. But there is no order among the keys, for the very fact that the returned value is a Hash. So it would NOT have anything to do with [2,1].group_by {|x|x} #=> {2=>[2], 1=>[1]}
|
6

Just for the record, I recently read about Object#tap here. My solution would be:

Hash.new(0).tap{|h| arr.each{|i| h[i] += 1}}

The #tap method passes the caller to the block and then returns it. This is pretty handy when you have to incrementally build an array/hash.

Comments

5

This should do it

arr = [1,2,1,3,5,2,4]

puts arr.inject(Hash.new(0)) {|h, v| h[v] += 1; h}
#=> {1=>2, 2=>2, 3=>1, 5=>1, 4=>1}

Comments

2
arr = [1,2,1,3,5,2,4]
r = {}
arr.each { |e| r[e] = arr.count(e) if r[e].nil?}

Outputs

p r
#==> {1=>2, 2=>2, 3=>1, 5=>1, 4=>1}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.