Basically your code is the fastest you can get code to run for this, except for some minor issues.
If you have an unneeded entry marking the end of the array
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
I think you should pop it off prior to running, since it has the possibility of resulting in a weird entry, and its existence forces you to use the conditional test in your loop, slowing your code.
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
stats.pop # => "xxx"
stats # => ["user1", "user1", "user1", "user2", "user4", "user2"]
Built-in methods exist, which reduce the amount of code to a single call, but they're slower than a loop:
stats.group_by{ |e| e } # => {"user1"=>["user1", "user1", "user1"], "user2"=>["user2", "user2"], "user4"=>["user4"], "xxx"=>["xxx"]}
From there it's easy to map the resulting hash into summaries:
stats.group_by{ |e| e }.map{ |k, v| [k, v.size] } # => [["user1", 3], ["user2", 2], ["user4", 1]]
And then into a hash again:
stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }.to_h # => {"user1"=>3, "user2"=>2, "user4"=>1}
or:
Hash[stats.group_by{ |e| e }.map{ |k, v| [k, v.size] }] # => {"user1"=>3, "user2"=>2, "user4"=>1}
Using the built-in methods are efficient, and very useful when you're dealing with very large lists, because there's very little redundant looping going on.
Looping over the data like you did is also very fast, and usually faster than the built-in methods, when written correctly. Here are some benchmarks showing alternate ways of accomplishing this stuff:
require 'fruity' # => true
names = ['user1', 'user2', 'user3', 'user4']
stats = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2']
Hash[names.map {|v| [v, 0]}] # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
Hash[names.zip([0] * names.size )] # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
names.zip([0] * names.size ).to_h # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
hash = {}; names.each{ |k| hash[k] = 0 }; hash # => {"user1"=>0, "user2"=>0, "user3"=>0, "user4"=>0}
compare do
map_hash { Hash[names.map {|v| [v, 0]}] }
zip_hash { Hash[names.zip([0] * names.size )] }
to_h_hash { names.zip([0] * names.size ).to_h }
hash_braces { hash = {}; names.each{ |k| hash[k] = 0 }; hash }
end
# >> Running each test 2048 times. Test will take about 1 second.
# >> hash_braces is faster than map_hash by 50.0% ± 10.0%
# >> map_hash is faster than to_h_hash by 19.999999999999996% ± 10.0%
# >> to_h_hash is faster than zip_hash by 10.000000000000009% ± 10.0%
Looking at the conditional in the loop to see how it effects the code:
require 'fruity' # => true
NAMES = ['user1', 'user2', 'user3', 'user4']
STATS = ['user1', 'user1', 'user1', 'user2', 'user4', 'user2', 'xxx']
STATS2 = STATS[0 .. -2]
def build_hash
h = {}
NAMES.each{ |k| h[k] = 0 }
h
end
compare do
your_way {
hash = build_hash()
STATS.each do |item| # basic loop to count the records
hash[item] += 1 if hash.has_key?(item)
end
hash
}
my_way {
hash = build_hash()
STATS2.each { |e| hash[e] += 1 }
hash
}
end
# >> Running each test 512 times. Test will take about 1 second.
# >> my_way is faster than your_way by 27.0% ± 1.0%
While several answers suggested using count, the code is going to slow down a lot as your lists increase in size, where walking the stats array once, as you are doing, will always be linear, so stick to one of these iterative solutions.
stats.each_with_object(Hash[names.map { |str| [str,0] }]) { |str, h| h[str] += 1 if h.key?(str) }. +1 for your answer. Why would you prefer a solution that must traversestatsonce for each element ofnamesto count the number of matches?["howdy", "hi", "hello"].map {|s| s[0..2] if s.size > 3 } => ["how", nil, "hel"], whereas["howdy", "hi", "hello"].map {|s| (s.size > 3) ? s[0..2] : s } => ["how", "hi", "hel"].