93

I have an array, and a function that returns a value given a value. Ultimately I want to create a hashmap that has the values of the array as key value, and the result of f(key_value) as the value. Is there a clean, simple way, like similar to each/map of Array, of doing this using block?

So something that is equivalent to

hsh = {}
[1,2,3,4].each do |x|
  hsh[x] = f(x)
end

but looks more similar to this, in that it's simple and one line?

results = array.map { | x | f(x) }
2

9 Answers 9

154

Note that since Ruby 2.1.0 you can also use Array#to_h, like this:

[1,2,3,4].map{ |x| [x, f(x)] }.to_h
Sign up to request clarification or add additional context in comments.

Comments

101

Ruby 2.6.0 enables passing a block to the to_h-method. This enables an even shorter syntax for creating a hash from an array:

[1, 2, 3, 4].to_h { |x| [x, f(x)] }

1 Comment

short and sweet
42

You could also define the function as the hash's default value:

hash = Hash.new {|hash, key| hash[key] = f(key) }

Then when you lookup a value, the hash will calculate and store it on the fly.

hash[10]
hash.inspect #=> { 10 => whatever_the_result_is }

Comments

34

You need each_with_object.

def f x
  x * 2
end

t = [1, 2, 3, 4].each_with_object({}) do |x, memo|
  memo[x] = f(x)
end

t # => {1=>2, 2=>4, 3=>6, 4=>8}

Another one:

t2 = [1, 2, 3, 4].map{|x| [x, f(x)]}
Hash[t2] # => {1=>2, 2=>4, 3=>6, 4=>8}

2 Comments

Just for the completeness (suggested edits are full): this is Ruby >= 1.9.1.378.
@Cadoiz: I am quite sure no one is using a 14yo ruby version. It's not whisky :) But thanks, good note.
25

Check out the Hash::[] method.

Hash[ [1,2,3,4].collect { |x| [x, f(x)] } ]

1 Comment

Note that this doesn't work with lazy enumerators, but Ruby 2.1.0's .to_h does.
11

From Ruby 2.1:

[1, 2, 3, 4].map { |x| [x, f(x)] }.to_h

[EDIT] Using Facets' mash (method to convert enumerable to hashes) (unmaintained?):

[1, 2, 3, 4].mash { |x| [x, f(x)] }

1 Comment

@Cadoiz thx, updated. Not sure facets is being developed anymore.
3

TL;DR: Comparing the different solutions in this Q/A

All tested options are fine. Hash.new with a lookup function performs best. No declaration cost and still approximately as fast if you access every element once. As long as f(...) is cheap, as it's a lazy/on-demand data structure then. If you need your results right away, chose any other one that suits your needs and suffer at most ~16% penalty.


This answer tries to be a summery of the existing other ones to answer a follow-up question the reader might have: which is the best solution? I will provide all the ressources to reproduce/verify my results and compare the existing answers, ordered by decreasing trend (at time of writing). Right now, even ruby2.6 is out-of support for more then a year, so we can probably assume that everything will be supported now. So let's start with this table for an overview:

There are more options such as Enumerable#reduce + Hash#merge! which are not scope of this answer. You can consider this related answer discussing it's advantages, I took some inspiration/structure from there.

So these are the different solutions I have compiled and benchmarked:

Presenting the full solutions in code

a = [1,2,3,4]
def f key
  key
end

g = a.to_h { |x| [x, f(x)] }
k = a.map { |x| [x, f(x)] }.to_h
o = Hash.new {|hash, key| hash[key] = f(key) }
q = a.each_with_object({}) { |x, memo| memo[x] = f(x) }
s = a.index_with(&:c) # not tested

Benchmark script

According to noraj, you should use bmbm and not bm to avoid differences due to the cost of memory allocation and garbage collection. This is an important point I'm discussing later in "Interpretation".

require 'benchmark'

a = (1..1_000_000)
b = a.to_a
def f key
  key
end

# Just for accessing later, not important
g = a.to_h { |x| [x, f(x)] }; h = b.to_h { |x| [x, f(x)] }; i = a.to_h { |x| [x, x] }; j = b.to_h { |x| [x, x] }; k = a.map { |x| [x, f(x)] }.to_h; l = b.map { |x| [x, f(x)] }.to_h; m = a.map { |x| [x, x] }.to_h; n = b.map { |x| [x, x] }.to_h; o = Hash.new {|hash, key| hash[key] = f(key) }; q = a.each_with_object({}) { |x, memo| memo[x] = f(x) }; r = b.each_with_object({}) { |x, memo| memo[x] = f(x) };


Benchmark.bmbm do |x|
  x.report('Declaration of a.to_h { function }:') { a.to_h { |x| [x, f(x)] } }
  x.report('Declaration of b.to_h { function }:') { b.to_h { |x| [x, f(x)] } }
  x.report('Declaration of a.to_h { array }:') { a.to_h { |x| [x, x] } }
  x.report('Declaration of b.to_h { array }:') { b.to_h { |x| [x, x] } }
  x.report('Declaration of a.map { function }.to_h:') { a.map { |x| [x, f(x)] }.to_h }
  x.report('Declaration of b.map { function }.to_h:') { b.map { |x| [x, f(x)] }.to_h }
  x.report('Declaration of a.map { array }.to_h:') { a.map { |x| [x, x] }.to_h }
  x.report('Declaration of b.map { array }.to_h:') { b.map { |x| [x, x] }.to_h }
  x.report('Declaration of Hash.new (with lookup)') { Hash.new {|hash, key| hash[key] = c(key) } }
  x.report('Declaration of a.each_with_object({})') { a.each_with_object({}) { |x, memo| memo[x] = f(x) } }
  x.report('Declaration of a.each_with_object({})') { b.each_with_object({}) { |x, memo| memo[x] = f(x) } }

  x.report('Accessing from a.to_h { function }:') { a.reduce { |sum, x| g[x] } }
  x.report('Accessing from b.to_h { function }:') { a.reduce { |sum, x| h[x] } }
  x.report('Accessing from a.to_h { array }:') { a.reduce { |sum, x| i[x] } }
  x.report('Accessing from b.to_h { array }:') { a.reduce { |sum, x| j[x] } }
  x.report('Accessing from a.map { function }.to_h:') { a.reduce { |sum, x| k[x] } }
  x.report('Accessing from b.map { function }.to_h:') { a.reduce { |sum, x| l[x] } }
  x.report('Accessing from a.map { array }.to_h:') { a.reduce { |sum, x| m[x] } }
  x.report('Accessing from b.map { array }.to_h:') { a.reduce { |sum, x| n[x] } }
  x.report('Accessing from Hash.new (with lookup)') { a.reduce { |sum, x| o[x] } }
  x.report('Accessing from a.each_with_object({})') { a.reduce { |sum, x| q[x] } }
  x.report('Accessing from b.each_with_object({})') { a.reduce { |sum, x| r[x] } }
end

Benchmark results for 5M items iterator/array

                                             user     system      total        real
Declaration of a.to_h { function }:       1.650822   0.091929   1.742751 (  1.742868)
Declaration of b.to_h { function }:       1.170426   0.071894   1.242320 (  1.242395)
Declaration of a.to_h { array }:          1.604802   0.051815   1.656617 (  1.660191)
Declaration of b.to_h { array }:          1.159055   0.007980   1.167035 (  1.167077)
Declaration of a.map { function }.to_h:   1.511910   0.099993   1.611903 (  1.611978)
Declaration of b.map { function }.to_h:   1.381991   0.112121   1.494112 (  1.494172)
Declaration of a.map { array }.to_h:      1.399749   0.091984   1.491733 (  1.491789)
Declaration of b.map { array }.to_h:      1.440968   0.039829   1.480797 (  1.480845)
Declaration of Hash.new (with lookup)     0.000017   0.000001   0.000018 (  0.000009)
Declaration of a.each_with_object({})     1.496914   0.131904   1.628818 (  1.628930)
Declaration of a.each_with_object({})     1.438418   0.180053   1.618471 (  1.618551)

Rehearsal ---------------------------------------------------------------------------
Accessing from a.to_h { function }:       1.005993   0.000323   1.006316 (  1.006360)
Accessing from b.to_h { function }:       0.999888   0.000164   1.000052 (  1.000107)
Accessing from a.to_h { array }:          0.998487   0.000068   0.998555 (  0.998610)
Accessing from b.to_h { array }:          1.003892   0.000139   1.004031 (  1.004061)
Accessing from a.map { function }.to_h:   1.023635   0.000115   1.023750 (  1.023789)
Accessing from b.map { function }.to_h:   1.006790   0.000061   1.006851 (  1.006912)
Accessing from a.map { array }.to_h:      1.005898   0.000204   1.006102 (  1.006155)
Accessing from b.map { array }.to_h:      1.013491   0.000000   1.013491 (  1.013545)
Accessing from Hash.new (with lookup)     2.289310   0.139982   2.429292 (  2.429415)
Accessing from a.each_with_object({})     1.011719   0.000078   1.011797 (  1.011855)
Accessing from b.each_with_object({})     1.002672   0.000026   1.002698 (  1.002755)
----------------------------------------------------------------- total: 12.502935sec

                                              user     system      total        real
Accessing from a.to_h { function }:       1.006216   0.000003   1.006219 (  1.006290)
Accessing from b.to_h { function }:       1.006313   0.000003   1.006316 (  1.006342)
Accessing from a.to_h { array }:          0.991536   0.000049   0.991585 (  0.991623)
Accessing from b.to_h { array }:          0.999372   0.000037   0.999409 (  0.999476)
Accessing from a.map { function }.to_h:   0.988464   0.000073   0.988537 (  0.988588)
Accessing from b.map { function }.to_h:   1.000364   0.000066   1.000430 (  1.000472)
Accessing from a.map { array }.to_h:      0.988162   0.000042   0.988204 (  0.988280)
Accessing from b.map { array }.to_h:      0.997830   0.000049   0.997879 (  0.997946)
Accessing from Hash.new (with lookup)     1.003044   0.000068   1.003112 (  1.003149)
Accessing from a.each_with_object({})     1.006635   0.000000   1.006635 (  1.006668)
Accessing from b.each_with_object({})     0.993825   0.000000   0.993825 (  0.993867)

Interpretation

So based on these results, you can say that there is a bill for memory allocation and garbage collection, which you can see in the rehearsal phase of Hash.new (with lookup f(...)): 2.429415 s / 1.003149 s = 242% (or 1.42 s for building the structure). This means that a preconstructed data structure isn't even guaranteed to be faster than on-demand lookup, I expected the cost to be much higher. When heavily used it could still make sense to just use Hash.new (with lookup f(...)). The consideration to make is when exactly the function is ran: on demand or previously constructed. Imagine having a compute intensive task taking 10s

def f key
  sleep 10
  key
end

If you rarely need a result, just use Hash.new with a lookup function and bam - the declaration has virtually no runtime. Or you might have a strong focus on response time - then it's obv. better to have them up-front (like a dynamic-programming style lookup table). At around 16% deviation, the performance difference isn't huge, so you could also prefer code readability. Starting with a plain array is mostly faster or equally fast so the format of your input data a/b made more of an impact here.

Comments

2

Also, Rails method index_with would be helpful:

a = ['a', 'bsdf', 'wqqwc']
a.index_with(&:size)
=> {"a"=>1, "bsdf"=>4, "wqqwc"=>5}

2 Comments

Turns out I misread the question, and I was actually looking for index_by, which does the opposite (function yields the keys; elements become the values).
You also do index_with {...} and give it a block and it is rails >= 6.. This link also provides an example for that (unfortunately the sugessted edit queue is full).
1

You're looking for each_with_object() method:

elem = [1,2,3,4]
h = elem.each_with_object({}) do |x, res|
  res[x] = x**2
end

puts h

The argument passed to each_with_object({}) is the initial value of an intermediate object that is passed to the block as res variable. In each iteration we're adding new pair key: value to the res Hash and returning the Hash to be used in next iteration.

The method above pre-computes a very practical hash of squared values:

{1=>1, 2=>4, 3=>9, 4=>16}

1 Comment

My RuboCop tells me that each_with_object as in this answer should be preferred.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.