Ruby - mapping an array to hashmap

Question

I have an array, and a function that returns a value given a value. Ultimately I want to create a hashmap that has the values of the array as key value, and the result of f(key_value) as the value. Is there a clean, simple way, like similar to each/map of Array, of doing this using block?

So something that is equivalent to

hsh = {}
[1,2,3,4].each do |x|
  hsh[x] = f(x)
end

but looks more similar to this, in that it's simple and one line?

results = array.map { | x | f(x) }

Can anybody say something about a comparison between the methods? We right now have .to_h (ruby >= 2.6), .map(...).to_h (ruby >=2.1) (both supporting lazy enumerators), Hash.new (with lookup f(...)), each_with_object({}) (ruby >= 1.9.1.378) and Hash[...], index_with(&:...)rails >= 6. — Cadoiz
– Cadoiz, Commented Aug 8, 2023 at 12:22
So my follow-up question is how do these 6 different methods perform? I would expect 1 to be the best, followed by 3 or 6? 4 is also only iterating one-time through, but i remember each_with_object to be inferior. 2 and 5 should be slower as they iterate twice. Are my assumptions true? I know that this is also about memory, not only CPU bandwidth. Or is this a whole separate question? (Just for the the sake of completeness: [Hash[...]](stackoverflow.com/a/13017531/4575793) is ruby >= v1.8.6.287 — Cadoiz
– Cadoiz, Commented Aug 8, 2023 at 12:30

Soron · Accepted Answer · 2016-08-20 11:23:19Z

154

Note that since Ruby 2.1.0 you can also use Array#to_h, like this:

[1,2,3,4].map{ |x| [x, f(x)] }.to_h

edited Aug 20, 2016 at 11:23

Soron

4483 silver badges10 bronze badges

answered Jun 17, 2015 at 9:29

jkjkjk666

1,7812 gold badges12 silver badges4 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Timitry · Accepted Answer · 2019-01-03 13:15:55Z

101

Ruby 2.6.0 enables passing a block to the to_h-method. This enables an even shorter syntax for creating a hash from an array:

[1, 2, 3, 4].to_h { |x| [x, f(x)] }

answered Jan 3, 2019 at 13:15

Timitry

2,9851 gold badge22 silver badges27 bronze badges

1 Comment

devdanke Over a year ago

short and sweet

Zach Kemp · Accepted Answer · 2012-10-22 18:38:00Z

42

You could also define the function as the hash's default value:

hash = Hash.new {|hash, key| hash[key] = f(key) }

Then when you lookup a value, the hash will calculate and store it on the fly.

hash[10]
hash.inspect #=> { 10 => whatever_the_result_is }

answered Oct 22, 2012 at 18:38

Zach Kemp

11.9k1 gold badge35 silver badges47 bronze badges

Comments

Sergio Tulentsev · Accepted Answer · 2012-10-22 18:32:53Z

34

You need each_with_object.

def f x
  x * 2
end

t = [1, 2, 3, 4].each_with_object({}) do |x, memo|
  memo[x] = f(x)
end

t # => {1=>2, 2=>4, 3=>6, 4=>8}

Another one:

t2 = [1, 2, 3, 4].map{|x| [x, f(x)]}
Hash[t2] # => {1=>2, 2=>4, 3=>6, 4=>8}

answered Oct 22, 2012 at 18:32

Sergio Tulentsev

231k43 gold badges381 silver badges373 bronze badges

2 Comments

Cadoiz Over a year ago

Just for the completeness (suggested edits are full): this is Ruby >= 1.9.1.378.

Sergio Tulentsev Over a year ago

@Cadoiz: I am quite sure no one is using a 14yo ruby version. It's not whisky :) But thanks, good note.

Matt Huggins · Accepted Answer · 2012-10-22 18:35:35Z

25

Check out the Hash::[] method.

Hash[ [1,2,3,4].collect { |x| [x, f(x)] } ]

answered Oct 22, 2012 at 18:35

Matt Huggins

83.7k37 gold badges153 silver badges220 bronze badges

1 Comment

Oded Niv Over a year ago

Note that this doesn't work with lazy enumerators, but Ruby 2.1.0's .to_h does.

tokland · Accepted Answer · 2023-08-08 12:27:21Z

11

From Ruby 2.1:

[1, 2, 3, 4].map { |x| [x, f(x)] }.to_h

[EDIT] Using Facets' mash (method to convert enumerable to hashes) (unmaintained?):

[1, 2, 3, 4].mash { |x| [x, f(x)] }

edited Aug 8, 2023 at 12:27

answered Oct 22, 2012 at 19:12

tokland

68.2k13 gold badges151 silver badges174 bronze badges

1 Comment

tokland Over a year ago

@Cadoiz thx, updated. Not sure facets is being developed anymore.

Cadoiz · Accepted Answer · 2023-08-14 06:49:15Z

TL;DR: Comparing the different solutions in this Q/A

All tested options are fine. Hash.new with a lookup function performs best. No declaration cost and still approximately as fast if you access every element once. As long as f(...) is cheap, as it's a lazy/on-demand data structure then. If you need your results right away, chose any other one that suits your needs and suffer at most ~16% penalty.

This answer tries to be a summery of the existing other ones to answer a follow-up question the reader might have: which is the best solution? I will provide all the ressources to reproduce/verify my results and compare the existing answers, ordered by decreasing trend (at time of writing). Right now, even ruby2.6 is out-of support for more then a year, so we can probably assume that everything will be supported now. So let's start with this table for an overview:

Method	Version (Docu link)	Lazy Enumerator Support
`.to_h`	ruby >= 2.6	Yes
`.map(...).to_h`	ruby >= 2.1	Yes
`Hash.new` (with lookup f(...))	ruby >= 1.8.6.287	Different approach, is somewhat lazy
each_with_object({})	ruby >= 1.9.1.378	No
Hash[...], index_with(&:...)	rails >= 6	No

There are more options such as Enumerable#reduce + Hash#merge! which are not scope of this answer. You can consider this related answer discussing it's advantages, I took some inspiration/structure from there.

So these are the different solutions I have compiled and benchmarked:

Presenting the full solutions in code

a = [1,2,3,4]
def f key
  key
end

g = a.to_h { |x| [x, f(x)] }
k = a.map { |x| [x, f(x)] }.to_h
o = Hash.new {|hash, key| hash[key] = f(key) }
q = a.each_with_object({}) { |x, memo| memo[x] = f(x) }
s = a.index_with(&:c) # not tested

Benchmark script

According to noraj, you should use bmbm and not bm to avoid differences due to the cost of memory allocation and garbage collection. This is an important point I'm discussing later in "Interpretation".

require 'benchmark'

a = (1..1_000_000)
b = a.to_a
def f key
  key
end

# Just for accessing later, not important
g = a.to_h { |x| [x, f(x)] }; h = b.to_h { |x| [x, f(x)] }; i = a.to_h { |x| [x, x] }; j = b.to_h { |x| [x, x] }; k = a.map { |x| [x, f(x)] }.to_h; l = b.map { |x| [x, f(x)] }.to_h; m = a.map { |x| [x, x] }.to_h; n = b.map { |x| [x, x] }.to_h; o = Hash.new {|hash, key| hash[key] = f(key) }; q = a.each_with_object({}) { |x, memo| memo[x] = f(x) }; r = b.each_with_object({}) { |x, memo| memo[x] = f(x) };


Benchmark.bmbm do |x|
  x.report('Declaration of a.to_h { function }:') { a.to_h { |x| [x, f(x)] } }
  x.report('Declaration of b.to_h { function }:') { b.to_h { |x| [x, f(x)] } }
  x.report('Declaration of a.to_h { array }:') { a.to_h { |x| [x, x] } }
  x.report('Declaration of b.to_h { array }:') { b.to_h { |x| [x, x] } }
  x.report('Declaration of a.map { function }.to_h:') { a.map { |x| [x, f(x)] }.to_h }
  x.report('Declaration of b.map { function }.to_h:') { b.map { |x| [x, f(x)] }.to_h }
  x.report('Declaration of a.map { array }.to_h:') { a.map { |x| [x, x] }.to_h }
  x.report('Declaration of b.map { array }.to_h:') { b.map { |x| [x, x] }.to_h }
  x.report('Declaration of Hash.new (with lookup)') { Hash.new {|hash, key| hash[key] = c(key) } }
  x.report('Declaration of a.each_with_object({})') { a.each_with_object({}) { |x, memo| memo[x] = f(x) } }
  x.report('Declaration of a.each_with_object({})') { b.each_with_object({}) { |x, memo| memo[x] = f(x) } }

  x.report('Accessing from a.to_h { function }:') { a.reduce { |sum, x| g[x] } }
  x.report('Accessing from b.to_h { function }:') { a.reduce { |sum, x| h[x] } }
  x.report('Accessing from a.to_h { array }:') { a.reduce { |sum, x| i[x] } }
  x.report('Accessing from b.to_h { array }:') { a.reduce { |sum, x| j[x] } }
  x.report('Accessing from a.map { function }.to_h:') { a.reduce { |sum, x| k[x] } }
  x.report('Accessing from b.map { function }.to_h:') { a.reduce { |sum, x| l[x] } }
  x.report('Accessing from a.map { array }.to_h:') { a.reduce { |sum, x| m[x] } }
  x.report('Accessing from b.map { array }.to_h:') { a.reduce { |sum, x| n[x] } }
  x.report('Accessing from Hash.new (with lookup)') { a.reduce { |sum, x| o[x] } }
  x.report('Accessing from a.each_with_object({})') { a.reduce { |sum, x| q[x] } }
  x.report('Accessing from b.each_with_object({})') { a.reduce { |sum, x| r[x] } }
end

Benchmark results for 5M items iterator/array

                                             user     system      total        real
Declaration of a.to_h { function }:       1.650822   0.091929   1.742751 (  1.742868)
Declaration of b.to_h { function }:       1.170426   0.071894   1.242320 (  1.242395)
Declaration of a.to_h { array }:          1.604802   0.051815   1.656617 (  1.660191)
Declaration of b.to_h { array }:          1.159055   0.007980   1.167035 (  1.167077)
Declaration of a.map { function }.to_h:   1.511910   0.099993   1.611903 (  1.611978)
Declaration of b.map { function }.to_h:   1.381991   0.112121   1.494112 (  1.494172)
Declaration of a.map { array }.to_h:      1.399749   0.091984   1.491733 (  1.491789)
Declaration of b.map { array }.to_h:      1.440968   0.039829   1.480797 (  1.480845)
Declaration of Hash.new (with lookup)     0.000017   0.000001   0.000018 (  0.000009)
Declaration of a.each_with_object({})     1.496914   0.131904   1.628818 (  1.628930)
Declaration of a.each_with_object({})     1.438418   0.180053   1.618471 (  1.618551)

Rehearsal ---------------------------------------------------------------------------
Accessing from a.to_h { function }:       1.005993   0.000323   1.006316 (  1.006360)
Accessing from b.to_h { function }:       0.999888   0.000164   1.000052 (  1.000107)
Accessing from a.to_h { array }:          0.998487   0.000068   0.998555 (  0.998610)
Accessing from b.to_h { array }:          1.003892   0.000139   1.004031 (  1.004061)
Accessing from a.map { function }.to_h:   1.023635   0.000115   1.023750 (  1.023789)
Accessing from b.map { function }.to_h:   1.006790   0.000061   1.006851 (  1.006912)
Accessing from a.map { array }.to_h:      1.005898   0.000204   1.006102 (  1.006155)
Accessing from b.map { array }.to_h:      1.013491   0.000000   1.013491 (  1.013545)
Accessing from Hash.new (with lookup)     2.289310   0.139982   2.429292 (  2.429415)
Accessing from a.each_with_object({})     1.011719   0.000078   1.011797 (  1.011855)
Accessing from b.each_with_object({})     1.002672   0.000026   1.002698 (  1.002755)
----------------------------------------------------------------- total: 12.502935sec

                                              user     system      total        real
Accessing from a.to_h { function }:       1.006216   0.000003   1.006219 (  1.006290)
Accessing from b.to_h { function }:       1.006313   0.000003   1.006316 (  1.006342)
Accessing from a.to_h { array }:          0.991536   0.000049   0.991585 (  0.991623)
Accessing from b.to_h { array }:          0.999372   0.000037   0.999409 (  0.999476)
Accessing from a.map { function }.to_h:   0.988464   0.000073   0.988537 (  0.988588)
Accessing from b.map { function }.to_h:   1.000364   0.000066   1.000430 (  1.000472)
Accessing from a.map { array }.to_h:      0.988162   0.000042   0.988204 (  0.988280)
Accessing from b.map { array }.to_h:      0.997830   0.000049   0.997879 (  0.997946)
Accessing from Hash.new (with lookup)     1.003044   0.000068   1.003112 (  1.003149)
Accessing from a.each_with_object({})     1.006635   0.000000   1.006635 (  1.006668)
Accessing from b.each_with_object({})     0.993825   0.000000   0.993825 (  0.993867)

Interpretation

So based on these results, you can say that there is a bill for memory allocation and garbage collection, which you can see in the rehearsal phase of Hash.new (with lookup f(...)): 2.429415 s / 1.003149 s = 242% (or 1.42 s for building the structure). This means that a preconstructed data structure isn't even guaranteed to be faster than on-demand lookup, I expected the cost to be much higher. When heavily used it could still make sense to just use Hash.new (with lookup f(...)). The consideration to make is when exactly the function is ran: on demand or previously constructed. Imagine having a compute intensive task taking 10s

def f key
  sleep 10
  key
end

If you rarely need a result, just use Hash.new with a lookup function and bam - the declaration has virtually no runtime. Or you might have a strong focus on response time - then it's obv. better to have them up-front (like a dynamic-programming style lookup table). At around 16% deviation, the performance difference isn't huge, so you could also prefer code readability. Starting with a plain array is mostly faster or equally fast so the format of your input data a/b made more of an impact here.

Sergio Belevskij · Accepted Answer · 2023-01-24 07:17:28Z

2

Also, Rails method index_with would be helpful:

a = ['a', 'bsdf', 'wqqwc']
a.index_with(&:size)
=> {"a"=>1, "bsdf"=>4, "wqqwc"=>5}

edited Jan 24, 2023 at 7:17

answered Sep 20, 2022 at 12:18

Sergio Belevskij

3,0472 gold badges28 silver badges31 bronze badges

2 Comments

JakeRobb Over a year ago

Turns out I misread the question, and I was actually looking for index_by, which does the opposite (function yields the keys; elements become the values).

Cadoiz Over a year ago

You also do index_with {...} and give it a block and it is rails >= 6.. This link also provides an example for that (unfortunately the sugessted edit queue is full).

Tombart · Accepted Answer · 2023-08-14 12:46:56Z

1

You're looking for each_with_object() method:

elem = [1,2,3,4]
h = elem.each_with_object({}) do |x, res|
  res[x] = x**2
end

puts h

The argument passed to each_with_object({}) is the initial value of an intermediate object that is passed to the block as res variable. In each iteration we're adding new pair key: value to the res Hash and returning the Hash to be used in next iteration.

The method above pre-computes a very practical hash of squared values:

{1=>1, 2=>4, 3=>9, 4=>16}

edited Aug 14, 2023 at 12:46

answered Nov 14, 2018 at 10:59

Tombart

32.9k16 gold badges134 silver badges149 bronze badges

1 Comment

Cadoiz Over a year ago

My RuboCop tells me that each_with_object as in this answer should be preferred.

Collectives™ on Stack Overflow

Ruby - mapping an array to hashmap

9 Answers 9

Comments

1 Comment

Comments

2 Comments

1 Comment

1 Comment

TL;DR: Comparing the different solutions in this Q/A

Presenting the full solutions in code

Benchmark script

Benchmark results for 5M items iterator/array

Interpretation

Comments

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

Comments

1 Comment

Comments

2 Comments

1 Comment

1 Comment

TL;DR: Comparing the different solutions in this Q/A

Presenting the full solutions in code

Benchmark script

Benchmark results for 5M items iterator/array

Interpretation

Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related