How to delete elements from array that are at indexes in another array

Question

I have two arrays, one with data and one with indexes. I want to know if there are some good ways to delete the elements in data at the position given in indexes. I could do simple iteration but I am wondering what the shortest way is:

data = ['a','b','c','a','b','c','a','b','c']
indexes = [2,5,8]

//some code here

Elements in data are gone when the indexes happened to coincide with numbers in array indexes. It should look like this:

['a','b','a','b','a','b']

Poll: should Array#delete_at be changed from delete_at(i) to delete_at(*i)? — Cary Swoveland
– Cary Swoveland, Commented Aug 25, 2015 at 18:30

sawa · Accepted Answer · 2015-08-24 16:05:15Z

5

data.values_at(*data.each_index.to_a - indexes)
# => ["a", "b", "a", "b", "a", "b"]

answered Aug 24, 2015 at 16:05

sawa

169k51 gold badges287 silver badges398 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Arup Rakshit Over a year ago

No one can find a better solution than this. great answer.

ndnenkov Over a year ago

This is actually perfect.

sawa · Accepted Answer · 2015-08-24 16:08:01Z

4

I will do as below:

data = ['a','b','c','a','b','c','a','b','c']
indexes = [2,5,8]
data.values_at(*(0...data.size).to_a - indexes)
# => ["a", "b", "a", "b", "a", "b"]

edited Aug 24, 2015 at 16:08

sawa

169k51 gold badges287 silver badges398 bronze badges

answered Aug 24, 2015 at 16:05

Arup Rakshit

119k30 gold badges270 silver badges328 bronze badges

2 Comments

Arup Rakshit Over a year ago

@sawa I feel good that we thought same way... :) But you were fast.

Devon Parsons Over a year ago

This is a really good answer, but it should be noted that behind the scenes array subtraction uses iteration.

the Tin Man · Accepted Answer · 2015-08-24 23:16:59Z

Doing it without iteration might seem like a good goal, but iteration done right is going to be extremely fast.

Benchmarks are important:

require 'benchmark'

DATA = ['a','b','c','a','b','c','a','b','c']
INDEXES = [2,5,8]

def ttm(data)
  d2 = data.dup
  INDEXES.sort.reverse.each{ |i| d2.delete_at(i) }
  d2
end

def devon_parsons(data)
  new_data = data.each_with_index.reject do |value,index|
    INDEXES.include? index
  end.map(&:first)
  new_data
end

def arup_rakshit(data)
  data.values_at(*(0...data.size).to_a - INDEXES)
end

def sawa(data)
  data.values_at(*data.each_index.to_a - INDEXES)
end

Make sure it's an apples to apples test:

ttm(DATA)           # => ["a", "b", "a", "b", "a", "b"]
devon_parsons(DATA) # => ["a", "b", "a", "b", "a", "b"]
arup_rakshit(DATA)  # => ["a", "b", "a", "b", "a", "b"]
sawa(DATA)          # => ["a", "b", "a", "b", "a", "b"]

Run the benchmarks:

n = 100_000 
Benchmark.bm(13) do |b|
  b.report('ttm:')          { n.times { ttm(DATA)           } }
  b.report('devon_parsons') { n.times { devon_parsons(DATA) } }
  b.report('arup_rakshit')  { n.times { arup_rakshit(DATA)  } }
  b.report('sawa')          { n.times { sawa(DATA)          } }
end

Which results in:

# >>                     user     system      total        real
# >> ttm:            0.130000   0.000000   0.130000 (  0.127559)
# >> devon_parsons   0.530000   0.000000   0.530000 (  0.535929)
# >> arup_rakshit    0.250000   0.000000   0.250000 (  0.255295)
# >> sawa            0.300000   0.010000   0.310000 (  0.305376)

If the data size grows:

DATA2 = DATA * 100
Benchmark.bm(13) do |b|
  b.report('ttm:')          { n.times { ttm(DATA2)           } }
  b.report('devon_parsons') { n.times { devon_parsons(DATA2) } }
  b.report('arup_rakshit')  { n.times { arup_rakshit(DATA2)  } }
  b.report('sawa')          { n.times { sawa(DATA2)          } }
end

The results really change:

# >>                     user     system      total        real
# >> ttm:            0.320000   0.090000   0.410000 (  0.420074)
# >> devon_parsons  39.170000   0.080000  39.250000 ( 39.265062)
# >> arup_rakshit    9.950000   0.010000   9.960000 (  9.975699)
# >> sawa            9.940000   0.020000   9.960000 (  9.959036)

It's really important to test what happens as the array size changes. What might run quickly on a small array can slow dramatically as the array grows. And, too often, what seems like a cool way to do something turns out to be very slow because there are hidden costs. Benchmarks help us figure these things out.

Note: Using sort.reverse is very important. Without those the array will be mangled.

sort can further be improved to sort_by(&:itself)

require 'benchmark'

array = (0..99).to_a.shuffle
n = 100_000 

Benchmark.bm(7) do |b|
  b.report('sort:')    { n.times { array.sort              } }
  b.report('sort_by:') { n.times { array.sort_by(&:itself) } }
end

Resulting in:

              user     system      total        real
sort:     0.460000   0.010000   0.470000 (  0.480236)
sort_by:  3.600000   0.030000   3.630000 (  3.627871)

Increasing the array size:

array = (0..999).to_a.shuffle
Benchmark.bm(13) do |b|
  b.report('sort:')    { n.times { array.sort              } }
  b.report('sort_by:') { n.times { array.sort_by(&:itself) } }
end

Resulting in:

                    user     system      total        real
sort:           9.520000   0.120000   9.640000 (  9.659246)
sort_by:       53.530000   0.720000  54.250000 ( 54.321285)

I have a solution, which seems to perform better - is it really better or fluke?
If you only care about performance, sort can further be improved to sort_by(&:itself).
sort will run faster than sort_by when comparing basic objects like numbers. See the added benchmark.

Devon Parsons · Accepted Answer · 2015-08-24 16:08:19Z

1

new_data = data.each_with_index.reject do |value,index|
  indexes.include? index
end.map(&:first)

New answer that actually works this time - it runs in O(n^2) and I don't see a way of doing it without iterating over indexes at all.

answered Aug 24, 2015 at 16:08

Devon Parsons

1,30814 silver badges23 bronze badges

Comments

Wand Maker · Accepted Answer · 2015-08-24 18:43:44Z

0

Here is my solution:

data = ['a','b','c','a','b','c','a','b','c']
indexes = [2,5,8]

updated_data = data.dup
indexes.each { |i| updated_data[i] = nil}
updated_data.compact!
p updated_data # Prints ["a", "b", "a", "b", "a", "b"]

As far as benchmark goes, using the Tin Man's code, it seems to perform the best. Not sure whether it has something to do with small size of indexes array.

                    user     system      total        real
ttm:            0.125000   0.000000   0.125000 (  0.113075)
devon_parsons   0.484000   0.000000   0.484000 (  0.491327)
arup_rakshit    0.219000   0.000000   0.219000 (  0.221149)
sawa            0.250000   0.000000   0.250000 (  0.253168)
wandmaker       0.094000   0.016000   0.110000 (  0.095063)

# Run 2 with larger data
                    user     system      total        real
ttm:            0.422000   0.188000   0.610000 (  0.596413)
devon_parsons  39.328000   0.000000  39.328000 ( 39.489394)
arup_rakshit   10.078000   0.562000  10.640000 ( 10.644099)
sawa           10.219000   0.110000  10.329000 ( 10.328250)
wandmaker       0.359000   0.062000   0.421000 (  0.423282)

answered Aug 24, 2015 at 18:43

Wand Maker

18.8k9 gold badges58 silver badges90 bronze badges

2 Comments

the Tin Man Over a year ago

What if the OPs array contained nils that were significant? You'd be removing data not specified by indexes. You need code to guard against that.

Wand Maker Over a year ago

@theTinMan Yeah, that makes sense.

Collectives™ on Stack Overflow

How to delete elements from array that are at indexes in another array

5 Answers 5

2 Comments

2 Comments

4 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

2 Comments

4 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related