6

I have two arrays, one with data and one with indexes. I want to know if there are some good ways to delete the elements in data at the position given in indexes. I could do simple iteration but I am wondering what the shortest way is:

data = ['a','b','c','a','b','c','a','b','c']
indexes = [2,5,8]

//some code here

Elements in data are gone when the indexes happened to coincide with numbers in array indexes. It should look like this:

['a','b','a','b','a','b']
3
  • Just a coincidence we're removing all the c's in here? Commented Aug 24, 2015 at 16:02
  • yep thats demonstration Commented Aug 24, 2015 at 18:37
  • Poll: should Array#delete_at be changed from delete_at(i) to delete_at(*i)? Commented Aug 25, 2015 at 18:30

5 Answers 5

5
data.values_at(*data.each_index.to_a - indexes)
# => ["a", "b", "a", "b", "a", "b"]
Sign up to request clarification or add additional context in comments.

2 Comments

No one can find a better solution than this. great answer.
This is actually perfect.
4

I will do as below:

data = ['a','b','c','a','b','c','a','b','c']
indexes = [2,5,8]
data.values_at(*(0...data.size).to_a - indexes)
# => ["a", "b", "a", "b", "a", "b"]

2 Comments

@sawa I feel good that we thought same way... :) But you were fast.
This is a really good answer, but it should be noted that behind the scenes array subtraction uses iteration.
4

Doing it without iteration might seem like a good goal, but iteration done right is going to be extremely fast.

Benchmarks are important:

require 'benchmark'

DATA = ['a','b','c','a','b','c','a','b','c']
INDEXES = [2,5,8]

def ttm(data)
  d2 = data.dup
  INDEXES.sort.reverse.each{ |i| d2.delete_at(i) }
  d2
end

def devon_parsons(data)
  new_data = data.each_with_index.reject do |value,index|
    INDEXES.include? index
  end.map(&:first)
  new_data
end

def arup_rakshit(data)
  data.values_at(*(0...data.size).to_a - INDEXES)
end

def sawa(data)
  data.values_at(*data.each_index.to_a - INDEXES)
end

Make sure it's an apples to apples test:

ttm(DATA)           # => ["a", "b", "a", "b", "a", "b"]
devon_parsons(DATA) # => ["a", "b", "a", "b", "a", "b"]
arup_rakshit(DATA)  # => ["a", "b", "a", "b", "a", "b"]
sawa(DATA)          # => ["a", "b", "a", "b", "a", "b"]

Run the benchmarks:

n = 100_000 
Benchmark.bm(13) do |b|
  b.report('ttm:')          { n.times { ttm(DATA)           } }
  b.report('devon_parsons') { n.times { devon_parsons(DATA) } }
  b.report('arup_rakshit')  { n.times { arup_rakshit(DATA)  } }
  b.report('sawa')          { n.times { sawa(DATA)          } }
end

Which results in:

# >>                     user     system      total        real
# >> ttm:            0.130000   0.000000   0.130000 (  0.127559)
# >> devon_parsons   0.530000   0.000000   0.530000 (  0.535929)
# >> arup_rakshit    0.250000   0.000000   0.250000 (  0.255295)
# >> sawa            0.300000   0.010000   0.310000 (  0.305376)

If the data size grows:

DATA2 = DATA * 100
Benchmark.bm(13) do |b|
  b.report('ttm:')          { n.times { ttm(DATA2)           } }
  b.report('devon_parsons') { n.times { devon_parsons(DATA2) } }
  b.report('arup_rakshit')  { n.times { arup_rakshit(DATA2)  } }
  b.report('sawa')          { n.times { sawa(DATA2)          } }
end

The results really change:

# >>                     user     system      total        real
# >> ttm:            0.320000   0.090000   0.410000 (  0.420074)
# >> devon_parsons  39.170000   0.080000  39.250000 ( 39.265062)
# >> arup_rakshit    9.950000   0.010000   9.960000 (  9.975699)
# >> sawa            9.940000   0.020000   9.960000 (  9.959036)

It's really important to test what happens as the array size changes. What might run quickly on a small array can slow dramatically as the array grows. And, too often, what seems like a cool way to do something turns out to be very slow because there are hidden costs. Benchmarks help us figure these things out.

Note: Using sort.reverse is very important. Without those the array will be mangled.


sort can further be improved to sort_by(&:itself)

require 'benchmark'

array = (0..99).to_a.shuffle
n = 100_000 

Benchmark.bm(7) do |b|
  b.report('sort:')    { n.times { array.sort              } }
  b.report('sort_by:') { n.times { array.sort_by(&:itself) } }
end

Resulting in:

              user     system      total        real
sort:     0.460000   0.010000   0.470000 (  0.480236)
sort_by:  3.600000   0.030000   3.630000 (  3.627871)

Increasing the array size:

array = (0..999).to_a.shuffle
Benchmark.bm(13) do |b|
  b.report('sort:')    { n.times { array.sort              } }
  b.report('sort_by:') { n.times { array.sort_by(&:itself) } }
end

Resulting in:

                    user     system      total        real
sort:           9.520000   0.120000   9.640000 (  9.659246)
sort_by:       53.530000   0.720000  54.250000 ( 54.321285)

4 Comments

I have a solution, which seems to perform better - is it really better or fluke?
Told you mine was in N^2 time :P
If you only care about performance, sort can further be improved to sort_by(&:itself).
sort will run faster than sort_by when comparing basic objects like numbers. See the added benchmark.
1
new_data = data.each_with_index.reject do |value,index|
  indexes.include? index
end.map(&:first)

New answer that actually works this time - it runs in O(n^2) and I don't see a way of doing it without iterating over indexes at all.

Comments

0

Here is my solution:

data = ['a','b','c','a','b','c','a','b','c']
indexes = [2,5,8]

updated_data = data.dup
indexes.each { |i| updated_data[i] = nil}
updated_data.compact!
p updated_data # Prints ["a", "b", "a", "b", "a", "b"]

As far as benchmark goes, using the Tin Man's code, it seems to perform the best. Not sure whether it has something to do with small size of indexes array.

                    user     system      total        real
ttm:            0.125000   0.000000   0.125000 (  0.113075)
devon_parsons   0.484000   0.000000   0.484000 (  0.491327)
arup_rakshit    0.219000   0.000000   0.219000 (  0.221149)
sawa            0.250000   0.000000   0.250000 (  0.253168)
wandmaker       0.094000   0.016000   0.110000 (  0.095063)

# Run 2 with larger data
                    user     system      total        real
ttm:            0.422000   0.188000   0.610000 (  0.596413)
devon_parsons  39.328000   0.000000  39.328000 ( 39.489394)
arup_rakshit   10.078000   0.562000  10.640000 ( 10.644099)
sawa           10.219000   0.110000  10.329000 ( 10.328250)
wandmaker       0.359000   0.062000   0.421000 (  0.423282)

2 Comments

What if the OPs array contained nils that were significant? You'd be removing data not specified by indexes. You need code to guard against that.
@theTinMan Yeah, that makes sense.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.