3

Intro

  • I wrote an answer to Expand records for each month between two dates. A part of my solution was to stack integer sequences from a column of integers. There were only 4 integers in the column, so I opted for the 'straightforward' DROP/REDUCE/VSTACK combo. Later, when testing my formula with 10k integers (stacks), as you can probably guess, it became unbearable to wait for the result. Is there a way to generate the result more efficiently?

The Task

  • I have a list (column) of integers in A2:A4. For each next integer, I want to generate a sequence of the size of the integer and stack it below the previously generated one.
Size Result
2 1
5 2
3 1
2
3
4
5
1
2
3
  • I'm currently using the following (slow) formula:

    =LET(data,A2:A4,DROP(REDUCE("",SEQUENCE(ROWS(data)),LAMBDA(rr,r,
        VSTACK(rr,SEQUENCE(INDEX(data,r))))),1))
    
  • You could generate the large dataset with the formula =RANDARRAY(10000,,1,9,1), then copy/paste values and replace A2:A4 with A2:A10001 in your formula.

  • I'm primarily interested in a formula for Excel 365 but solutions for Legacy Excel, Power Query, or VBA might be useful to the community.

6
  • 1
    Shouldn't this work for you? =LET(_a, A2:A4, _b, SEQUENCE(, MAX(_a)), TOCOL(IF(_b <= _a, _b, 0/0), 3))) bt I think PQ is better here Commented Sep 23 at 12:24
  • 1
    @MayukhBhattacharya It seems that's it. It runs in less than a second on 10k rows. Post it as an answer and add some explanations, e.g., about 0/0. Commented Sep 23 at 12:40
  • 1
    But I found one problem, it is returning #SPILL error for 10k rows of data, not sure why! even though the resulting output is 40K. this is my sample size: =RANDARRAY(10000, , 2, 6, 1) can you try and let me know if its working for you or not Commented Sep 23 at 12:45
  • 1
    @MayukhBhattacharya You need to paste as values. It won't work with the RANDARRAY formula. Commented Sep 23 at 12:48
  • 1
    Oh Gotcha, i understood, ok wait let me try with an array, my question, was whether it was working with the array, thinks for the tip 💡go team# Commented Sep 23 at 12:51

3 Answers 3

6

So, here is one way using Excel Formula:

=LET(_a, A2#, _b, SEQUENCE(, MAX(_a)), TOCOL(IF(_b <= _a, _b, 0/0), 2))

  • LET() function helps to define variables with the formula, so it can read clearly
  • Variable _a is assigned to the array A2#
  • Variable _b is the sequence of numbers to its the max value of _a
  • Now, using IF() logic to determine and compares each element in the sequence with each value in _a, so, when _b less than equal to _a it returns the variable _b else it returns a #DIV/0! error
  • To ignore the error and convert the matrix into a single column one can use TOCOL() function, the parameter 2 or 3 can be used which ignores the error or if any empty. Thus giving the requisite output.

Also, in place of IF() --> IFS() can be used:

=LET(_a, A2#, _b, SEQUENCE(, MAX(_a)), TOCOL(IFS(_b <= _a, _b), 2))

Since we are dealing with numbers, we use in the following manner as well:

=LET(_a, A2#, _b, SEQUENCE(, MAX(_a)), TOCOL(_b/(_b <= _a), 2))
Sign up to request clarification or add additional context in comments.

2 Comments

Simple and straightforward. I used =RANDARRAY(190000,,1,10,1) to get over 1M resulting rows and put it to the test using your 3rd solution =LET(data,A2:A190001,s,SEQUENCE(,MAX(data)),TOCOL(s/(s<=data),2)) and it took close to a second. Then I increased the number in the first cell (the columns in the matrix) when it capped on number 282 with 6 s at well over 50M data points (most were of course errors) before Excel ran out of resources on number 283. Tom Sharpe's solution took 2s and wasn't affected by the number of columns since it just added the corresponding numbers to the result. Thx
Sounds Good, but here it is already there check all options!
3

In PQ, it is as simple as: {1..[Size]}

enter image description here

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlKK1YlWMgWTxkqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Size = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Size", Int64.Type}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Result", each {1..[Size]}),
    #"Expanded Result" = Table.ExpandListColumn(#"Added Custom", "Result")
in
    #"Expanded Result"

UPDATE START

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

UPDATE END

5 Comments

I tested it with 3 records. Then I added 9997 records, but it still considers only the first 3. I would guess it is the first line. Could you make it dynamic? Also, can the "Added Custom" step be generated using the interface and how?
I'll add steps so you can see in the interface how to do this.
When I right-click the table and select Get Data from Table/Range I get this Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content]. What did you select to acquire your 'Source' line? Does that have any benefits?
The original Binary.FromText is generated when you click enter data to create a table manually in PQ. Get Data from Table is probably your best option.
I used my first line and the rest of your code and got it to work. First I forgot the trailing comma, and it wouldn't work. Thanks for your input. I really appreciate it since I'm such a noob in Power Query.
2

A bit late to the party, here's another method which I recall from some early experiments with Google Sheets where you could generate running totals using a trick with Sumif.

The idea is that you generate one long sequence running from 0 to N-1, where N is the sum of the original data. You then generate running totals of the original data in the usual way using Scan (also starting from zero). Finally use Xlookup using 'next lowest' in binary mode to lookup the current value of the sequence in the running totals, and subtract the result from that current value to get the answer.

=LET(data,B2:B10001, sum,SUM(data), rtotals, VSTACK(0,SCAN(0,data,LAMBDA(a,c,a+c))), seq,SEQUENCE(sum,1,0), seq+1-XLOOKUP(seq,rtotals,rtotals,,-1,2))

Test data Output
5 1
5 2
2 3
6 4
3 5
5 1
2 2
6 3
3 4
3 5
5 1
6 2
4 1
6 2
3 3
2 4
4 5
5 6
4 1

I think you could do this in legacy Excel but you would need a helper column for the running totals (the standard method pre-lambda involved 2d arrays and was expensive in terms of space). Vlookup or index/match should still work. However, it's difficult to say if it would be useful if you wanted to set something up as part of a larger array formula.

3 Comments

Thanks for stopping by. "Another king emerges!". What a great idea—this is pure gold.
Thank you! I always think one's rep is a fairly accurate indicator of one's standing though - about 61% of a VBasic2008!
Complicated and not quite straightforward. That's what a jewel looks like. I used =RANDARRAY(190000,,1,10,1) to get over 1M resulting rows to put it to the test using =LET(data,A2:A190001,s,SEQUENCE(SUM(data)),t,VSTACK(1,SCAN(1,data,LAMBDA(sr,r,sr+r))),s-XLOOKUP(s,t,t,,-1,2)+1) and it took close to two seconds. Then I increased the number in the first cell when it wasn't considerably affected, just returning a few more rows. Mayukh's solution took only 1 s and was seriously affected by the increased number of columns since it creates a matrix. Thx

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.