0

So, the BigQuery Scripting feature came out and I thought of giving it a try.

I understand it is still in beta and being tested. However, to get a feel of it, I ran this small simple loop of 20k cycles a bunch of times, and each time it took between 5 and 10 minutes (and sometimes more) to complete. Sometimes, I just had to cancel the job because it was taking forever.

declare n int64;
declare i int64;
declare k float64;

set i = 0;
set n = 20000;
set k = rand();

loop
  set i = i + 1;
  if i >= n then leave;
  else set k = k*rand();
  end if;
end loop;

select k;

I am wondering if I am doing anything incorrect here, or it's just that it is not as performant yet.

NOTE: Here is one of the job ids: music-178807:US.bquxjob_366fc627_16da33c0ee1

4
  • Can you try and be a little more specific? Using wording like "several minutes" and "forever" can be ambiguous. Also, some BigQuery job ids would be helpful for the Google engineers no doubt :) Commented Oct 6, 2019 at 22:58
  • Okay. So, it has been taking variable times between 5 and 10 minutes (and sometimes more). The words like "forever/slow" are actually comparative to the small scale of script :-) Commented Oct 6, 2019 at 23:05
  • Understood, but remember that engineers like specific details :) Can you also provide job ids? Commented Oct 6, 2019 at 23:08
  • Sure. I also added a job id for curious engineers :-) Commented Oct 6, 2019 at 23:19

1 Answer 1

6

Scripting in BigQuery is intentionally not nearly as fast as running this type of code in some other language. The expectation is that people will want to use scripting to tie together multiple queries, not to multiply numbers in a loop. Notice also that there is no additional cost for scripting, whereas high performance would probably have to come with a price tag.

Sign up to request clarification or add additional context in comments.

2 Comments

I understand what you are trying to convey, and it makes sense to some degree. However, if a user is scanning a few million rows then they are already paying the price/byte for scanning. Now, if they add scripting to do some sort of iterative stuff over those resulting rows, and it blows up in terms of time, then its not so nice. This is the point of my question, I guess.
Queries are still the best way to do set-based processing. If you find yourself running a loop over thousands of individual rows, there's probably a better approach.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.