0

I am using Nginx + Lua (OpenResty, LuaJIT), and I did some performance tests on various loops.

local ngx_log = ngx.log
-- https://openresty-reference.readthedocs.io/en/latest/Lua_Nginx_API/#nginx-log-level-constants
local ngx_LOG_TYPE = ngx.STDERR


local N=1e8

t0=os.clock()
a = 0
while a < N do
    a = a + 1
end
t1=os.clock()-t0
ngx_log(ngx_LOG_TYPE,"While Global " .. t1 .. " " .. math.floor(t1/t1*100+0.5))

t0=os.clock()
local a = 0
while a < N do
    a = a + 1
end
t2=os.clock()-t0
ngx_log(ngx_LOG_TYPE,"While Local " .. t2 .. " " .. math.floor(t1/t2*100+0.5))

t0=os.clock()
b = 0
for i=1,N do
    b = b + 1
end
t3=os.clock()-t0
ngx_log(ngx_LOG_TYPE,"For Global " .. t3 .. " " .. math.floor(t1/t3*100+0.5))

t0=os.clock()
local b = 0
for i=1,N do
    b = b + 1
end
t4=os.clock()-t0
ngx_log(ngx_LOG_TYPE,"For Local " .. t4 .. " " .. math.floor(t1/t4*100+0.5))

Here is the output:

[lua] test.lua:14: While Global 0.048999999999999 100, client: 127.0.0.1, server: localhost, request: "GET /test.lua HTTP/1.1", host: "localhost"
[lua] test.lua:22: While Local 0.030000000000001 163, client: 127.0.0.1, server: localhost, request: "GET /test.lua HTTP/1.1", host: "localhost"
[lua] test.lua:30: For Global 0.057000000000002 86, client: 127.0.0.1, server: localhost, request: "GET /test.lua HTTP/1.1", host: "localhost"
[lua] test.lua:38: For Local 0.036000000000001 136, client: 127.0.0.1, server: localhost, request: "GET /test.lua HTTP/1.1", host: "localhost"

They execute very fast, but the while loops are a tiny fraction faster than the for loops.

Should I change all my code to use while loops instead of for loops?

3
  • Did you repeat the test about 100000x and average the results? Did you know that os.clock() is not a safe/repeatable way to make microsecond measurements? Commented Jul 12 at 15:58
  • @WarrenP What would you use instead of os.clock ? Commented Jul 12 at 16:04
  • LUA is usually used as an extensible scripting language. I would find a real high resolution timer for LUA and use that, one that uses the operating system’s high resolution tick counters, not the wall time, which moves when synchronized over the internet, and which may follow utc or local time zones. LUA being embedded usually means you need to find a way that works with YOUR LUA. Ie devforum.roblox.com/t/lua-performance-profiling-api/28934 Commented Jul 15 at 15:28

1 Answer 1

3

Performance profiling and optimization are specific to a given process. As is, there is almost no point in testing such arbitrary elements of code, as they do not necessarily reflect real world use cases. Performance testing should be reserved for when a real bottleneck is observed, and actual alternatives that produce the same result can be developed and re-tested.

This becomes apparent when you consider that the desired result in the examples shown is an integer, which is achieved here by incrementing a variable some number of times. The comparison becomes rather unfair when you consider that the for loops have to do this incrementation twice.

In other words:

The for loops have the overhead of making available the local control value of i, containing the current iteration. This is in addition to the variable being incremented in the body of the loop.

The while loops use the variable being incremented in the body of the loop as the control, reducing the amount of instructions required.

Faster than any of these loops is to just write a = 1e8 - so surely using constants is faster than using loops, thus we should always "use constants instead of loops"? Such a generalization is not particularly useful as the use cases obviously differ. Generalizing a performance preference between for and while is equally problematic as their use cases can also differ.

When you do find a certain construct to be more performant than another for the same use case (i.e., they produce the same result), then the only other question to ask yourself is if any degradation in code quality is worth the performance bump. If so, then optimize away.

For example, if you told me the difference in performance between these two constructs below was measured somewhere in the range of micro- to nanoseconds, unless performance was unbelievably critical, I am choosing the one that's vastly easier to understand and maintain, regardless of its relative performance.

for key, value in pairs(t) do
    print(key, value)
end                           
                              
----

do
    local key = next(t)

    while key do
        local value = t[key]
        print(key, value)
        key = next(t, key)        
    end                           
end  

TL;DR: Do your best to use the correct construct for a given use case and save performance profiling for real world problems.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.