2

As I understand Lua, an array is just a table where the keys start at 1 and increment sequentially. However, there are features within Lua to manipulate arrays in the traditional sense (e.g. ipairs, square-bracket syntax, etc.). So Lua does "support" the concept of arrays.

In Lua, you could initialize an array like this:

-- Code Example "A"
a={}
a[1] = "abc"
a[2] = "xyz"
a[3] = "foo"

That works, but is sometimes abbreviated as follows:

-- Code Example "B"
a={ "abc", "xyz", "foo" }

This also works and is equivalent to example "A". Furthermore, Lua also provides a way to initialize an an array by specifying each index, as follows:

-- Code Example "C"
a={
    [1] = "abc",
    [2] = "xyz",
    [3] = "foo"
}

The above syntax should have identical semantics to examples "A" and "B". However, example "C" suffers from an insidious problem, which examples "A" and "B" do not suffer from. In particular, given the variable "a" from example "C" , you can't use the normal Lua design-pattern to determine that "a" can be treated as an array (as opposed to a table). The well-known Lua pattern for determining if a table is an array is as follows:

table_is_array = (type(a)== "table") and (#a>0) and (next(a,#a)==nil)

In particular, the expression next(a,#a) returns a value of nil for code examples "A" and "B" (which is correct to indicate an array), while the expression next(a,#a) returns a value of 1 for code example "C" (which is incorrect for an array). The question is, why are semantically identical array initialization syntaxes producing different results with the Lua "next" function? Clearly all the code examples "A", "B", and "C" are all doing exactly the same thing, but yet they produce different results with the Lua "next" function. Worse, the differing results interfere with the most common way to determine if a table can be treated as an array in Lua.

So to recap:

a={}
a[1] = "abc"
a[2] = "xyz"
a[3] = "foo"

t={
    [1] = "abc",
    [2] = "xyz",
    [3] = "foo"
}

print("next(a,#a) = " .. tostring(next(a,#a)))  -- displays nil (correct)
print("next(t,#t) = " .. tostring(next(t,#t)))  -- displays 1 (incorrect)

I need to initialize the table as in example C because in my code, the index values are in variables, similar to this example:

i_abc=1
i_xyz=2
i_foo=3
v_abc="abc"
v_xyz="xyz"
v_foo="foo"
-- Now put the above data into an array
a={
    [i_abc] = v_abc,
    [i_xyz] = v_xyz,
    [i_foo] = v_foo
}

Why does the array-checking pattern fail for example C?

3
  • 1
    Where did you get this "well-known Lua pattern" from? Commented Jul 2 at 4:56
  • 1
    I see my question has been edited so the preemptive answers at "why are you doing this?" have been deleted. Nice. Now responders are asking "why are you doing this". This is why I love this platform so much! I shall just accept an answer and move along with my day. Commented Jul 2 at 14:39
  • Feel free to add it back. That was in the edit summary. Commented Jul 2 at 16:06

3 Answers 3

5

From Lua 5.4, §3.4.9 – Table Constructors, the syntax (in EBNF) is described as:

tableconstructor ::= ‘{’ [fieldlist] ‘}’
fieldlist ::= field {fieldsep field} [fieldsep]
field ::= ‘[’ exp ‘]’ ‘=’ exp | Name ‘=’ exp | exp
fieldsep ::= ‘,’ | ‘;’

The important bits of text from that section are (emphasis mine):

Each field of the form [exp1] = exp2 adds to the new table an entry with key exp1 and value exp2. A field of the form name = exp is equivalent to ["name"] = exp. Fields of the form exp are equivalent to [i] = exp, where i are consecutive integers starting with 1; fields in the other formats do not affect this counting.
[ ... ]
The order of the assignments in a constructor is undefined.

So only the plain syntax of { 'a', 'b', 'c' } influences the internal count of the array-like portion of the table. In the explicit syntax of { [1] = 'a', [2] = 'b', [3] = 'c' }, the keys, despite being densely incrementing integers in this case, are treated purely as associative, and the order in which they are added to the table is not defined by the specification.

The order of operations in both your A and B examples are understood to be fixed, and clearly align with your implementation of Lua and your (mis)use of next. These results are misleading.

The real problem comes from the use of next (table [, index]):

The order in which the indices are enumerated is not specified, even for numeric indices. (To traverse a table in numerical order, use a numerical for.)

The "well-known Lua pattern" you have described for determining if a table is an array is problematic (falls apart based on implementation, and the presence of multiple borders). I would quickly argue that, by default, all Lua tables are arrays (although they may not be a sequence), and you can generally just treat them as such by iterating from index 1 until the first nil-value index (i.e., ipairs).

If you really need to be exact (reject sparse arrays or mixed key types), you can:

  • Check for an empty table with not next(t);

  • or check that an index of 1 contains a value with rawget(t, 1) ~= nil;

  • then, iterate through all key-value pairs (i.e., pairs) and confirm that all key types are numeric integers, and that the number of keys matches the border found by the length operator (#).

In summary, the behaviour of the table constructor syntax is not a bug in Lua.


Quick aside: it is hard not to level some criticism at your example, because it is difficult to understand how you would arrive at such a use-case. If you are trying to move data between languages, use an interchange format, like the previously mentioned JSON - there are pure Lua and C API implementations available.

If this is machine generated code, change the generation code.

Sign up to request clarification or add additional context in comments.

1 Comment

Sure we use JSON as an interchange format. JSON supports arrays and objects as distinct first-class entities. So when encoding a Lua value into JSON, the code needs to determine if the thing in Lua is an array or an object, and this is the issue that falls out. It is possible to put the burden on the Lua developer to create the JSON manually before handing it off, but the decision was to avoid that in this particular application.
1

this is an interesting question that can be explained by how Lua internally handles tables :

In your example with a and t, the key difference lies in how the table is initialized. When you initialize a using the syntax like a[1] = "abc", Lua treats the table as a sequential array and it maintaining numeric keys in order internally.

But when you initialize t using this explicit k-v pairs ({ [1] = "abc", [2] = "xyz", [3] = "foo"}), Lua will process the table as a generic hash table from the start, even though the keys are numeric and sequential in your code. The explicit key definitions override any assumption of sequential order, and Lua does not internally classify it as an array as a result.

This difference explains the results you're seeing in Lua pairs for-loops :

for k, v in pairs(a) do print(k) end
-- Output: 1 2 3 (keys are sequentially maintained)

for k, v in pairs(t) do print(k) end
-- Output: 3 1 2 (order is determined by internal hashing mechanics)

It's not really a bug but a "quirk" of how Lua treats tables internally.

1 Comment

This is also an implementation detail: This could absolutely be changed or "fixed" in a minor update one day if someone makes a case for it being a worthwhile optimisation. Tables having an array part is really just an optimisation in the first place, and a simpler implementation may well choose to just have all tables be hash maps.
0

I like to extend 'In Lua, you could initialize an array like this:'

Because for tables in Lua there are many functions already present in the table library and also tables in Lua are able to have this library associated as __index metamethod like strings have the string library.

-- Code Example D
-- Constructor
a = setmetatable({},  {__index = table})
-- Using metamethod __index (table functions)
a:insert("abc") -- 1
a:insert("xyz") -- 2
a:insert("foo") -- 3
-- Iterate over each entry (1-3) and return all as one single string with concat()
print(a:concat("\n"))
-- "\n" is the separator ;-)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.