Lua 5.3: strange behavior with table index

51 views Asked by At

so I was playing around with tables a bit and found this strange behavior:

folder1 = {1, 2, 3}
table.insert(folder1, "test") --4
folder1[5] = "test2"
print("#folder1: " .. #folder1)  --yields 5

folder2 = {1, 2, 3}
table.insert(folder2, "test") --4
folder2[6] = "test2"
print("#folder2: " .. #folder2)  --yields 6

folder3 = {1, 2, 3}
table.insert(folder3, "test") --4
folder3[7] = "test2"
print("#folder3: " .. #folder3)  --yields 4

the first case is correct since it's an array with no gap. The third case is understandable since #table returns the last index before a nil index. But what happens in case two? I was expecting to receive the index 4 here.

Also it only seems to happen with a table.insert before. If I remove the insert command, the output is as expected again:

folder1 = {1, 2, 3}
folder1[4] = "test2"
print("#folder1: " .. #folder1)  --yields 4

folder2 = {1, 2, 3}
folder2[5] = "test2"
print("#folder2: " .. #folder2)  --yields 3

folder3 = {1, 2, 3}
folder3[6] = "test2"
print("#folder3: " .. #folder3)  --yields 3

Can someone explain this to me?

I was reading into the Lua documentation. the function rawlen(table) was looking like the function that would return the true size of the table, but I figured out, that rawlen also just returns the last index before a non-nil index.

1

There are 1 answers

0
Luatic On

Let's unpack what's happening here.

First example: After the operations, folder1 = {1, 2, 3, "test", 5}.

This is a proper sequence, all entries from key 1 to 5 are non-nil, so the length is 5, just as expected.

Second example: After the operations, folder2 = {1, 2, 3, "test", nil, "test2"}. This is not a proper "sequence", it has holes. Your explanation is

The third case is understandable since #table returns the last index before a nil index.

but this definition is wrong; #table returns any index i such that t[i] ~= nil and t[i+1] == nil. It is not guaranteed to be the last or the first index, and in practice it often won't be for performance reasons 1.

In the second example, both 4 and 6 are valid values for #folder2. You are getting 6, so everything is fine here.

Third example: After the operations, folder3 = {1, 2, 3, "test", nil, nil, "test2"}. Both 4 and 7 are valid values for #folder3. The larger gap leads PUC Lua to evaluate #folder3 to 7, which has to do with the implementation details. 1

Your second set of examples can be explained analogeously:

  • #{1, 2, 3, "test2"} is 4, as expected
  • #{1, 2, 3, nil, "test2"} may be either 3 or 5; you get 3, which is fine
  • #{1, 2, 3, nil, nil, "test2"} may be either 3 or 6; you get ``, which is fine

1 Lua tables are implemented using an array and a hash part. The "array" part entries won't always land in the array part, but Lua wants {[3] = 3, [2] = 2, [1] = 1} and {1, 2, 3} to be the same from the programmer's point of view. Thus Lua needs to define #t for hash tables; this means it has to find the length somehow, it can't just rely on an internal length field. Even for arrays this won't work, since the programmer may set entries to nil in the middle of the array, and later on, at the end of the array.

If Lua had to return the first or last boundary index, it would either have to do additional bookkeeping with significant overhead, or it would have to use a linear search. Both aren't viable options.

What Lua does instead is quite clever: It uses a binary search to find any boundary: If t[i] is nil, you know that there will be a boundary j < i such that t[j + 1] is a nil. If t[i] is not nil, you know that there will be a boundary j >= i such that t[j + 1] is nil - you can halve the search space for the boundary in each step. To find an initial upper bound on i, you can just check powers of two i = 2^n until you hit a nil. Furthermore, you can use the max safe integer as an upper bound.

Here is a reimplementation of the table length operator in pure Lua I did a while ago, with some tests included to give you an idea:

local rawget = rawget
local maxint = math.maxinteger or 2^53
function len(tbl)
    if tbl[1] == nil then
        return 0
    end
    if tbl[maxint] then
        return maxint
    end
    -- Quickly ramp numbers up. Could start with high = maxint.
    local low, high = 1, 1
    while rawget(tbl, high) do
        low, high = high, high * 2
        if low > high then -- overflow
            high = maxint
            break
        end
    end
    while low < high do
        local mid = math.ceil((low + high) / 2)
        if tbl[mid] == nil then
            high = mid - 1
        else
            low = mid
        end
    end
    return low
end

assert(len{} == 0)
local t = {}; for i = 1, 1e6 do t[i] = i end; assert(len(t) == 1e6)
print(len{1, 2, 3, 4, 5, 6})
print(len{1, 2, 3, nil, 5, 6})
t = {}
for i = 0, 500 do
    t[2^i] = i
end
print(len(t))

Lua can get tighter upper bounds using less time since it knows the capacity of the hash and array part and can use that as an upper bound; it won't have to do the "try powers of two" trick.