← Back to the index page

How to split string in Lua

Simple case

There are numerous examples available online that demonstrate how to split a string in the Lua language. Most of them are very specific and do not cover generic and edge cases.

Like this one:

---A very dumb split string function.
---@param str string
---@param sep? string
---@return string[]
local function split(str, sep)
    sep = sep or "%s"
    local t = {}
    for s in string.gmatch(str, "([^" .. sep .. "]+)") do
        t[#t + 1] = s
    end
    return t
end

Probably in most cases it is enough. The advantage of this version is that it is very performant. But if you need a more common solution with a more complex pattern for splitting, even considering UTF-8 encoding, this example above will not work.

Common solution

I would like to introduce the function that covers all edge cases. Of course, it is not such performant as I would like to have. The resЛult is identical to JavaScript’s String.prototype.split().

---Generic split function for splitting the string, taking into account
---UTF-8 encoding.
---@param str string Input string.
---@param sep? string Separator pattern or string (default empty string).
---@param n? number Number of splits: if less than zero, then all substrings are returned.
---If 0 empty table is returned.
---@param offset? number UTF-8 bytes offset (default 1)
---@param plain? boolean Turns off the pattern matching facilities.
---@return string[]
local function split(str, sep, n, offset, plain)
    sep = sep or ""
    offset = offset or 1
    n = n or -1

    ---Result value
    ---@type string[]
    local t = {}

    if n == 0 then
        return t
    end

    local len = utf8.len(str)
    -- If empty string, then return table with single element containing empty string.
    if len == 0 then
        t[#t + 1] = ""
        return t
    end

    local i = 1
    local start = 1
    while true do
        local sepBegin, sepEnd = str:find(sep, start, plain)
        if not sepBegin then
            t[#t + 1] = str:sub(start)
            break
        elseif sepEnd < sepBegin then
            -- If empty separator, then explode string considering UTF8
            t[#t + 1] = str:sub(
                    utf8.offset(str, start),
                    utf8.offset(str, sepBegin + offset) - offset
            )
            if sepBegin < len then
                start = sepBegin + 1
            else
                break
            end
        else
            if sepBegin > start then
                t[#t + 1] = str:sub(start, sepBegin - offset)
            else
                t[#t + 1] = ""
            end
            start = sepEnd + offset
        end
        if n == i then
            break
        end
        i = i + 1
    end

    return t
end

Testing common solution

Tests are performed using Laura testing library.

local laura = require("laura")
local expect = laura.expect
local describe = laura.describe
local it = laura.it
local split = require("./split")

describe("function split()", function()
    local testPairs = {
        { "", "", { "" } },
        { "a,b,c", "def", { "a,b,c" } },
        { "a,b,c", ",", { "a", "b", "c" } },
        { " xyz ", "", { " ", "x", "y", "z", " " } },
        { "abc def", "", { "a", "b", "c", " ", "d", "e", "f" } },
        { "абв где", "", { "а", "б", "в", " ", "г", "д", "е" } },
        { " a b c", " ", { "", "a", "b", "c" } },
        { "Hello Mike, Hello Jane", "Hello", { "", " Mike, ", " Jane" } },
        {
            "a man a plan a canal panama",
            "a ",
            { "", "man ", "plan ", "canal panama" },
        },
        { "Миру - Мир!", "Мир", { "", "у - ", "!" } },
        {
            "月は明るく輝いているa",
            "",
            {
                "月",
                "は",
                "明",
                "る",
                "く",
                "輝",
                "い",
                "て",
                "い",
                "る",
                "a",
            },
        },
        {
            "hello,world.and.dots",
            "[.,]",
            { "hello", "world", "and", "dots" },
        },
        {
            "/home/user/config",
            "[\\/]",
            { "", "home", "user", "config" },
        },
        { "===", "=", { "", "", "", "" } },
    }

    for _, pair in ipairs(testPairs) do
        local name = string.format(
            'should split "%s" with seprator "%s"',
            pair[1],
            pair[2]
        )
        it(name, function()
            expect(split(pair[1], pair[2])).toDeepEqual(pair[3])
        end)
    end

    local testPairsN = {
        { "hello world", " ", 0, {} },
        { "hello world", " ", -1, { "hello", "world" } },
        { "hello world", " ", -999, { "hello", "world" } },
        { "a,b,c,d", ",", 2, { "a", "b" } },
        { "a,b,c,d", ",", 1, { "a" } },
        { "a,b,c,d", ",", 49, { "a", "b", "c", "d" } },
    }

    for _, pair in ipairs(testPairsN) do
        local name = string.format(
            'should split "%s" %d times with seprator "%s"',
            pair[1],
            pair[3],
            pair[2]
        )
        it(name, function()
            expect(split(pair[1], pair[2], pair[3])).toDeepEqual(pair[4])
        end)
    end
end)

--[[
SUMMARY
20 of 20 passing
0 failing
0 skipping

time: ~13ms / mem: 302.93KB @ 2025-02-18 15:41:14
pass
--]]

Feedback

For feedback, please check the contacts section. Before writing, please specify where you came from and who you are. Sometimes spammers go insane. Thank you in advance for your understanding.

← Back to the index page