How to split string in Lua
Simple case¶
There are numerous examples available online that demonstrate how to split a string in the Lua language. Most of them are very specific and do not cover generic and edge cases.
Like this one:
---A very dumb split string function.
---@param str string
---@param sep? string
---@return string[]
local function split(str, sep)
sep = sep or "%s"
local t = {}
for s in string.gmatch(str, "([^" .. sep .. "]+)") do
t[#t + 1] = s
end
return t
end
Probably in most cases it is enough. The advantage of this version is that it is very performant. But if you need a more common solution with a more complex pattern for splitting, even considering UTF-8 encoding, this example above will not work.
Common solution¶
I would like to introduce the function that covers all edge cases. Of course, it is not such performant as I would like to have. The resЛult is identical to JavaScript’s String.prototype.split()
.
---Generic split function for splitting the string, taking into account
---UTF-8 encoding.
---@param str string Input string.
---@param sep? string Separator pattern or string (default empty string).
---@param n? number Number of splits: if less than zero, then all substrings are returned.
---If 0 empty table is returned.
---@param offset? number UTF-8 bytes offset (default 1)
---@param plain? boolean Turns off the pattern matching facilities.
---@return string[]
local function split(str, sep, n, offset, plain)
sep = sep or ""
offset = offset or 1
n = n or -1
---Result value
---@type string[]
local t = {}
if n == 0 then
return t
end
local len = utf8.len(str)
-- If empty string, then return table with single element containing empty string.
if len == 0 then
t[#t + 1] = ""
return t
end
local i = 1
local start = 1
while true do
local sepBegin, sepEnd = str:find(sep, start, plain)
if not sepBegin then
t[#t + 1] = str:sub(start)
break
elseif sepEnd < sepBegin then
-- If empty separator, then explode string considering UTF8
t[#t + 1] = str:sub(
utf8.offset(str, start),
utf8.offset(str, sepBegin + offset) - offset
)
if sepBegin < len then
start = sepBegin + 1
else
break
end
else
if sepBegin > start then
t[#t + 1] = str:sub(start, sepBegin - offset)
else
t[#t + 1] = ""
end
start = sepEnd + offset
end
if n == i then
break
end
i = i + 1
end
return t
end
Testing common solution¶
Tests are performed using Laura testing library.
local laura = require("laura")
local expect = laura.expect
local describe = laura.describe
local it = laura.it
local split = require("./split")
describe("function split()", function()
local testPairs = {
{ "", "", { "" } },
{ "a,b,c", "def", { "a,b,c" } },
{ "a,b,c", ",", { "a", "b", "c" } },
{ " xyz ", "", { " ", "x", "y", "z", " " } },
{ "abc def", "", { "a", "b", "c", " ", "d", "e", "f" } },
{ "абв где", "", { "а", "б", "в", " ", "г", "д", "е" } },
{ " a b c", " ", { "", "a", "b", "c" } },
{ "Hello Mike, Hello Jane", "Hello", { "", " Mike, ", " Jane" } },
{
"a man a plan a canal panama",
"a ",
{ "", "man ", "plan ", "canal panama" },
},
{ "Миру - Мир!", "Мир", { "", "у - ", "!" } },
{
"月は明るく輝いているa",
"",
{
"月",
"は",
"明",
"る",
"く",
"輝",
"い",
"て",
"い",
"る",
"a",
},
},
{
"hello,world.and.dots",
"[.,]",
{ "hello", "world", "and", "dots" },
},
{
"/home/user/config",
"[\\/]",
{ "", "home", "user", "config" },
},
{ "===", "=", { "", "", "", "" } },
}
for _, pair in ipairs(testPairs) do
local name = string.format(
'should split "%s" with seprator "%s"',
pair[1],
pair[2]
)
it(name, function()
expect(split(pair[1], pair[2])).toDeepEqual(pair[3])
end)
end
local testPairsN = {
{ "hello world", " ", 0, {} },
{ "hello world", " ", -1, { "hello", "world" } },
{ "hello world", " ", -999, { "hello", "world" } },
{ "a,b,c,d", ",", 2, { "a", "b" } },
{ "a,b,c,d", ",", 1, { "a" } },
{ "a,b,c,d", ",", 49, { "a", "b", "c", "d" } },
}
for _, pair in ipairs(testPairsN) do
local name = string.format(
'should split "%s" %d times with seprator "%s"',
pair[1],
pair[3],
pair[2]
)
it(name, function()
expect(split(pair[1], pair[2], pair[3])).toDeepEqual(pair[4])
end)
end
end)
--[[
SUMMARY
20 of 20 passing
0 failing
0 skipping
time: ~13ms / mem: 302.93KB @ 2025-02-18 15:41:14
pass
--]]
Feedback
For feedback, please check the contacts section. Before writing, please specify where you came from and who you are. Sometimes spammers go insane. Thank you in advance for your understanding.