Split string by separator
One frequent part of text processing, especially in Advent of Code, is to split a string into substrings based on a separator. For example, in 2015 Day 2, we need to find the length, width and height from an input string where the dimensions are separated by x
, e.g. 2x3x4
.
Most languages have this functionality as part of their standard library, e.g. PHP has a function called explode and Go has strings.Split.
What about the C standard library? The closest is strtok, which has the following signature:
char *strtok(char *str, const char *delim)
From the man page:
The strtok() function breaks a string into a sequence of zero or more nonempty tokens. On the first call to strtok(), the string to be parsed should be specified in str. In each subsequent call that should parse the same string, str must be NULL.
However, this function has a number of issues:
- It returns NULL, instead of the empty string, if the string is empty or consists entirely of delimiters.
- It modifies its first argument, i.e. the string to be split, so you need to operate on a copy of the string.
- It cannot be used on constant strings, which makes testing harder.
- It is not thread-safe (strtok_r is thread-safe but not part of the C standard library).
Third party libraries do offer this functionality, but for the purposes of challenges such as Advent of Code we are restricting ourselves to the standard library. If we didn't have this restriction, we could use strsplit from Glib, which supports UTF-8 and works in a similar way to explode in PHP.
For the purposes of this article, we're going to simplify the problem with the following constraints:
- The input will be an array of characters all within the ASCII character set.
- Only a single character can be used as the separator.
- Only a single substring will be returned.