Paul Waring

Freelance C Developer based in Manchester, UK

Split string by separator

One frequent part of text processing, especially in Advent of Code, is to split a string into substrings based on a separator. For example, in 2015 Day 2, we need to find the length, width and height from an input string where the dimensions are separated by x, e.g. 2x3x4.

Most languages have this functionality as part of their standard library, e.g. PHP has a function called explode and Go has strings.Split.

What about the C standard library? The closest is strtok, which has the following signature:

char *strtok(char *str, const char *delim)

From the man page:

The strtok() function breaks a string into a sequence of zero or more nonempty tokens. On the first call to strtok(), the string to be parsed should be specified in str. In each subsequent call that should parse the same string, str must be NULL.

However, this function has a number of issues:

Third party libraries do offer this functionality, but for the purposes of challenges such as Advent of Code we are restricting ourselves to the standard library. If we didn't have this restriction, we could use strsplit from Glib, which supports UTF-8 and works in a similar way to explode in PHP.

For the purposes of this article, we're going to simplify the problem with the following constraints: