Chicken Soup for the Grug-pilled C Programmer Soul

2025/08/29


c is my daily driver

Almost every day, I write C. I am not an embedded developer. I exclusively use regular computers (a Linux desktop and a MacBook Air) and write mostly regular code (small tools, throwaway scripts, GPU experiments, video games).

Unless I have a good reason otherwise, I do it exclusively in C. I legimately find it to be the best tool for most things I do.

Out of the box, C is a (mostly) well designed language with perhaps the worst standard library and tooling of any commonly used language. It’s truly bad. But those things are not C; they’re just the environment. And even though some of C’s warts are baked in too deeply, the vast majority of them are not.

In other words, C is fixable. It is a beautiful, modern language resting primordial and soupish in an extremely thick cocoon. I’m going to show you how to replace things which we may not think of as replaceable to make C a good language for any program, and we’re going to start with strings.

shill

If you read this far, please wishlist Deep Copy on Steam. It is a handpainted point and click; a mixture between Disco Elysium, Neal Stephenson, and Philip K. Dick. And it’s written in C.

string.h is a special kind of hell

Despite all of C’s shortcomings and sins, the worst among them is the standard library. Fundamentally, computers haven’t changed that much from 1970, but programming languages have changed a great deal, and this sense the C standard library is truly irrelevant. And within this pasture of footguns and insecure-by-default and hidden mutable state, the most insidious beast of them all also roams: string.h.

Fundamentally, the issue with C strings is that they are null terminated. I won’t pretend to know why this decision was made; I’ll guarantee that whoever made it was highly intelligent and operating under constraints which would seem alien to us today. But at some point since then, the pendulum crossed some threshold, and then kept swinging for thirty or so more years, and has left us now with a Very Bad Thing.

Null terminated strings mean you cannot:

Plus, of course, the unfathomable number of bugs and security issues that arise from a missing null terminator. Step one to modernizing C is to completely ditch null terminated strings in favor of the humble sp_str_t.

sp_str_t will set you free

typedef struct {
    const c8* data;
    u32 len;
} sp_str_t;

The basics of sp_str_t:

Look at how easy it is to manipulate strings once you have this building block:

sp_str_sub()

sp_str_t sp_str_sub(sp_str_t str, u32 index, u32 len) {
  sp_str_t substr = {
    .len = len,
    .data = str.data + index
  };
  SP_ASSERT(index + len <= str.len);
  return substr;
}

sp_str_trim()

sp_str_t sp_str_trim_right(sp_str_t str) {
  while (str.len) {
    c8 c = sp_str_back(str);

    switch (c) {
      case ' ':
      case '\t':
      case '\r':
      case '\n': {
        str.len--;
        break;
      }
      default: {
        return str;
      }
    }
  }

  return str;
}

sp_str_equal()

bool sp_str_equal(sp_str_t a, sp_str_t b) {
  if (a.len != b.len) return false;

  return sp_os_is_memory_equal(a.data, b.data, a.len);
}

sp_str_builder_t is the secret sauce

For more complicated string operations, you should use a string builder API. Any time you need to join strings, pad strings, serialize data to strings, you should use the string builder. A string builder is nothing more than a buffer with a size and a capacity. Here’s the basic API:

SP_API void     sp_str_builder_append(sp_str_builder_t* builder, sp_str_t str);
SP_API void     sp_str_builder_append_c8(sp_str_builder_t* builder, c8 c);
SP_API void     sp_str_builder_append_fmt(sp_str_builder_t* builder, sp_str_t fmt, ...);
SP_API void     sp_str_builder_new_line(sp_str_builder_t* builder);
SP_API sp_str_t sp_str_builder_write(sp_str_builder_t* builder);

Then, you can implement lots of useful stuff very easily:

sp_str_concat()

sp_str_t sp_str_concat(sp_str_t a, sp_str_t b) {
  return sp_format("{}{}", SP_FMT_STR(a), SP_FMT_STR(b));
}

sp_str_join()

sp_str_t sp_str_join(sp_str_t a, sp_str_t b, sp_str_t join) {
  return sp_format("{}{}{}", SP_FMT_STR(a), SP_FMT_STR(join), SP_FMT_STR(b));
}

sp_str_replace_c8()

sp_str_t sp_str_replace_c8(sp_str_t str, c8 from, c8 to) {
  sp_str_builder_t builder = SP_ZERO_INITIALIZE();

  for (u32 i = 0; i < str.len; i++) {
    c8 c = str.data[i];
    if (c == from) {
      sp_str_builder_append_c8(&builder, to);
    } else {
      sp_str_builder_append_c8(&builder, c);
    }
  }

  return sp_str_builder_write(&builder);
}

The backbone of complex string building is format strings. I prefer to use my own style of format strings, sp_format 2, which trades the compiler support of printf-style format strings for better syntax, support for colors, and easy custom formatters. But sprintf works just fine.

strings should be immutable

sp_str_builder_t is the only entry point in the library for mutable strings 2. Although that’s not quite the right way to think about it, because a string builder doesn’t keep a string; it keeps a mutable buffer which you manipulate and from which produce an immutable string. But that’s bullshit semantics.

Immutability is something I took from functional programming. It solves a ton of problems. When I pass a string into a function, I never have to worry about the state of that string when it returns. For a datatype whose main purpose in programs is, roughly, to be tweaked, trimmed, stripped, joined, and generally fucked with, that’s very useful.

what about allocations?

The problem with immutability is that it forces you to copy; to allocate memory. It’s pretty easy to write a C library with good ergonomics, but I often find the wheels fall apart when you try to have a sane, consistent strategy for memory allocation.

Unfortunately, pretty much everything that we want to do with string processing needs an allocation. Temporary buffers for intermediate results, or for final results, or for copies to maintain immutability. But nearly all of this memory is transient. You don’t care where it comes from, and you don’t care where it goes.

This is where RAII looks really nice. These problems just go away with automatic destruction of resources. But this, too, is a falsehood. A programming language is like a river. It defines a gradient of how difficult is to do so-and-so; a gradient of friction. And people by and large tend to follow that gradient. You tend to do things that are easy to do, and avoid things that are hard to do.

In C++, what’s easy to do is to use a standard heap allocator for everything and make lots and lots of small, one-off heap allocations and have RAII free them automatically. But this is nothing more than the gradient at work.

RAII is not the best way to make frequent small allocations with short lifetimes. But it is the least frictive, at least by default, in C++. We’re in C though, subject to no such gradient, and we can do better regardless. We’re missing one more piece for our string library.

use a global allocator

We’d prefer not to need a separate heap allocation for all those things. It’s slow, and it’s wasteful, and if we could avoid it then we could end up with a string library that is faster despite being immutable.

In my standard library, there is a thread-local context. It is just a global; calling it a context does not make it fancier than what it is. The context holds an allocator, and anything inside sp.h will use this allocator via sp_alloc() when it needs memory. An allocator can be as simple as a trivial wrapper around malloc() or VirtualAlloc() or as complex as a fully custom built allocator.

The problem of these small, frequent allocations then becomes very simple. All we have to do is use a cheap allocator for temporary memory; I use a bump allocator (one of many names for the same thing) which is just a pointer and an offset. It grabs a large block of memory on initialization, and when you allocate, it just increments the offset. Freeing just decrements the offset.

This isn’t a new idea, or an original idea, or even my idea. I ripped this straight from several of the programming folks that I like to follow. And it’s not a panacea, of course. But for this very common use case, it’s incredible.

miscellanaeia

operator overloading

Operator overloading is pretty nice, but writing a + b versus sp_str_join(a, b) is meaningless. For more complex operations, sp_str_builder_t and sp_format are superior APIs anyway.

literals

You have to wrap literals in a macro:

#define SP_STR_LIT(cstr) (sp_str_t) { .data = (cstr), .len = strlen(cstr) }

This is a little annoying, but also unimportant. For literal-heavy APIs (e.g. sp_format() – you almost always want to use a literal for the format string), just provide a function that takes a C string (e.g. sp_format_cstr()).

conversion

Yes, you have to copy sp_str_t into a null terminated buffer if you want to call most C libraries. No, it’s not a big deal. The bump allocator fixes this, too.

You could keep a u32 capacity in sp_str_t, and when allocating tack on an extra byte. When converting to a C string, check the size against the capacity and use that byte as a null terminator if you have it. If not (which is rare; just substrings, pretty much), then you copy. But this is unnecesary.

intrusive pointers

Some libraries (e.g. stb_ds.h) implement data structures by returning a pointer that has a header allocated before it in memory. Then, API functions can accept a regular pointer while still keeping metadata. This seems like a natural choice for strings, since it lets you unify APIs that take a sp_str_t and those that take a const char*.

Ultimately, I decided to say fuck it and skip this. It’s admittedly nice not to have to call sp_str_to_cstr() at API boundaries. But the problem becomes that you no longer know whether a given const char* is actually null terminated; is it a plain C string, or is it one of our strings? The only way to avoid this extreme footgun is to null terminate all strings. And we’re back where we started.

And we bid you good night!

That’s it. My C string code is pretty much exactly as ergonomic as, say, my Python string code, minus a few nice-to-have operators.



  1. I technically cheat this sometimes by adjusting the lengths of strings ↩︎

  2. This is a printf replacement that’s in the style of std::format; because sp.h is a single header library and thus compiled with your program, we can do some funny business to give some compile time guarantees (e.g. making sure what is passed to SP_FMT_* is of the correct type). Unfortunately, compile time checking of printf style format strings is baked into the compiler, so we can’t get around that. Even still, I find it very useful and a great ergonomic improvement. ↩︎ ↩︎