C concurrency pthreads multithreading mutex race condition guide

C Concurrency & pthreads: Write Blazing-Fast Multithreaded Programs in 2026

Back to C RoadmapC Programming Course • 50 Lessons

Table of Contents

What Are Threads?

Every program you have written so far runs as a single thread of execution — one instruction after another, top to bottom, no shortcuts. A thread is an independent path of execution within the same process. Multiple threads share the same memory space (global variables, heap, open files), but each thread gets its own stack and program counter.

Think of it this way: a process is a restaurant, and threads are the cooks working inside it. They share the same kitchen (memory), the same pantry (global data), and the same order queue. But each cook works on their own task simultaneously, and if they aren’t careful about grabbing the same knife at the same time, things go very wrong.

On modern multi-core CPUs, threads can genuinely run in parallel — not just fake it with time-slicing. That is the key to making your programs dramatically faster on computationally heavy workloads.

Why Use Concurrency?

There are three major reasons to reach for threads:

  1. Performance — Split a massive computation across 8 cores and finish in roughly one-eighth the time. Image processing, scientific simulations, and data crunching all benefit enormously.
  2. Responsiveness — A GUI application that does heavy computation on the main thread will freeze. Offload the work to a background thread and the interface stays snappy.
  3. Modeling real-world concurrency — Servers handle thousands of simultaneous connections. A web server that processes requests one at a time would be useless. Threads let you model naturally parallel problems in a natural way.

The tradeoff? Concurrency introduces an entirely new class of bugs — race conditions, deadlocks, and data corruption — that are notoriously difficult to reproduce and debug. This lesson will teach you both the power and the discipline required to use threads correctly.

Creating Threads with pthread_create

POSIX Threads (pthreads) is the standard threading API on Linux, macOS, and most Unix-like systems. To use it, include <pthread.h> and link with -lpthread. If you need a refresher on linking libraries, see our lesson on static and dynamic libraries.

The signature of pthread_create is:

int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
                   void *(*start_routine)(void *), void *arg);

That third parameter is a function pointer — the function the new thread will execute. It must accept a void * and return a void *. Here is the simplest possible threaded program:

#include <stdio.h>
#include <pthread.h>

void *say_hello(void *arg) {
    printf("Hello from the new thread!\n");
    return NULL;
}

int main(void) {
    pthread_t tid;

    if (pthread_create(&tid, NULL, say_hello, NULL) != 0) {
        perror("pthread_create failed");
        return 1;
    }

    pthread_join(tid, NULL);  /* Wait for thread to finish */
    printf("Thread finished. Back in main.\n");
    return 0;
}

Compile it with: gcc -o threads threads.c -lpthread

Without that pthread_join call, the main thread could exit before the new thread prints anything, terminating the entire process. Never forget to join or detach your threads.

Joining Threads with pthread_join

pthread_join blocks the calling thread until the specified thread terminates. It also lets you retrieve the thread’s return value:

void *result;
pthread_join(tid, &result);
/* result now holds whatever the thread returned */

If you do not need a thread’s return value and do not want to wait for it, you can detach it instead with pthread_detach(tid). A detached thread cleans up its own resources automatically when it exits. But be warned — you lose all ability to synchronize with it.

Here is an example launching multiple threads and joining them all:

#include <stdio.h>
#include <pthread.h>

#define NUM_THREADS 5

void *worker(void *arg) {
    int id = *(int *)arg;
    printf("Thread %d is running\n", id);
    return NULL;
}

int main(void) {
    pthread_t threads[NUM_THREADS];
    int ids[NUM_THREADS];

    for (int i = 0; i < NUM_THREADS; i++) {
        ids[i] = i;
        pthread_create(&threads[i], NULL, worker, &ids[i]);
    }

    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
    }

    printf("All threads completed.\n");
    return 0;
}

Notice we create a separate ids[i] for each thread. If you passed &i directly, all threads would share the same variable — a classic race condition bug.

Passing Arguments to Threads

The void *arg parameter lets you pass any data to a thread. For a single integer, you can cast it directly (though some consider this a hack). For multiple values, use a struct:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

typedef struct {
    int id;
    double value;
    char name[32];
} ThreadArgs;

void *process(void *arg) {
    ThreadArgs *args = (ThreadArgs *)arg;
    printf("Thread %d: processing %s (value=%.2f)\n",
           args->id, args->name, args->value);
    free(args);  /* Thread owns this memory */
    return NULL;
}

int main(void) {
    pthread_t tid;

    ThreadArgs *args = malloc(sizeof(ThreadArgs));
    args->id = 1;
    args->value = 3.14;
    snprintf(args->name, sizeof(args->name), "sensor_data");

    pthread_create(&tid, NULL, process, args);
    pthread_join(tid, NULL);
    return 0;
}

A critical rule: the argument data must outlive the thread that reads it. If you allocate it on the stack of a function that returns before the thread reads it, you get undefined behavior. Heap allocation with malloc is the safest approach — just make sure someone frees it.

Race Conditions: The Silent Killer

A race condition occurs when two or more threads access shared data simultaneously and at least one of them writes to it. The result depends on the unpredictable timing of thread scheduling. Here is a program that demonstrates the problem:

#include <stdio.h>
#include <pthread.h>

int counter = 0;  /* Shared by all threads */

void *increment(void *arg) {
    for (int i = 0; i < 1000000; i++) {
        counter++;  /* NOT atomic! */
    }
    return NULL;
}

int main(void) {
    pthread_t t1, t2;

    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);

    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    printf("Expected: 2000000\n");
    printf("Got:      %d\n", counter);
    return 0;
}

Run this multiple times and you will get a different wrong answer each time. The operation counter++ is actually three CPU instructions: load the value, add one, store the result. Two threads can load the same value, both add one, and both store the same result — losing an increment entirely. This is called a lost update.

Mutexes: Locking Down Shared Data

A mutex (mutual exclusion lock) ensures that only one thread can execute a critical section of code at a time. The pthread mutex API provides pthread_mutex_lock and pthread_mutex_unlock:

#include <stdio.h>
#include <pthread.h>

int counter = 0;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void *increment(void *arg) {
    for (int i = 0; i < 1000000; i++) {
        pthread_mutex_lock(&lock);
        counter++;
        pthread_mutex_unlock(&lock);
    }
    return NULL;
}

int main(void) {
    pthread_t t1, t2;

    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);

    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    printf("Expected: 2000000\n");
    printf("Got:      %d\n", counter);  /* Always 2000000 now */

    pthread_mutex_destroy(&lock);
    return 0;
}

Now the counter is correct every time. The mutex guarantees that while one thread is inside the lock-unlock section, the other thread waits. The tradeoff is speed — locking and unlocking a mutex millions of times adds overhead. The art of concurrent programming is minimizing the time spent holding locks while still protecting all shared data.

Key mutex rules:

  • Always unlock what you lock — no exceptions, even on error paths
  • Keep critical sections as short as possible
  • Never call unknown or complex functions while holding a lock
  • Always initialize before use and destroy when done

Condition Variables: Thread Communication

Mutexes protect data, but sometimes threads need to wait for a condition to become true. Busy-waiting (spinning in a loop checking a variable) wastes CPU. Condition variables let a thread sleep efficiently until another thread signals that something has changed:

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t  cond  = PTHREAD_COND_INITIALIZER;
int data_ready = 0;

/* Waiting thread */
pthread_mutex_lock(&mutex);
while (!data_ready) {              /* Always use a while loop, not if */
    pthread_cond_wait(&cond, &mutex);  /* Atomically unlocks mutex and sleeps */
}
/* data_ready is now true, mutex is held */
pthread_mutex_unlock(&mutex);

/* Signaling thread */
pthread_mutex_lock(&mutex);
data_ready = 1;
pthread_cond_signal(&cond);       /* Wake one waiting thread */
pthread_mutex_unlock(&mutex);

The while loop is essential, not optional. Condition variables can experience spurious wakeups — the thread wakes up even though nobody signaled it. The loop re-checks the actual condition and goes back to sleep if it was a false alarm.

Use pthread_cond_signal to wake one waiting thread, or pthread_cond_broadcast to wake all of them.

Deadlocks: When Threads Freeze Forever

A deadlock occurs when two or more threads each hold a lock the other needs, and neither can proceed. Classic scenario:

/* Thread A */                     /* Thread B */
pthread_mutex_lock(&lock1);        pthread_mutex_lock(&lock2);
pthread_mutex_lock(&lock2); /* blocks */  pthread_mutex_lock(&lock1); /* blocks */
/* Neither thread can proceed — deadlock! */

Four conditions must all be true simultaneously for a deadlock to occur (the Coffman conditions):

  1. Mutual exclusion — resources cannot be shared
  2. Hold and wait — a thread holds one resource while waiting for another
  3. No preemption — locks cannot be forcibly taken away
  4. Circular wait — a circular chain of threads, each waiting for the next

The easiest prevention strategy is to always acquire locks in the same global order. If every thread locks lock1 before lock2, circular wait is impossible. You can also use pthread_mutex_trylock, which returns immediately if the lock is unavailable instead of blocking, giving you a chance to back off and retry.

Thread Safety Best Practices

A function is thread-safe if it can be called simultaneously from multiple threads without causing data corruption. Here are the rules to live by:

Practice Why It Matters
Prefer local variables over globals Stack variables are thread-private — no sharing, no locking needed
Protect all shared mutable state with a mutex If two threads can touch it and one can write, you need a lock
Use _r (reentrant) versions of C library functions strtok_r, localtime_r, etc. do not use static buffers
Minimize shared state Less sharing means fewer locks means fewer bugs
Avoid holding locks during I/O or long operations Long-held locks throttle parallelism and invite deadlocks
Use pthread_mutex_destroy on cleanup Prevents resource leaks, especially with dynamically initialized mutexes

Also be aware that printf is thread-safe on most implementations (POSIX requires it), but the output from multiple threads may be interleaved. If you need atomic multi-line output, protect it with a mutex.

Practical Example: Producer-Consumer Pattern

The producer-consumer pattern is the workhorse of concurrent programming — one thread produces data, another consumes it, and a bounded buffer sits between them. This pattern appears everywhere: logging systems, task queues, data pipelines.

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define BUFFER_SIZE 10
#define NUM_ITEMS   20

int buffer[BUFFER_SIZE];
int count = 0;       /* Items currently in buffer */
int in_idx = 0;      /* Next write position */
int out_idx = 0;     /* Next read position */

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t not_full  = PTHREAD_COND_INITIALIZER;
pthread_cond_t not_empty = PTHREAD_COND_INITIALIZER;

void *producer(void *arg) {
    for (int i = 0; i < NUM_ITEMS; i++) {
        pthread_mutex_lock(&mutex);

        while (count == BUFFER_SIZE) {
            pthread_cond_wait(&not_full, &mutex);
        }

        buffer[in_idx] = i;
        in_idx = (in_idx + 1) % BUFFER_SIZE;
        count++;
        printf("[Producer] Produced: %d (buffer: %d/%d)\n", i, count, BUFFER_SIZE);

        pthread_cond_signal(&not_empty);
        pthread_mutex_unlock(&mutex);
    }
    return NULL;
}

void *consumer(void *arg) {
    for (int i = 0; i < NUM_ITEMS; i++) {
        pthread_mutex_lock(&mutex);

        while (count == 0) {
            pthread_cond_wait(&not_empty, &mutex);
        }

        int item = buffer[out_idx];
        out_idx = (out_idx + 1) % BUFFER_SIZE;
        count--;
        printf("[Consumer] Consumed: %d (buffer: %d/%d)\n", item, count, BUFFER_SIZE);

        pthread_cond_signal(&not_full);
        pthread_mutex_unlock(&mutex);
    }
    return NULL;
}

int main(void) {
    pthread_t prod, cons;

    pthread_create(&prod, NULL, producer, NULL);
    pthread_create(&cons, NULL, consumer, NULL);

    pthread_join(prod, NULL);
    pthread_join(cons, NULL);

    pthread_mutex_destroy(&mutex);
    pthread_cond_destroy(&not_full);
    pthread_cond_destroy(&not_empty);

    printf("All items produced and consumed.\n");
    return 0;
}

Study this example carefully — it combines everything you have learned. The mutex protects the shared buffer and count. The two condition variables coordinate the producer and consumer: the producer waits when the buffer is full, the consumer waits when it is empty. The while loops guard against spurious wakeups.

Practical Example: Parallel Array Sum

Here is a practical performance example — summing a large array using multiple threads. Each thread computes a partial sum over its assigned chunk, then the main thread adds them together. No mutex is needed because each thread writes to its own slot in the results array:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

#define ARRAY_SIZE  100000000  /* 100 million */
#define NUM_THREADS 4

typedef struct {
    int *array;
    long start;
    long end;
    long partial_sum;
} ChunkArgs;

void *sum_chunk(void *arg) {
    ChunkArgs *c = (ChunkArgs *)arg;
    long sum = 0;
    for (long i = c->start; i < c->end; i++) {
        sum += c->array[i];
    }
    c->partial_sum = sum;
    return NULL;
}

int main(void) {
    int *array = malloc(ARRAY_SIZE * sizeof(int));
    if (!array) { perror("malloc"); return 1; }

    /* Fill with values */
    for (long i = 0; i < ARRAY_SIZE; i++) {
        array[i] = 1;  /* Sum should be ARRAY_SIZE */
    }

    pthread_t threads[NUM_THREADS];
    ChunkArgs args[NUM_THREADS];
    long chunk_size = ARRAY_SIZE / NUM_THREADS;

    for (int i = 0; i < NUM_THREADS; i++) {
        args[i].array = array;
        args[i].start = i * chunk_size;
        args[i].end   = (i == NUM_THREADS - 1) ? ARRAY_SIZE : (i + 1) * chunk_size;
        pthread_create(&threads[i], NULL, sum_chunk, &args[i]);
    }

    long total = 0;
    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(threads[i], NULL);
        total += args[i].partial_sum;
    }

    printf("Total sum: %ld (expected: %d)\n", total, ARRAY_SIZE);
    free(array);
    return 0;
}

This pattern — divide the work, compute in parallel, combine the results — is the foundation of parallel computing. Notice that no mutex is needed because each thread writes only to its own partial_sum field. Eliminating unnecessary locking is how you get real speedups.

Compilation and Debugging Tips

Compiling and debugging threaded programs requires some extra tools:

# Compile with pthread support
gcc -o program program.c -lpthread -Wall -Wextra

# Enable Thread Sanitizer (catches race conditions at runtime)
gcc -o program program.c -lpthread -fsanitize=thread -g

# Use Helgrind (Valgrind tool for thread errors)
valgrind --tool=helgrind ./program

ThreadSanitizer (TSan) is incredibly valuable — it instruments your code to detect data races at runtime and reports exactly which memory locations are accessed unsafely. Helgrind is slower but catches additional issues like lock ordering problems and misuse of the pthreads API.

Always test threaded code under load with multiple runs. Race conditions are timing-dependent — a bug that occurs once in a thousand runs is still a bug.

Summary

You now have the tools to write correct, high-performance multithreaded C programs. Here is what to remember:

  • Threads share memory within a process — fast communication, dangerous if unprotected
  • pthread_create launches threads, pthread_join waits for them to finish
  • Pass data to threads through pointers — use structs for multiple values, heap-allocate when lifetime is uncertain
  • Race conditions corrupt shared data — protect critical sections with mutexes
  • Condition variables enable efficient waiting without burning CPU cycles
  • Deadlocks occur when threads wait on each other — prevent them by acquiring locks in a consistent order
  • Design for minimal sharing — the less data threads share, the fewer bugs you will have
  • Use ThreadSanitizer and Helgrind to catch bugs your eyes will miss

Concurrency is the bridge between writing programs that work and writing programs that fly. It is also the domain where C gives you the most power and the least safety net. Master pthreads, and you will understand how operating systems, databases, and web servers actually work under the hood — knowledge that transfers to every language and every platform you will ever touch.

In the next lesson, we will explore C Networking and Sockets, where threads become essential for building servers that handle multiple clients simultaneously.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *