root.system / 0x07 / system

One CPU.
Many programs.

The CPU page showed a machine that runs one instruction stream. The memory page showed an address space that belongs to one process. Right now your laptop is running hundreds of programs across a handful of cores, and they don't trample each other. The thing in the middle making that work is the operating system.

Your code has never spoken to your CPU.

Not once.

Every instruction you have ever written.
Every function you have ever called.
Every file you have ever opened.

None of it reaches the hardware directly.

It all goes through a middleman.

The operating system.

Your program lives in a box the OS drew.
It can only see the memory the OS gave it.
It can only use the CPU time the OS allows.
It can only touch the hardware by asking the OS for permission.

This is not a limitation.
It is what makes computing reliable.

Without the OS your program and every other program running on the same machine would share one address space. One set of registers. One CPU.

And the first bug in any one of them would corrupt everything else.

The OS is the thing that decided that could not be allowed.

And built the walls to enforce it.

Beginner// level 01

What an operating system is for

An operating system is, at its heart, just another program. The trick is that it runs in a privileged mode the CPU itself enforces, and every other program runs inside the box the OS draws around it. The OS owns the hardware. Your program asks for things; the OS decides whether and how to give them to you.

Three jobs make up almost everything an OS does:

job 01

Multiplex

Share one CPU between hundreds of programs. Share one disk, one network card, one screen. Make each program think it owns the machine.

job 02

Isolate

Stop programs from reading each other's memory, corrupting each other's files, or crashing the whole machine when one of them dies.

job 03

Abstract

Hide the differences between disks, between network cards, between keyboards. Expose one uniform interface (files, sockets, processes) that programs can target.

User mode and kernel mode

The CPU has, baked into the silicon, two modes: kernel mode (full access to every instruction, every memory address, every device) and user mode (restricted: most instructions allowed, but anything that touches hardware traps). The OS kernel runs in kernel mode. Your program runs in user mode. There is no in-between.

So how does your program ever do anything: open a file, send a packet, allocate memory? It asks the kernel. That request is called a system call.

The CPU's two modes are enforced in silicon. Kernel mode and user mode are bits in a CPU control register. The same CPU you learned about on page 5. The same fetch-decode-execute loop. The privilege level is just another bit pattern the CPU checks before executing certain instructions. ← see: CPU

Every program is, ultimately, a sequence of syscalls

Underneath println!, printf, fopen, malloc, fetch() (under everything) is a syscall. The standard library is mostly a polite, portable wrapper around them.

Rust• • •
// printf?  read?  open?  In the end, every one of those goes
// through the OS via a *system call*. Here's the same write
// done two ways: the high-level library, and the raw syscall.

use std::io::Write;

fn main() {
    // High-level: Rust's std::io. Cross-platform; calls into libc,
    // which eventually issues the OS syscall.
    let _ = std::io::stdout().write_all(b"hello via std\n");

    // Low-level (Linux/macOS): write(fd=1, buf, len) is syscall #1
    // on x86_64 Linux. Going through libc keeps it portable.
    extern "C" {
        fn write(fd: i32, buf: *const u8, count: usize) -> isize;
    }
    let msg = b"hello via syscall\n";
    unsafe { write(1, msg.as_ptr(), msg.len()); }
}

C• • •
// On Linux, write() is a libc wrapper around the kernel's
// sys_write, syscall number 1 on x86_64. We can call it
// directly via syscall(2), bypassing the libc wrapper.
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

int main(void) {
    // High-level: libc, ultimately a syscall.
    printf("hello via printf\n");

    // One layer down: the libc wrapper that names the syscall.
    write(1, "hello via write()\n", 18);

    // Raw: name the syscall by its number.
    syscall(SYS_write, 1, "hello via SYS_write\n", 20);
    return 0;
}

// the syscall instruction

On x86_64, the actual mechanism is a single instruction: syscall. The user program puts the syscall number in rax, args in registers, and executes syscall. The CPU traps, switches to kernel mode, and jumps to a fixed handler the OS installed at boot. When the kernel returns, the CPU drops back to user mode at the next instruction. Every "open a file", "send a packet", "fork a process" is exactly one of these traps.

How the OS itself starts running

Power on. The CPU jumps to a hardcoded address in firmware (BIOS on old PCs, UEFI on modern ones). Firmware finds a bootloader on disk and runs it. The bootloader loads the OS kernel into memory, then jumps to it. The kernel sets up page tables, starts the scheduler, mounts file systems, and finally launches the first user-mode process: init on Unix, System on Windows. From there, init starts every other process you'll ever run.

That entire chain is just CPUs jumping to addresses. There's no magic. Every step is a continuation of the fetch-decode-execute loop you already know.

Intermediate// level 02

Processes, threads & scheduling

What a process actually is

A process is the OS's bookkeeping for one running program. It's a struct in the kernel containing, roughly:

A page table: its private virtual address space (see the memory page).
The current register state: instruction pointer, stack pointer, and the rest of the CPU's registers, frozen for when this process isn't running.
A table of open file descriptors: small integers that index into kernel-side objects (open files, sockets, pipes).
A process ID, a parent process ID, credentials, signal handlers, working directory.

That's the entire identity of a "running program". On Linux you can read it: cat /proc/<pid>/status. The whole struct, formatted for humans.

A process struct in the kernel is just a data structure in memory. The page table pointer, register state, file descriptor table — all of it binary data at a memory address. The OS manages processes the same way your programs manage linked lists and arrays. With pointers. With structs. With the same memory operations you learned on pages 6 and 9. ← see: Memory · ← see: Pointers

Creating processes: fork & exec

Unix has an unusual but elegant model for starting a new program. Two syscalls do it:

fork() clones the calling process. After fork, there are two processes with identical memory, identical file descriptors, identical everything except their PID and fork's return value.
exec() replaces the current process's program with a different binary. Same PID, same file descriptors, brand new code and data.

To run ls from a shell: fork() a copy of the shell, then in the child, exec("ls"). The shell stays alive (it's the parent), and the child becomes ls. Two syscalls, every command in your terminal.

Rust• • •
// fork() asks the kernel to clone the current process.
// Both processes return from fork(): the child sees 0,
// the parent sees the child's PID. Then the OS schedules them
// independently on whatever cores are free.

use std::process;

fn main() {
    extern "C" {
        fn fork() -> i32;
        fn getpid() -> i32;
        fn wait(status: *mut i32) -> i32;
    }

    println!("[parent] starting, pid={}", process::id());

    let pid = unsafe { fork() };
    match pid {
        -1 => panic!("fork failed"),
        0  => {
            // Child branch.
            let cpid = unsafe { getpid() };
            println!("[child]  hello, pid={cpid}");
        }
        n  => {
            // Parent branch: wait for the child.
            let mut status = 0;
            unsafe { wait(&mut status); }
            println!("[parent] child {n} exited");
        }
    }
}

C• • •
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main(void) {
    printf("[parent] starting, pid=%d\n", getpid());

    pid_t pid = fork();
    if (pid < 0) {
        perror("fork");
        return 1;
    }

    if (pid == 0) {
        // Child branch.
        printf("[child]  hello, pid=%d\n", getpid());
    } else {
        // Parent branch.
        int status;
        wait(&status);
        printf("[parent] child %d exited\n", pid);
    }
    return 0;
}

Threads: lightweight processes

A thread is an independent stream of execution that shares its process's address space and file descriptors with other threads. Cheaper to create than a process, faster to switch between, and able to communicate just by reading the same memory.

That last property is also threads' biggest pitfall. If two threads write to the same variable without coordination, you get a data race: undefined behaviour in C, a compile error in safe Rust. The languages diverge here. C trusts you to use mutexes correctly; Rust's type system tracks which references can cross thread boundaries and refuses to compile the unsafe combinations.

Rust prevents data races at compile time. A data race is two threads writing to the same memory without coordination. The ownership system tracks which references can cross thread boundaries. If two threads could write the same value the code does not compile. C trusts you with mutexes. Rust enforces the contract. ← see: Compile vs Runtime

Scheduling: how the OS shares one CPU

You have 8 cores. You have 600 processes. They don't all fit. Every few milliseconds the OS performs a context switch: it saves the current process's registers into its kernel struct, picks another runnable process, restores its registers, and resumes. Done fast enough, every process feels like it's running constantly.

Picking which process runs next is the scheduler's job. Some classic strategies:

scheduler	rule	fairness	where used
Round-robin	Each runnable process gets a fixed time slice in turn	Equal share	Teaching examples; some real-time systems
Priority	Higher-priority always runs first; ties broken by round-robin	Higher pri starves lower	Real-time systems, embedded
CFS (Completely Fair Scheduler)	Track each task's share of CPU time; run the one furthest behind	Proportional to weight	Linux 2.6.23+ (the desktop / server default)
MLFQ	Multiple priority queues; tasks demote on long runs, promote when interactive	Adaptive	macOS, Windows (variants)

// what "100% CPU" actually means

top shows your program at 100% CPU. That doesn't mean it's running 100% of the time. It means the scheduler is giving it 100% of one core's available time. The kernel itself, interrupt handlers, and other processes still preempt it. There's no such thing as "all of the CPU forever" on a real OS.

Advanced// level 03

Virtual memory, I/O & the kernel boundary

Virtual memory, revisited

The memory page covered the idea: every process gets its own virtual address space, the MMU translates virtual to physical at every load and store. The OS is what fills in the table. On every mmap, every fork, every page fault, the kernel adjusts page-table entries and reloads the MMU.

Page faults are the magic. When you touch a virtual address that has no physical page yet, the CPU traps into the kernel. The kernel decides what should be there (a fresh zeroed page, a page from disk, a page being shared with another process), allocates physical RAM, updates the page table, and resumes your program. The instruction that caused the fault re-runs and now succeeds. Your program never knew.

// what "swap" actually does

Swap is the same mechanism, in reverse. Under memory pressure the kernel writes a rarely-touched page to disk and marks its page-table entry "not present". Next time anyone reads that address, page fault → kernel reads the page back from disk → updates the table → resumes. Slow (millions of cycles) but invisible.

Memory-mapped I/O: files as memory

The same machinery makes one of Unix's most beloved tricks work. Instead of read()-ing a file in chunks, mmap asks the kernel to map the file into your address space. The page table now says "addresses X through Y of this process correspond to bytes 0 through N of that file." No data has been copied yet, but as you walk the bytes, page faults pull each 4 KB chunk in on demand.

Rust• • •
// Read a file by *mapping* it into memory: the kernel pages in
// each block on demand, on first touch, instead of `read()`-ing
// it byte by byte. Same syscall every database, log indexer, and
// language runtime uses for fast file access.

use std::fs::File;
use std::os::fd::AsRawFd;

fn main() -> std::io::Result<()> {
    extern "C" {
        fn mmap(addr: *mut u8, len: usize, prot: i32,
                flags: i32, fd: i32, off: i64) -> *mut u8;
        fn munmap(addr: *mut u8, len: usize) -> i32;
    }
    const PROT_READ:    i32 = 1;
    const MAP_PRIVATE:  i32 = 2;

    let f = File::open("Cargo.toml")?;
    let len = f.metadata()?.len() as usize;
    let ptr = unsafe {
        mmap(std::ptr::null_mut(), len,
             PROT_READ, MAP_PRIVATE, f.as_raw_fd(), 0)
    };

    // The file is now bytes in our address space, but no actual
    // RAM has been allocated yet. The kernel pages each 4 KB
    // chunk in only when we touch it.
    let bytes = unsafe { std::slice::from_raw_parts(ptr, len) };
    let s = std::str::from_utf8(&bytes[..bytes.len().min(80)]).unwrap();
    println!("first chars: {s}");

    unsafe { munmap(ptr, len); }
    Ok(())
}

C• • •
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>

int main(void) {
    int fd = open("Makefile", O_RDONLY);
    if (fd < 0) { perror("open"); return 1; }

    struct stat st;
    fstat(fd, &st);

    // Ask the kernel for a virtual address window backed by the file.
    // No data is read yet; the page table just gets new entries
    // marked "this region maps to that file."
    char *p = mmap(NULL, st.st_size,
                   PROT_READ, MAP_PRIVATE, fd, 0);
    if (p == MAP_FAILED) { perror("mmap"); return 1; }

    // Touching p[0] triggers a page fault, which the kernel
    // services by reading the first 4 KB of the file into RAM
    // and patching the page table. From then on, access is
    // a normal load, no syscall on the fast path.
    fwrite(p, 1, st.st_size < 80 ? st.st_size : 80, stdout);
    putchar('\n');

    munmap(p, st.st_size);
    close(fd);
    return 0;
}

Databases, log indexers, language runtimes (the JVM, the V8 heap), and dynamic linkers all use mmap heavily. It's how a 100 GB log file becomes a normal pointer you can scan.

I/O models: blocking, non-blocking, async

A read() on a network socket can take milliseconds. What does the OS do with your thread while it waits?

blocking

Sleep the thread

Default. Thread is parked on a kernel wait queue; scheduler picks something else. When data arrives, your thread is woken. Simple to write, expensive at scale (1 thread per connection).

non-blocking

Return EAGAIN

Set the fd non-blocking; reads return immediately, with an error if nothing's ready. Your thread polls (wastefully, unless paired with the next idea).

readiness multiplex

epoll / kqueue / IOCP

One syscall, hand it many fds, block until any is ready. One thread serves thousands of connections. The architecture every modern server runs on.

Async runtimes (Tokio in Rust, libuv under Node, Go's runtime) are built on top of the third option. The runtime keeps an epoll/kqueue loop, schedules user tasks (futures, goroutines, callbacks) onto a small pool of OS threads, and parks them on I/O instead of blocking the thread. The OS provides the readiness primitive; the language runtime provides the ergonomics.

The kernel boundary, in one diagram

One line, the syscall, is the only way through. Everything in user space funnels through it; the kernel is the only thing that talks to hardware. Lock that boundary down and you get isolation, security, and a stable interface that any user program can target without knowing what hardware it's running on.

Bitcoin Core as an operating system client

Every Bitcoin full node on Earth is a program running inside an OS.

Not a special program. Not a privileged program. A regular user-space process.

Bitcoin Core — the reference implementation written in C++ — uses every OS primitive this page has described.

Processes and threads

Bitcoin Core spawns multiple threads on startup:

The main thread: handles the event loop.
The net thread: manages peer connections.
The mempool thread: validates transactions.
The validation thread: validates new blocks.
The RPC thread: handles API requests.

Each thread is scheduled by the OS. Each shares the same process address space. Each must coordinate using mutexes to avoid data races.

Rust• • •
use std::sync::{Arc, Mutex};
use std::sync::atomic::{AtomicBool, Ordering};
use std::thread;

struct Mempool { /* pending transactions */ }
impl Mempool { fn validate_pending(&mut self) {} }

struct BitcoinNode {
    mempool: Arc<Mutex<Mempool>>,
    running: Arc<AtomicBool>,
}

impl BitcoinNode {
    fn start(&self) {
        let mempool = Arc::clone(&self.mempool);
        let running = Arc::clone(&self.running);

        /* OS creates and schedules this thread */
        thread::spawn(move || {
            while running.load(Ordering::Relaxed) {
                /* Rust enforces: only one writer at a time.
                 * Forget the lock: compile error, not a race. */
                let mut pool = mempool.lock().unwrap();
                pool.validate_pending();
                /* MutexGuard drops here — lock released */
            }
        });
    }
}
/* Same OS primitives. Different safety guarantees.
 * Bitcoin Core (C++) prevents races at code review.
 * Rust prevents them at compile time. */

C++• • •
#include <thread>
#include <mutex>

std::mutex mempool_mutex;

class BitcoinNode {
    std::thread net_thread;
    std::thread validation_thread;
    std::thread rpc_thread;
    bool running = true;

    void net_main() {
        /* OS schedules this thread */
        /* manages TCP connections  */
        /* each connection: one socket fd */
        /* multiplexed with epoll/kqueue */
        while (running) {
            poll_peers();      /* non-blocking I/O */
            gossip_txns();     /* write() syscall  */
        }
    }

    void validation_main() {
        while (running) {
            std::lock_guard<std::mutex> lock(mempool_mutex);
            /* only one thread validates at a time */
            validate_next_block();
        }       /* mutex released here (RAII) */
    }
};
/* C++ trusts you to use mutexes correctly.
 * Forget the lock: silent data race at runtime. */

Syscalls

Every Bitcoin Core operation is built on syscalls:

socket() — create a network socket
connect() — connect to a peer
send() — broadcast a transaction
recv() — receive a new block
open() — open the block database
mmap() — map the UTXO set into memory
epoll() — wait for any peer to send data
futex() — fast mutex for thread coordination

The entire peer-to-peer Bitcoin network is socket() + send() + recv(). That is it. The OS provides the sockets. Bitcoin Core provides the protocol. TCP/IP carries the binary packets. The blockchain page showed the full picture. This page shows the OS layer it runs on top of.

Memory mapping the UTXO set

The UTXO set (~85 million entries, ~8 GB) is memory-mapped using mmap().

The OS does not load all 8 GB into RAM at once. It maps the file into the address space. As Bitcoin Core accesses UTXO entries the OS pages them in on demand. Hot UTXOs (recently used) stay in RAM. Cold UTXOs (rarely accessed) get swapped. The OS manages this automatically. Bitcoin Core just follows pointers.

This is the same mmap() from the advanced section above. Used on the largest financial dataset in the history of Bitcoin.

epoll and the peer network

Bitcoin Core connects to ~125 peers by default. 125 TCP connections. 125 sockets.

Reading from 125 sockets with 125 threads would use 125 MB of stack memory just for idle threads. Instead Bitcoin Core uses epoll() on Linux or kqueue() on macOS/BSD: one syscall that blocks until any of the 125 sockets has data. One thread. 125 connections. Zero wasted memory.

This is the "readiness multiplex" I/O model from the section above. In production. On the Bitcoin network.

The OS is not just below Bitcoin. It is what Bitcoin runs inside. Every transaction. Every block. Every peer connection. Mediated by the kernel. One syscall at a time.

What different OSes actually share

Linux, macOS, Windows, FreeBSD, the BSDs, illumos: they look different on the surface, but the architecture is the same. Privileged kernel, unprivileged user space, syscalls as the only bridge, virtual memory, processes, schedulers, file abstractions. The interfaces differ (POSIX vs Win32 vs Mach), but the shape doesn't. Once you understand one, you can read the others.

The full stack, with the OS in place

// from electrons to your terminal prompt

Electrons gated by transistors form logic gates.
Gates compose into CPUs and memory chips.
The CPU runs fetch-decode-execute over bits in memory.
Bits in memory encode numbers, characters, and instructions.
The kernel is one program, granted privileged access by the CPU's mode bits.
Every other program is run by the kernel, in its own virtual address space, scheduled onto cores, mediated by syscalls.
Your shell typed at a prompt is one of those user-space programs: exec'd by init, scheduled by the kernel, drawing characters by writing to a file descriptor that ends up in a TTY driver.
And under that driver, eventually, more electrons gating more transistors.

Where to dig in next

The OS is one of the largest topics in computing; this page is the one-screen tour. Natural deep-dives:

Operating Systems: Three Easy Pieces (Arpaci-Dusseau): free online, the kindest modern OS textbook in print.
The Linux Programming Interface (Kerrisk): the canonical reference for what every syscall does.
xv6, MIT's teaching OS: ~10k lines of C. Read it cover to cover in a week.
Writing an OS in Rust (Philipp Oppermann): build a small kernel from scratch on bare metal.

Where the OS appears in BitRoot

The operating system is not an isolated topic. It sits on top of everything below it and beneath everything above it.

01 / binary

The kernel is binary

The OS kernel is binary machine code. The kernel mode bit is a binary flag in a CPU control register. Every syscall is a binary trap instruction.

04 / logic gates

Privilege in silicon

The CPU's privilege levels are implemented in logic gates. Gates check the mode bit before executing privileged instructions. The protection is hardware-enforced.

05 / cpu

The OS owns the CPU

The scheduler controls which process runs on which core. Context switches save and restore the entire CPU register state. The fetch-decode-execute loop serves the OS's will.

06 / memory

Every page table

The OS manages every page table, every virtual address space, every allocation. The stack and heap exist because the OS created them. Virtual memory is an OS abstraction over physical RAM.

09 / pointers

File descriptors

File descriptors are OS-level pointers to kernel objects. Socket fd 5 points to a TCP connection. File fd 3 points to an open file. The kernel is a linked list of these objects internally.

0A / compile vs runtime

The runtime boundary

Syscalls are the runtime boundary. Your compiled binary contains the syscall instruction statically. The OS decides at runtime whether to grant the request.

0F / networking

The OS owns TCP

TCP/IP is implemented in the kernel. Your program calls send() and recv(). The kernel does packet assembly, routing, and checksums. Every network packet travels through the kernel.

0D / hashing

Hash tables inside

The OS uses hashing internally. The page table is a hash map of virtual to physical addresses. File system inodes are found via hash. The OS is one of the largest users of hashing.

14 / recursion

Stack overflow = SIGSEGV

A stack overflow is a page fault at the stack guard page. The OS detects it and sends SIGSEGV. The kernel enforces the boundary in the page table.

10 / distributed systems

Processes on a network

Every node in a distributed system is a process managed by an OS. The OS provides the sockets. The scheduler determines when each node's logic runs.

13 / blockchain

Bitcoin runs inside the OS

Bitcoin Core is a user-space process. Its 125 peer connections are sockets managed by the kernel. The UTXO set is memory-mapped via mmap(). The mempool is protected by OS mutexes.

15 / big o

Context switch is O(1)

Saving and restoring registers is a fixed number of operations — O(1). But the real cost is cache invalidation and TLB flushes. Big O explains the algorithm. Cache explains the reality.

next up / 0x08

Trace one variable end-to-end: where `let x = 42` actually goes

variables →