Houses on a
numbered street.
An array is the simplest data structure there is: a row of identical boxes, packed next to each other in memory, addressed by number. From that single shape, every string, every image, every audio buffer, every hash table, every CPU's branch predictor table is built. Once you see how arrays really work, half of computer science gets cheaper to think about.
Arrays as a numbered street
Imagine a row of identical houses on one side of a street. Each house is the same size. Each has a number on its door: 0, 1, 2, 3, and so on. If a friend tells you "I'm in house 4", you don't have to walk past every house counting; you just look at the door numbers and go straight there. That's an array.
An array is:
- A sequence of values, all of the same type (all integers, or all bytes, or all whatever).
- Each value sits in a numbered slot called an index.
- The slots are side by side in memory, in order.
That's the whole idea. Every other property of arrays follows from those three.
A picture of an array in memory
Five integers. Five slots. Each slot is 4 bytes wide because an integer is 4 bytes. The whole array takes 20 contiguous bytes in memory.
arr[0], not arr[1]. The last element of a length-N array is arr[N-1]. This catches every programmer at least once, and pretty much always on the day you most want to ship.Declare, access, iterate
fn main() {
// A fixed-size array of five i32s. Type written [T; N].
let scores: [i32; 5] = [91, 84, 78, 66, 100];
// Indexing starts at 0. Out-of-bounds panics, never reads garbage.
println!("first: {}", scores[0]); // 91
println!("last: {}", scores[4]); // 100
// Iterating: idiomatic Rust uses iterators, not index loops.
for (i, s) in scores.iter().enumerate() {
println!("scores[{i}] = {s}");
}
// The whole array is 5 * 4 = 20 bytes on the stack.
// No heap, no allocator, no pointer chase.
println!("size in memory: {} bytes", std::mem::size_of_val(&scores));
}#include <stdio.h>
int main(void) {
// Same array, declared in C.
int scores[5] = {91, 84, 78, 66, 100};
// Indexing starts at 0. C does NOT bounds-check; scores[99] is
// legal at compile time and undefined at runtime.
printf("first: %d\n", scores[0]); // 91
printf("last: %d\n", scores[4]); // 100
for (int i = 0; i < 5; i++)
printf("scores[%d] = %d\n", i, scores[i]);
printf("size in memory: %zu bytes\n", sizeof scores); // 20
return 0;
}scores[99] compiles fine in both languages and produces undefined behaviour in C while panicking cleanly in Rust. That gap is the rest of this page.How indexing actually works
Why is arr[5,000,000] just as fast as arr[0]? Because the CPU doesn't walk through the array. It computes the address of the slot directly:
arr[i] = *(arr + i * sizeof(element))
For an array of 4-byte ints starting at 0x1000:
arr[0] = load from 0x1000 + 0 * 4 = 0x1000
arr[1] = load from 0x1000 + 1 * 4 = 0x1004
arr[2] = load from 0x1000 + 2 * 4 = 0x1008
arr[i] = load from 0x1000 + i * 4
One multiply, one add, one load. Constant time, no matter how big i is.
That's the O(1) in "arrays have O(1) random access." The cost doesn't depend on the size of the array, only on the size of one element. Every other data structure (linked list, tree, hash table) tries to either match or hide this property.
Indexing is pointer arithmetic in disguise
In C, the language makes this completely explicit. arr[i] is a literal synonym for *(arr + i). The compiler will accept either one. It will also accept the cursed-but-legal i[arr], because addition is commutative.
fn main() {
let arr: [i32; 4] = [10, 20, 30, 40];
// Rust's [] is bounds-checked. The same machine code as C
// (load from base + i * 4) plus a runtime "is i < len" check.
// Out-of-bounds panics; never reads adjacent memory.
println!("{}", arr[2]); // 30
// Same 4-byte stride between consecutive elements.
for i in 0..4 {
println!("&arr[{}] = {:p} (val {})", i, &arr[i], arr[i]);
}
// Slice the middle two elements. A slice is (ptr, len), and
// gives you the same fast indexing with the same safety check.
let mid: &[i32] = &arr[1..3];
println!("mid = {:?}", mid); // [20, 30]
// The line below would panic at runtime:
// let bad = arr[5];
// error: index out of bounds: the len is 4 but the index is 5
}#include <stdio.h>
int main(void) {
int arr[4] = {10, 20, 30, 40};
// arr[i] is EXACTLY *(arr + i). The C language defines them equal.
// The compiler turns the index into address arithmetic:
// 1. multiply i by sizeof(int) = 4
// 2. add to the base address of arr
// 3. load 4 bytes from that address
printf("%d\n", arr[2]); // 30
printf("%d\n", *(arr + 2)); // 30 -- same machine code
printf("%d\n", 2[arr]); // 30 -- also legal, identical
// Print each address. They're spaced exactly sizeof(int) apart
// because the elements live contiguously in memory.
for (int i = 0; i < 4; i++)
printf("&arr[%d] = %p (val %d)\n",
i, (void*)&arr[i], arr[i]);
return 0;
}C indexes blindly; Rust checks every time
The machine-level operation is the same: multiply, add, load. The difference is what happens before the load.
| language | what arr[i] does | if i is out of bounds |
|---|---|---|
| C | Multiply, add, load. No check. | Undefined behaviour. Could read adjacent memory, crash, or look fine and corrupt silently. |
| Rust (safe) | Check that i < len, then multiply, add, load. | Panic with a clear message: index out of bounds: the len is N but the index is i. |
Rust (unsafe, get_unchecked) | Multiply, add, load. No check. | Same UB as C. You opted in. |
That bounds check is a single compare-and-branch on a register that's almost always in cache. It costs essentially zero in tight loops because the branch predictor learns it always succeeds. The "cost" of Rust's safety, for arrays, is close to non-existent on real workloads.
Stack arrays vs heap arrays
Two shapes, same indexing experience, different mechanics.
- A stack array is the whole array, sitting in one block on the stack. Compile-time-known length. Freed automatically when the function returns. Zero allocator involvement.
- A heap array (Rust's
Vec, C'smalloc'd buffer) is a small fixed-size header on the stack pointing at a variable-size body on the heap. You can grow it. You pay for the allocator. You have to release it (manually in C, automatically when the owner drops in Rust).
fn main() {
// Vec<T> is the canonical growable array in Rust. Layout: a
// (ptr, len, cap) header on the stack, pointing at a heap buffer.
let mut arr: Vec<i32> = Vec::new();
// push() is amortized O(1). When the heap buffer is full,
// Vec doubles its capacity, memcpy's the old data over, and
// frees the old buffer. The whole machinery is hidden.
for i in 0..10 {
arr.push(i * i);
}
println!("arr[3] = {}", arr[3]); // 9
println!("len = {}, cap = {}", arr.len(), arr.capacity());
// A slice is the safe pointer for an array region. It's (ptr, len),
// valid only while the source array is alive.
let middle: &[i32] = &arr[3..7];
println!("{:?}", middle); // [9, 16, 25, 36]
}#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
// Heap-allocated array. The size is decided at runtime.
int n = 10;
int *arr = malloc(n * sizeof(int));
for (int i = 0; i < n; i++) arr[i] = i * i;
// Indexing syntax is identical to a stack array. Underneath,
// arr is just a pointer to the first byte of a contiguous block.
printf("%d\n", arr[3]); // 9
// Grow it. realloc may move the buffer to a bigger free region,
// copy the old contents over, and free the old one. The pointer
// can change, so we reassign.
n = 20;
arr = realloc(arr, n * sizeof(int));
for (int i = 10; i < 20; i++) arr[i] = i * i;
printf("%d\n", arr[15]); // 225
free(arr);
return 0;
}Cache lines, layouts, and why arrays are everywhere
Cache locality: the real reason arrays win
The CPU page covered the cache hierarchy: registers, L1, L2, L3, RAM. Each level is bigger and slower than the one above. When the CPU fetches memory from RAM, it doesn't fetch one byte. It fetches a cache line (typically 64 bytes) and parks the whole line in L1.
This is why arrays are obscenely fast to walk in order. The first read pulls in a 64-byte line containing the next 16 ints (or 8 doubles, or whatever). The next 15 reads hit cache. Zero memory stalls. The CPU's prefetcher even notices the pattern and starts pulling in the next line before you ask for it.
Now consider a linked list. Each node lives at a different address. Following next is a pointer chase. The CPU has no idea where the next node is until it reads the current one's next field. Cache prefetch can't help. Every step is a cache miss.
Same algorithmic complexity (O(n) to walk). Real-world: the array beats the linked list by 5 to 50 times. The difference is locality, not big-O.
Strings are arrays of bytes
A string is not a special type. It's an array of bytes that happens to contain text. The ASCII page showed how each byte maps to a character; the string is just a sequence of those bytes laid out contiguously.
- C strings: a
char *pointing at a null-terminated array of bytes."hello"in source is a 6-byte array (5 chars + a zero) inrodata. - Rust
&str: a (pointer, length) pair. No null terminator. Guaranteed valid UTF-8 by the type system. - Rust
String: a heap-owned, growable byte array. Effectively aVec<u8>with a UTF-8 guarantee.
The reason Rust's string slicing rules feel finicky is that UTF-8 characters aren't all the same size, so "the third character" and "the third byte" are different questions. The array shape doesn't change; only the interpretation of its elements does.
Multidimensional arrays
A 2D array, conceptually a grid, is in memory just a 1D array with a layout convention.
grid[2][3]: 2 rows × 3 cols, same data, two layouts
This isn't pedantic; it affects performance dramatically. Walking a row-major matrix by rows hits cache nicely; walking it by columns thrashes cache. Matrix multiplication libraries (BLAS, cuBLAS) spend most of their cleverness on traversing in the right order for their target architecture.
Arrays as the universal foundation
Where to dig in next
Arrays look simple but have rabbit holes a kilometre deep:
- SIMD (Single Instruction, Multiple Data). Special CPU instructions that operate on 4, 8, or 16 array elements at once. The
std::archmodule in Rust and intrinsics in C let you write hand-tuned hot loops. - Cache-oblivious algorithms. Sorting and matrix algorithms that beat cache-aware ones by recursive blocking, often without knowing the cache size.
- The Folly & Abseil libraries. Facebook's and Google's drop-in replacements for the C++ standard containers, with much better cache behaviour and growth strategies. The lessons apply to Rust and C too.
- Apache Arrow. A columnar in-memory format for analytics that takes "arrays for everything" to its logical conclusion across language boundaries.
- Data-Oriented Design (Mike Acton's talks). The game-engine philosophy of organising your data as arrays of fields, not arrays of structs, for maximum cache friendliness.