Skip to content

Spawn & Await

Wyn uses stackful coroutines for concurrency — each spawn creates a lightweight green thread with its own 8MB virtual stack (only ~0.3KB physical). Coroutines are scheduled across OS threads using a work-stealing scheduler, similar to Go goroutines.

Basic Spawn

wyn
fn compute(n: int) -> int {
    var sum = 0
    for i in 0..n { sum = sum + i }
    return sum
}

fn main() -> int {
    var f1 = spawn compute(100000)
    var f2 = spawn compute(200000)

    var total = await f1 + await f2
    println("total = ${total}")
    return 0
}

spawn starts a function in a new coroutine and returns a future. await suspends the current coroutine until the result is ready — it doesn't block the OS thread.

How It Works

spawn f(x) → mmap 1MB virtual stack (only ~0.3KB physical) → init coroutine → enqueue to scheduler
await f     → if not ready, park coroutine → scheduler runs other work
              when result ready, wake coroutine
  • Virtual memory stacks: coroutine stacks are mmap'd — the OS reserves 1MB of address space but only commits physical pages as the stack grows (~0.3KB for simple coroutines, grows automatically up to 1MB)
  • Cooperative scheduling: coroutines yield at await, channel operations, and I/O calls
  • Work-stealing: idle workers steal tasks from busy workers' queues
  • I/O integration: blocking I/O (Http.accept, Socket.recv) automatically yields the coroutine and uses kqueue/epoll to resume when ready
  • Nested spawn: spawning from inside a coroutine works correctly — inner spawns create new coroutines on the scheduler

Multiple Workers

wyn
fn work(id: int) -> int {
    return id * id
}

var futures = []
for i in 0..10 {
    futures.push(spawn work(i))
}

var total = 0
for i in 0..10 {
    total = total + await futures[i]
}
println("sum of squares = ${total}")

Shared State

wyn
fn worker_add(shared: int, n: int) -> int {
    Task.add(shared, n)
    return 0
}

var shared = Task.value(0)
var f1 = spawn worker_add(shared, 10)
var f2 = spawn worker_add(shared, 20)
await f1
await f2
println(Task.get(shared).to_string())  // 30

Performance

Benchmarked on Apple M4. All numbers are real — run the benchmarks yourself in wyn/tests/test_coroutine_bench.wyn and wyn/benchmarks/.

Sequential spawn+await (10K ops)

LanguageTimePer op
Go 1.2410ms1μs
Python 3.12 (asyncio)2ms0.2μs*
Rust (tokio)~5ms~0.5μs
Wyn243ms24μs

*Python's async/await is cooperative with no OS thread involvement — fast to create but single-threaded.

Concurrent spawn capacity (create + await all)

ConcurrentWynGoPythonRust (tokio)
1,00013ms0.4ms6ms~1ms
10,000112ms3ms62ms~10ms
50,000559ms11ms436ms~50ms
100,0001,088ms23ms1,196ms~100ms

Memory efficiency (100K concurrent coroutines)

LanguagePhysical RSS
Wyn11 MB
Go~50 MB
Python~200 MB
Rust (tokio)~30 MB

Stack depth (recursion inside spawn)

LanguageMax depthStack model
Gounlimitedcopying (grows to 1GB)
Rust8MB defaultOS thread stack
Wyn500K+ (8MB)mmap virtual (pages on demand)
Python~1000interpreter limit

Summary

Go and Rust are significantly faster at coroutine creation — they have years of runtime optimization. Wyn wins on memory efficiency (mmap pages-on-demand) and is competitive with Python on throughput. For typical workloads (web servers, concurrent I/O, CLI tools), all are fast enough. Wyn's advantage is simplicity: spawn f(x) with no imports, no runtime setup, no async coloring.

Architecture

.wyn source → spawn f(x)

coroutine pool (pre-allocated 16KB stacks, reused via spinlock pool)

M:N scheduler (N OS threads, M coroutines)
  ├── per-processor local deque (LIFO, cache-friendly)
  ├── global queue (lock-free stack)
  ├── work-stealing (batch steal up to 32 tasks)
  └── adaptive parking (spin → yield → sleep)

kqueue/epoll I/O loop (non-blocking I/O with coroutine parking)

Known Limitations

  • Coroutine stack is 8MB virtual (mmap) — the OS only commits physical pages as used. Recursion up to 500K+ depth works. Unlike Go, stacks don't grow beyond 8MB (Go can grow to 1GB).
  • Spawn+await is ~27x slower than Go goroutines for sequential create+await. For concurrent workloads the gap is smaller.
  • Fire-and-forget spawns (spawn f(x) without await) are faster than awaited spawns.

Try It

🐉 Playground
Press Run or Ctrl+Enter

MIT License