For modern vision! I’d like you to use this thread solely for discussing these things. If possible, this could be truly helpful, even if it’s just a theoretical scenario. These “summaries” may help many composers better understand the complexity of programming software like a complete DAW, and not only that, but also adapting it to current times while maintaining a foundation of older software. The amount of code, trial and error testing, and research involved is enormous.
Obviously, I had to use AI tools for this. I’m wondering where things could go in new DAW developments like Renoise, in a hypothetical scenario of developing from scratch, with current hardware and software capabilities.
By the way, I forgot a crucial approach here, which would be “centralized MIDI control” (allowing the user to route a controlled number of controllers, for example, 32, and thus control almost all of Renoise with general routing). I suppose this should be focused on the MIDI 2.0 protocol.
The following points represent a correct path to follow, or are there more efficient and effective alternatives for designing this type of software? The discussion here focuses heavily on process queuing and related aspects related to audio, as well as the logical utilization of CPU and GPU resources (how to get the most out of everything currently available).
Rethinking DAW Architecture for Modern CPUs (and Beyond)
I’ve been looking into how DAWs like Renoise or others actually use the CPU, and why—even on powerful multi-core systems—they often don’t scale as well as expected. This post summarizes both how things currently work and what a “from-scratch” modern design could look like.
1. How DAWs Use the CPU Today
Modern DAWs process audio in small buffers (e.g. 64 samples). For each buffer, they:
- Traverse the audio graph (tracks, effects, routing)
- Process each node in order
- Output the result to the audio device
The key constraint is real-time deadlines. For example:
- 64 samples at 48 kHz ≈ 1.3 ms
All processing must finish within that time, or you get dropouts.
Where parallelism exists
- Independent tracks can run on different threads
- Some DAWs distribute tracks across cores
Where it breaks down
- Effect chains are sequential (A → B → C must run in order)
- Feedback loops force single-thread execution
- The master bus becomes a synchronization point
So even on a 16-core CPU:
- One core may be maxed out (critical path)
- Others sit underused
2. The Core Problem: Dependencies
The real bottleneck isn’t “bad threading”—it’s dependency chains.
In any audio graph, there’s always a critical path (longest chain of dependent operations). That path determines total processing time, regardless of how many cores you have.
This is a classic parallel computing limitation (Amdahl’s Law).
3. A More Modern Approach: Task-Based Audio Engine
Instead of thinking:
“one track = one thread”
We can move to:
“audio as a graph of small tasks (jobs)”
Key ideas:
a) Audio as a DAG (Directed Acyclic Graph)
Each node (synth, effect, bus) is a unit with explicit dependencies.
b) Job system scheduler
- Break processing into small tasks
- Use a thread pool with work-stealing
- Dynamically distribute load across cores
c) Wavefront parallelism
Process nodes in layers:
- All nodes without dependencies → parallel
- Then next layer → parallel
- etc.
This keeps all cores busy instead of assigning fixed threads.
4. Real-Time vs Non-Real-Time Separation
One major limitation today is treating everything as equally “real-time”.
A better model:
Real-time domain (strict deadlines)
- Live input
- Monitoring chain
- Low-latency effects
Deferred domain (relaxed timing)
- Long reverbs
- Analysis
- Background rendering
- Non-active tracks
This allows:
- Better CPU utilization
- Larger buffers where possible
- Fewer dropouts
5. Hybrid Buffer Strategy
Instead of fixed buffer sizes:
- Critical nodes → small buffers (e.g. 64 samples)
- Non-critical nodes → large buffers (512–2048 samples)
This improves efficiency without affecting latency where it matters.
6. Smarter Scheduling (Critical Path First)
The engine can compute the “longest dependency chain” and prioritize those tasks.
Result:
- Reduced risk of glitches
- Better real-time stability
7. Parallel Mixing (Removing the Master Bottleneck)
Instead of summing everything sequentially:
-
Use tree-based reduction:
- (A+B), (C+D), then combine results
This makes even the final mix parallelizable.
8. SIMD and Data-Oriented Design
Beyond multithreading:
- Use SIMD (AVX, etc.) inside each task
- Optimize memory layout for cache efficiency
This improves per-core performance significantly.
9. Extending to CPU + GPU
Once you have a task-based system:
-
CPU handles low-latency, sequential work
-
GPU can process:
- Convolution reverbs
- Spectral processing
- Granular synthesis
- AI-based tools
The scheduler decides where each task runs.
10. Integrating Lua (or Scripting) in a Modern DAW
A key question is how scripting systems (like Lua tools in Renoise) could fit into this architecture.
The problem
Lua (and similar scripting languages):
- Are not thread-safe by default
- Use garbage collection (non-deterministic pauses)
- Are not suitable for real-time DSP processing
So running Lua directly in the audio thread is not viable.
The solution: decouple control from processing
Instead of using Lua for DSP, use it as a control and orchestration layer:
a) Lua defines behavior, not audio processing
- Build/modify the audio graph
- Schedule tasks
- Control parameters and automation
- Generate procedural musical logic
b) Sandboxed execution
- Each tool runs in its own isolated state
- Executed outside the real-time audio thread
- Communicates via lock-free messaging or double buffering
c) Job system integration
Lua can:
- Spawn background jobs (analysis, MIDI generation, etc.)
- Interact with the engine’s task system
But heavy work is executed in:
- Native code (C++/SIMD)
- Or GPU kernels
d) Strict domain separation
| Domain | Lua allowed |
|---|---|
| Audio thread | |
| Worker threads | |
| UI / logic |
Result
Lua becomes a kind of DSL (domain-specific language) for:
- procedural composition
- advanced automation
- dynamic routing
While the actual audio processing remains fully parallel, deterministic, and real-time safe.
11. Why This Isn’t Standard Yet
- Legacy plugin formats (VST/AU) aren’t designed for this
- Real-time audio constraints are unforgiving
- Complexity of scheduling + determinism
- Backward compatibility requirements
12. Bottom Line
Current DAWs aren’t “badly optimized”—they’re constrained by:
- Sequential DSP chains
- Real-time deadlines
- Old architectural assumptions
But a from-scratch design could:
- Use a task-based graph engine
- Fully exploit multi-core CPUs
- Separate real-time and deferred processing
- Integrate scripting (like Lua) safely as a control layer
- Potentially leverage GPU acceleration
This wouldn’t be an incremental improvement—it would be a paradigm shift.
Curious to hear thoughts—especially from people working on DSP, engines, or plugin systems. Where do you see the biggest practical blockers?
Discussion Questions
-
Do you think current DAW architectures are fundamentally limited by real-time constraints, or just by legacy design decisions?
-
How much parallelism do you actually see in practice in tools like Renoise? Does it match what modern CPUs should be capable of?
-
Would a fully task-based audio engine (instead of track-based threading) be viable in a real-world DAW?
-
Where do you think the biggest bottleneck is today: CPU scheduling, plugin design, or audio graph structure?
-
Could existing plugin standards (VST/AU) adapt to a task-based / heterogeneous system, or would a completely new format be required?
-
How would you handle determinism and reproducibility in a highly parallel audio engine?
-
Do you see a practical way to separate real-time and deferred processing without breaking workflow expectations?
-
Would users accept a system that dynamically changes buffer sizes internally for efficiency?
-
How could a tracker-style workflow (like Renoise) evolve within a graph-based engine without losing its precision and speed?
-
What role should scripting (e.g. Lua) play in a next-generation DAW: UI only, orchestration, or something deeper?
-
Is there any realistic way to safely integrate GPU processing into real-time audio, or is it inherently better suited for offline tasks?
-
What lessons could DAWs borrow from game engines (job systems, schedulers, data-oriented design)?
-
If you were to design a DAW from scratch today, what would you not keep from existing designs?
-
And most importantly: do you think the complexity of such a system is justified by the potential performance gains?
Feasible Roadmap for a Modern Audio Engine (Hypothetical Small Team)
This outlines a realistic development plan for a hypothetical small team (1 lead developer + 2 part-time contributors) aiming to build a next-generation audio engine based on a task-based, graph-driven architecture rather than a traditional DAW.
The goal is not to build a commercial DAW, but a research-grade prototype audio engine that explores modern CPU parallelism, scheduling, and scripting integration.
Phase 0 — Core Design Definition (2–4 weeks)
The focus here is to strictly define scope and prevent overengineering.
-
Define internal data model:
- Audio processing as a Directed Acyclic Graph (DAG)
-
Define execution unit:
- “audio job” (small, independent processing task)
-
Define execution model:
- thread pool + work stealing scheduler
-
Choose core technology:
- C++ or Rust (C++ is more practical for audio ecosystem integration)
-
Explicitly exclude at this stage:
- full VST host support
- advanced UI
- plugin ecosystem compatibility
Outcome:
A frozen technical specification and architecture blueprint
Phase 1 — Minimal Audio Engine (2–3 months)
Objective:
Achieve real-time audio playback with a minimal graph system.
-
Real-time audio callback (ASIO / CoreAudio / ALSA)
-
Basic graph engine:
- oscillator/synth node
- gain node
- mixer node
-
Fixed buffer processing (64–128 samples)
-
Single-thread or minimal parallelism initially
Outcome:
A functional “hello world” audio engine
Phase 2 — Parallel Task Scheduler (3–5 months)
This is the core architectural innovation phase.
-
Convert audio graph into a DAG of jobs
-
Implement:
- thread pool system
- work-stealing scheduler
- dependency resolution system
-
Introduce:
- critical path prioritization
-
Add profiling tools for real-world performance measurement
Outcome:
Fully multi-core capable audio processing engine
Phase 3 — Scripting Integration (Lua or equivalent) (2–3 months)
Introduce a scripting layer for orchestration and control.
-
Sandboxed scripting environment per instance
-
Message-based communication (no shared memory in real-time domain)
-
API capabilities:
- create/modify audio nodes
- control graph structure
- schedule background tasks
-
Strict limitation:
- no DSP processing inside scripting layer
Outcome:
A flexible orchestration layer for dynamic audio graph control
Phase 4 — Minimal UI Prototype (2–4 months)
Not a full DAW interface, only a functional research UI:
- Graph visualization (nodes + connections)
- Simple transport controls (play/stop)
- Basic parameter control system
- Optional simplified tracker-style view
Outcome:
Usable interactive prototype for testing engine behavior
Phase 5 — Advanced Optimization Layer (3–6 months)
Focus on performance engineering:
- SIMD optimization (AVX2/AVX-512)
- Lock-free data structures tuning
- Cache-efficient buffer management
- Hybrid buffer size strategy (adaptive processing blocks)
- Advanced latency profiling tools
Outcome:
High-performance, production-grade audio core
Phase 6 (Optional) — GPU / Advanced DSP Integration (indefinite)
Only after CPU engine stability:
- GPU-based convolution processing
- Spectral / batch DSP operations
- Offline rendering pipeline
- Hybrid CPU/GPU task scheduling
Outcome:
Experimental heterogeneous compute audio engine
Realistic Time Estimates
Minimal viable research engine (MVP)
6–9 months
- basic audio graph
- simple scheduler
- scripting integration
- minimal UI
Functional research DAW prototype
12–18 months
- stable multi-core scheduler
- Lua-based orchestration
- usable UI prototype
- profiling and optimization layer
Advanced experimental system
18–30+ months
Only achievable if scope expands to include:
- plugin ecosystem support
- full UI/UX polish
- external compatibility layers
Practical Constraints
Key challenges in such a system are not purely computational:
- Plugin ecosystem complexity (e.g. VST/AU standards)
- Real-time determinism requirements
- Multi-thread debugging complexity
- UI development often dominating total workload
- Audio edge cases in real-time scheduling
Conclusion
This type of system is not intended to compete with established DAWs such as Ableton Live or FL Studio.
Instead, it represents a research-oriented architecture exploration focused on:
- task-based audio computation
- full multi-core CPU utilization
- separation of real-time and deferred processing
- scripting as a control layer rather than DSP execution
- potential future hybrid CPU/GPU audio systems
The most realistic outcome is not a commercial DAW, but a proof-of-concept engine that could influence next-generation audio software design.

