Building VoiceKick: modern desktop application
Graphical user interface combining voice stream and Whisper
When I started working on VoiceKick, one of the first decisions I faced was whether to build a terminal-based application (TUI) or a modern desktop application with a graphical user interface (GUI). Terminal interfaces are lightweight and often preferred by developers, but I realized a desktop application would be more convenient and accessible for a broader audience, especially non-technical users.
After deciding on a GUI, the next step was choosing the right framework to bring VoiceKick to life. This required striking a balance between performance, ease of development, and user experience. Below is a short video depicting the application, source in voicekick-dioxus.
Exploring the Framework Landscape
I researched several frameworks, and a few stood out as the most promising options for building modern desktop applications in Rust:
Tauri: lightweight and fast, using web technologies like HTML, CSS, and JavaScript for the frontend while leveraging Rust for the backend. It seemed like a natural choice for small, fast apps.
Dioxus: inspired by React, Dioxus offers a declarative approach to building GUIs in Rust. It felt more aligned with my preference for modern, component-driven designs.
Iced: A robust framework for building native GUIs, offering a clean API but somewhat limited in terms of styling and modern UI capabilities.
Slint: Designed for building sleek, custom user interfaces, but its focus seemed more niche for applications requiring intricate visuals.
First iteration: Tauri
Initially, I chose Tauri, drawn by its promise of creating lightweight desktop apps with a strong Rust backend. The idea of building GUIs using familiar web technologies was appealing. I quickly set up the first version of VoiceKick, which allowed basic functionality like selecting input devices and displaying waveforms.
However, I ran into issues almost immediately. While Tauri is great for some use cases, debugging turned out to be a frustrating experience. After half a day of troubleshooting with minimal feedback or actionable insights, I decided it wasn’t worth the time investment - at least not for this project.
Switching to Dioxus
After leaving Tauri behind, I turned to Dioxus, which had some compelling advantages:
Extensive documentation: the documentation is comprehensive and developer-friendly, making it easier to troubleshoot and experiment.
Familiar concepts: it borrows from React’s declarative component model, which I found intuitive and efficient for building UIs.
Flexibility and performance: Dioxus reuses some of Tauri’s better ideas but focuses on performance and a more ergonomic developer experience.
With Dioxus, I quickly recreated the basic structure of VoiceKick and even added features without the earlier debugging headaches. The framework’s component-based architecture allowed me to iterate faster and stay organized.
Building VoiceKick’s core features
For the first iteration of the VoiceKick desktop application, I implemented two main pages:
Page 1: Voice configuration and waveforms
Input device selection: users can choose the audio input device (e.g., a specific microphone).
Voice detection threshold: a slider allows fine-tuning the voice detection threshold, balancing sensitivity and noise filtering.
Waveform visualization: the page displays real-time audio waveforms, giving users instant feedback on their input. This visualization is crucial for understanding how the app interprets sound.
Page 2: Whisper configuration
Model selection: users can select the Whisper model to use for transcription. The default is TinyEn, optimized for lightweight and fast transcription tasks.
Language settings: users can specify the language, ensuring accurate transcription for multilingual setups.
Both pages focus on simplicity and functionality, ensuring the app is user-friendly and practical for real-world use cases.
Summary
Switching frameworks early in development was an easy decision and ultimately the right one. Here are a few takeaways from the process:
1. Start simple: the first version contains a bunch of unwrap and expect statements; it just needs to work. This helped me focus on core features before worrying about polish.
2. Prioritize DevEx: a framework that simplifies debugging and iteration and saves significant time and frustration.
3. Flexibility matters: choosing a flexible framework like Dioxus made it easier to adapt as the project evolved.
4. Keep the user in mind: Transitioning from a TUI to a GUI may have been more work initially, but it made VoiceKick more accessible to a wider audience.