What does it look like when a linguist, someone who thinks professionally about rhythm, stress, and information structure, sits down with a TTS tool? We asked Isabella Massardo to walk us through her experience with Voiseed Studio.
The workflow: from edited article to finished audio in under 30 minutes
Isabella uses Voiseed Studio to produce audio versions of articles published on the GALA Resource Center — a well known platform focused on localization and globalization topics. Examples include Connecting Communities and Technology and Inclusivity in Tech for Asian Languages.
"Accessibility is part of how we think about reach," she explains. "Offering a high-quality voice version means readers can engage with our articles while commuting, walking, or simply giving their eyes a break, and it makes the content available to people who prefer or need audio."
In her process, Voiseed Studio comes in at the end: the article is already edited and approved before she opens the tool.
"I spent the early weeks experimenting to develop a repeatable process; now I can take an article from text to finished audio file in 10 to 30 minutes, depending on length and complexity. That's a small investment for the quality and accessibility it adds."
What makes the difference: control, multilingual handling, and timing
When it comes to features, Isabella points to a few things that genuinely changed her output quality.
She settled on the Cassian voice after testing several options: "It balances clarity with a natural cadence." But the most impactful capability has been multilingual pronunciation control.
"Our articles are full of names, acronyms, and terms from different languages, and being able to fine-tune pronunciation makes a real difference. 'GALA' is a good example: we ensure it's pronounced /ˈɡaː.la/ rather than left to chance. The same applies to people's names and language-specific terminology. Once I set these in the project glossary, they're saved, so I'm not re-doing the same work on every article."
She also pays close attention to timing: "I listen to each segment and adjust pauses to improve flow, which makes the audio far easier to follow."
The contrast with other TTS tools is clear. "Compared to the TTS tools I used in the past, the difference is striking. Older solutions offered limited voice options and produced output that was technically impressive but unmistakably robotic. Voiseed's personalization translates into a listening experience that feels professional and natural."
The linguist's eye: preparing text for ears, not eyes
This is where Isabella's background really shapes her approach. Her process has three distinct layers before she even generates a single second of audio.
"Moving from page logic to audio logic is, for me, the core of the work. Written and spoken language follow different rules, and a linguist's eye helps spot where a sentence that works visually will trip up the ear."
The first layer is standard editing: spelling, punctuation, clarity. The second is formatting: removing hard returns, standardising spacing, adding full stops to titles so the system handles them correctly.
The third layer is where things get interesting. Isabella built a dedicated Copilot prompt specifically for audio adaptation: "It restructures sentences for listening comprehension (shorter, less subordinated), spells out numbers and symbols where needed, and rewrites visual structures like bullet points into prose that flows naturally when spoken. A bulleted list of three items, for instance, becomes a sentence with clear connectors so the listener can follow without seeing the page."
For pronunciation, she works through the glossary and IPA controls: "Proper nouns and technical terms from non-English languages are the most common cases."
Then comes timing, the step she considers the most focused part of the whole process: "I go through the generated audio segment by segment, adjusting pauses, occasionally splitting or merging segments, fine-tuning pace so the text breathes in the right places. That's the step that turns a good audio file into one that genuinely sounds like it was made for listeners."
What's still on the wish list
The most time-consuming part of Isabella's workflow remains the page-to-audio rewrite, though she's clear this is inherent to the format, not a tool limitation.
If she could add one feature, it would be a pre-flight check before generation: "A feature that runs before generation and flags sentences likely to feel long or dense when spoken, suggests where pauses might be needed, etc. Even as suggestions rather than automatic changes, it would compress the prep stage considerably and bring some of the linguistic judgment I currently apply manually."
On language coverage, English is where she works most, and quality there is consistently strong. "Articles that mix languages (quotations, terminology, names) are where I do the most manual refinement, mostly through the glossary. Anything that streamlined multilingual handling at the segment level would be welcome."




