Text-to-speech API
Text-to-speech API I built for automated video narration. You send text, it gives you audio + word-by-word subtitle files. I made this for those Minecraft storytelling videos that went viral in late 2023 to 2024 era, those were popular on TikTok and Facebook at the time.

Edge TTS API
# Overview
Text-to-speech API I built for automated video narration. You send text, it gives you audio + word-by-word subtitle files. I made this for those Minecraft storytelling videos that went viral in late 2023 to 2024 era, those were popular on TikTok and Facebook at the time. It was cool concept and easy to automate so I started building one part at a time - this is the audio + subtitle engine part.
It used Microsoft Edge's neural TTS engine. 400+ voices, 100+ languages, completely free. Hosted on Vercel as serverless functions. Worked great for about 3-4 months. Then Microsoft changed their API. I tried fixing it. Didn't work. Ultimately gave up with the work I had at the time.
# Timeline
- July 27, 2024 - First commit. Built the whole thing in a day.
- April 16, 2025 - Last attempt at fixing it. Gave up after that.
# Why I Built This
I wanted to make those simple videos myself. You probably seen them - those creepypasta stories that TikTok spits out in old days. A Reddit horror story with the Minecraft parkour background and middle word-by-word big text popping up with the audio. Nowadays they're not as viral as old days but still around.
Microsoft Edge has this "Read Aloud" feature with voices that sound actually good. Someone reverse-engineered it into a Python library and wrapped it into a library called edge-tts. I wrapped it in an API, added subtitle generation, deployed to Vercel. It can deploy onto a VPS as well but Vercel offered free so I hosted there. Worked perfectly for months.
# How It Worked
The flow was pretty simple:
You send text + voice choice. The API calls Microsoft Edge's TTS service. Gets the audio back. Also grabs timing data for each word (like "hello" starts at 0.00s, ends at 0.30s). Converts that into subtitle format. Zips everything together. Done.
# Technical Stack
Core API
- FastAPI - Web framework
- edge-tts - Python wrapper for Microsoft Edge TTS
- Uvicorn - Server
- Pydantic - Data validation
Deployment
- Vercel - Serverless hosting (Edge-tts-api-on-vercel repo)
- Python 3.9 - Runtime
# Voices & Languages
400+ neural voices. Some I used:
- English: en-US-BrianMultilingualNeural, en-GB-SoniaNeural
- Sinhala: si-LK-SameeraNeural, si-LK-ThiliniNeural (yes, Sinhala included)
The quality was impressive. Not robotic like other TTS's at the time. Actually sounded like a real person talking. The kind of voiceover you'd normally pay for. And it's completely free.
# Vercel Deployment
That's my 1st exploration into serverless environments. I setup Vercel config files and all - all of them were new to me, but figured it out with AI and so on, and deployed there. Serverless are limited by time. I didn't remember what the limit was at the time but it's pretty generous. I can run this on like few minutes of text without problem, it spits out all the SRT file and the audio file perfectly for download. It's pretty cool overall.
# Why It Died
Microsoft updated Edge's TTS service. I think they also changed the API endpoints, updated authentication, and so on. But the community adopted and changed the library, so that meant I needed to change my scripts too to match the new style. But as I said it's not viral anymore so why bother fixing unwanted library anyway. But I tried once to fix it.
I tried fixing:
- Updated library to latest version
- The subtitle part didn't work correctly so adjusted the script a bit
- Poked at the internal API calls
But ended up calling it a day. Because it really felt dead weight at this point to support something no one using - not even myself.
# What I Learned
Even though it doesn't work anymore, I learned a lot:
- FastAPI basics - Request models, routers, auto-generated docs
- Async/await - TTS is slow, learned to use async properly
- World of Serverless - Understanding serverless envarements
- External API risks - Building on undocumented APIs = breakage eventually
- ZIP files - Creating file packages in memory
- SRT format - Sub file formats, SRT, VTT
- Vercel deployment - Serverless functions, cold starts, timeouts
# Would I Do It Again?
Yeah, but differently next time.
I need tools that once built, will sustain themselves. Occasional troubleshooting is fine - but depending on undocumented APIs that break randomly isn't sustainable. If I build this again, I'd use official services or run it on a proper VPS where I have full control.
This was my first ever my own hosted project, So yeah, I learned a lot. I'd definitely learn even more building it again with what I know now.
It was fun while it lasted. Serverless is cool - I'm a fan now. No server management, less flexible. Need to learn where to use it. That's a lesson. And I like lessons. The best lessons are the ones you break stuff, then fix it yourself multiple times.
Sometimes a project doesn't need to last forever. It just needs to teach you something while it works.
Leave a suggestion to improve!
Share your thoughts about this project