Cloud
Live System

Template-Based Auto Video Editor

An automated REST API that generates videos from JSON input. Send text, style, and audio, and get a ready-to-post video using template-based overlays on slow-motion clips. Built to keep social accounts active with no manual editing, plus queue handling, rate limiting, job tracking, and auto cleanup.

PythonOpenCVAPIFastAPIVideo-ProcessingWebhooks
Template-Based Auto Video Editor

Vix Video Editor API

# Overview

This is a REST API I built for automated video generation. You send a JSON request with text, style, and audio - and it spits out a custom video with overlays. Right now it's template-based. Text overlays on random slow-motion videos. Perfect for keeping social accounts active with weekly content. No manual editing needed. Just send JSON, get video. Queue-based rendering, rate limiting, job tracking, auto cleanup. The whole package.

# Why I Built This

This one's personal. I wanted to manage a Facebook page, but manual editing was so boring. Even just adding text to a video and waiting 1-2 minutes to render - what a waste of time.

So I spent 2 months building this instead. It actually ran for about 2-3 months. Facebook APIs expire every 2 months though, so manual updates still suck. This API i connected through n8n workflow with AI contenct genaration. but the workflow stopped - I need to update the Facebook page token. im lasy to do that.

But honestly? I don't regret it. This project taught me more than I expected.

# Technical Stack

Core API

  • FastAPI - Modern Python web framework with async support, auto docs, and type validation
  • Python 3.11+ - The language everything runs on
  • SQLAlchemy - ORM for database operations
  • Pydantic - Request/response validation with proper types

Database

  • SQLite - Default database (can swap for PostgreSQL/MySQL)
  • SQLAlchemy ORM - Database models and migrations

Video Processing

  • MoviePy - Unwanted middle child
  • FFmpeg - Video encoding and format conversion - the heavy lifter
  • OpenCV (cv2) - Frame manipulation, video reading and writing
  • Pillow (PIL) - Image processing for text overlays and logos

Deployment & Infrastructure

  • Uvicorn - ASGI server for running FastAPI
  • Gunicorn - Production WSGI server
  • multiprocessing - Parallel job rendering
  • SystemD - Linux service management
  • Git webhooks - CI/CD deployment automation

# API Architecture

Loading diagram...

# Tool Evolution - The Journey

First attempt: MoviePy

I wanted this as a side project while building my auto trading bot. Had a free VPS sitting around, so I figured this would be good practice. and mybe a facebook page also grown while im doing my other things. But then I got sidetracked to this form my main project, but let me tell you how it started.

I began with MoviePy. I had an old project that used it, so I thought it would be easy. High-level Python API, simple syntax, good documentation. Seemed perfect.

Then I drowned in installation hell.

On Windows, I installed MoviePy and FFmpeg as executables. When it was time to test - the tools weren't recognized by the system. I tried adding system variables. Didn't work. I tried reinstalling. Didn't work. I spent more than a day troubleshooting. Finally fixed it by hardcoding the paths in the script.

Then came the first render test. I have a 1st gen i7 - not great, but enough for most workloads. The first video took more than 20 minutes. While rendering, I couldn't do anything on my PC. It was using more than 10GB of RAM for a single 1080p video. This is so inefficient.

Still, I thought it wouldn't be a problem once deployed. My server can handle it also its just time i have planty, only need 1 video per day to stay active, right?

Then came deployment on Ubuntu. Same story. Installed MoviePy, installed FFmpeg. Both showed as installed. When it was time to run - MoviePy was 404. Nowhere to be found. I spent hours troubleshooting, checking paths, fixing version mismatches. Finally got it working.

The reality hit hard. The same video that took 20 minutes on Windows now took more than 4 hours on the server. My server is weak i know - Its oracle free tier with 1/8 AMD CPU core and 1GB RAM. But it's capable of more than this. I started investigating. It was always RAM.

I needed something more memory efficient. Something faster. Something scalable.

Why MoviePy didn't work:

  • Slow rendering with tons of overhead
  • Memory leaks - each VideoClip object held frame data in RAM
  • No control over FFmpeg commands
  • Text rendering was slow - fonts rendered on every frame
  • 4+ hours per video on a 1GB server
  • 20+ minutes on a decent PC
  • 10GB+ RAM usage

The switch: FFmpeg + OpenCV + Pillow

I realized MoviePy is just a wrapper around FFmpeg anyway. Why not use FFmpeg directly? But raw FFmpeg is complex for text overlays. So I built a hybrid pipeline:

  1. OpenCV - reads source video frames (fast, memory efficient)
  2. Pillow - pre-renders text overlays as transparent images (once per video, not per frame)
  3. FFmpeg - encodes everything with proper codecs (fast, hardware acceleration support)

The results:

  • My machine: 45-60 seconds per video (down from 20+ minutes)
  • Server: 8-10 minutes per video (down from 4+ hours)
  • Memory usage stayed flat
  • Could run parallel render jobs without crashing
  • FFmpeg works fine, no installation headaches
  • OpenCV didn't give me headaches
  • Pillow works perfectly
  • Smooth deployment on the server
  • Rendering on 1GB RAM server without issues

4 hours to 10 minutes. That's not just an improvement - that's a completely different ballgame.

Key optimizations I made:

  • Pre-bake text overlays - Instead of rendering text on every frame, render it once with Pillow as a transparent PNG, then composite onto frames. This alone saved massive amounts of CPU.

  • FFmpeg pipe streams - Don't write intermediate files. Pipe frames directly from Python to FFmpeg stdin. No disk I/O bottleneck.

  • Multiprocessing - Each render job runs in a separate process. True parallelism across all CPU cores. Not threading - actual processes.

  • Video info cache - Store source video duration and size in a JSON cache. Don't have to read file headers every single time.

  • Style skip config - Some source videos have issues. Mark them as skipped in a config file so the queue worker doesn't waste time on broken files.

  • Full 60fps processing with effects - Here's the crazy part: we're still rendering full 60fps source videos, applying slow motion effects, blur, color adjustments - all the expensive stuff. And it's STILL 8-10 minutes instead of 4 hours. That's how inefficient MoviePy was.

# Problems I Ran Into

Let me be real - this wasn't smooth sailing.

Installation hell with MoviePy. Spent a full day troubleshooting why FFmpeg and MoviePy weren't recognized. System variables, re-installs, nothing worked. Finally fixed by hardcoding paths in the script. but still dosn't fit for the job.

Memory usage with MoviePy. 10GB+ RAM to render a single 1080p video. Couldn't use my PC while rendering. Server would crash after 20 videos.

Deployment version mismatch. MoviePy version on Ubuntu didn't match Python package. Had to downgrade script to match server's ancient packages.

Video format issues. OpenCV reads frames, FFmpeg encodes them. Getting pixel formats right - that was a fight. Color space mismatches, weird resolution issues.

Multiprocessing on Windows vs Linux. multiprocessing.freeze_support() needed for Windows. Manager().dict() for shared state. Lots of trial and error between dev and production.

Text overlay positioning. Pixel-perfect positioning across different resolutions. Had to build a custom coordinate system supporting both ["center", "center"] and [540, 960] formats.

FFmpeg pipe streams. Writing frames through stdin to FFmpeg. Buffer issues, encoding hangs, corrupt output. Took forever to get the pipe command right.

Queue race conditions. Had to use database transactions and proper status locking.

Font caching. Google Fonts API has rate limits. Had to build a local cache with TTL so I'm not hitting their API every request.

# How I Fixed Everything

Here's how I tackled each problem:

Video formats. Standardized everything. OpenCV reads frames → Pillow processes images → FFmpeg encodes to H.264. Every video output is the same format: MP4, 1080x1920 (9:16), H.264 codec.

Multiprocessing. Used multiprocessing.Manager() for shared state. Each render job runs in its own process. The main API stays responsive. Worker dies when job finishes.

Text positioning. Built a flexible coordinate system. Can use ["center", "center"] or [540, 960]. Added margin support too. Tested across different resolutions.

FFmpeg pipes. Used subprocess.PIPE with proper buffer sizing. Frame-by-frame write to stdin. -y -f image2pipe -c:v mpeg4 -i - input flags. -c:v libx264 -pix_fmt yuv420p output flags.

Queue locking. Database query with .order_by(Job.created_at).first() gets the next job. Update status to "rendering" immediately inside a transaction. Commits only if everything works.

Font cache. JSON file cache with 24-hour TTL. Check file mtime. If stale, fetch from Google API, update cache. If API fails, fallback to stale cache.

File cleanup. Check file creation time. file_path.stat().st_mtime. If older than OUTPUT_RETENTION_HOURS, delete. Run every 60 minutes in background thread.

# Security & API Design

After everything was working, I wanted to host it. But then the fear of random people using my service kicked in. So I built an entire security system.

Security features I added:

  • API key authentication with rate limiting
  • Admin endpoints (MASTER_KEY only, IP whitelisted)
  • Daily request limits per key
  • Per-minute rate limits
  • Queue-based rendering - only one video at a time on my weak server
  • All server power goes to single render job

I learned a lot about security, rate limiting, IP whitelisting, and resource management through this.

The JSON structure:

I redesigned it around a single JSON payload. One endpoint for everything. The structure lets me control every aspect of the video in one request.

Content pipeline:

  • Gathered 60+ hours of non-copyrighted content
  • Pre-processed all videos to portrait style (1080x1920)
  • Random selection so each video feels unique
  • 30 second clip slowed 2x = 1 minute video
  • 60fps source → 30fps output (smooth, moody look)

The result looks professional. If I saw these videos somewhere, I'd actually buy the design. mmm, bulk video generation might be a good idea for later.

Why nested JSON?

  • video block - all the video settings (duration, effects, style)
  • audio block - audio settings separate from video
  • text_overlays array - multiple text lines, each with own timing/styling
  • signature - special text overlay that's always there (watermark)
  • logo - overlay images/branding

This structure is flexible. Want 5 text lines? Just add 5 items to the array. Want no audio? Set "audio": "none". Want random style? "style": "random".

Pydantic models validate everything. Bad JSON gets a 422 with exact error messages. No guesswork.

# What I Learned

This project was a crash course in modern web development and DevOps.

  • FastAPI - Request models, dependencies, routers, CORS, lifespan context, middleware
  • SQLAlchemy - ORM models, relationships, session management, transactions
  • Pydantic - Data validation, nested models, validators, Field() constraints
  • FFmpeg CLI - Pipe streams, codecs, pixel formats, input/output flags
  • OpenCV - VideoCapture, VideoWriter, frame manipulation, property access
  • Multiprocessing - Process spawning, Manager() shared state, freeze_support
  • Threading - Background workers, daemon threads, thread-safe operations
  • SystemD - Service files, user services, enable/start/restart commands
  • Webhooks - GitHub signature validation, secret headers, payload parsing
  • Asyncio - asynccontextmanager, lifespan events, async/await patterns
  • Security - API key authentication, rate limiting, IP whitelisting, resource management
  • Shell scripting - Bash scripts, deployment automation, service management
  • Linux troubleshooting - Not perfect but enough to level up from beginner
  • Nginx proxy setup - Reverse proxy, multiple endpoints on same server

# Deployment & DevOps

This project is always one step toward my DevOps career. I learned a lot about shell scripting and Linux environment troubleshooting. I'm not perfect at it, but I've leveled up enough that I know where to look when I see the same problem again.

Nginx proxy setup:

I needed 2 endpoints - one for the API, another for CI/CD. so i set up NGINX reverse proxy on the same server.

The DIY webhook story:

I didn't want to manually upload every time I have an update. But I also didn't want to use GitHub Actions or Jenkins - it's overkill for this. This isn't a build-focused project like npm. Just need to:

  1. Update the files
  2. Install if requirements.txt changed
  3. Restart the service

So I invented a smaller weel, I built my own webhook connection. People ask "why reinvent the wheel?" But the wheel they invented is too big for me. So I built my own.

How the webhook works:

Loading diagram...

It's a separate service that uses the same repo. No restart needed for the webhook itself. Its only job is to trigger when GitHub gets a new push. When it does, it runs deploy.sh and the whole service updates to the new code.

Huge time saver.

# API Usage Examples

Generate Video Request

{ "video": { "style": "random", "exposure": 1.0, "brightness": 1.1, "contrast": 1.0, "saturation": 1.2, "fade_in": 0, "fade_out": 0, "duration": 15, "speed": 0.5, "blur": 0 }, "audio": { "audio": "random", "volume": 1.0, "fade_in": 2, "fade_out": 2 }, "text_overlays": [ { "text": "Hello World", "font": "Arial", "font_size": 30, "font_align": "center", "font_color": "white", "start_time": null, "end_time": null, "fade_in": 0.0, "fade_out": 0.0, "positionxy": ["center", "center"], "margin": [0, 0, 0, 0] } ], "signature": { "text": "@mychannel", "font": "Arial", "font_size": 24, "font_align": "center", "font_color": "white", "opacity": 0.5, "positionxy": ["bottom", "center"], "margin": [0, 25, 25, 0] }, "logo": { "name": "logo.png", "size": 150, "positionxy": ["bottom", "center"], "margin": [0, 25, 25, 0], "opacity": 1.0 } }

API Endpoints

Generate Video:

POST /v1/generate X-Api-Key: catv_xxxxx # Response { "job_id": "job_abc123xyz", "status": "in_queue" }

Check Status:

GET /v1/status/job_abc123xyz # Response { "job_id": "job_abc123xyz", "status": "rendering", "progress": 45, "queue_position": null, "estimated_time_remaining_s": 32 }

Download Video:

GET /v1/download/job_abc123xyz.mp4 # Returns video file directly

# That's the Story

Like I said at the start, this was a side project. But it taught me so much about the career I'm looking for, so I added it here.

The crazy part is the performance win - 4 hours down to 8-10 minutes per video. On a 1GB RAM server. That's not just an improvement. That's proof that the right tooling matters.

I ended up building a production-ready video generation API with queue-based rendering, parallel processing, rate limiting, auto cleanup, security middleware, admin endpoints, and even my own DIY webhook deployment because GitHub Actions felt like overkill. Along the way I picked up FFmpeg, OpenCV, FastAPI, SQLAlchemy, multiprocessing, SystemD services, Nginx proxy, shell scripting, and enough Linux troubleshooting to not feel like a total beginner next time.

The problem this solves is simple: automated video content creation at scale. No manual editing. Send JSON, get video. Sometimes the best projects start with the simplest problems.

Leave a suggestion to improve!

Share your thoughts about this project