2 min read

Building PowerTTS: PPT → Audio → Video Pipeline (FastAPI + React)

Building PowerTTS: PPT → Audio → Video Pipeline (FastAPI + React)
PowerPoint to TTS Converter

Problem / Context

  • Creating professional narrated presentations is time-consuming
  • Users (content creators, educators, businesses) typically need to:
    • Record narration per slide manually
    • Sync audio with slides
    • Export to video for sharing
    • Manage multiple voices and languages
  • Traditional workflow involves multiple tools and takes hours
  • PowerTTS goal: Upload a PowerPoint, select a voice, receive a fully narrated video quickly

Constraints

Compliance & Audit

  • File size limit: max 50MB per PowerPoint (avoid server overload)
  • Strict validation: only .ppt and .pptx (prevent malicious uploads)
  • Project isolation: unique UUID per project (prevent data leakage)
  • Cleanup: automatic removal of temporary/expired project files

Legacy Data & Compatibility

  • Supports .ppt (legacy) and .pptx (modern)
  • .ppt conversion requires LibreOffice
  • Cross-platform support: Windows (COM automation), Linux, macOS
  • Fallback slide rendering methods: LibreOffice → python-pptx

Timelines

  • Real-time progress tracking for long operations
  • Video generation runs in background threads (avoid blocking API)
  • Timeout handling:
    • LibreOffice conversions: 300s
    • FFmpeg operations: 60s

Reliability & UX Improvements

  • Retry logic (3 retries) for TTS failures
  • Fallbacks (LibreOffice → python-pptx, MoviePy → FFmpeg)
  • Real-time progress updates (reduces user confusion)
  • Zero-config deployment with Docker Compose
  • Multiple export formats + broad voice selection

Lessons Learned

  • Direct FFmpeg calls beat wrappers (MoviePy is slower)
  • LibreOffice is unreliable for batch export → must have fallbacks
  • Background tasks require progress tracking channel (files/Redis/WebSockets)
  • Edge TTS rate limits → backoff/retry is essential
  • File-based storage works well for MVP simplicity
  • Docker Compose simplifies deployment

What to Improve Next

  • Add database (PostgreSQL) for multi-user + project history
  • Replace polling with WebSocket progress updates
  • Add queue system (Redis + Celery) for scalable processing
  • Add caching (hash text+voice) to avoid regenerating same TTS
  • Improve slide image fidelity (e.g., Playwright rendering)
  • Add video customization (resolution/FPS/aspect ratio)
  • Add audio post-processing (normalization, background music)
  • Add batch/multi-project processing API
  • Add analytics dashboard (usage + performance metrics)
  • Offload encoding to cloud services for cost/scale