Build an Auto-Scaling Transcoding Platform Using Python, FFmpeg & AWS

4 min readJul 13, 2021

--

In this post, I’ll describe the building blocks of a resilient self-hosted transcoding platform using open source tools and AWS.

For part two, I’ll share a sample python project that allows you to bootstrap this in minutes.

General principles

When building a system like this, you should never compromise on these:

Self-healing (AWS ASG)
Retry failed jobs (SQS)
Instrumentation
Cost efficiency
Auto Scaling (AWS ASG)
Error logging (Sentry.io)
Central logging (AWS CloudWatch)

Most of those can be attained effortlessly through SaaS solutions and/or your cloud provider services.

Infrastructure diagram

You can replace the compute layer with lambda, for example, just bear in mind that you have an execution time limit of 15 minutes.
The cheapest option will always depend on your workload (video length, output formats, amount of videos, schedules, etc), this architecture can be adapted to the execution layer that suits you better.

The cheapest option in AWS is to have a pool of workers using an auto-scaling group that provisions spot instances.

Using the proposed architecture, premature instance termination will have no effect on our service. This means we can safely use spot instances and enjoy up to a 90% discount. On average, my total discount has been about 70% versus On-Demand pricing.

This will be orders of magnitude cheaper than any SaaS video transcoder on the market.

Transcoder node diagram

Key points for the system (AWS version)

SQS: Send your jobs to a queue, here I’m using SQS, but any queue/stream service or even Redis should handle this job just fine.
Remember to set your message visibility timeout longer than your longest-running transcode job or you’ll have the message go back to the queue.
Beanstalk: worker type will automatically read from SQS and send the message to a handler in your code.
CPU metric: It’s best suited for auto-scaling since this will mainly use the machine CPU (unless you’re using GPU machines).
Docker: Using a docker image is much easier to manage and update than using a custom AMI. This also will keep your CI/CD simple since you can update the OS and code at the same time.
S3: It will make your life much easier to just store the originals and the transcoded on S3. Make sure you separate them in a way that’s easy to manage. If they sit on different base prefixes, it’s much easier to create policies and other AWS automated behaviors.
CloudFront: Even if you only intend to distribute this for a small number of users, CloudFront should be used. In my tests, I found that downloading through CF increased my download speeds considerably, even for the “first fetch”.
Sending messages: If you can get away without special logic, you should just make use of AWS automation to send events when new files land on S3. https://docs.aws.amazon.com/AmazonS3/latest/dev/ways-to-add-notification-config-to-bucket.html
Stateless: Remember to always keep your system stateless, this will make your life easier in the long run. If you really need to store some session data, use Redis instead of the worker node memory/filesystem.
Logging: Ship your logs to a central service like CloudWatch, Papertrail, or Logz.io. Beanstalk allows you to ship all machine logs to CloudWatch with the click of a button.
Instrumentation: Beanstalk offers some instrumentation out of the box and you can even configure custom metrics like RAM usage using agents inside the machine.
Sentry.io: Sentry.io is an excellent option to capture all your application errors with a complete stack trace and other useful information.

FFmpeg command

This command will output using the HLS. You can easily change this to any other format. I chose this one in case you need HLS and the information for it can bit a bit scattered since it offers a lot of options.

FFmpeg command to transcode a video file to HLS

DOWNLOAD_PATH: this is the path to the original file you want to transcode
HLS_PATH: this should be the destination base folder
split: split input into multiple outputs
map: map the different outputs with the specified formats
-f hls: instructs FFmpeg to convert to HLS format
hls_playlist_type vod: this will make sure you have the full playlist inside the m3u8 file
strftime: special param that will format the filename based on the machine datetime and auto-create the needed file structure (very useful)
version_%v: this allows you to have a folder for each version (_0, _1, _3)

The following script serves to show a very basic example of how an AWS SQS worker could process the videos.

simple python script to illustrate the system

This article's intention is mainly to give you the key concepts you need if you want to build a resilient transcoding system. If you’re interested, I can share a sample python project and some more detailed instructions on how to run this in production with some extra features like dynamic watermarks.

Build an Auto-Scaling Transcoding Platform Using Python, FFmpeg & AWS

General principles

Infrastructure diagram

Transcoder node diagram

Key points for the system (AWS version)

FFmpeg command

Further Reading

Creating the Perfect Python Dockerfile

Increase your python code performance and security without changing the project source code.

Gunicorn Worker Types: You’re Probably Using Them Wrong

Scale your wsgi project to the next level by leveraging everything Gunicorn has to offer.

Understanding and optimizing python multi-process memory management

This post will focus on lowering your memory usage and increase your IPC at the same time

Written by Luis Sena