Hello Everyone,

As per the survey, It is found that Netflix serves over 6 billion hours of content per month, globally. Netflix's backend system is not just a simple architecture to consider, it is so efficiently engineered.

15% of all internet bandwidth usage worldwide is attributed to Netflix.

In this article, let us analyze Netflix's backend architecture by looking at the individual system components.

3 MAIN COMPONENTS OF NETFLIX:

Netflix uses two clouds to operate. Open Connect along with Amazon Web Services.

Open Connect Appliances(OCA):

Open Connect is a custom global content distribution network created by Netflix (CDN). To provide Netflix content to viewers, these OCAs servers are positioned inside the networks of internet service providers (ISPs) and Internet exchange sites (IXPs) all over the world.

Client:

Any device from which you play a Netflix video is referred to as a client. This is made up of all the programs that communicate with the Netflix servers.

Netflix web app is written using reactJS which was influenced by several factors some of which include startup speed, runtime performance, and modularity.

Backend:

Databases, servers, logging frameworks, application monitoring, recommendation engines, background services, etc. are included in this. The backend server in AWS handles all requests when a user loads the Netflix app.

Some of these backend services include (AWS EC2 instances, AWS S3, AWS DynamoDB, Cassandra, Hadoop, Kafka, etc).

Now Let us go deep into Backend Architecture:

Backend Architecture:

Each part of their system is a collaboration between a number of loosely connected services. Large, complex applications can be delivered, often, and reliably thanks to the microservice design. An illustration of the backend architecture can be found below.

A Backend operating on AWS receives a Play request from the Client. For traffic routing to its services, Netflix makes use of Amazon's Elastic Load Balancer (ELB) service.

That request will be forwarded to the API Gateway Service by AWS ELB. Zuul, which was designed to support dynamic routing, traffic monitoring, security, and resilience to failures at the edge of the cloud deployment, is the API gateway used by Netflix.

The main business logic underlying Netflix operations is included in application API components. There are various API types that correlate to various user behaviors, including the Signup API and the Discovery/Recommendation API for receiving movie recommendations. In this case, Play API handles the request that API Gateway Service passed.

To complete the request, Play API will make a call to a microservice or series of microservices. Microservices are often small, stateless programs that communicate with one another and can number in the thousands.

Microservices can deliver user activity tracking events or other data to the Stream Processing Pipeline for batch processing of business intelligence tasks or real-time processing of personalized recommendations.

The output data from a stream processing pipeline can be persistently stored in other data stores like AWS S3, Hadoop HDFS, Cassandra, etc.

Hystrix & Titus:

Hystrix is a Distributed API Services Management. Any distributed environment (with multiple dependents) will undoubtedly experience service dependencies failure at some point. Monitoring the health and condition of every service can get difficult when more and more services are put into place and as certain services are taken down or simply fail.

Hystrix offers assistance by offering a straightforward dashboard. By adding some logic for fault tolerance and latency tolerance, the Hystrix library is utilized to manage the communication between these distributed services.

Whereas...

Titus is a container management technology that offers a cloud-native interface with Amazon AWS, scalable, and reliable container execution.

It is a framework that is based on Apache Mesos, a system for controlling clusters of machines and distributing available resources among them. At Netflix, Titus is used in production to manage hundreds of thousands of AWS EC2 instances and launch millions of containers every day for batch and service workloads. Consider it Kubernetes as it exists on Netflix.

Stream Processing Pipeline:

The fact that the image displayed for each video was specifically chosen for you. Based on the information they have gathered about you from your viewing habits and interests, Netflix strives to choose the artwork that emphasizes the part of a video that is most important to you.

The Netflix data backbone for business analytics and individualized recommendation jobs is now the Stream Processing Data Pipeline. All microservice events must be produced, gathered, processed, aggregated, and moved to other data processors in almost real-time by it.

For a variety of analytics, including correlations, aggregations, filtering, and sampling, this data needs to be analyzed sequentially and progressively on a record-by-record basis or over sliding time frames.

Using Apache Spark for Analyzing Streaming Data:

For movie recommendations, Netflix utilizes Apache Spark and machine learning. An open-source unified analytics engine for processing massive amounts of data is Apache Spark.

In response to a live user request, individualized content is created for the user using aggregated play popularity (the number of times a video is played) and take rate (the proportion of plays over impressions for a given video) data, as well as other explicit signals like members' viewing histories and past ratings. The infrastructure used to create user movie recommendations is depicted in the following figure from beginning to end.

Using SSDs for Caching :

Caching is typically done on the RAM. It costs money to store a lot of data in RAM. Netflix made the decision to migrate some cache data to SSD as a result.

When compared to RAM, modern disc technologies based on SSD offer quick data access at a significantly cheaper cost. A 1 TB SSD storage space costs substantially less than a RAM storage space of the same size.

Netflix uses Elastic Search for Error Logging and Monitoring:

Elastic search is implemented by Netflix for system error detection, customer service, and visualization of data.
A search engine based on the Lucene library is called Elasticsearch. It provides an HTTP web interface, a distributed, multitenant full-text search engine, and JSON documents without schema.
With elastic search, users may quickly check the status of the system and address faults and error logs.

OUTRO:

I hope this article is useful for you in terms of how Netflix's backend works. We saw an overview of the backend architecture and a detailed look at the individual system components. Netflix's backend is much much bigger than this. It has tons of microservices and APIs running to serve us without any buffer.

https://netflixtechblog.com/engineering-trade-offs-and-the-netflix-api-re-architecture-64f122b277dd?gi=ac2804c64c9

Thank you for Reading 😊

HAPPY LEARNING

Understanding NETFLIX's System Design ...

Table of contents