Backend Engineers Guide

Api Communication Protocols ๐Ÿ‡บ๐Ÿ‡ธ

API communication protocols describe how different software components exchange data and invoke functionality across networks. They define the transport mechanisms, data formats, interaction styles, and often how developers should structure their requests and responses. These protocols are often cho...

Rest ๐Ÿ‡บ๐Ÿ‡ธ

Representational State Transfer, often referred to as REST, is an architectural style used to design web services. It uses a stateless communication model between clients and servers, relies on standard HTTP methods, and focuses on simple but powerful conventions...

Grpc ๐Ÿ‡บ๐Ÿ‡ธ

gRPC is a high-performance open-source framework that was developed at Google for remote procedure calls. It uses the Protocol Buffers (protobuf) serialization format by default and runs over HTTP/2 to support features like full-duplex streaming and efficient compression. Many microservices architec...

State Management ๐Ÿ‡บ๐Ÿ‡ธ

Stateful and stateless designs are common terms in software architecture. They describe how an application handles data over multiple interactions. This set of notes explains the differences between applications that remember information between requests and those that treat every request as a fresh...

Data Transmission ๐Ÿ‡บ๐Ÿ‡ธ

Data transmission in API design describes how information moves between a client and a server. It includes the request format, response format, protocol, headers, authentication details, compression behavior, caching rules, and error-handling signals. Good transmission design helps APIs remain fast...

Graphql ๐Ÿ‡บ๐Ÿ‡ธ

GraphQL is a query language for APIs that allows clients to request exactly the data they need in a single request. It provides a type system to describe data and offers a more efficient, flexible, and powerful alternative to traditional REST-based architectures. These notes explore the fundamentals...

Web Server Overview ๐Ÿ‡บ๐Ÿ‡ธ

Backend engineers are responsible for setting up and maintaining servers that host web applications, APIs, background jobs, and databases. A web server is not just a machine that returns files. In modern systems, it may also route traffic, terminate HTTPS, proxy requests to application services, enf...

Forward Proxies ๐Ÿ‡บ๐Ÿ‡ธ

A forward proxy sits between clients and the wider internet. Instead of connecting directly to an external service, the client sends the request to the proxy, and the proxy makes the outbound connection on the clientโ€™s behalf. This pattern is commonly used for egress control, caching, auditing, and ...

Tomcat ๐Ÿ‡บ๐Ÿ‡ธ

Apache Tomcat, often referred to as Tomcat, is an open-source web server and servlet container that implements the Java Servlet, JavaServer Pages (JSP), and WebSocket specifications. Maintained by the Apache Software Foundation, Tomcat serves as a robust and lightweight platform for hosting Java-bas...

Static Dynamic Content ๐Ÿ‡บ๐Ÿ‡ธ

Web servers deliver two main types of content: static and dynamic. Static content usually consists of files such as HTML, CSS, images, videos, and JavaScript bundles that already exist on the server. These files are served directly to the client without needing extra processing...

Nginx ๐Ÿ‡บ๐Ÿ‡ธ

Nginx is a high-performance web server, reverse proxy, and load balancer that has grown popular for its speed, scalability, and flexibility. It can serve static files extremely quickly, proxy requests to application servers, balance traffic across multiple backends, terminate SSL/TLS connections, an...

Load Balancing ๐Ÿ‡บ๐Ÿ‡ธ

Load balancing is central to designing robust distributed systems. It distributes incoming requests or workloads across multiple servers so that no single machine becomes overloaded. Instead of clients connecting directly to one backend server, they connect to a load balancer, which decides which se...

Apache ๐Ÿ‡บ๐Ÿ‡ธ

Apache HTTP Server (commonly referred to as โ€œApacheโ€) is one of the most widely used web servers in the world. It is maintained by the Apache Software Foundation and offers robust, flexible, and highly configurable capabilities for serving static and dynamic content. Over the decades, Apache has bec...

Reverse Proxies ๐Ÿ‡บ๐Ÿ‡ธ

A reverse proxy is a server that receives incoming requests from external clients and forwards them to one or more internal servers. To the client, the reverse proxy looks like the application server. Behind the scenes, the proxy decides where the request should go...

Supply Chain Attacks ๐Ÿ‡บ๐Ÿ‡ธ

A supply chain attack targets the tools, dependencies, build systems, or distribution channels that an application relies on. Instead of attacking the application directly, the attacker compromises something the application already trusts...

Tls ๐Ÿ‡บ๐Ÿ‡ธ

Transport Layer Security, commonly abbreviated as TLS, is a cryptographic protocol that protects data transmitted over computer networks. It is the modern successor to SSL, or Secure Sockets Layer. Although people still often say โ€œSSL certificateโ€ or โ€œSSL connection,โ€ most modern secure web traffic ...

Secure Containers ๐Ÿ‡บ๐Ÿ‡ธ

Containers package an application together with its dependencies, providing portability and reproducibility. That same packaging surface can, however, introduce security risks if images are built carelessly, runtimes are misconfigured, or containers run with unnecessary privileges. This document cov...

Security Vulnerabilities ๐Ÿ‡บ๐Ÿ‡ธ

Backend systems form the foundation of web applications, APIs, and data-driven services. Because the backend often handles authentication, authorization, business logic, databases, payments, personal information, and integrations with other systems, security weaknesses in this layer can have serious...

Credentials Management ๐Ÿ‡บ๐Ÿ‡ธ

Applications almost always need some form of secret. These secrets may include database passwords, API keys, TLS private keys, OAuth client secrets, webhook signing keys, encryption keys, and credentials for third-party services...

Auth ๐Ÿ‡บ๐Ÿ‡ธ

Authentication is the process of verifying who a user is. Authorization is the process of deciding what that user is allowed to access or do...

Third Party Cookies Vulnerabilities ๐Ÿ‡บ๐Ÿ‡ธ

Third-party cookies are cookies set by a domain other than the website the user is directly visiting. For example, a user may visit news.example.com, but that page may load ads, analytics scripts, social widgets, or tracking pixels from another domain. That external domain can set or receive its own...

Security Best Practices and Measures ๐Ÿ‡บ๐Ÿ‡ธ

Security is a multi-layered concern involving networks, systems, applications, data, users, and operational processes. A single misconfiguration, weak password, missing patch, exposed secret, or insecure dependency can give attackers a path into a system...

Deployment Strategies ๐Ÿ‡บ๐Ÿ‡ธ

A deployment strategy defines how a new version of an application is released to production. The right strategy balances risk, speed, infrastructure cost, and rollback complexity...

Ci Cd ๐Ÿ‡บ๐Ÿ‡ธ

Continuous Integration (CI) and Continuous Delivery/Deployment (CD) automate the path from a code commit to running software in production. CI catches integration problems early by building and testing on every commit. CD takes those validated artifacts and delivers them to one or more environments ...

Kubernetes ๐Ÿ‡บ๐Ÿ‡ธ

Kubernetes (K8s) is an open-source container orchestration platform that automates the deployment, scaling, and management of containerised applications across a cluster of machines...

Infrastructure As Code ๐Ÿ‡บ๐Ÿ‡ธ

Infrastructure as Code (IaC) is the practice of defining and managing infrastructure (servers, networks, databases, load balancers) through machine-readable configuration files instead of manual processes or ad-hoc scripts. Changes are committed to version control, reviewed, and applied automaticall...

Docker ๐Ÿ‡บ๐Ÿ‡ธ

Docker packages an application and all of its dependencies into a lightweight, portable unit called a container. Containers share the host OS kernel but run in isolated namespaces, so they start in milliseconds and consume far less memory than virtual machines...

Web Sockets ๐Ÿ‡บ๐Ÿ‡ธ

WebSockets introduce an event-driven, two-way communication channel between clients and servers over a single TCP connection. Unlike traditional HTTP request-response systems, where the client sends a request and waits for the server to reply, WebSockets allow both sides to send messages whenever th...

Network Communications ๐Ÿ‡บ๐Ÿ‡ธ

Network communications in a backend context involve the flow of data between clients and server-side systems. A client might be a browser, a mobile app, another backend service, an API gateway, or a scheduled job. The server might be a REST API, GraphQL API, gRPC service, WebSocket server, database...

Tcp and Udp ๐Ÿ‡บ๐Ÿ‡ธ

Transmission Control Protocol, or TCP, and User Datagram Protocol, or UDP, are foundational Internet protocols that operate on top of IP. IP is responsible for addressing and routing packets between devices, while TCP and UDP define how applications send and receive data through ports on those devic...

Http Protocol ๐Ÿ‡บ๐Ÿ‡ธ

Hypertext Transfer Protocol, or HTTP, is the foundational communication protocol of the World Wide Web. It defines how clients and servers exchange messages, how requests are structured, and how responses are returned. A client might be a browser, mobile app, command-line tool, or backend service. A...

Metrics and Analysis ๐Ÿ‡บ๐Ÿ‡ธ

In modern distributed systems, the performance and reliability of communication channels, APIs, and network infrastructure directly affect user experience. A user may not know whether a delay comes from a database, an overloaded API server, packet loss, or a slow dependency, but they will notice tha...

Etl and Pipelines ๐Ÿ‡บ๐Ÿ‡ธ

ETL and ELT are foundational patterns for moving, cleaning, reshaping, and delivering data between systems...

Stream Processing ๐Ÿ‡บ๐Ÿ‡ธ

Stream processing involves ingesting, analyzing, and acting on data as it is produced. Instead of waiting for a complete batch of data to be collected, a stream processing system handles events continuously as they arrive...

Batch Processing ๐Ÿ‡บ๐Ÿ‡ธ

Batch processing is a method for handling large volumes of data by grouping records into a single batch and processing them together. Unlike real-time or stream processing, batch processing does not usually require immediate results. Instead, data is collected over a period of time and processed on ...

Workflow Orchestration ๐Ÿ‡บ๐Ÿ‡ธ

As data pipelines grow beyond a single script, they become networks of interdependent steps that must run in the right order, on a schedule, with retries on failure, and with observable state. Workflow orchestration is the discipline of managing this complexity: defining the execution order of tasks...

Lambda and Kappa Architecture ๐Ÿ‡บ๐Ÿ‡ธ

As data volumes grew and real-time analytics became a business requirement, engineers needed architectural patterns that could handle both historical re-computation and low-latency stream processing reliably. Lambda and Kappa architectures are the two dominant answers to that challenge...

Xml ๐Ÿ‡บ๐Ÿ‡ธ

XML, or Extensible Markup Language, is a W3C-standardized markup language designed to encode documents in a format that is both human-readable and machine-readable. Unlike HTML, which has a fixed set of tags, XML lets you define your own vocabulary of elements to represent arbitrary data structures...

Algorithms Summary ๐Ÿ‡บ๐Ÿ‡ธ

The following algorithms and data structures appear frequently in backend and distributed system design. They are useful because system design often involves trade-offs around scale, latency, storage, consistency, correctness, and fault tolerance...

Coordination Services ๐Ÿ‡บ๐Ÿ‡ธ

In large-scale distributed systems, many processes, microservices, or nodes must work together while running on different machines. These systems need a reliable way to agree on shared state, detect failures, elect leaders, coordinate ownership, and distribute configuration...

Indexes ๐Ÿ‡บ๐Ÿ‡ธ

Indexing is one of the most effective ways to optimize database queries. By maintaining auxiliary data structures that map certain key values to their physical or logical locations, indexes allow a database to rapidly locate rows that match a search condition. This reduces the number of full-table s...

Isolation Levels ๐Ÿ‡บ๐Ÿ‡ธ

Isolation levels in relational-database systems govern how simultaneously running transactions perceive one anotherโ€™s changes. They sit on a spectrum that trades consistency guaranteesโ€”how โ€œcorrectโ€ every read isโ€”against concurrencyโ€”how many transactions can safely overlap. Choosing the right level ...

Transactions ๐Ÿ‡บ๐Ÿ‡ธ

Database transactions are a cornerstone of reliable data management. They let an application bundle multiple low-level reads and writes into a single, all-or-nothing unit so the database moves cleanly from one consistent state to anotherโ€”even when dozens of users race to change the same rows or hard...

Halloween Problem ๐Ÿ‡บ๐Ÿ‡ธ

The Halloween Problem is a database execution-plan issue where an UPDATE operation could theoretically update the same row more than once if the database scans rows through an access path that is changed by the update itself...

Memcached ๐Ÿ‡บ๐Ÿ‡ธ

Memcached is a high-performance, distributed, in-memory key-value cache. It is designed for one main purpose: storing small pieces of frequently accessed data in RAM so applications can avoid repeatedly querying slower systems such as relational databases, APIs, or disk-backed storage...

Database Caching ๐Ÿ‡บ๐Ÿ‡ธ

Database caching stores frequently requested database query results, rows, aggregates, or relational lookup data in a faster layer so the application does not repeatedly execute the same expensive database operations. The main goal is to reduce database read load, lower query latency, and protect th...

Redis ๐Ÿ‡บ๐Ÿ‡ธ

Redis is a high-performance, in-memory data store commonly used as a cache, message broker, session store, rate limiter, leaderboard engine, and fast key-value database. It is often described as a data structure server because it supports rich built-in data types such as strings, lists, sets, sorted...

Application Level Caching ๐Ÿ‡บ๐Ÿ‡ธ

Application-level caching stores computed results or frequently accessed objects directly inside the running process or in a dedicated in-process store. Because data never leaves the application's memory space, reads are limited only by CPU and memory bandwidth โ€” no network hop, no serialisation, an...

Memory Map ๐Ÿ‡บ๐Ÿ‡ธ

mmap, short for memory map, is an operating system mechanism that maps a file or device directly into a processโ€™s virtual memory address space. Instead of reading file data explicitly with read() into a buffer, the application can access the file as if it were an array in memory. The operating syste...

Http Caching ๐Ÿ‡บ๐Ÿ‡ธ

HTTP caching is the process of storing copies of HTTP responses so that future requests can be served without contacting the origin server. It operates at multiple layers โ€” the browser, intermediate proxies, reverse proxies, and CDN edge nodes โ€” and is controlled primarily through standardised HTTP ...

Static Python Website Netlify ๐Ÿ‡บ๐Ÿ‡ธ

Netlify allows you to easily deploy and manage static websites. A Python-based static site generator like Pelican, MkDocs, or Frozen-Flask produces HTML files that Netlify serves through its global CDN...

Centos Digital Ocean ๐Ÿ‡บ๐Ÿ‡ธ

Digital Ocean provides cloud-based virtual machines called Droplets that let you deploy and manage CentOS servers. The overall flow looks like this...

Pub Sub vs Queue ๐Ÿ‡บ๐Ÿ‡ธ

Message queues enable asynchronous, decoupled communication in distributed systems by allowing publishers to send messages to a queue that consumers process independently, typically in first-in, first-out order. This approach reduces direct dependencies between services, enhances reliability and sca...

Yaml ๐Ÿ‡บ๐Ÿ‡ธ

YAML, which stands for "YAML Ain't Markup Language," is a human-readable data serialization format designed for configuration files and data exchange between systems. Unlike XML or JSON, YAML relies on whitespace and indentation rather than brackets or tags, making it feel closer to natural prose th...

Json ๐Ÿ‡บ๐Ÿ‡ธ

JSON, or JavaScript Object Notation, is a lightweight data-interchange format that is easy for humans to read and write and easy for machines to parse and generate. Despite originating from JavaScript, the format is completely language independent and supported by virtually every modern programming ...

Operational Transform ๐Ÿ‡บ๐Ÿ‡ธ

Operational Transform is a foundational technique in distributed systems that enables real-time collaborative editing of shared documents. Originally proposed by Ellis and Gibbs in 1989, OT allows multiple users to concurrently modify the same document while preserving consistency across all partici...

Linearizability ๐Ÿ‡บ๐Ÿ‡ธ

Linearizability is a consistency model that makes a distributed system appear as if there is only a single copy of the data, and every operation takes effect atomically at some point between its invocation and its response. Even when data is replicated across multiple nodes, a linearizable system gu...

Concurrent Writes ๐Ÿ‡บ๐Ÿ‡ธ

Concurrent writes happen when two or more clients write to the same key in a database at the same time, each unaware of the other's write. In replicated systems, these writes may arrive at different replicas in different orders, causing the replicas to diverge and hold conflicting values. Without a ...

Gossip Protocol ๐Ÿ‡บ๐Ÿ‡ธ

The Gossip Protocol is a peer-to-peer communication technique in distributed systems where nodes share information by randomly selecting partners and exchanging state, much like how rumors spread through a social network. It is especially useful in large clusters where nodes frequently join or leave...

Caching Strategies ๐Ÿ‡บ๐Ÿ‡ธ

Caching is a technique used to speed up data retrieval by placing frequently accessed or computationally heavy information closer to the application or the end user. Below is an expanded set of notes on caching, presented with ASCII diagrams and bullet points that emphasize key considerations. Each ...

Content Delivery Networks ๐Ÿ‡บ๐Ÿ‡ธ

A Content Delivery Network (CDN) is a geographically distributed system of servers that deliver web assets such as images, videos, stylesheets, and scripts to users based on their proximity to the nearest server. By placing cached copies of content at strategic locations around the world, CDNs drast...

Messaging System Integration ๐Ÿ‡บ๐Ÿ‡ธ

In modern distributed architectures, messaging systems form an essential backbone for decoupling services, handling asynchronous communication, and enabling more resilient data flows. They allow separate applications or microservices to interact by sending and receiving messages through well-defined...

Protocol Buffers ๐Ÿ‡บ๐Ÿ‡ธ

Protocol Buffers (often referred to as protobuf) is a language-neutral, platform-independent method for serializing structured data. Originally created at Google, it excels at enabling efficient data interchange between services, storing information in a compact binary format, and sustaining backwar...

Optimistic vs Pessimistic Locking ๐Ÿ‡บ๐Ÿ‡ธ

Locking is about managing concurrent access to shared data. Engineers often make it sound harder than it is, but the core idea is simple: choose between optimistic or pessimistic approaches depending on how costly retries are...

Data Warehousing ๐Ÿ‡บ๐Ÿ‡ธ

Data warehousing unifies large volumes of information from different sources into a centralized repository that supports analytics, reporting, and strategic decision-making. By collecting operational data, transforming it, and then loading it into one or more specialized databases, data warehouses a...

Replication ๐Ÿ‡บ๐Ÿ‡ธ

Replication is a method of maintaining copies of data across multiple nodes in distributed systems, making it useful for improving availability, reducing latency, and distributing load. Below are detailed notes, organized in bullet points, each containing one highlighted word in the middle to emphas...

Types of Databases ๐Ÿ‡บ๐Ÿ‡ธ

Databases store and organize data so that applications and users can retrieve, manage, and manipulate information efficiently. The choice of database often depends on data structure requirements, scale, performance expectations, and the nature of the workload. Over the years, numerous types of datab...