Backend Engineers Guide

Xml ๐Ÿ‡บ๐Ÿ‡ธ

XML, or Extensible Markup Language, is a W3C-standardized markup language designed to encode documents in a format that is both human-readable and machine-readable. Unlike HTML, which has a fixed set of tags, XML lets you define your own vocabulary of elements to represent arbitrary data structures...

Yaml ๐Ÿ‡บ๐Ÿ‡ธ

YAML, which stands for "YAML Ain't Markup Language," is a human-readable data serialization format designed for configuration files and data exchange between systems. Unlike XML or JSON, YAML relies on whitespace and indentation rather than brackets or tags, making it feel closer to natural prose th...

Json ๐Ÿ‡บ๐Ÿ‡ธ

JSON, or JavaScript Object Notation, is a lightweight data-interchange format that is easy for humans to read and write and easy for machines to parse and generate. Despite originating from JavaScript, the format is completely language independent and supported by virtually every modern programming ...

Pub Sub vs Queue ๐Ÿ‡บ๐Ÿ‡ธ

Message queues enable asynchronous, decoupled communication in distributed systems by allowing publishers to send messages to a queue that consumers process independently, typically in first-in, first-out order. This approach reduces direct dependencies between services, enhances reliability and sca...

Content Delivery Networks ๐Ÿ‡บ๐Ÿ‡ธ

A Content Delivery Network (CDN) is a geographically distributed system of servers that deliver web assets such as images, videos, stylesheets, and scripts to users based on their proximity to the nearest server. By placing cached copies of content at strategic locations around the world, CDNs drast...

Caching Strategies ๐Ÿ‡บ๐Ÿ‡ธ

Caching is a technique used to speed up data retrieval by placing frequently accessed or computationally heavy information closer to the application or the end user. Below is an expanded set of notes on caching, presented with ASCII diagrams and bullet points that emphasize key considerations. Each ...

Redis ๐Ÿ‡บ๐Ÿ‡ธ

Redis is an open-source, in-memory data store that serves as a high-performance cache, message broker, and general-purpose database. It is often called a "data structure server" because it natively supports rich types like strings, lists, sets, sorted sets, and hashes. Because every operation happen...

Static Python Website Netlify ๐Ÿ‡บ๐Ÿ‡ธ

Netlify allows you to easily deploy and manage static websites. A Python-based static site generator like Pelican, MkDocs, or Frozen-Flask produces HTML files that Netlify serves through its global CDN...

Centos Digital Ocean ๐Ÿ‡บ๐Ÿ‡ธ

Digital Ocean provides cloud-based virtual machines called Droplets that let you deploy and manage CentOS servers. The overall flow looks like this...

Transactions ๐Ÿ‡บ๐Ÿ‡ธ

Database transactions are a cornerstone of reliable data management. They let an application bundle multiple low-level reads and writes into a single, all-or-nothing unit so the database moves cleanly from one consistent state to anotherโ€”even when dozens of users race to change the same rows or hard...

Halloween Problem ๐Ÿ‡บ๐Ÿ‡ธ

The Halloween Problem is a notorious pitfall in relationalโ€database execution plans. First observed by IBM System R researchers on October 31, 1976โ€”hence the spooky nicknameโ€”the phenomenon occurs when an UPDATE statement unwittingly revisits and modifies the same row more than once. Each extra pass ...

Gossip Protocol ๐Ÿ‡บ๐Ÿ‡ธ

The Gossip Protocol is a peer-to-peer communication technique in distributed systems where nodes share information by randomly selecting partners and exchanging state, much like how rumors spread through a social network. It is especially useful in large clusters where nodes frequently join or leave...

Operational Transform ๐Ÿ‡บ๐Ÿ‡ธ

Operational Transform is a foundational technique in distributed systems that enables real-time collaborative editing of shared documents. Originally proposed by Ellis and Gibbs in 1989, OT allows multiple users to concurrently modify the same document while preserving consistency across all partici...

Linearizability ๐Ÿ‡บ๐Ÿ‡ธ

Linearizability is a consistency model that makes a distributed system appear as if there is only a single copy of the data, and every operation takes effect atomically at some point between its invocation and its response. Even when data is replicated across multiple nodes, a linearizable system gu...

Algorithms Summary ๐Ÿ‡บ๐Ÿ‡ธ

The following algorithms and data structures appear frequently in backend and distributed system design. Each entry explains what the structure is, how it works at a high level, why it matters, and where it is used in practice...

Concurrent Writes ๐Ÿ‡บ๐Ÿ‡ธ

Concurrent writes happen when two or more clients write to the same key in a database at the same time, each unaware of the other's write. In replicated systems, these writes may arrive at different replicas in different orders, causing the replicas to diverge and hold conflicting values. Without a ...

Database Caching ๐Ÿ‡บ๐Ÿ‡ธ

Database caching stores frequently used query results or objects in a cache, bringing them closer to the application for faster data retrieval. This reduces load on the primary database and shortens response times, ultimately improving user experience...

Security Vulnerabilities ๐Ÿ‡บ๐Ÿ‡ธ

Backend systems form the backbone of web applications, and their security vulnerabilities can pose significant threats to data integrity and user privacy. This guide outlines common backend vulnerabilities with concrete examples and provides best practices to mitigate these risks...

Security Best Practices and Measures ๐Ÿ‡บ๐Ÿ‡ธ

Security is a multi-layered concern involving networks, systems, applications, and user access. A single misconfiguration or overlooked patch can allow attackers to breach systems, steal data, or disrupt services. This document aims to outline key security best practices and measures organizations s...

Tls ๐Ÿ‡บ๐Ÿ‡ธ

Transport Layer Security, commonly abbreviated as TLS, is a cryptographic protocol that protects data transmissions over computer networks. It succeeds the older SSL (Secure Sockets Layer), and though the term โ€œSSLโ€ is still widely used, most modern โ€œSSLโ€ connections are really TLS. The protocol aim...

Auth ๐Ÿ‡บ๐Ÿ‡ธ

Authentication is the process of verifying a user's identity, while authorization is the management of access rights to resources...

Third Party Cookies Vulnerabilities ๐Ÿ‡บ๐Ÿ‡ธ

Third-party cookies are often inserted into a userโ€™s browser by domains other than the website the user is directly visiting. While first-party cookies (from the visited domain) are essential for maintaining user sessions and preferences, third-party cookies commonly facilitate cross-site tracking a...

Nginx ๐Ÿ‡บ๐Ÿ‡ธ

Nginx is a high-performance web server, reverse proxy, and load balancer that has grown popular for its speed, scalability, and flexibility. It can serve static files extremely quickly, proxy requests to application servers, balance traffic across multiple backends, terminate SSL/TLS connections, an...

Load Balancing ๐Ÿ‡บ๐Ÿ‡ธ

Load balancing is central to designing robust distributed systems. It ensures that incoming requests or workloads are equitably distributed across multiple servers or nodes, thereby preventing any single server from becoming a bottleneck. This technique also boosts system resilience, providing highe...

Web Server Overview ๐Ÿ‡บ๐Ÿ‡ธ

Backend engineers are responsible for setting up and maintaining servers that host web applications, APIs, and databases. A solid understanding of server management principles is crucial for delivering robust, high-performing, and secure systems...

Tomcat ๐Ÿ‡บ๐Ÿ‡ธ

Apache Tomcat, often referred to as Tomcat, is an open-source web server and servlet container that implements the Java Servlet, JavaServer Pages (JSP), and WebSocket specifications. Maintained by the Apache Software Foundation, Tomcat serves as a robust and lightweight platform for hosting Java-bas...

Reverse Proxies ๐Ÿ‡บ๐Ÿ‡ธ

A reverse proxy is a special server that receives incoming requests from external clients and forwards them to one or more internal web servers. By acting as an intermediary, it hides the details of the internal network, providing a single entry point that can improve load balancing, security, cachi...

Static Dynamic Content ๐Ÿ‡บ๐Ÿ‡ธ

Web servers deliver two main types of content: static and dynamic. Static content usually consists of files (HTML, CSS, images, JavaScript) that rarely change and can be served directly from the file system or a cache. Dynamic content is generated on the fly by server-side logic (such as PHP, Node.j...

Forward Proxies ๐Ÿ‡บ๐Ÿ‡ธ

Proxies function as intermediaries in the communication flow between clients and servers, performing tasks such as request routing, caching, encryption offloading, and IP masking. By inserting themselves between the client and the destination server, proxies can manage connections in ways that provi...

Apache ๐Ÿ‡บ๐Ÿ‡ธ

Apache HTTP Server (commonly referred to as โ€œApacheโ€) is one of the most widely used web servers in the world. It is maintained by the Apache Software Foundation and offers robust, flexible, and highly configurable capabilities for serving static and dynamic content. Over the decades, Apache has bec...

Protocol Buffers ๐Ÿ‡บ๐Ÿ‡ธ

Protocol Buffers (often referred to as protobuf) is a language-neutral, platform-independent method for serializing structured data. Originally created at Google, it excels at enabling efficient data interchange between services, storing information in a compact binary format, and sustaining backwar...

Messaging System Integration ๐Ÿ‡บ๐Ÿ‡ธ

In modern distributed architectures, messaging systems form an essential backbone for decoupling services, handling asynchronous communication, and enabling more resilient data flows. They allow separate applications or microservices to interact by sending and receiving messages through well-defined...

Batch Processing ๐Ÿ‡บ๐Ÿ‡ธ

Batch processing is a method for handling large volumes of data by grouping them into a single batch, typically without immediate user interaction. It is often useful in scenarios where tasks can be processed independently and do not require real-time results, such as nightly analytics jobs, buildin...

Stream Processing ๐Ÿ‡บ๐Ÿ‡ธ

Stream processing involves ingesting, analyzing, and taking action on data as it is produced. This near-real-time or real-time methodology is helpful for applications that need to respond quickly to continuously updating information, such as IoT sensor readings, financial transactions, or social med...

Grpc ๐Ÿ‡บ๐Ÿ‡ธ

gRPC is a high-performance open-source framework that was developed at Google for remote procedure calls. It uses the Protocol Buffers (protobuf) serialization format by default and runs over HTTP/2 to support features like full-duplex streaming and efficient compression. Many microservices architec...

State Management ๐Ÿ‡บ๐Ÿ‡ธ

Stateful and stateless designs are common terms in software architecture. They describe how an application handles data over multiple interactions. This set of notes explains the differences between applications that remember information between requests and those that treat every request as a fresh...

Graphql ๐Ÿ‡บ๐Ÿ‡ธ

GraphQL is a query language for APIs that allows clients to request exactly the data they need in a single request. It provides a type system to describe data and offers a more efficient, flexible, and powerful alternative to traditional REST-based architectures. These notes explore the fundamentals...

Data Transmission ๐Ÿ‡บ๐Ÿ‡ธ

Data transmission in API design covers how information is sent and received between a client and a server. This involves choosing data formats, transport protocols, security measures, and techniques to ensure both correctness and efficiency. Whether an application is stateful or stateless affects th...

Rest ๐Ÿ‡บ๐Ÿ‡ธ

Representational State Transfer, often referred to as REST, is an architectural style used to design web services. It uses a stateless communication model between clients and servers, relies on standard HTTP methods, and focuses on simple but powerful conventions. These notes explore the core princi...

Api Communication Protocols ๐Ÿ‡บ๐Ÿ‡ธ

API communication protocols describe how different software components exchange data and invoke functionality across networks. They define the transport mechanisms, data formats, interaction styles, and often how developers should structure their requests and responses. These protocols are often cho...

Http Protocol ๐Ÿ‡บ๐Ÿ‡ธ

Hypertext Transfer Protocol (HTTP) is the foundational communication protocol of the World Wide Web. It follows a client-server model and defines how messages are formatted and transmitted, as well as how servers and clients respond to various commands. HTTP was originally designed for fetching hype...

Tcp and Udp ๐Ÿ‡บ๐Ÿ‡ธ

Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) are foundational Internet protocols that operate on top of IP (Internet Protocol). They determine how data is packaged, addressed, transmitted, and received between devices. TCP prioritizes reliability and ordered delivery. UDP foc...

Metrics and Analysis ๐Ÿ‡บ๐Ÿ‡ธ

In modern distributed systems, the performance and reliability of communication channels, APIs, and network infrastructure are critical factors that determine user experience. Metrics and analysis offer insights into system behavior under varying loads, help identify bottlenecks, and guide capacity ...

Network Communications ๐Ÿ‡บ๐Ÿ‡ธ

Network communications in a backend context involve the flow of data between clients (browsers, mobile apps, or other services) and server-side applications or services. This process spans multiple layers, from physical transmission over cables or wireless signals, through protocols such as TCP or U...

Web Sockets ๐Ÿ‡บ๐Ÿ‡ธ

WebSockets introduce an event-driven, two-way communication channel between clients and servers over a single TCP connection. Unlike traditional HTTP request-response systems, WebSockets enable real-time data exchange with minimal overhead, effectively eliminating the need for repeated polling or lo...

Types of Databases ๐Ÿ‡บ๐Ÿ‡ธ

Databases store and organize data so that applications and users can retrieve, manage, and manipulate information efficiently. The choice of database often depends on data structure requirements, scale, performance expectations, and the nature of the workload. Over the years, numerous types of datab...

Replication ๐Ÿ‡บ๐Ÿ‡ธ

Replication is a method of maintaining copies of data across multiple nodes in distributed systems, making it useful for improving availability, reducing latency, and distributing load. Below are detailed notes, organized in bullet points, each containing one highlighted word in the middle to emphas...

Data Warehousing ๐Ÿ‡บ๐Ÿ‡ธ

Data warehousing unifies large volumes of information from different sources into a centralized repository that supports analytics, reporting, and strategic decision-making. By collecting operational data, transforming it, and then loading it into one or more specialized databases, data warehouses a...

Isolation Levels ๐Ÿ‡บ๐Ÿ‡ธ

Isolation levels in relational-database systems govern how simultaneously running transactions perceive one anotherโ€™s changes. They sit on a spectrum that trades consistency guaranteesโ€”how โ€œcorrectโ€ every read isโ€”against concurrencyโ€”how many transactions can safely overlap. Choosing the right level ...

Indexes ๐Ÿ‡บ๐Ÿ‡ธ

Indexing is one of the most effective ways to optimize database queries. By maintaining auxiliary data structures that map certain key values to their physical or logical locations, indexes allow a database to rapidly locate rows that match a search condition. This reduces the number of full-table s...

Coordination Services ๐Ÿ‡บ๐Ÿ‡ธ

In large-scale distributed architectures, multiple processes, microservices, or nodes must operate in concert to achieve consistency, fault tolerance, and robust state management. Coordination services address these challenges by offering primitives like distributed locks, leader election, and confi...

Optimistic vs Pessimistic Locking ๐Ÿ‡บ๐Ÿ‡ธ

Locking is about managing concurrent access to shared data. Engineers often make it sound harder than it is, but the core idea is simple: choose between optimistic or pessimistic approaches depending on how costly retries are...