Articles

Ssh and Scp 🇺🇸

SSH, SFTP, and SCP are network protocols that provide secure data communication and file transfer over insecure networks. Here's a brief overview of each...

Commands 🇺🇸

Let's talk about some seriously useful tricks that'll make your command-line life much easier. Ever find yourself thinking "I know I ran that command yesterday, but what was it again?" or "There has to be a faster way to do this!" Well, you're in luck - the terminal has some fantastic features to he...

Log Files and Journals 🇺🇸

Understanding how logging works in Linux is like learning the language your system uses to communicate. Logs are the detailed records that your system keeps about its activities, and they are invaluable for troubleshooting, monitoring performance, and ensuring security. Let's embark on a journey to ...

Performance Monitoring 🇺🇸

Performance monitoring helps you identify bottlenecks or issues that may be affecting your system's performance. We'll now explore some tools and techniques available for monitoring performance and explain some usage statistics, such as CPU and RAM usage...

Xml 🇺🇸

XML, or Extensible Markup Language, is a W3C-standardized markup language designed to encode documents in a format that is both human-readable and machine-readable. Unlike HTML, which has a fixed set of tags, XML lets you define your own vocabulary of elements to represent arbitrary data structures...

Yaml 🇺🇸

YAML, which stands for "YAML Ain't Markup Language," is a human-readable data serialization format designed for configuration files and data exchange between systems. Unlike XML or JSON, YAML relies on whitespace and indentation rather than brackets or tags, making it feel closer to natural prose th...

Json 🇺🇸

JSON, or JavaScript Object Notation, is a lightweight data-interchange format that is easy for humans to read and write and easy for machines to parse and generate. Despite originating from JavaScript, the format is completely language independent and supported by virtually every modern programming ...

Pub Sub vs Queue 🇺🇸

Message queues enable asynchronous, decoupled communication in distributed systems by allowing publishers to send messages to a queue that consumers process independently, typically in first-in, first-out order. This approach reduces direct dependencies between services, enhances reliability and sca...

Content Delivery Networks 🇺🇸

A Content Delivery Network (CDN) is a geographically distributed system of servers that deliver web assets such as images, videos, stylesheets, and scripts to users based on their proximity to the nearest server. By placing cached copies of content at strategic locations around the world, CDNs drast...

Caching Strategies 🇺🇸

Caching is a technique used to speed up data retrieval by placing frequently accessed or computationally heavy information closer to the application or the end user. Below is an expanded set of notes on caching, presented with ASCII diagrams and bullet points that emphasize key considerations. Each ...

Redis 🇺🇸

Redis is an open-source, in-memory data store that serves as a high-performance cache, message broker, and general-purpose database. It is often called a "data structure server" because it natively supports rich types like strings, lists, sets, sorted sets, and hashes. Because every operation happen...

Static Python Website Netlify 🇺🇸

Netlify allows you to easily deploy and manage static websites. A Python-based static site generator like Pelican, MkDocs, or Frozen-Flask produces HTML files that Netlify serves through its global CDN...

Centos Digital Ocean 🇺🇸

Digital Ocean provides cloud-based virtual machines called Droplets that let you deploy and manage CentOS servers. The overall flow looks like this...

Transactions 🇺🇸

Database transactions are a cornerstone of reliable data management. They let an application bundle multiple low-level reads and writes into a single, all-or-nothing unit so the database moves cleanly from one consistent state to another—even when dozens of users race to change the same rows or hard...

Halloween Problem 🇺🇸

The Halloween Problem is a notorious pitfall in relational‐database execution plans. First observed by IBM System R researchers on October 31, 1976—hence the spooky nickname—the phenomenon occurs when an UPDATE statement unwittingly revisits and modifies the same row more than once. Each extra pass ...

Gossip Protocol 🇺🇸

The Gossip Protocol is a peer-to-peer communication technique in distributed systems where nodes share information by randomly selecting partners and exchanging state, much like how rumors spread through a social network. It is especially useful in large clusters where nodes frequently join or leave...

Operational Transform 🇺🇸

Operational Transform is a foundational technique in distributed systems that enables real-time collaborative editing of shared documents. Originally proposed by Ellis and Gibbs in 1989, OT allows multiple users to concurrently modify the same document while preserving consistency across all partici...

Linearizability 🇺🇸

Linearizability is a consistency model that makes a distributed system appear as if there is only a single copy of the data, and every operation takes effect atomically at some point between its invocation and its response. Even when data is replicated across multiple nodes, a linearizable system gu...

Algorithms Summary 🇺🇸

The following algorithms and data structures appear frequently in backend and distributed system design. Each entry explains what the structure is, how it works at a high level, why it matters, and where it is used in practice...

Concurrent Writes 🇺🇸

Concurrent writes happen when two or more clients write to the same key in a database at the same time, each unaware of the other's write. In replicated systems, these writes may arrive at different replicas in different orders, causing the replicas to diverge and hold conflicting values. Without a ...

Series 🇺🇸

A sequence is an ordered list of numbers that can be viewed as a function mapping each natural number $n$ to a specific value $a_n$. More formally, a sequence ${a_n}$ is a function whose domain is the set of natural numbers, and the values are called the terms of the sequence...

Difference Equations 🇺🇸

A difference equation (also known as a recurrence relation) defines each term of a sequence based on previous terms. In some cases, the general term of a sequence is given explicitly (e.g., $a_n = 3n + 2$, resulting in the sequence $5, 8, 11, \dots$). However, more commonly, a difference equation pr...

Autocorrelation Function 🇺🇸

In time series analysis, understanding the relationships between observations at different time lags is crucial for model identification and forecasting. Two essential tools for analyzing these relationships are the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF)...

Autoregressive Models 🇺🇸

Autoregressive (AR) models are fundamental tools in time series analysis, used to describe and forecast time-dependent data. An AR model predicts future values based on a linear combination of past observations. The order of an AR model, denoted as $p$, indicates how many lagged past values are used...

Random Walk 🇺🇸

The random walk is a fundamental and widely used time series model, often applied in finance to represent stock prices and other economic indicators. The idea behind the random walk is that the value of the process at time $t$ is the sum of its value at time $t-1$ and a random shock (or noise). Esse...

Forecasting 🇺🇸

Time series forecasting is a technique used to predict future values based on historical data. It is widely used in various fields, such as finance, economics, and meteorology. In this section, we will discuss the basics of time series forecasting...

Invertibility 🇺🇸

In time series modeling, invertibility is the property of a model that allows the innovation process (also called the noise or disturbance process) to be expressed as a function of the observed series and its past values. This is particularly relevant for Moving Average (MA) models...

Seasonality and Trends 🇺🇸

Seasonality and trends are fundamental components in time series data that significantly impact analysis and forecasting. Understanding and correctly modeling these elements are useful for accurate predictions and effective time series modeling...

Financial Time Series Models 🇺🇸

Financial series (prices, returns, exchange rates) often look very different from the classical stationary Gaussian assumptions. Common features include...

Stationarity 🇺🇸

Stationarity is an important idea in time series analysis. A time series is considered stationary if its statistical properties—like the mean, variance, and autocovariance—stay constant over time. This matters because methods like ARIMA and ARMA are designed to work with stationary data, so it’s a g...

Arima Models 🇺🇸

ARMA, ARIMA, and SARIMA are models commonly used to analyze and forecast time series data. ARMA (AutoRegressive Moving Average) combines two ideas: using past values to predict current ones (autoregression) and smoothing out noise using past forecast errors (moving average). ARIMA (AutoRegressive In...

Randomness Tests 🇺🇸

When a series looks noisy, it is still useful to check whether the noise is random or whether weak structure (trend or dependence) is present. The tests below are lightweight diagnostics for an IID or weak-dependence null...

Moving Average Models 🇺🇸

Moving Average (MA) models are a fundamental class of univariate time series models used for forecasting and understanding temporal data. Unlike Autoregressive (AR) models, which rely on past values of the series itself, MA models utilize past forecast errors to model the current value of the series...

Statistical Moments and Time Series 🇺🇸

Understanding the behavior of time series data is crucial across various fields such as finance, economics, and engineering. Statistical moments, especially the mean and standard deviation, are essential tools in summarizing and analyzing time series data. This section explores how these statistical...

Autocovariance Function 🇺🇸

Autocovariance functions describe how values of a time series relate to their lagged counterparts, measuring the joint variability between a series at time $t$ and its value at a previous time $t-k$ (where $k$ is the lag). In autoregressive models, these relationships are expressed through coefficie...