Last modified: September 16, 2025
This article is written in: 🇺🇸
Multithreading refers to the capability of a CPU, or a single core within a multi-core processor, to execute multiple threads concurrently. A thread is the smallest unit of processing that can be scheduled by an operating system. In a multithreaded environment, a program, or process, can perform multiple tasks at the same time, as each thread runs in the same shared memory space. This can be useful for tasks that are IO-bound, as threads can be used to keep the CPU busy while waiting for IO operations to complete. However, because threads share the same memory, they must be carefully synchronized to avoid issues like race conditions, where two threads attempt to modify the same data concurrently, leading to unpredictable outcomes.
+----------------+ +----------------+ +------------------+
| Incoming Tasks | | Pool Manager | | Thread Pool |
| | | | | |
| +-----------+ | | | | +-----------+ |
| | Task 1 |-------------> Assigns Task ---------> | Thread 1 | |
| +-----------+ | | | | +-----------+ |
| +-----------+ | | | | +-----------+ |
| | Task 2 |-------------> Assigns Task ---------> | Thread 2 | |
| +-----------+ | | | | +-----------+ |
| +-----------+ | | | | +-----------+ |
| | Task 3 |-------------> Assigns Task ---------> | Thread 3 | |
| +-----------+ | | | | +-----------+ |
| +-----------+ | | | | +-----------+ |
| | Task 4 |-------------> Assigns Task ---------> | Thread 4 | |
| +-----------+ | | | | +-----------+ |
| +-----------+ | | | +------------------+
| | Task 5 |-------------> Waiting |
| +-----------+ | | |
+----------------+ +----------------+
A web server process, for example, receives a request and assigns it to a thread from its pool for processing. That thread then follows the main thread's instructions, completes the task, and returns to the pool, allowing the main thread to remain free for other tasks.
Consider an example: two functions, funA()
and funB()
, where funB()
relies on the output of funA()
. In a single-threaded program:
funA()
funB()
The order is guaranteed. However, in a multithreaded scenario:
# Thread 1
funA()
# Thread 2
funB()
The execution order becomes unpredictable. If funB()
runs before funA()
has completed, the result could be incorrect.
Analogy:
Imagine a busy kitchen with multiple chefs working on the same dish. They share the same utensils and ingredients. Without coordination, two chefs might grab the same tool or ingredient at the same time, causing confusion or mistakes. Likewise, a data race occurs when multiple threads share data without proper synchronization, leading to unpredictable outcomes.
Example:
#include <iostream>
#include <thread>
#include <vector>
// Shared counter variable
int counter = 0;
// Function to increment the counter
void incrementCounter(int numIncrements) {
for (int i = 0; i < numIncrements; ++i) {
// Read, increment, and write back the counter
// This is not an atomic operation and can cause race conditions
counter++;
}
}
int main() {
const int numThreads = 10; // Number of threads
const int incrementsPerThread = 100000; // Increments per thread
std::vector<std::thread> threads;
// Start timer
auto start = std::chrono::high_resolution_clock::now();
// Create and start threads
for (int i = 0; i < numThreads; ++i) {
threads.emplace_back(incrementCounter, incrementsPerThread);
}
// Wait for all threads to finish
for (auto& th : threads) {
th.join();
}
// Stop timer
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = end - start;
// Expected result
int expected = numThreads * incrementsPerThread;
// Output results
std::cout << "Final counter value: " << counter << std::endl;
std::cout << "Expected counter value: " << expected << std::endl;
std::cout << "Time taken: " << elapsed.count() << " seconds" << std::endl;
return 0;
}
Possible Output:
Final counter value: 282345
Expected counter value: 1000000
Time taken: 0.023456 seconds
What is happening:
+----------------------------+
| Shared Counter: 100 |
+----------------------------+
^ ^
| |
+-----+-----+ +---+------+
| Thread 1 | | Thread 2 |
+-----------+ +----------+
| |
| |
| |
[Thread 1] [Thread 2]
Read Counter = 100 Read Counter = 100
| |
| |
| |
[Thread 1] [Thread 2]
Increment: 100 + 1 = 101
| |
| |
| |
[Thread 1] [Thread 2]
Write Counter = 101 Write Counter = 101
| |
+----------------------------+
| Shared Counter: 101 |
+----------------------------+
In this scenario, both threads read the same value (100) before either has a chance to write back the incremented value. This leads to lost updates and an incorrect final result.
What do we mean by a resource?
In the context of computing and multithreading, a resource refers to any hardware or software component that applications and processes need to operate effectively. This includes elements such as CPU time, memory, storage, network bandwidth, files, and shared data structures. Resources are limited and must be managed efficiently to make sure that multiple threads or processes can access them without conflicts. Proper resource management is necessary for maintaining optimal system performance, preventing bottlenecks, and avoiding issues like deadlocks or excessive contention when multiple threads compete for the same assets.
Analogy:
Imagine a single-stall public restroom. If multiple people try to enter simultaneously, chaos ensues. Instead, a lock on the door ensures only one person can use it at a time. Similarly, a mutex ensures exclusive access to a shared resource.
Example:
#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
// Shared counter variable
int counter = 0;
// Mutex to protect the counter
std::mutex counterMutex;
// Function to increment the counter with synchronization
void incrementCounterSafe(int numIncrements) {
for (int i = 0; i < numIncrements; ++i) {
std::lock_guard<std::mutex> lock(counterMutex);
counter++;
}
}
int main() {
const int numThreads = 10;
const int incrementsPerThread = 100000;
std::vector<std::thread> threads;
// Start timer
auto start = std::chrono::high_resolution_clock::now();
// Create and start threads
for (int i = 0; i < numThreads; ++i) {
threads.emplace_back(incrementCounterSafe, incrementsPerThread);
}
// Wait for all threads to finish
for (auto& th : threads) {
th.join();
}
// Stop timer
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = end - start;
// Expected result
int expected = numThreads * incrementsPerThread;
// Output results
std::cout << "Final counter value: " << counter << std::endl;
std::cout << "Expected counter value: " << expected << std::endl;
std::cout << "Time taken: " << elapsed.count() << " seconds" << std::endl;
return 0;
}
Possible Output:
Final counter value: 1000000
Expected counter value: 1000000
Time taken: 0.234567 seconds
What is happening:
┌────────────────────────────┐
│ Shared Counter: 100 │
└────────────────────────────┘
▲ ▲
│ │
┌─────┴─────┐ ┌─────┴─────┐
│ Thread 1 │ │ Thread 2 │
└─────┬─────┘ └─────┬─────┘
│ │ WAITING
│ -----------------
▼ |
┌─────────────────────────────────┐ |
│ [Thread 1 acquires mutex] │ |
│ [Thread 1] Read Counter = 100 │ |
│ [Thread 1] Increment to 101 │ |
│ [Thread 1] Write Counter = 101 │ |
│ [Thread 1 releases mutex] │ |
└─────────────────────────────────┘ |
▼
┌────────────────────────────────┐
│ [Thread 2 acquires mutex] │
│ [Thread 2] Read Counter = 101 │
│ [Thread 2] Increment to 102 │
│ [Thread 2] Write Counter = 102 │
│ [Thread 2 releases mutex] │
└────────────────────────────────┘
The mutex ensures that only one thread can modify the shared counter at a time, resulting in a correct final value but with additional locking overhead.
An atomic operation ensures that a read-modify-write sequence completes as one indivisible action. This means no other thread can interrupt or observe a partial update, preventing data races for simple shared variables without needing a heavier synchronization mechanism like a mutex. Atomic operations can apply to various fundamental data types (e.g., int
, bool
, pointer
types) and, in many implementations, to user-defined types that are trivially copyable and do not exceed a certain size (often the size of a machine word).
In C++, these atomic types are provided by std::atomic<T>
, and some specialized versions like std::atomic_flag
offer specific functionalities. The standard guarantees that reads and writes to these types occur as single, uninterruptible steps. Operations like load
, store
, fetch_add
, fetch_sub
, compare_exchange
, and similar can all be made atomic.
What do we gain by using atomics?
What do we lose by using atomics?
Analogy:
Imagine a vending machine that instantly dispenses an item the moment you press a button and inserts your bill into a slot—no one can see a partial transaction or grab the bill out mid-transaction. The entire action (paying and getting the item) is handled as a single, uninterruptible event.
Example:
#include <iostream>
#include <thread>
#include <vector>
#include <atomic>
std::atomic<int> counter(0);
void incrementCounterAtomic(int numIncrements) {
for (int i = 0; i < numIncrements; ++i) {
counter.fetch_add(1, std::memory_order_relaxed);
}
}
int main() {
const int numThreads = 10;
const int incrementsPerThread = 100000;
std::vector<std::thread> threads;
// Create and start threads
for (int i = 0; i < numThreads; ++i) {
threads.emplace_back(incrementCounterAtomic, incrementsPerThread);
}
// Wait for all threads to finish
for (auto& th : threads) {
th.join();
}
std::cout << "Final counter value: " << counter << std::endl;
std::cout << "Expected counter value: " << (numThreads * incrementsPerThread) << std::endl;
return 0;
}
What is happening:
Atomic Counter
Thread 1 | Thread 2
-------------------+-------------------
Read & Inc |
| Read & Inc
| |
Write: 101 ----> No Interruption <---- Write: 102
| |
v v
next iteration next iteration
(All increments happen as atomic steps,
so partial updates are never seen)
To clear up the common confusion surrounding this term, let’s clarify how it differs from related concepts:
A deadlock occurs when two or more threads are blocked, each waiting for a lock that another thread already holds. Because all threads are waiting on one another, no progress can be made, and the system is effectively stuck.
Analogy:
Imagine two cars on a narrow one-lane bridge coming from opposite ends. Each driver refuses to back up, and neither can move forward. Both are blocked indefinitely, waiting for the other to yield.
Example:
#include <iostream>
#include <thread>
#include <mutex>
std::mutex mutexA;
std::mutex mutexB;
void threadFunc1() {
std::lock_guard<std::mutex> lock1(mutexA);
std::this_thread::sleep_for(std::chrono::milliseconds(50)); // simulate work
std::lock_guard<std::mutex> lock2(mutexB);
}
void threadFunc2() {
std::lock_guard<std::mutex> lock1(mutexB);
std::this_thread::sleep_for(std::chrono::milliseconds(50)); // simulate work
std::lock_guard<std::mutex> lock2(mutexA);
}
int main() {
std::thread t1(threadFunc1);
std::thread t2(threadFunc2);
t1.join();
t2.join();
return 0;
}
What is happening:
Thread 1 Thread 2
| |
v v
Lock(mutexA) Lock(mutexB)
| |
|-------Wait(mutexB) <----|
| |
| |-------Wait(mutexA)
v v
BLOCKED BLOCKED
(Each thread holds one lock and waits
for the other lock to be released.
Neither lock is ever freed -> deadlock)
A livelock occurs when two or more threads actively respond to each other in a way that prevents them from making progress. Unlike a deadlock, the threads are not blocked; they keep "moving," but they continually change their states in a manner that still prevents the system from completing its task.
Analogy:
Picture two people in a narrow hallway who both step aside to let the other pass—only to keep stepping in the same direction repeatedly. They’re not standing still, but neither can get by the other.
Example:
#include <iostream>
#include <thread>
#include <mutex>
#include <atomic>
std::mutex mutex1;
std::mutex mutex2;
std::atomic<bool> is_done(false);
void thread1() {
while (!is_done.load()) {
if (mutex1.try_lock()) {
if (mutex2.try_lock()) {
std::cout << "Thread 1 completes work.\n";
is_done.store(true);
mutex2.unlock();
}
mutex1.unlock();
}
// Thread tries, fails or succeeds,
// then repeats without blocking indefinitely.
}
}
void thread2() {
while (!is_done.load()) {
if (mutex2.try_lock()) {
if (mutex1.try_lock()) {
std::cout << "Thread 2 completes work.\n";
is_done.store(true);
mutex1.unlock();
}
mutex2.unlock();
}
}
}
int main() {
std::thread t1(thread1);
std::thread t2(thread2);
t1.join();
t2.join();
return 0;
}
What is happening:
Thread 1 Thread 2
try_lock(mutex1) try_lock(mutex2)
| |
success? success?
| |
try_lock(mutex2) try_lock(mutex1)
| |
success? success?
| |
release/retry release/retry
| |
v v
loop again loop again
(Threads keep attempting to acquire both locks,
but they often release them and try again at the
same time, never settling and never fully blocking,
thus making no actual forward progress -> livelock)
A semaphore is a synchronization mechanism that uses a counter to control how many threads can access a shared resource at once. Each thread performs an atomic wait (or acquire) operation before entering the critical section, which decrements the semaphore’s counter. When a thread finishes its work, it performs a signal (or release) operation, incrementing the counter and allowing other waiting threads to proceed.
Analogy:
Think of a parking garage with a limited number of spaces. Each car (thread) must check if a space is available before entering (acquire). If no space is free, the car must wait. When a car leaves (release), a space opens up for the next waiting car.
Example (using C++20 counting semaphore):
#include <iostream>
#include <thread>
#include <vector>
#include <semaphore>
#include <chrono>
// A counting semaphore initialized to allow 2 concurrent threads
std::counting_semaphore<2> sem(2);
void worker(int id) {
// Acquire a slot
sem.acquire();
std::cout << "Thread " << id << " enters critical section.\n";
// Simulate some work
std::this_thread::sleep_for(std::chrono::milliseconds(100));
std::cout << "Thread " << id << " leaves critical section.\n";
// Release the slot
sem.release();
}
int main() {
std::vector<std::thread> threads;
// Launch multiple threads
for (int i = 0; i < 5; ++i) {
threads.emplace_back(worker, i);
}
// Wait for all to finish
for (auto &t : threads) {
t.join();
}
return 0;
}
What is happening:
[Semaphore with count = 2]
-----------------+-----------------+-----------------
Thread 0 | Thread 1 | Thread 2 ...
tries sem.acquire() |
| |
[Slot1 free, Slot2 free] |
acquires Slot1 -> count=1 |
| |
v |
"In critical section" |
| |
Thread 1 tries sem.acquire() |
acquires Slot2 -> count=0 |
| |
v |
"In critical section" |
... Meanwhile ...
Thread 2 tries sem.acquire()
|
v
Must wait because count=0
Once Thread 0 or 1 calls sem.release():
- count increments by 1
- Thread 2 (or next in line) acquires and enters
Binary Semaphore vs. Mutex
There is a common misconception that a binary semaphore and a mutex are equivalent. While both can restrict access to a resource, their primary use cases differ:
Multithreading Automatically Improves Performance
Many developers believe that incorporating multiple threads always leads to faster execution. However, multithreading can also slow down an application if not designed and tuned properly. The overhead of context switching, synchronization, and resource contention can negate performance gains, especially if the tasks are not well-suited for parallelism.
More Threads Equals Better Performance
It is often assumed that creating more threads will consistently boost performance. In reality, once the number of threads exceeds the available CPU cores or the nature of the task’s concurrency limits, performance may degrade. Excessive thread creation can lead to increased scheduling overhead, cache thrashing, and resource contention—ultimately harming efficiency.
Multithreaded Code Is Always Harder to Write and Maintain
While concurrency introduces challenges—such as synchronization, potential race conditions, and timing-related bugs—multithreaded code is not necessarily more difficult to manage than single-threaded code. Modern languages and frameworks provide abstractions (e.g., thread pools, futures, async/await mechanisms) that simplify parallelism. With proper design, testing strategies, and usage of these tools, writing reliable and maintainable multithreaded applications becomes more approachable.
Use case | Pattern summary | C++ (preferred) | Python (preferred) | Why this choice | Shutdown behavior |
High-throughput web server: handle many requests | Worker pool reuses threads for each request | Thread pool (joinable) | ThreadPoolExecutor (joinable) | Limits thread count, good control of lifecycle, back-pressure via queue | Stop accepting work, drain queue, join pool |
Background logging/telemetry uploader | Fire-and-forget uploader reading from a queue | jthread (auto-join) | Daemon thread | Non-critical work; jthread lets you cancel; daemon won’t block exit | On shutdown, try flush; daemons may drop work |
Periodic metrics/health pinger | Loop with sleep to send heartbeats | jthread (auto-join) | Daemon thread | Runs for app lifetime; easy cancel/stop; ok if it ends abruptly | Signal stop, allow one last send if needed |
GUI app: offload long task (keep UI responsive) | Worker thread performs blocking work, reports progress | Joinable thread (or QThread) | Joinable thread | Ensure task finishes/cleans up before closing app | Signal cancel, join before window closes |
Real-time sensor acquisition → queue | Reader thread pushes samples to a bounded queue | Joinable thread (or jthread) | Joinable thread | Data loss unacceptable; deterministic shutdown | Stop signal, flush buffer, join |
Producer→consumer pipeline (download→parse→write) | Stage per thread, connected by queues | Joinable threads or pool | Joinable threads with queue | Back-pressure and orderly teardown | Send sentinels, join in stage order |
Parallel I/O (e.g., web scraping many hosts) | Cap concurrency using pool/executor | Thread pool (joinable) | ThreadPoolExecutor (joinable) | I/O-bound; pooling avoids oversubscription | Shutdown executor, wait=True |
CPU-bound parallel compute (e.g., image filters) | True parallelism for heavy CPU tasks | Thread pool (joinable) | Use processes (multiprocessing) | C++ threads run in parallel; Python GIL limits CPU threads | Join pool / close process pool |
Connection timeout watchdog | Sleeps, then cancels/alerts if overdue | jthread (auto-join) | Daemon thread or Timer | Short-lived helper; safe if it dies at exit | Cancel timer / stop token |
Cache warmer / prefetcher | Preloads likely-needed data in background | Detached thread | Daemon thread | Best-effort; shouldn’t block shutdown | Allow early exit; no guarantees |
Background email/SMS sender in a web app | Queue of messages consumed by workers | Thread pool (joinable) | ThreadPoolExecutor (joinable) | Must ensure delivery or retry logic | Drain queue, join; otherwise use external task queue |
Game asset streaming loader | Loads textures/models while game runs | Joinable thread | Joinable thread | Coordinate with main loop; avoid torn state | Signal cancel, join before scene swap |
Game telemetry/analytics uploader | Buffers and uploads non-critical events | Detached thread | Daemon thread | Don’t stall frame rate or exit | Best-effort flush only |
File system watcher (hot-reload) | Watches dirs and enqueues change events | Joinable thread | Joinable thread | Needs clean shutdown to release handles | Stop watcher, join |
DB connection keepalive / pool maintenance | Occasional pings, cleanup of idle conns | jthread (auto-join) | Daemon thread | Low-importance periodic task | Cancel on stop; ok to skip final ping |
CLI tool spawns background maintenance (e.g., log rotation) | Helper that outlives brief main work | Detached thread | Daemon thread | Main shouldn’t wait; tolerates early exit | No join; rely on OS cleanup |
Small chat server: one thread per client (naïve) | Spawn per connection, handle then exit | Detached thread (small scale only) | Joinable thread (or executor) | Detached avoids bookkeeping; Python prefers controlled join | Prefer pool for scale; join on shutdown |
Market data listener → queue | Network read loop pushes updates | Joinable thread (or jthread) | Joinable thread | Data integrity and ordering matter | Signal stop, flush, join |
In C++, every application starts with a single default main thread, represented by the main()
function. This main thread can create additional threads, which are useful for performing multiple tasks simultaneously. Since C++11, the Standard Library provides the std::thread
class to create and manage threads. The creation of a new thread involves defining a function that will execute in parallel and passing it to the std::thread
constructor, along with any arguments required by that function.
A new thread in C++ can be created by instantiating the std::thread
object. The constructor accepts a callable object (like a function, lambda, or function object) and optional arguments to be passed to the callable object.
#include <iostream>
#include <thread>
void printMessage(const std::string& message) {
std::cout << message << std::endl;
}
int main() {
std::thread t1(printMessage, "Hello from thread!");
t1.join(); // Wait for the thread to finish
return 0;
}
In this example, printMessage
is called in a separate thread, and the main thread waits for t1
to complete using join()
.
The join()
function is called on a std::thread
object to wait for the associated thread to complete execution. This blocks the calling thread until the thread represented by std::thread
finishes.
Advantages:
Disadvantages:
t1.join(); // Main thread waits for t1 to finish
Using detach()
, a thread is separated from the std::thread
object and continues to execute independently. This allows the main thread to proceed without waiting for the detached thread to finish. However, once detached, the thread becomes non-joinable, meaning it cannot be waited on or joined, and it will run independently until completion.
Advantages:
Disadvantages:
std::thread t2(printMessage, "This is a detached thread");
t2.detach(); // Main thread does not wait for t2
Each thread has a lifecycle, beginning with creation, execution, and finally termination. Upon termination, the resources held by the thread need to be cleaned up. If a thread object goes out of scope and is still joinable (not yet joined or detached), the program will terminate with std::terminate
because it is considered an error to destroy a std::thread
object without properly handling the thread.
Arguments can be passed to the thread function through the std::thread
constructor. The arguments are copied or moved as necessary. Special care must be taken when passing pointers or references, as these must refer to objects that remain valid throughout the thread's execution.
#include <iostream>
#include <thread>
void printSum(int a, int b) {
std::cout << "Sum: " << (a + b) << std::endl;
}
int main() {
int x = 5, y = 10;
std::thread t(printSum, x, y); // Passing arguments by value
t.join();
return 0;
}
In this example, x
and y
are passed by value to the printSum
function.
Lambda expressions provide a convenient way to define thread tasks inline. They can capture local variables by value or reference, allowing for flexible and concise thread management.
#include <iostream>
#include <thread>
int main() {
int a = 5, b = 10;
std::thread t([a, b]() {
std::cout << "Lambda Sum: " << (a + b) << std::endl;
});
t.join();
return 0;
}
In this case, the lambda captures a
and b
by value and uses them inside the thread.
std::mutex
is used to protect shared data from being accessed simultaneously by multiple threads. It ensures that only one thread can access the critical section at a time, preventing data races.
#include <iostream>
#include <thread>
#include <mutex>
std::mutex mtx;
int sharedCounter = 0;
void increment() {
std::lock_guard<std::mutex> lock(mtx);
++sharedCounter;
}
int main() {
std::thread t1(increment);
std::thread t2(increment);
t1.join();
t2.join();
std::cout << "Shared Counter: " << sharedCounter << std::endl;
return 0;
}
In this example, std::lock_guard
automatically locks the mutex on creation and unlocks it on destruction, ensuring the increment operation is thread-safe.
Deadlocks occur when two or more threads are waiting for each other to release resources, resulting in a standstill. To avoid deadlocks, you need to lock multiple resources in a consistent order, use try-lock mechanisms, or employ higher-level concurrency primitives like std::lock
or condition variables.
#include <iostream>
#include <thread>
#include <mutex>
std::mutex mutex1;
std::mutex mutex2;
void taskA() {
std::lock(mutex1, mutex2);
std::lock_guard<std::mutex> lock1(mutex1, std::adopt_lock);
std::lock_guard<std::mutex> lock2(mutex2, std::adopt_lock);
std::cout << "Task A acquired both mutexes\n";
}
void taskB() {
std::lock(mutex1, mutex2);
std::lock_guard<std::mutex> lock1(mutex1, std::adopt_lock);
std::lock_guard<std::mutex> lock2(mutex2, std::adopt_lock);
std::cout << "Task B acquired both mutexes\n";
}
int main() {
std::thread t1(taskA);
std::thread t2(taskB);
t1.join();
t2.join();
return 0;
}
Here, std::lock
locks both mutexes without risking a deadlock by ensuring that both mutexes are acquired in a consistent order.
std::condition_variable
is used for thread synchronization by allowing threads to wait until they are notified to proceed. This is useful for scenarios where a thread must wait for some condition to become true.
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
std::mutex mtx;
std::condition_variable cv;
bool ready = false;
void print_id(int id) {
std::unique_lock<std::mutex> lock(mtx);
cv.wait(lock, [] { return ready; });
std::cout << "Thread " << id << "\n";
}
void set_ready() {
std::unique_lock<std::mutex> lock(mtx);
ready = true;
cv.notify_all();
}
int main() {
std::thread t1(print_id, 1);
std::thread t2(print_id, 2);
std::thread t3(set_ready);
t1.join();
t2.join();
t3.join();
return 0;
}
In this example, cv.wait
makes the threads wait until ready
becomes true. set_ready
changes the condition and notifies all waiting threads.
C++20 introduces std::counting_semaphore
and std::binary_semaphore
. Semaphores are synchronization primitives that control access to a common resource by multiple threads. They use a counter to allow a fixed number of threads to access a resource concurrently.
#include <iostream>
#include <thread>
#include <semaphore>
std::binary_semaphore semaphore(1);
void task(int id) {
semaphore.acquire();
std::cout << "Task " << id << " is running\n";
std::this_thread::sleep_for(std::chrono::milliseconds(100)); // simulate some work
semaphore.release();
}
int main() {
std::thread t1(task, 1);
std::thread t2(task, 2);
t1.join();
t2.join();
return 0;
}
Here, semaphore.acquire()
ensures that only one thread can access the critical section at a time, and semaphore.release()
signals that the resource is available again.
C++ provides thread-local storage via the thread_local
keyword, allowing data to be local to each thread. This is useful when each thread requires its own instance of a variable, such as when storing non-shared data.
#include <iostream>
#include <thread>
thread_local int localVar = 0;
void increment(int id) {
++localVar;
std::cout << "Thread " << id << ": localVar = " << localVar << std::endl;
}
int main() {
std::thread t1(increment, 1);
std::thread t2(increment, 2);
t1.join();
t2.join();
return 0;
}
In this example, each thread has its own instance of localVar
, independent of the other threads.
For cases where synchronization is needed, but mutexes are too heavy-weight, C++ provides atomic operations via the std::atomic
template. This allows for lock-free programming and can be used to implement simple data structures or counters safely in a multithreaded environment.
#include <iostream>
#include <thread>
#include <atomic>
std::atomic<int> atomicCounter(0);
void increment() {
for (int i = 0; i < 100000; ++i) {
++atomicCounter;
}
}
int main() {
std::thread t1(increment);
std::thread t2(increment);
t1.join();
t2.join();
std::cout << "Atomic Counter: " << atomicCounter << std::endl;
return 0;
}
In this example, std::atomic<int>
ensures that the increment operation is atomic, preventing data races.
When using atomic operations in C++, we not only specify which operations should be atomic, but also how they synchronize with other memory operations in the program. This “how” is controlled by memory orderings—a set of rules that govern visibility and ordering of reads and writes.
C++ provides six memory order enumerations in std::memory_order
:
std::memory_order_relaxed
std::memory_order_consume
(mostly unimplemented in mainstream compilers) std::memory_order_acquire
std::memory_order_release
std::memory_order_acq_rel
std::memory_order_seq_cst
Each ordering offers different guarantees about how operations on one thread become visible to other threads and in what sequence they appear to happen. Understanding these guarantees can greatly affect both the correctness and performance of concurrent code.
Below is a comparison table that summarizes the main C++ memory orderings, their guarantees, common use cases, and potential pitfalls. Use this as a quick reference to decide which ordering is best suited for a particular concurrency scenario.
Memory Order | Brief Description | Guarantees | Common Use Cases | Pitfalls & Advice |
std::memory_order_relaxed |
Provides only atomicity, no ordering constraints | - The operation itself is atomic (indivisible) - No guarantees about visibility or ordering relative to other operations |
- Simple counters or statistics - Non-critical flags where ordering doesn’t matter |
- Easy to introduce data races if other parts of the program rely on the update’s order - Great performance but requires careful design |
std::memory_order_consume |
Intended to enforce data dependency ordering (rarely implemented properly) | - Theoretically only dependent reads are ordered - In practice, compilers often treat it like acquire |
- Very specialized, mostly replaced by acquire in real-world code |
- Not well supported by most compilers - Avoid in portable or production code |
std::memory_order_acquire |
Prevents following reads/writes from moving before the acquire operation | - Ensures that subsequent operations see all side effects that happened before a matching release - Acts as a one-way barrier after the load |
- Loading a “ready” flag to know that data is now valid - Synchronizing consumer who must see the producer’s writes |
- Only ensures that instructions after the acquire load can’t be reordered before it - Must pair with release for full producer-consumer semantics |
std::memory_order_release |
Prevents preceding reads/writes from moving after the release operation | - Ensures all prior writes are visible to a thread that does an acquire on the same atomic - One-way barrier before the store |
- Setting a “ready” flag after populating shared data - Synchronizing producer who writes data before signaling availability |
- Doesn’t prevent instructions after the release from moving before it - Must pair with acquire to guarantee another thread will observe the updates |
std::memory_order_acq_rel |
Acquire + Release in one read-modify-write operation | - Combines the effects of acquire and release for RMW ops (e.g., fetch_add , compare_exchange ) - Ensures no reorder before or after the operation |
- Updating shared state in a single atomic step where you must see previous writes and publish new writes (e.g., lock-free structures) | - Can be stronger (thus slower) than needed if you only require a one-way barrier - Must be used carefully in highly concurrent scenarios |
std::memory_order_seq_cst |
Enforces total sequential consistency across all threads | - Provides a single, global order of all sequentially consistent operations - Easiest model to reason about, strongest ordering guarantee |
- When correctness is paramount and performance concerns are secondary - Prototyping concurrency code before optimizing |
- Highest potential performance cost - May introduce unnecessary fences on weaker architectures |
What Do We Gain By Careful Use of Memory Orderings?
relaxed
, acquire
, and release
can compile to more efficient instructions on some hardware, resulting in better performance compared to using a blanket seq_cst
.What Do We Lose / Need to Beware Of?
seq_cst
everywhere can lead to over-synchronization, causing potential performance losses by missing out on possible optimizations.Analogy
Imagine you’re coordinating a relay race:
release
operation is like handing the baton off—ensuring everything you’ve done (run your segment) is finished before the next runner picks it up. acquire
operation is the next runner receiving the baton—ensuring they see everything you did (how far you ran, the state of the race) the moment they take it. relaxed
would be like running without caring about handing the baton off or receiving it properly—fast, but not synchronized. seq_cst
would be like having a strict official track judge making sure everyone runs in a strictly observed, universal order—less chance of cheating but more overhead.Example
Below is a small snippet that demonstrates release
and acquire
:
#include <atomic>
#include <vector>
#include <thread>
#include <iostream>
struct SharedData {
int value;
};
std::atomic<bool> ready(false);
SharedData data;
void producer() {
// 1. Write to shared data
data.value = 42;
// 2. Publish that data is ready
ready.store(true, std::memory_order_release);
}
void consumer() {
// Wait until the data is ready
while (!ready.load(std::memory_order_acquire)) {
// spin or sleep
}
// Now it is guaranteed that we see data.value = 42
std::cout << "Shared data value = " << data.value << std::endl;
}
int main() {
std::thread t1(producer);
std::thread t2(consumer);
t1.join();
t2.join();
return 0;
}
data.value = 42
and then calls ready.store(true, std::memory_order_release)
, ensuring that any subsequent acquire operation on ready
will see the updated data.value
.ready.load(std::memory_order_acquire)
becomes true, and because it’s an acquire load, once it returns true, the consumer also sees data.value = 42
.What is happening:
Producer Thread Consumer Thread
| |
data.value = 42 ...
| |
ready.store(true, release) ready.load(acquire) --> sees true
| |
v |
[ memory fence ] v
sees data.value = 42
data.value = 42
, are visible to another thread that performs an acquire operation.ready
is seen as true
, the consumer consistently sees the "before-release" state, such as data.value = 42
.std::thread
, std::mutex
, std::lock_guard
, and std::future
, can greatly simplify thread management and help avoid common pitfalls. These features provide robust and standardized ways to handle concurrency, making the code more maintainable and less error-prone.Here are some example code snippets demonstrating various aspects of multithreading in C++:
# | Example | Description |
1 | single_worker_thread | Introduce the concept of threads by creating a single worker thread using std::thread . |
2 | thread_subclass | Demonstrate how to create a custom thread class by inheriting std::thread . |
3 | multiple_worker_threads | Show how to create and manage multiple worker threads using std::thread . |
4 | race_condition | Explain race conditions and their impact on multi-threaded applications using C++ examples. |
5 | mutex | Illustrate the use of std::mutex to protect shared resources and avoid race conditions in C++ applications. |
6 | semaphore | Demonstrate the use of std::counting_semaphore to limit the number of concurrent threads accessing a shared resource in C++ applications. |
7 | producer_consumer | Present a classic multi-threading problem (Producer-Consumer) and its solution using C++ synchronization mechanisms like std::mutex and std::condition_variable . |
8 | fetch_parallel | Showcase a practical application of multi-threading for parallel fetching of data from multiple sources using C++ threads. |
9 | merge_sort | Use multi-threading in C++ to parallelize a merge sort algorithm, demonstrating the potential for performance improvements. |
10 | schedule_every_n_sec | Show how to schedule tasks to run periodically at fixed intervals using C++ threads. |
11 | barrier | Demonstrate the use of std::barrier to synchronize multiple threads at a specific point in the execution. |
12 | thread_local_storage | Illustrate the concept of Thread Local Storage (TLS) and how it can be used to store thread-specific data. |
13 | thread_pool | Show how to create and use a thread pool to efficiently manage a fixed number of worker threads for executing multiple tasks. |
14 | reader_writer_lock | Explain the concept of Reader-Writer Locks and their use for efficient access to shared resources with multiple readers and a single writer. |
Python provides built-in support for concurrent execution through the threading
module. While the Global Interpreter Lock (GIL) in CPython limits the execution of multiple native threads to one at a time per process, threading is still useful for I/O-bound tasks, where the program spends a lot of time waiting for external events.
To create a new thread, you can instantiate the Thread
class from the threading
module. The target function to be executed by the thread is passed to the target
parameter, along with any arguments required by the function.
import threading
def print_message(message):
print(message)
# Create a thread
t1 = threading.Thread(target=print_message, args=("Hello from thread!",))
t1.start()
t1.join() # Wait for the thread to finish
In this example, the print_message
function is executed in a new thread.
Using the join()
method ensures that the main thread waits for the completion of the thread. This is important for coordinating threads, especially when the main program depends on the thread's results.
t1.join() # Main thread waits for t1 to finish
Python threads do not have a direct detach()
method like C++. However, once started, a thread runs independently. The main program can continue executing without waiting for the threads, similar to detached threads in C++. However, you should ensure that all threads complete before the program exits to avoid abrupt termination.
Python threads are automatically managed by the interpreter. However, you should still ensure that threads are properly joined or allowed to finish their tasks to prevent any issues related to resource management or incomplete executions.
Arguments can be passed to the thread function via the args
parameter when creating the Thread
object. This allows for flexible and dynamic argument passing.
import threading
def add(a, b):
print(f"Sum: {a + b}")
# Create a thread
t2 = threading.Thread(target=add, args=(5, 10))
t2.start()
t2.join()
Lambda expressions can also be used with threads, providing a concise way to define thread tasks. This is particularly useful for simple operations.
import threading
# Create a thread with a lambda function
t3 = threading.Thread(target=lambda: print("Hello from a lambda thread"))
t3.start()
t3.join()
The Lock
class from the threading
module is used to ensure that only one thread accesses a critical section of code at a time. This prevents race conditions by locking the shared resource.
import threading
counter = 0
counter_lock = threading.Lock()
def increment():
global counter
with counter_lock:
counter += 1
# Create multiple threads
threads = [threading.Thread(target=increment) for _ in range(10)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Counter: {counter}")
In this example, counter_lock
ensures that only one thread modifies the counter
variable at a time.
Deadlocks can occur when multiple threads are waiting for each other to release resources. In Python, you can avoid deadlocks by carefully planning the order of acquiring locks or by using try-lock
mechanisms.
import threading
lock1 = threading.Lock()
lock2 = threading.Lock()
def task1():
with lock1:
print("Task 1 acquired lock1")
with lock2:
print("Task 1 acquired lock2")
def task2():
with lock2:
print("Task 2 acquired lock2")
with lock1:
print("Task 2 acquired lock1")
# Create threads
t4 = threading.Thread(target=task1)
t5 = threading.Thread(target=task2)
t4.start()
t5.start()
t4.join()
t5.join()
In this example, care must be taken to avoid deadlocks by ensuring that locks are acquired in a consistent order.
Condition
variables allow threads to wait for some condition to be true before proceeding. This is useful in producer-consumer scenarios.
import threading
condition = threading.Condition()
item_available = False
def producer():
global item_available
with condition:
item_available = True
print("Producer produced an item")
condition.notify()
def consumer():
global item_available
with condition:
condition.wait_for(lambda: item_available)
print("Consumer consumed an item")
item_available = False
# Create threads
t6 = threading.Thread(target=producer)
t7 = threading.Thread(target=consumer)
t6.start()
t7.start()
t6.join()
t7.join()
Here, the consumer waits for the producer to produce an item before proceeding.
Python's threading
module includes Semaphore
and BoundedSemaphore
for managing access to a limited number of resources.
import threading
sem = threading.Semaphore(2) # Allows up to 2 threads to access the resource
def access_resource(thread_id):
with sem:
print(f"Thread {thread_id} is accessing the resource")
# Simulate some work
threading.Thread.sleep(1)
# Create multiple threads
threads = [threading.Thread(target=access_resource, args=(i,)) for i in range(5)]
for t in threads:
t.start()
for t in threads:
t.join()
In this example, the semaphore limits access to a resource, allowing only two threads to enter the critical section at a time.
Python provides threading.local()
to store data that should not be shared between threads.
import threading
local_data = threading.local()
def process():
local_data.value = 5
print(f"Thread {threading.current_thread().name} has value {local_data.value}")
# Create threads
t8 = threading.Thread(target=process, name="Thread-A")
t9 = threading.Thread(target=process, name="Thread-B")
t8.start()
t9.start()
t8.join()
t9.join()
In this example, each thread has its own local_data
value, independent of the others.
In multi-threaded Python programs, there is often confusion regarding whether certain operations are truly atomic. This confusion largely stems from the presence of the Global Interpreter Lock (GIL), which ensures that only one thread is executing Python bytecode at any given time. Some developers interpret this to mean that operations like counter += 1
are automatically safe and cannot cause race conditions. However, this is not guaranteed by Python's documentation or design.
While the GIL does prevent multiple threads from running Python bytecode simultaneously, many Python operations, including integer increments, actually consist of several steps under the hood (e.g., loading the current value, creating a new integer, and storing it). These intermediate steps can be interleaved with operations from other threads, making race conditions possible if no additional synchronization mechanism is employed. Therefore, if you need to ensure correct and consistent results when multiple threads modify a shared variable, you must use locks (like threading.Lock
) or other thread-safe data structures.
Below is an example illustrating the use of a lock to ensure a thread-safe increment of a shared counter
:
import threading
counter = 0
counter_lock = threading.Lock()
def safe_increment():
global counter
with counter_lock:
temp = counter
temp += 1
counter = temp
# Create and start threads
threads = [threading.Thread(target=safe_increment) for _ in range(1000)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Counter: {counter}")
In this example, counter_lock
ensures that the increment operation is effectively atomic by preventing multiple threads from modifying counter
at the same time. Without this lock, two or more threads could potentially load the same value of counter
, increment it independently, and overwrite each other's updates—resulting in an incorrect final value. Keep in mind that the GIL itself does not guarantee atomicity for these kinds of operations, which is why locks (or other concurrency primitives) are essential when sharing mutable state across threads.
concurrent.futures.ThreadPoolExecutor
should be used.thread.setDaemon(True)
, ensuring that these threads do not prevent the application from closing if they are still running.Here are some example code snippets demonstrating various aspects of multithreading in Python:
# | Example | Description | |
1 | single_worker_thread | Introduce the concept of threads by creating a single worker thread. | |
2 | thread_subclass | Demonstrate how to create a custom thread class by subclassing Thread . |
|
3 | multiple_worker_threads | Show how to create and manage multiple worker threads. | |
4 | race_condition | Explain race conditions and their impact on multi-threaded applications. | |
5 | mutex | Illustrate the use of mutexes to protect shared resources and avoid race conditions. | |
6 | semaphore | Demonstrate the use of semaphores to limit the number of concurrent threads accessing a shared resource. | |
7 | producer_consumer | Present a classic multi-threading problem (Producer-Consumer) and its solution using synchronization mechanisms like mutexes and condition variables. | |
8 | fetch_parallel | Showcase a practical application of multi-threading for parallel fetching of data from multiple sources. | |
9 | merge_sort | Use multi-threading to parallelize a merge sort algorithm, demonstrating the potential for performance improvements. | |
10 | schedule_every_n_sec | Show how to schedule tasks to run periodically at fixed intervals using threads. | |
11 | barrier | Demonstrate the use of barriers to synchronize multiple threads at a specific point in the execution. | |
12 | thread_local_storage | Illustrate the concept of Thread Local Storage (TLS) and how it can be used to store thread-specific data. | |
13 | thread_pool | Show how to create and use a thread pool to efficiently manage a fixed number of worker threads for executing multiple tasks. | |
14 | reader_writer_lock | Explain the concept of Reader-Writer Locks and their use for efficient access to shared resources with multiple readers and a single writer. |
Node.js traditionally uses a single-threaded event loop to handle asynchronous operations. However, since version 10.5.0, Node.js has included support for worker threads, which allow multi-threaded execution. This is particularly useful for CPU-intensive tasks (e.g., image processing, cryptography), which can block the event loop and degrade performance in a purely single-threaded environment.
Worker threads in Node.js are provided by the worker_threads
module, enabling the creation of additional JavaScript execution contexts. Each worker thread runs in its own isolated V8 instance and does not share state with other worker threads or with the main thread. Instead, communication is accomplished by message passing and, optionally, by sharing specific memory buffers (e.g., SharedArrayBuffer
).
To create a new worker thread, you instantiate the Worker
class from the worker_threads
module. The worker is initialized with a script (or a code string) to execute:
// main.js
const { Worker } = require('worker_threads');
const worker = new Worker('./worker.js'); // Separate file containing worker code
worker.on('message', (message) => {
console.log(Received message from worker: ${message}
);
});
worker.on('error', (error) => {
console.error(Worker error: ${error}
);
});
worker.on('exit', (code) => {
console.log(Worker exited with code ${code}
);
});
// worker.js
const { parentPort } = require('worker_threads');
parentPort.postMessage('Hello from worker');
In this example:
main.js
creates a Worker
instance pointing to the worker.js
file. message
: Triggered when the worker sends data back. error
: Triggered if an uncaught exception occurs in the worker. exit
: Triggered when the worker stops execution. worker.js
obtains a reference to parentPort
(the communication channel back to the main thread) and sends a message.Communication between the main thread and worker threads is done via message passing using postMessage
and on('message', callback)
. This serialization-based messaging ensures that no implicit shared state is introduced.
// main.js (continued)
worker.postMessage({ command: 'start', data: 'example data' });
// worker.js (continued)
const { parentPort } = require('worker_threads');
parentPort.on('message', (message) => {
console.log(Worker received: ${JSON.stringify(message)}
);
// Perform CPU-intensive task or other operations
parentPort.postMessage('Processing complete');
});
Here, the main thread sends a structured message to the worker with a command
property and some data
. The worker, upon receiving it, can process the data and then respond back to the main thread.
Workers can be terminated from either the main thread or within the worker itself.
I. From the main thread, you can call worker.terminate()
, which returns a Promise resolving to the exit code:
// main.js
worker.terminate().then((exitCode) => {
console.log(Worker terminated with code ${exitCode}
);
});
II. Inside the worker, you can terminate execution using process.exit()
:
// worker.js
process.exit(0); // Graceful exit
Terminating the worker ends its event loop and frees its resources. Any pending operations in the worker are discarded once termination begins.
You can also pass initial data to the worker at creation time through the Worker
constructor using the workerData
option:
// main.js
const { Worker } = require('worker_threads');
const worker = new Worker('./worker.js', {
workerData: { initialData: 'Hello' }
});
Within worker.js
:
// worker.js
const { workerData, parentPort } = require('worker_threads');
console.log(workerData); // { initialData: 'Hello' }
// Do work, then optionally respond
parentPort.postMessage('Worker started with initial data!');
This pattern is useful for small or essential bits of configuration data that the worker needs right from startup.
Some objects (like ArrayBuffer
and MessagePort
) can be transferred to a worker, meaning the main thread loses ownership and can no longer use the object once it’s transferred. This can be more efficient than copying large data structures.
// main.js
const { Worker } = require('worker_threads');
const buffer = new SharedArrayBuffer(1024);
const worker = new Worker('./worker.js', { workerData: buffer });
In this snippet, a SharedArrayBuffer
is provided to the worker. Both the main thread and the worker thread can access and modify this shared memory concurrently, which is useful for scenarios requiring high-performance concurrent access (e.g., streaming or real-time data processing). Synchronization in such cases typically uses Atomics
(part of JavaScript’s standard library).
Atomics
and SharedArrayBuffer
When sharing memory (via SharedArrayBuffer
), JavaScript provides the Atomics
object for performing atomic operations (e.g., Atomics.add
, Atomics.load
, Atomics.store
). Unlike higher-level synchronization primitives in other languages (like mutexes or semaphores), JavaScript concurrency with SharedArrayBuffer
and Atomics
relies on these low-level primitives for correctness.
Example:
// main.js
const { Worker } = require('worker_threads');
const sharedBuffer = new SharedArrayBuffer(4); // Enough for one 32-bit integer
const worker = new Worker('./worker.js', { workerData: sharedBuffer });
// Optionally communicate via messages as well
worker.on('message', (msg) => {
console.log('Message from worker:', msg);
});
// worker.js
const { parentPort, workerData } = require('worker_threads');
const { Atomics, Int32Array } = globalThis;
// Interpret the shared buffer as a 32-bit integer array of length 1
const sharedArray = new Int32Array(workerData);
for (let i = 0; i < 100000; i++) {
// Atomically increment the integer
Atomics.add(sharedArray, 0, 1);
}
// Once done, send a message back
parentPort.postMessage('Incrementing done!');
In this example:
SharedArrayBuffer
of 4 bytes (enough space for an Int32Array
element). Atomics.add
. sharedArray[0]
safely, without data races.Handling errors in multi-threaded environments isn’t easy, but it’s something you can’t skip:
// main.js
worker.on('error', (error) => {
console.error('Worker error:', error);
});
// worker.js
try {
// perform some operation that might throw
throw new Error('Something went wrong');
} catch (err) {
// Handle locally or propagate
parentPort.postMessage({ error: err.message });
// Optionally re-throw, or process.exit(1) for immediate termination
}
If an uncaught exception occurs in the worker, the main thread’s error
event will fire, allowing you to clean up resources or attempt a restart. Consider carefully whether to handle errors in the worker itself or bubble them up to the main thread.
SharedArrayBuffer
should be used when possible, as they allow for direct memory access without the overhead of copying data.Below is a complete example of using worker threads to calculate prime numbers, demonstrating data passing, message handling, and worker management.
// main.js
const { Worker } = require('worker_threads');
function runService(workerData) {
return new Promise((resolve, reject) => {
const worker = new Worker('./primeWorker.js', { workerData });
worker.on('message', resolve);
worker.on('error', reject);
worker.on('exit', (code) => {
if (code !== 0) {
reject(new Error(Worker stopped with exit code ${code}
));
}
});
});
}
runService(10).then((result) => console.log(result)).catch((err) => console.error(err));
// primeWorker.js
const { parentPort, workerData } = require('worker_threads');
function isPrime(num) {
for (let i = 2, sqrt = Math.sqrt(num); i <= sqrt; i++) {
if (num % i === 0) return false;
}
return num > 1;
}
const primes = [];
for (let i = 2; i <= workerData; i++) {
if (isPrime(i)) primes.push(i);
}
parentPort.postMessage(primes);
In this example, the main thread delegates the task of finding prime numbers up to a certain limit to a worker thread. The worker calculates the primes and sends the results back to the main thread using parentPort.postMessage()
.
Here are some example code snippets demonstrating various aspects of multithreading in JavaScript (Node.js):
# | Example | Description |
1 | single_worker_thread | Introduce the concept of threads by creating a single worker thread using Web Workers. |
2 | thread_subclass | Demonstrate how to create a custom thread class by extending the Worker class. |
3 | multiple_worker_threads | Show how to create and manage multiple worker threads using Web Workers. |
4 | race_condition | Explain race conditions and their impact on multi-threaded applications using JavaScript examples. |
5 | mutex | Illustrate the use of Atomics and SharedArrayBuffer to protect shared resources and avoid race conditions in JavaScript applications. |
6 | semaphore | Demonstrate the use of semaphores to limit the number of concurrent threads accessing a shared resource in JavaScript applications using Atomics and SharedArrayBuffer . |
7 | producer_consumer | Present a classic multi-threading problem (Producer-Consumer) and its solution using JavaScript synchronization mechanisms like Atomics and SharedArrayBuffer . |
8 | fetch_parallel | Showcase a practical application of multi-threading for parallel fetching of data from multiple sources using Web Workers. |
9 | merge_sort | Use multi-threading in JavaScript to parallelize a merge sort algorithm, demonstrating the potential for performance improvements. |
10 | schedule_every_n_sec | Show how to schedule tasks to run periodically at fixed intervals using JavaScript and Web Workers. |
11 | barrier | Demonstrate the use of barriers to synchronize multiple threads at a specific point in the execution. |
12 | thread_local_storage | Illustrate the concept of Thread Local Storage (TLS) and how it can be used to store thread-specific data. |
13 | thread_pool | Show how to create and use a thread pool to efficiently manage a fixed number of worker threads for executing multiple tasks. |
14 | reader_writer_lock | Explain the concept of Reader-Writer Locks and their use for efficient access to shared resources with multiple readers and a single writer. |