Article by Ayman Alheraki on January 11 2026 10:33 AM
The best external library for handling Concurrency and Multithreading in Modern C++ is the Intel Threading Building Blocks (TBB) library. This library is widely used in parallel programming and provides advanced tools to facilitate the creation of multithreaded applications efficiently and safely.
Intel TBB is an open-source library developed by Intel, enabling developers to write applications that leverage multithreading and parallelism efficiently on multi-core systems. The library is specifically designed to provide high performance and reliability in resource and thread management while ensuring data safety through advanced features like Mutexes and Locks.
Automatic Thread Management: The library handles threads on behalf of developers, making it easier to create parallel programs without worrying about complex resource management details.
Dynamic Work Division: TBB offers dynamic work-stealing mechanisms to ensure optimal use of available processors.
Data Safety: The library provides several tools for protecting data across parallel processing, such as Mutexes and Spinlocks.
Modern C++ Support: The library is fully compatible with modern C++ standards and leverages features like Lambdas and Move Semantics.
Scalability: TBB scales efficiently, whether your system has a single processor or hundreds of them.
High-level Parallel Algorithms: The library provides ready-to-use algorithms like parallel_for and parallel_reduce for easy implementation of parallel tasks.
Task-based Parallelism: TBB allows writing task-based programs instead of thread-based ones, which simplifies breaking programs into smaller, independently executable parts.
Mutexes and Spinlocks: The library offers various types of locks, such as Mutex and Spinlock, to protect shared data between threads.
Flow Graph: A tool that lets you design programs based on parallel workflow graphs, enabling complex applications to efficiently manage data flow.
Parallel Algorithms: Intel TBB includes ready-made algorithms that help perform common tasks like parallel loops and reductions.
To start using Intel TBB in a C++ program, follow these steps:
Installing the Library:
You can download the library from the official site or install it via package managers.
On Ubuntu, you can install the library using the following command:
sudo apt-get install libtbb-devA Simple Example Using parallel_for: Here’s a simple example that demonstrates how to use parallel_for for parallel processing of an array:
const int N = 1000;int arr[N];
void parallel_sum() { tbb::parallel_for(tbb::blocked_range<size_t>(0, N), [&](tbb::blocked_range<size_t> r) { for (size_t i = r.begin(); i != r.end(); ++i) { arr[i] = i * 2; // Any calculation } });}
int main() { parallel_sum(); for (int i = 0; i < 10; ++i) { std::cout << arr[i] << " "; } return 0;}In this example, the workload on the array arr is divided into smaller parts, which are processed in parallel using parallel_for.
Using Mutex for Data Protection: When dealing with shared data across threads, you can use Mutex to avoid issues caused by concurrent data access:
tbb::mutex myMutex;int counter = 0;
void increment_counter() { tbb::parallel_for(0, 1000, [&](int i) { myMutex.lock(); ++counter; myMutex.unlock(); });}
int main() { increment_counter(); std::cout << "Counter: " << counter << std::endl; return 0;}Here, tbb::mutex is used to protect access to the shared variable counter, ensuring data safety when handling parallel tasks.
Intel TBB evolves continuously, and the latest major release is Intel TBB 2021, which comes with performance improvements and full support for C++20 standards. New features include:
Better support for C++17 parallel standard libraries.
Performance enhancements for multi-core processors.
Direct support for heterogeneous computing.
You can download the latest version of the library through the project’s official GitHub page:
https://github.com/oneapi-src/oneTBB
Alternatively, the library can be installed via the Intel oneAPI toolkit, which includes TBB as part of the package.
how to use Intel TBB for concurrency and multithreading in C++ on Windows. Before we start, ensure that Intel TBB is installed on your system. You can download the library from the official Intel site.
parallel_forThis example demonstrates how to parallelize a simple loop with parallel_for.
Download and Install Intel TBB:
Download Intel TBB from the official site.
Follow the installation instructions for Windows.
Link TBB to Your Project:
If you're using
Visual Studio
:
Go to Project Properties → C/C++ → General → Additional Include Directories, and add the TBB include path (e.g., C:\path_to_tbb\tbb\include).
In Linker → General → Additional Library Directories, add the path to the TBB library (e.g., C:\path_to_tbb\tbb\lib).
In Linker → Input → Additional Dependencies, add tbb.lib.
const int N = 100000;int arr[N];
void parallel_sum() { tbb::parallel_for(tbb::blocked_range<size_t>(0, N), [](const tbb::blocked_range<size_t>& r) { for (size_t i = r.begin(); i != r.end(); ++i) { arr[i] = i * 2; // Some computation } });}
int main() { parallel_sum(); // Print first 10 results for (int i = 0; i < 10; ++i) { std::cout << arr[i] << " "; } std::cout << std::endl; return 0;}The parallel_for function automatically divides the loop into chunks and assigns them to different threads for parallel execution.
tbb::blocked_range specifies the range of the loop to be processed in parallel.
Compile and run the program using Visual Studio or gcc on Windows after linking the TBB library.
tbb::mutexThis example shows how to use tbb::mutex to protect shared resources between multiple threads.
int counter = 0;tbb::mutex myMutex;
void increment_counter() { tbb::parallel_for(0, 1000, [](int i) { myMutex.lock(); ++counter; myMutex.unlock(); });}
int main() { increment_counter(); std::cout << "Final counter value: " << counter << std::endl; return 0;}Mutex is used to ensure that the shared variable counter is safely incremented without race conditions.
myMutex.lock() and myMutex.unlock() ensure that only one thread can access the counter at a time.
parallel_reduce for Parallel ReductionThe following example demonstrates how to sum an array of integers using parallel_reduce, which is optimized for reduction operations like summing or finding a minimum/maximum value.
const int N = 100000;int arr[N];
void initialize_array() { for (int i = 0; i < N; ++i) { arr[i] = i; }}
int parallel_sum() { return tbb::parallel_reduce( tbb::blocked_range<size_t>(0, N), 0, [](const tbb::blocked_range<size_t>& r, int init) -> int { for (size_t i = r.begin(); i != r.end(); ++i) { init += arr[i]; } return init; }, [](int x, int y) -> int { return x + y; } );}
int main() { initialize_array(); int total_sum = parallel_sum(); std::cout << "Total sum: " << total_sum << std::endl; return 0;}parallel_reduce divides the workload and then reduces (combines) the results from different threads.
The first lambda function processes the range, while the second combines partial results from different threads.
tbb::task_groupIn this example, we use task-based parallelism with tbb::task_group to run independent tasks in parallel.
xxxxxxxxxx
void task1() { std::cout << "Task 1 is running" << std::endl;}
void task2() { std::cout << "Task 2 is running" << std::endl;}
void parallel_tasks() { tbb::task_group tg; tg.run([] { task1(); }); // Runs task1 in parallel tg.run([] { task2(); }); // Runs task2 in parallel tg.wait(); // Wait for both tasks to complete}
int main() { parallel_tasks(); return 0;}tbb::task_group allows you to run multiple independent tasks in parallel.
tg.run schedules a task for execution, and tg.wait ensures that the main thread waits for all tasks to complete.
Intel TBB also provides a flow graph API that allows you to define complex task dependencies. Here is an example of a simple flow graph that runs two tasks concurrently, then combines their results.
int main() { tbb::flow::graph g; tbb::flow::function_node<int, int> task1(g, tbb::flow::unlimited, [](const int& input) -> int { std::cout << "Task 1 processing " << input << std::endl; return input + 1; }); tbb::flow::function_node<int, int> task2(g, tbb::flow::unlimited, [](const int& input) -> int { std::cout << "Task 2 processing " << input << std::endl; return input + 2; }); tbb::flow::join_node<std::tuple<int, int>, tbb::flow::queueing> join(g); tbb::flow::function_node<std::tuple<int, int>, int> task3(g, tbb::flow::unlimited, [](const std::tuple<int, int>& input) -> int { int result = std::get<0>(input) + std::get<1>(input); std::cout << "Task 3 combining results: " << result << std::endl; return result; }); tbb::flow::make_edge(task1, tbb::flow::input_port<0>(join)); tbb::flow::make_edge(task2, tbb::flow::input_port<1>(join)); tbb::flow::make_edge(join, task3); task1.try_put(1); task2.try_put(2); g.wait_for_all(); return 0;}The flow graph API allows you to define tasks and specify their dependencies.
In this example, Task 1 and Task 2 run concurrently, and Task 3 combines their results.
Intel TBB provides powerful tools for Concurrency and Multithreading in Modern C++. On Windows, using Intel TBB with Visual Studio is straightforward, and it allows you to create efficient, parallel programs with minimal effort. With features like parallel_for, mutexes, task groups, and the flow graph API, Intel TBB offers flexibility and high performance in multithreaded applications.
The library provides advanced and flexible tools to manage threads and resources dynamically, along with integrated solutions for data protection and performance optimization.