When compared to the cost of creating and managing a process, a thread can be created with much less operating system overhead. Managing threads requires fewer system resources than managing processes.
For example, the following table compares timing results for the fork() subroutine and the pthread_create() subroutine. Timings reflect 50,000 process/thread creations, were performed with the time utility, and units are in seconds, no optimization flags.
Note: don’t expect the system and user times to add up to real time, because these are SMP systems with multiple CPUs/cores working on the problem at the same time. At best, these are approximations run on local machines, past and present.
Intel 2.6 GHz Xeon E5-2670 (16 cores/node)
Intel 2.8 GHz Xeon 5660 (12 cores/node)
AMD 2.3 GHz Opteron (16 cores/node)
AMD 2.4 GHz Opteron (8 cores/node)
IBM 4.0 GHz POWER6 (8 cpus/node)
IBM 1.9 GHz POWER5 p5-575 (8 cpus/node)
IBM 1.5 GHz POWER4 (8 cpus/node)
INTEL 2.4 GHz Xeon (2 cpus/node)
INTEL 1.4 GHz Itanium2 (4 cpus/node)
Efficient Communications/Data Exchange:
The primary motivation for considering the use of Pthreads in a high performance computing environment is to achieve optimum performance. In particular, if an application is using MPI for on-node communications, there is a potential that performance could be improved by using Pthreads instead.
MPI libraries usually implement on-node task communication via shared memory, which involves at least one memory copy operation (process to process).
For Pthreads there is no intermediate memory copy required because threads share the same address space within a single process. There is no data transfer, per se. It can be as efficient as simply passing a pointer.
In the worst case scenario, Pthread communications become more of a cache-to-CPU or memory-to-CPU bandwidth issue. These speeds are much higher than MPI shared memory communications.
For example: some local comparisons, past and present, are shown below:
Other Common Reasons:
Threaded applications offer potential performance gains and practical advantages over non-threaded applications in several other ways:
Overlapping CPU work with I/O: For example, a program may have sections where it is performing a long I/O operation. While one thread is waiting for an I/O system call to complete, CPU intensive work can be performed by other threads.
Priority/real-time scheduling: tasks which are more important can be scheduled to supersede or interrupt lower priority tasks.
Asynchronous event handling: tasks which service events of indeterminate frequency and duration can be interleaved. For example, a web server can both transfer data from previous requests and manage the arrival of new requests.
A perfect example is the typical web browser, where many interleaved tasks can be happening at the same time, and where tasks can vary in priority.
Another good example is a modern operating system, which makes extensive use of threads. A screenshot of the MS Windows OS and applications using threads is shown below.
Lawrence Livermore National Laboratory
7000 East Avenue • Livermore, CA 94550 | LLNL-WEB-458451
Operated by the Lawrence Livermore National Security, LLC for the
Department of Energy's National Nuclear Security Administration
Learn about the Department of Energy's Vulnerability Disclosure Program