Fork me on GitHub

Why Pthreads?

Pthreads Overview: Why Pthreads?

Light Weight:

When compared to the cost of creating and managing a process, a thread can be created with much less operating system overhead. Managing threads requires fewer system resources than managing processes.

For example, the following table compares timing results for the fork() subroutine and the pthread_create() subroutine. Timings reflect 50,000 process/thread creations, were performed with the time utility, and units are in seconds, no optimization flags.

Note: don’t expect the system and user times to add up to real time, because these are SMP systems with multiple CPUs/cores working on the problem at the same time. At best, these are approximations run on local machines, past and present.

Platformfork()pthread_create()
realusersysrealusersys
Intel 2.6 GHz Xeon E5-2670 (16 cores/node)8.10.12.90.90.20.3
Intel 2.8 GHz Xeon 5660 (12 cores/node)4.40.44.30.70.20.5
AMD 2.3 GHz Opteron (16 cores/node)12.51.012.51.20.21.3
AMD 2.4 GHz Opteron (8 cores/node)17.62.215.71.40.31.3
IBM 4.0 GHz POWER6 (8 cpus/node)9.50.68.81.60.10.4
IBM 1.9 GHz POWER5 p5-575 (8 cpus/node)64.230.727.61.70.61.1
IBM 1.5 GHz POWER4 (8 cpus/node)104.548.647.22.11.01.5
INTEL 2.4 GHz Xeon (2 cpus/node)54.91.520.81.60.70.9
INTEL 1.4 GHz Itanium2 (4 cpus/node)54.51.122.22.01.20.6

Efficient Communications/Data Exchange:

The primary motivation for considering the use of Pthreads in a high performance computing environment is to achieve optimum performance. In particular, if an application is using MPI for on-node communications, there is a potential that performance could be improved by using Pthreads instead.

MPI libraries usually implement on-node task communication via shared memory, which involves at least one memory copy operation (process to process).

For Pthreads there is no intermediate memory copy required because threads share the same address space within a single process. There is no data transfer, per se. It can be as efficient as simply passing a pointer.

In the worst case scenario, Pthread communications become more of a cache-to-CPU or memory-to-CPU bandwidth issue. These speeds are much higher than MPI shared memory communications.

For example: some local comparisons, past and present, are shown below:

Second table

Other Common Reasons:

Threaded applications offer potential performance gains and practical advantages over non-threaded applications in several other ways:

A perfect example is the typical web browser, where many interleaved tasks can be happening at the same time, and where tasks can vary in priority.

Another good example is a modern operating system, which makes extensive use of threads. A screenshot of the MS Windows OS and applications using threads is shown below.

resourcemonitor.600pix.jpg