++++++++++++++++++++++++++
Multithreading Programming
++++++++++++++++++++++++++
Writing code with threads is hard. Race conditions are likely and common in
multithreaded applications and usually very hard to reproduce.
Threads and Signals
===================
Threads are bad. Threads with signals are worse :-)
Before OpenBSD 5.2, threads were implemented in user space on OpenBSD. Sending
signals was not reliable at all. Before FreeBSD 8, threads+signals had a lot
of issues too.
Spawn Child Processes
=====================
Many tests in test_threading are skipped on FreeBSD <= 6, HP UX11, NetBSD5:
Between fork() and exec(), only async-safe functions are allowed (issues
#12316 and #11870), and fork() from a worker thread is known to trigger
problems with some operating systems (issue #3863): skip problematic tests
on platforms known to behave badly.
Fork, Exec and File Descriptors
-------------------------------
I wrote `PEP 446 - Make newly created file descriptors non-inheritable
`_ which breaks backward
compatibility in Python 3.4 "for the good of mankind" (wrote Guido van Rossum).
This PEP makes all file descriptors non-inheritable to fix a race condition:
* two threads create a child process
* thread A creates a pipe
* thread B creates a child process 1
* thread A makes the pipe non-inheritable
* thread A creates a child process 2
Without the PEP, the child process 1 created by the thread B inherit the file
descriptor created by the thread A.
In fact, the race condition is only avoided if the thread A is able to make the
file descriptor non-inheritable atomically, using ``O_CLOEXEC`` or
``SOCK_CLOEXEC`` flag for example.
Python subprocess Module
------------------------
In Python 2, the subprocess module is *not* thread-safe: unexpected file
descriptors can be inherited, and the subprocess has other sily issues.
Process-wide
============
Multithreading is hard because many functions of system C library (libc):
* modify a state for the whole process: change process wide
* is not reentrant
* rely on a global state
* is not "async signal safe"
On Unix and other platforms, many variables are "process-wide", in opposition
of "per thread".
Examples of process-wide states:
* *Current working directory* aka **cwd**
* Modified by ``chdir()``, read by ``getcwd()``
* Most "legacy" filesystem functions taking a filename rely on the "current
working directory" (cwd), especially using relative path. Exampes:
``open()`` or ``chmod()``.
* The new Linux "at" functions don't rely on the current working directory.
Examples: ``openat()`` or ``chmodat()``.
* Locales
* Modified by ``setlocale()``
* For example, used by ``strftime()`` and ``localeconv()``.
* Functions using "wide character strings" (wcs) avoid some issues.
Example: ``wcsftime()``.
* Some libraries don't rely on a global locale but expect a locale argument
* Recent glibc has a new API to get per-thread locale: XXX
* Unix signals
* Example of signals: ``SIGINT``, ``SIGSEGV``
* Signal handlers are registered by ``signal()`` and ``sigaction()``
* ``raise()`` or ``kill()`` to send a signal to a process
* Per thread API: ``pthread_kill()`` (send a signal to a thread),
``pthread_sigmask()`` (block signals)
* Clock: ``clock_gettime(CLOCK_PROCESS_CPUTIME_ID)``, but there is: CLOCK_THREAD_CPUTIME_ID
* umask: File mode creation mask
* Get and set by ``umask()``.
Others:
* File descriptors: not really an issue in practice if a FD is only used
in a single thread.
* Heap memory, malloc()/free(): modern malloc() implementations scales on
threads/CPUs. Not an issue if a memory block is only used in a single thread.
* User and groups
See also `Ghosts of Unix Past: a historical search for design patterns
`_ by Neil Brown (October, 2010).