See a C++ example and a Java example with threads and performance measurement.
See also a classical pitfall regarding closures in C#, in a threading context.
A thread has a current instruction and a calling stack. In more details, it has the following attributes:
At each moment, a thread can:
This means that each CPU executes instructions from one thread until it either launches a blocking operation (read from a file or from network), its time slice expires, or some higher piority thread becomes runable. At that point, the operation system is invoked (by the read syscall, by the timer interrupt, or by the device driver interrupt), saves the registers of the current thread, including the instruction pointer (IP) and the stack pointer (SP), and loads the registers for the next scheduled thread. The last operation effectively restores the context of that thread and jumps to it.
It should be noted that a context switch (jumping from one thread to another) is quite an expensive operation, because it consists in some hundreds of instrunctions, and may invalidate a lot of the CPU cache.
Creation and termination of a thread is also expensive.
A process can have one or more threads executing for it. The memory and the opened files are per process.
Two threads walk into a bar. The bartender says:
Go I don't away! want a race to get condition last like I time had.
Consider several threads, where each of them adds a value to a shared sum. For instance, each thread processes a sale at a supermarket, and each adds the sale value to the total amount of money the supermarket has.
Since the addition itself is done in some register of the CPU, it is possible to have the following timeline:

So, thread B computes the sum based on the original value of S, not the one computed by thread A, and overwrites the value computed by A.
What we should have is to execute the addition either fully by A and then fully by B, or viceversa; but not overlapped.
These are simple operations, on simple types (integers, booleans, pointers), that are guaranteed to execute atomically. They have hardware support in the CPU. They need to be coupled with memory fences — directives to the compiler and CPU to refrain from performing some re-orderings.
Operations:
See:
Uses:
Notes about memory order (see C++ atomic variables):
Notes about compare-and-exchange:
A mutex can be held by a single thread at a time. If a second thread tries to get the mutex, it waits until the mutex is released by the first thread.
A mutex can be implemented as a spin-lock (via atomic operations), or by going through the operating system (which puts the thread to sleep until the mutex is freed).
Mutexes are used to prevent simultaneous access to the same data.
Each mutex should have an associate invariant that holds as long as nobody holds that mutex:
Mutexes in various languages:
In a single threaded program, when a function begins execution, it assumes some pre-conditions are met. For instance, consider a function:
void merge(vector<int> const& a, vector<int> const& b, vector<int>& result);Preconditions:
If the pre-conditions are met, the function promises to deliver the specified post-conditions.
If the pre-conditions are not met, the behavior of the function is undefined (anything may happen, including crashing, corrupting other data, infinite loops, etc).
In conjunction with classes, we have the concept of a class invariant: a condition that is satisfied by the member data of the class, whenever no member function is in execution.
Any public member function assumes, among its pre-conditions, that the invariant of its class is satisfied. Also, any public member function promises, among its post-conditions, to satisfy the class invariant.
At a larger scale, there are various invariants satisfied by subsets of the application variables. Consider the case of a bank accounts application: an invariant would be that the account balances and history all reflect the same set of processed transactions (the balance of an account an is the sum of all transactions in the account history, and if a transaction appears on the debited account, it appears also on the credited account, and viceversa).
At the beginning of certain functions (for instance, those performing a money transfer, as above), we assume some invariant is satisfied; the same invariant shall be satisifed in the end. Then, sub-functions are invoked, concerned with sub-aspects of the computation to be done; the precise pre- and post-conditions for those functions should be part of their design; however, many bugs arise from a misunderstanding regarding those per- and post-conditions (in other words, the exact responsability of each function).
An implicit assumption in a single-threaded program is that nobody else changes a variable; only explicit instructions that exist in the currently executing function.
In multi-threaded applications, it is hard to know when it is safe to assume a certain invariant and when it is safe to assume that a certain variable is not modified.
This is the role of mutexes: a mutex protects certain invariants involving certain variables. When a function aquires a mutex, it can rely that:
The function must re-establish the invariant before releasing the mutex.
The above also implies that, in order to modify a variable, a function must make sure it (or its callers) hold all mutexes that protect invariants concerning that variable.
The invariants may concern the relation between variables and an abstract model. Consider money transfer between accounts in a bank: we can have an abstract (idealized) model where the each transfer is instataneous, changing the balance of both accounts at the same time. In the same model, an audit is also instantaneous.
Now, the real implementation will have variables that cannot be changed at exactly the same time. However, we can come with an invariant that says that the balance of account A coincides with that in the ideal model. The invariant is protected by the mutex associated to account A. That means, whenever the mutex of A is unlocked, the value of the variable holding the balance of A coincides with the balance of A in the ideal model.
A real transfer from A to B will update the ideal model at some instant in time. Mutex of A will have to be locked during a time interval covering the declared instant of the tranzaction (in the ideal model) as well as the time when the variable containing the balance of A is updated. As a consequence, there must be an instant when both mutexes of A and B must be locked.
There are two use cases concerning the invariants:
A thread doing case 1 above is incompatible with any other thread accessing any of the variables involved in the invariant. A thread doing case 2 above, however, is compatible with any number of threads doing case 2 (but not with one doing 1).
For this reason, we have read-write mutexes, also called shared mutexes. Such a mutex can be locked in 2 modes:
Caveat: the implementation of a shared mutex must deal with the following dilemma: Suppose several readers hold the mutex in shared mode, and a new (writer) thread attempts to lock it in exclusive mode. What to do if, before all the readers finish, a new reader comes in? If we allow the reader, we run the risk of starving the writer (if we have enough readers to keep at least one active one for a long time). If we deny the reader, we miss a parallelizing opportunity.
Note: starvation is the situation that occurs when a thread cannot advance because it is always waiting for a resource that it cannot get because, although no thread holds the resource continuously, there is always a thread that demands the resource and has a higher priority than the starving one. The usual solution is to allocate resources on a first-in first-out basis, so that each thread waiting on a resource eventually gets its turn to it. On the other hand, a deadlock is a situation where two or more threads cannot advance because each waits for another one to advance.
A recursive mutex allows a lock operation to succeed if the mutex is already locked by the same thread. The mutex must be unlocked the same number of times it was locked.
The problem with recursive mutexes is that, if a function attempting to aquire a mutex cannot determine if the mutex is alreay locked or not, will not be able to determine if the invariant protected by the mutex holds or not immediately after the mutex is aquired. On the other hand, if the function can determine if the mutex is already locked, it has no need for a recursive mutex.
Radu-Lucian LUPŞA