Operating Systems: Three Easy Pieces [Concurrency Ch.1: Concurrency and Threads]

Operating Systems: Three Easy Pieces consists of 3 "Easy Pieces" one of them is Concurrency so I decided to summarizes this part of the book. This chapter discuss what is threads and why they are useful besides some important concepts like critical section, data race and atomic operation.

What is thread

instead of our classic view of a single point of execution within a program (i.e., a single PC where instructions are being fetched from and executed), a multi-threaded program has more than one point of execution, another way to think of this is that each thread is very much like a separate process, except for one difference: they share the same address space and thus can access the same data

Each thread has its own private set of registers it uses for computation; thus, if there are two threads that are running on a single processor, when switching from running one (T1) to running the other (T2), a context switch must take place.

The context switch between threads is quite similar to the context switch between processes, as the register state of T1 must be saved and the register state of T2 restored before running T2.

A major difference between threads and processes concerns the stack. However, in a multi-threaded process, each thread runs independently and of course may call into various routines to do whatever work it is doing. Instead of a single stack in the address space, there will be one per thread.

Why threads

Imagine you are writing a program that performs operations on very large arrays, for example, adding two large arrays together, or incrementing the value of each element in the array by some amount if you are executing the program on a system with multiple processors, you have the potential of speeding up this process considerably by using the processors to each perform a portion of the work.

Another good reason is to avoid blocking program progress due to slow I/O, Imagine that you are writing a program that performs different types of I/O: either waiting to send or receive a message, for an explicit disk I/O to complete, or even (implicitly) for a page fault to finish. Instead of waiting, your program may wish to do something else, including utilizing the CPU to perform computation, or even issuing further I/O requests. Using threads is a natural way to avoid getting stuck

NOTE: The task of transforming your standard single-threaded program into a program that does this sort of work on multiple CPUs is called parallelization Now lets dig in and create our first thread

Thread Creation

We will create 2 threads, T1 and its name will be "A" and T2 and its name will be "B" each one of them will print their given name in concurrency way

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void *mythread(void *arg) {
    printf("%s\n", (char *) arg);
    return NULL;
}

int main(int argc, char *argv[]) {
    pthread_t t1, t2;
    printf("main: begin\n");
    pthread_create(&t1, NULL, mythread, "A");
    pthread_create(&t2, NULL, mythread, "B");

    // join waits for the threads to finish
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("main: end\n");
    return 0;
}

After creating the two threads (let’s call them T1 and T2), the main thread calls pthread join(), which waits for a particular thread to complete. It does so twice, thus ensuring T1 and T2 will run and complete before finally allowing the main thread to run again.

Once a thread is created, it may start running right away, alternately it may be put in a “ready” but not “running” state and thus not run yet, but let’s not worry about this possibility quite yet.

Let us examine the possible execution ordering of this little program

First possibility:

main	Thread 1	Thread 2
starts running
prints “main: begin”
creates Thread 1
creates Thread 2
waits for T1
	runs
	Prints "A"
	returns
waits for T2
		runs
		Prints "B"
		returns
prints “main: end”

in this trace thread A done and then thread B but the scheduler doesn't always excite threads in order they written in the code, Remember that: the 2 threads will be at least in ready mode and they maybe running in the same time. lets trace another example

Second possibility:

main	Thread 1	Thread 2
starts running
prints “main: begin”
creates Thread 1
creates Thread 2
		runs
		Prints "B"
		returns
waits for T1
	runs
	Prints "A"
	returns
waits for T2 But its returns returns immediately; T2 is done
prints “main: end”

What runs next is determined by the OS scheduler so it is hard to know what will run at any given moment in time, you need to be careful when write concurrency code.

Shared Data

Remember what we said about why threads are important, incrementing variable by one can be faster if we wrote concurrency code, now lets write code so our variable "counter" will be 20000000(2e7) with 2 threads. So we can run each thread to increment the counter variable by 10000000(1e7)

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

int max;
volatile int counter = 0; // shared global variable

void *mythread(void *arg) {
    int i; // stack (private per thread) 
    for (i = 0; i < max; i++) {
    counter = counter + 1; // shared: only one
    }
    return NULL;
}

int main(int argc, char *argv[]) {                    
    max = atoi(argv[1]);
    pthread_t p1, p2;
    pthread_create(&p1, NULL, mythread, "A"); 
    pthread_create(&p2, NULL, mythread, "B");
    // join waits for the threads to finish
    pthread_join(p1, NULL); 
    pthread_join(p2, NULL); 
    printf("main: done\n [counter: %d]\n [should: %d]\n", 
       counter, max*2);
    return 0;
}

So we expect to run this code and get the final result is 20000000 Unfortunately, when we run this code, even on a single processor, we don’t necessarily get the desired result. Sometimes, we get: main: done with both (counter = 19345221) try to run it again maybe we can get our desire value but output may be main: done with both (counter = 19221041) Not only is each run wrong, but also yields a different result! A big question remains: why does this happen?

Data Race(Race Conditions)

To understand why this happens, we must understand the code sequence that the compiler generates for the update to counter. In this case, we wish to simply add a number (1) to counter. Thus, the code sequence for doing so might look something like this (in x86);

mov 0x8049a1c, %eax
add $0x1, %eax
mov %eax, 0x8049a1c

This example assumes that the variable counter is located at address 0x8049a1c. In this three-instruction sequence, the x86 mov instruction is used first to get the memory value at the address and put it into register eax. Then, the add is performed, adding 1 (0x1) to the contents of the eax register, and finally, the contents of eax are stored back into memory at the same address.

Let us imagine one (Thread 1) enters this region of code, and is thus about to increment counter by one. It loads the value of counter (let’s say it’s 50 to begin with) into its register eax. Thus, eax=50 for Thread 1. Then it adds one to the register; thus eax=51. Now, something unfortunate happens: a timer interrupt goes off; thus, the OS saves the state of the currently running thread.

OS  Thread1                                      Eax Counter
        before critical section               0       50
        mov 8049a1c,%eax                      50      50
        add $0x1,%eax                         51      50

Now something worse happens: Thread 2 is chosen to run, and it enters this same piece of code. It also executes the first instruction, getting the value of counter and putting it into its eax, The value of counter is still 50 at this point, and thus Thread 2 has eax=50, Then incrementing eax by 1 (thus eax=51), and then saving the contents of eax into counter (address 0x8049a1c). Thus, the global variable counter now has the value 51.

interrupt
save T1
restore T2 
OS  Thread2                                      Eax Counter
        before critical section               0       50
        mov 8049a1c,%eax                      50      50
        add $0x1,%eax                         51      50
        mov %eax,8049a1c                      51      51

Finally, another context switch occurs, and Thread 1 resumes running. Recall that it had just executed the mov and add, and is now about to perform the final mov instruction. Recall also that eax=51. Thus, the final mov instruction executes, and saves the value to memory; the counter is set to 51 again.

interrupt
save T2
restore T1 
OS  Thread1                                      Eax Counter
        mov %eax,8049a1c                      51      51

Sadly our counter will be 51 not 52. What we have demonstrated here is called a race condition (or, more specifically, a data race),With some bad luck (i.e., context switches that occur at untimely points in the execution), we get the wrong result.

So, Because multiple threads executing this code can result in a race condition, we call this code a critical section. A critical section is a piece of code that accesses a shared variable (or more generally, a shared resource) and must not be concurrently executed by more than one thread.

What we really want for this code is what we call mutual exclusion. This property guarantees that if one thread is executing within the critical section, the others will be prevented from doing so.

Atomicity Operations

Atomically, means “as a unit”, which sometimes we take as “all or none.” it should either appear as if all of the actions you wish to group together occurred, or that none of them occurred.

we need to enter our critical section in atomically sequence so no two threads will be in the same critical section at the same time which we will know that deeper in the next chapters

So we reached the end, hope you find anything useful :) feel free to give a feedback

References

Operating Systems: Three Easy Pieces

Omar Morsy