www.Tutorialsforu.info

Free Tutorials Cave

  • Increase font size
  • Default font size
  • Decrease font size
Your Ad Here



Concurrency and Race Conditions

E-mail Print
Article Index
Concurrency and Race Conditions
Page 2
Page 3
Page 4
Page 5
Page 6
Page 7
Page 8
All Pages

Concurrency and Race Conditions

 

              Thus far, we have paid little attention to the problem of concurrency—i.e., what  happens when the system tries to do more than one thing at once. The management of concurrency is, however, one of the core problems in operating systems programming. Concurrency-related bugs are some of the easiest to create and some of the hardest to find. Even expert Linux kernel programmers end up creating concurrencyrelated bugs on occasion.

 

               In early Linux kernels, there were relatively few sources of concurrency. Symmetric
               multiprocessing (SMP) systems were not supported by the kernel, and the only cause
               of concurrent execution was the servicing of hardware interrupts. That approach
               offers simplicity, but it no longer works in a world that prizes performance on sys-
               tems with more and more processors, and that insists that the system respond to
               events quickly. In response to the demands of modern hardware and applications,
               the Linux kernel has evolved to a point where many more things are going on simul-
               taneously. This evolution has resulted in far greater performance and scalability. It
               has also, however, significantly complicated the task of kernel programming. Device
               driver programmers must now factor concurrency into their designs from the begin-
               ning, and they must have a strong understanding of the facilities provided by the ker-
               nel for concurrency management.
               The purpose of this chapter is to begin the process of creating that understanding.
               To that end, we introduce facilities that are immediately applied to the scull driver
               from Chapter 3. Other facilities presented here are not put to use for some time yet.
               But first, we take a look at what could go wrong with our simple scull driver and how
               to avoid these potential problems.

   

Pitfalls in scull

               Let us take a quick look at a fragment of the scull memory management code. Deep
               down inside the write logic, scull must decide whether the memory it requires has
               been allocated yet or not. One piece of the code that handles this task is:
                          if (!dptr->data[s_pos]) {
                                dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL);
                                if (!dptr->data[s_pos])
                                     goto out;
                          }
               Suppose for a moment that two processes (we’ll call them “A” and “B”) are indepen-
               dently attempting to write to the same offset within the same scull device. Each pro-
               cess reaches the if test in the first line of the fragment above at the same time. If the
               pointer in question is NULL, each process will decide to allocate memory, and each
               will assign the resulting pointer to dptr->data[s_pos]. Since both processes are
               assigning to the same location, clearly only one of the assignments will prevail.
               What will happen, of course, is that the process that completes the assignment sec-
               ond will “win.” If process A assigns first, its assignment will be overwritten by pro-
               cess B. At that point, scull will forget entirely about the memory that A allocated; it
               only has a pointer to B’s memory. The memory allocated by A, thus, will be dropped
               and never returned to the system.
               This sequence of events is a demonstration of a race condition. Race conditions are a
               result of uncontrolled access to shared data. When the wrong access pattern hap-
               pens, something unexpected results. For the race condition discussed here, the result
               is a memory leak. That is bad enough, but race conditions can often lead to system
               crashes, corrupted data, or security problems as well. Programmers can be tempted
               to disregard race conditions as extremely low probability events. But, in the comput-
               ing world, one-in-a-million events can happen every few seconds, and the conse-
               quences can be grave.
               We will eliminate race conditions from scull shortly, but first we need to take a more
               general view of concurrency.

     Concurrency and Its Management

               In a modern Linux system, there are numerous sources of concurrency and, there-
               fore, possible race conditions. Multiple user-space processes are running, and they
               can access your code in surprising combinations of ways. SMP systems can be exe-
               cuting your code simultaneously on different processors. Kernel code is preemptible;
               your driver’s code can lose the processor at any time, and the process that replaces it
               could also be running in your driver. Device interrupts are asynchronous events that
               can cause concurrent execution of your code. The kernel also provides various mech-
               anisms for delayed code execution, such as workqueues, tasklets, and timers, which
               can cause your code to run at any time in ways unrelated to what the current pro-
               cess is doing. In the modern, hot-pluggable world, your device could simply disap-
               pear while you are in the middle of working with it.
               Avoidance of race conditions can be an intimidating task. In a world where anything
               can happen at any time, how does a driver programmer avoid the creation of abso-
               lute chaos? As it turns out, most race conditions can be avoided through some
               thought, the kernel’s concurrency control primitives, and the application of a few
               basic principles. We’ll start with the principles first, then get into the specifics of
               how to apply them.
               Race conditions come about as a result of shared access to resources. When two
               threads of execution* have a reason to work with the same data structures (or hard-
               ware resources), the potential for mixups always exists. So the first rule of thumb to
               keep in mind as you design your driver is to avoid shared resources whenever possi-
               ble. If there is no concurrent access, there can be no race conditions. So carefully-
               written kernel code should have a minimum of sharing. The most obvious applica-
               tion of this idea is to avoid the use of global variables. If you put a resource in a place
               where more than one thread of execution can find it, there should be a strong reason
               for doing so.
               The fact of the matter is, however, that such sharing is often required. Hardware
               resources are, by their nature, shared, and software resources also must often be
               available to more than one thread. Bear in mind as well that global variables are far
               from the only way to share data; any time your code passes a pointer to some other
               part of the kernel, it is potentially creating a new sharing situation. Sharing is a fact
               of life.
               Here is the hard rule of resource sharing: any time that a hardware or software
               resource is shared beyond a single thread of execution, and the possibility exists that
               one thread could encounter an inconsistent view of that resource, you must explic-
               itly manage access to that resource. In the scull example above, process B’s view of
               the situation is inconsistent; unaware that process A has already allocated memory
               for the (shared) device, it performs its own allocation and overwrites A’s work. In
               this case, we must control access to the scull data structure. We need to arrange
               things so that the code either sees memory that has been allocated or knows that no
               memory has been or will be allocated by anybody else. The usual technique for
               access management is called locking or mutual exclusion—making sure that only one
               thread of execution can manipulate a shared resource at any time. Much of the rest
               of this chapter will be devoted to locking.
               * For the purposes of this chapter, a “thread” of execution is any context that is running code. Each process is
                 clearly a thread of execution, but so is an interrupt handler or other code running in response to an asyn-
                 chronous kernel event.

               First, however, we must briefly consider one other important rule. When kernel code
               creates an object that will be shared with any other part of the kernel, that object
               must continue to exist (and function properly) until it is known that no outside refer-
               ences to it exist. The instant that scull makes its devices available, it must be pre-
               pared to handle requests on those devices. And scull must continue to be able to
               handle requests on its devices until it knows that no reference (such as open user-
               space files) to those devices exists. Two requirements come out of this rule: no object
               can be made available to the kernel until it is in a state where it can function prop-
               erly, and references to such objects must be tracked. In most cases, you’ll find that
               the kernel handles reference counting for you, but there are always exceptions.
               Following the above rules requires planning and careful attention to detail. It is easy
               to be surprised by concurrent access to resources you hadn’t realized were shared.
               With some effort, however, most race conditions can be headed off before they bite
               you—or your users.



 

Subscribe By Email

Enter your email address:

Delivered by FeedBurner

Translate

Donate

Development & maintainance needs time & money.
With your donation you can help us to keep this project alive
Donate:
  Monthly Monthly
Currency
Amount