Whenever a talk begins about the “volatile” constructs in programming languages, the chat goes all the way to “Memory Models“. Unless one is familiar with the concept of the memory models, it will be hard to grasp the concept of “volatile” inner workings.
It is already known that “volatile” is about the threading operations. As the name implies, it is about volatile variables that may be subject to change between thread calls (in a nutshell this sentence describes the motivation behind it). But what it really is always behind the closed doors of compilers, CPUs, CPU caches, main memories etc. So what happens when a multithreaded code with shared variables runs or compiled?
Usually with higher level languages (especially that are first compiled to an intermediate byte code) developers need not to give much attention to these details. Because the platforms that will later compile this byte code to the native language of the machine in that the code runs (JIT Compilers, Native Image Generators [NGENs]) give assurance on the behavior of the underlying hardware and of the compiler that will compile the higher level language to byte code or to the machine code. This assurance is what we call the “Software Memory Model“. So a developer could write one code, and by the magic of the byte code, it will run on many processor architectures without fine tuning. On the software side, it is a contract between the developer and the software platform provider. “Hardware Memory Model”, on the other side, is an assurance from the hardware platform provider to the software platform provider for code execution in hardware.
As it can be seen, the concept of “memory model” is twofold. The first one is according to the developer point view. It is how a developer sees a memory model. The software platform provider is responsible to obey by it. On the hardware side, the second one, the manufacturers also define memory models for their processors. So it becomes a contract between the software platform provider and the hardware platform provider. Let us first go over hardware memory models and try to understand what they really are.
Hardware Memory Models
Strong Memory Models (older models of processors)
(This is something I have made up to draw the distinction between the memory models.) “Strong Memory model” name can be given to the memory models that were used before. The code that is given to the processor is executed as is. No reordering, no optimization, and no cache coherency (cache coherency: in multi-processed systems caches holding the copy of the same variable from the memory). What is written to the cache is directly will be available to other threads no matter in which processor they execute on (If a single processor system, which most of them were, then no worries on the cache coherency). What is read from the cache will be up-to-date one. So no optimization that will change the behavior of the code will occur.
Weak Memory Models (newer models of processors)
Weak memory model is an attempt to improve the performance of software . In this model, CPU is given the freedom to reorder  or optimize the program sequences (that reside in memory) whenever it sees fit. If a reorder is at hand and if it will not break the real execution order of the code, then a reorder could be implemented. So, what this means is the processor is not giving any assurance on the order of code but in the overall execution. This is what it is called the “as-if-sequential” execution. The overall effect of the software would be the same and the user or the developer will see the execution as if completed in the same order. Also some other optimization of code could occur and the CPU will honor the optimization if it will not break the overall execution effect.
The Ups and Downs of the Both Systems
Strong memory model are easy to be developed in and has less performance characteristics. It is this feature that lead to the weak memory models. CPU would reorder or optimize the code if it sees some optimization is ahead.
The Hardware Structure That Gives Way to Race Conditions
For the sake of simplicity, we first assume that we have a system of one processor having only one level of cache system. Then we will talk about what happens when the system is made up with two of those processors.
In this system, cache always is up-to-date (if no mapped I/O operation occurs). Because there is only one cache then we cannot talk about cache coherency problems. The only problem that will surface will be the reordering operation in multithreaded applications. If a thread is reading a variable, it is seeing this from the cache value and because there is no other cache (or CPU) that will change this variable in the main memory, then the operations will be safe according to the last value of the variable. Multithreaded applications, in this type of systems, must take cautions on memory reordering problems, though. What can be done to overcome the memory ordering in these systems? Memory Barriers can be used to overcome this reordering. In that, volatile can be expensive but for the sake of portability for other platforms, they can be suitable
If the system is made of two processors and two caches (each one for a CPU), then a variable will have the danger of being manipulated by more than one thread executing on different CPUs. If a thread is executing on the CPU1 and writing to Cache1, and another thread is executing on CPU2 and writing on Cache2, the same variable could have two different values on two different caches at the same time. Of course, the cache value will be written to the main memory but the timing of it could lead to race conditions. The value may be changed long before another thread is trying to access the variable which it read from the local CPU cache. In this situation, cache coherency protocols  take the action for the consistency of the caches in a shared memory model. This consistency model can be applied in many ways to many processor architectures. For example, Intel x86 processors, writes are always volatile, meaning if you are writing to a variable, the cache will be invalidated and the value will be written to the memory at once.
Software Memory Models
The first language to define a software memory model was Java (It will be the subject of another article). Because Java made a promise to the developers that if you have written one code then this code will work in every platform Java runs on. But hardware platforms are different when it comes to memory management. Every platform had its own memory model. What Java did this time was to implement its own memory model which Java would promise to the developers that if you write code according to this memory model, it would work everywhere . Later on, other programming languages implemented their own memory models, including C # .
Now that we have learned what a memory model is, then we are ready to go on the java concept of volatile and its bitter history. See you on the next article…
1-Weak Ordering – A New Definition
2-The Performance and Scalability of Distributed Shared Memory Cache Coherence Protocols, Chapter 2: Cache Coherence Protocols, Mark Heinrich Ph.D. Dissertation Computer Systems Laboratory Stanford University October 1998
3-Chapter 8.2 Memory Ordering, Chapter 11 Cache Control, Intel® 64 and IA-32 Architectures Developer’s Manual: Vol. 3A
4- Chapter 17.4 Memory Model, the Java Language Specification Third Edition, James Gosling, Bill Joy, Guy Steele, Gilad Bracha
5- Volatile keyword in C# – memory model explained