This article has been contributed by William Brown, Senior Software Developer at SUSE.
Disclaimer: Blog articles are not part of the official SUSE documentation. They are contributed voluntarily by SUSE’s employees and by third parties. All information found in these article has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Therefore, we need to specifically state that neither SUSE LLC, its affiliates, nor the authors may be held liable for possible errors or the consequences thereof.
With the article at hand, I would like to explain different types of concurrent data structures, so that we can explore their properties and when or why they might be useful.
As our computer systems become increasingly parallel and asynchronous, it’s important that our applications are able to work in these environments effectively. Languages like Rust help us to ensure our concurrent structures are safe.
CPU Memory Model Crash Course
In no way is this a thorough, complete, or 100% accurate representation of CPU memory. My goal here just and only is to give you a quick brief on how it works. I highly recommend you read “what every programmer should know about memory” if you want to learn more.
In a CPU we have a view of a memory space. That could be in the order of KB to TB. But it’s a single coherent view of that space.
Of course, over time systems and people have demanded more and more performance. But we also have languages like C, that won’t change from their view of a system as a single memory space, or change how they work. And of course, it turns out C is not a low level language but we like to convince ourselves it is.
To keep working with C and other languages, CPU’s have acquired caches that are transparent to the operation of the memory. You have no control of what is – or is not – in the cache. It “just happens” asynchronously. This is exactly why Spectre and Meltdown happened (and will continue to happen) because these asynchronous behaviours will always have the observable effect of making your CPU faster. Who knew!
Anyway, for this to work, each CPU has multiple layers of cache. At L3 the cache is shared with all the cores on the die. At L1 it is “per CPU”.
Of course it’s a single view into memory. So if address 0xff is in the CPU cache of core 1, and also in cache of core 2, what happens? Well it’s supported! Caches between cores are kept in sync via a state machine called MESI. These states are:
- Exclusive – The cache is the only owner of this value, and it is unchanged.
- Modified – The cache is the only owner of this value, and it has been changed.
- Invalid – The cache holds this value but another cache has changed it.
- Shared – This cache and maybe others are viewing this valid value.
To gloss very heavily over this topic, we want to avoid Invalid. Why? That means two CPUs are contending for the value, causing many attempts to keep each other in check. These contentions cause CPUs to slow down.
We want values to either be in Exclusive/Modified or Shared. In Shared, many CPUs are able to read the value at maximum speed, all the time. In Exclusive/Modified, we know only this CPU is changing the value.
This cache coherency is also why mutexes (see below) and locks exist – they issue the needed CPU commands to keep the caches in the correct states for the memory we are accessing.
Keep in mind Rust’s variables are immutable, and able to share between threads, or mutable and single thread only. Sounds familar? Rust is helping with concurrency by keeping our variables in the fastest possible cache states.
We use data structures in programming to help improve behaviours of certain tasks. Maybe we need to find values quicker, sort contents, or search for things. Data structures are a key element of modern computer performance.
However most data structures are not thread safe. This means only a single CPU can access or change them at a time. Why? Because if a second read them, due to cache-differences in content the second CPU may see an invalid data structure, leading to undefined behaviour.
Mutexes can be used, but this causes other CPUs to stall and wait for the mutex to be released – not really what we want on our system. We want every CPU to be able to process data without stopping!
Thread Safe Data Structures
There exist many types of thread safe data structures that can work on parallel systems. They often avoid mutexes to try and keep CPU’s moving as fast as possible, relying on special atomic CPU operations to keep all the threads in sync.
Multiple classes of these structures exist, which have different properties.
I have mentioned these already, but it’s worth specifying the properties of a mutex. A mutex is a system where a single CPU exists in the mutex. It becomes one “reader/writer” and all other CPUs must wait until the mutex is released by the current CPU holder.
Read Write Lock
Read Write Lock, often also called RWlock, allow one writer OR multiple parallel readers. If a reader is reading, then a writer request is delayed until the readers complete. If a writer is changing data, all new reads are blocked. All readers will always be reading the same data.
These are great for highly concurrent systems provided your data changes infrequently. If you have a writer changing data a lot, this causes your readers to be continually blocking. The delay on the writer is also high due to a potentially high amount of parallel readers that need to exit.
Lock free is a common (and popular) data structure type. These are structures that don’t use a mutex at all, and can have multiple readers and multiple writers at the same time.
The most common and popular structure for lock free is queues, where many CPUs can append items and many can dequeue at the same time. There are also a number of lock free sets which can be updated in the same way.
An interesting part of lock free is that all CPUs are working on the same set – if CPU1 reads a value, then CPU2 writes the same value, the next read from CPU1 will show the new value. This is because these structures aren’t transactional – lock free, but not transactional. There are some times where this is really useful as a property when you need a single view of the world between all threads, and your program can tolerate data changing between reads.
Wait free is a specialization of lock free, where the reader/writer has guaranteed characteristics about the time they will wait to read or write data. This is very detailed and subtle, only affecting real time systems that have strict deadline and performance requirements.
In between all of these is a type of structure called concurrently readable. A concurrently readable structure allows one writer and multiple parallel readers. An interesting property is that when the reader “begins” to read, the view for that reader is guaranteed not to change until the reader completes. This means that the structure is transactional.
An example being if CPU1 reads a value, and CPU2 writes to it, CPU1 would NOT see the change from CPU2 – it’s outside of the read transaction!
In this way there are a lot of read-only immutable data, and one writer mutating and changing things … sounds familiar again? It’s very close to how our CPUs’ cache works!
These structures also naturally lend themselves well to long processing or database systems where you need transactional (ACID) properties. In fact some databases use concurrent readable structures to achieve ACID semantics.
You might have realized – concurrent readability is where my interest lies. So watch out for my next article where I’ll discuss some specific concurrently readable structures that exist today, and ideas for future structures.