I had written this review for /. and never got published. Basically I never got any reply. So thought will post it here, lest it be useful for the readers
Multi-Core Programming – Increasing Performance through Software Multi-threading by Shameem Akhter & Jason Roberts, Intel press
A funny thing happened on the microprocessors’ race to faster and faster clock speeds; somewhere on the way, people stopped caring about the clock speed and started looking for overall system/application performance and that led to a different disruptive solution – hyper threading and now multi-core CPUs. As a result, parallel programming and parallel computing, which were once the realm of a few esoteric applications, are slowly becoming mainstream. How important is this, enquiring minds want to know. A glimpse of the relevance can be inferred by glancing the PCWorld 100 best products of the year 2006 – the #1 and #2 were Intel Core Duo and AMD Dual-Core Athlon 62 Dual-Core ! Even the iPod came later in the list ! (For the same enquiring minds – iPod came 4th after Craigslist.org, Google Earth 6th and YouTube.com was 9th) In the world of game consoles, multiple execution threads are a norm, the Xbox 360 has three cores and PS/3’s cell processor has 6-9 vectors attached to a PPC core.
The building blocks of the domain consist of software concepts and mechanisms like parallel programming, parallel computing and multi-threaded programming, combined with hardware pragmas like SMP, hyper threading and multi-core. All these are intersecting concepts, but they are not quite the same – that is where this book stands out, the conceptual depth in terms of fundamentals, in the back drop of multi-core, hyper-threaded microprocessors.
Not that every programmer needs to be intimately familiar with multi-core parallel programming. Majority will write to the higher order threading models of Java and C# without worrying about the underlying microprocessor architecture. But kernel writers, folks who write C/C++ code (for example for imaging, visualizations, digital media applications and games) that leverages the microprocessor at a bare metal level would need a good working knowledge of the microprocessor primitives as well as understand the patterns. For example, the recent Ubuntu Summit in Paris (which was where I was writing this review) had a couple of BOF discussions on various aspects dual-core as applicable to Ubuntu kernel internals. The discussions were centered around parallel loader with hierarchical parallel tasks, using dual-core for computing the symbol table hash and making c-routines like strcpy() and malloc() “core-aware”.
The organization is straight forward – 11 chapters structured incrementally as follows:
- Chapters 1 – 3 cover Introduction to Multi-Core Architecture, Overview of Threading, and Fundamentals of Parallel Programming. While these chapters start out from the basics, they race from 0-1000 mph in three chapters !
- Chapter 4 covers the parallel programming constructs with the next two chapters (5 & 6) delve into threading APIs – Windows, POSIX and the OpemMP.
- I found Chapter 7 (Solutions to Common Parallel Programming) and Chapter 8 (Multi-threaded Debugging Techniques) the most useful and informative.
- The next two chapters cover the hardware aspects – Chapter 9 a short chapter on single processor & Chapter 10 a detailed discussion on Threading on Intel Multi-Core Processors. Both chapters are good.
- Chapter 11 is Intel® Software Development Products, which is Ok.
Interesting to note that the book has a unique (I assume so) serial number and you can register the book at Intel’s web site. But the web site is very anemic and has no big value and it is not intuitive either. They have the code samples in a zip file – that is all. Even though the book cover and other materials highlight “immediately usable code” I don’t think the code is all that useful.
There are two related books from Intel Press viz : Programming with Hyper-threading Technology and the just released second edition of The Software Optimization Cookbook. Another book worth looking into is the Patterns for Parallel Programming. I plan to review all the three in the near future …
Gory Details …
Normal microprocessors have one thread of execution, called the hardware thread (we will visit this later as well), consists of the architecture state (control registers, interrupt logic et al), execution units and cache. What the engineers found that because execution units are 100 or more times faster than the memory fetch, they remain idle while the registers are busy fetching values.
One solution is to add two architectural state lanes to one execution unit/cache set. This is called hyper Threading technology and the performance increase is around 30%. Hyper-threading is logical – it looks like there are two cores to the software, but actually the two threads are interleaved to the same execution unit !
The next idea was to actually have two execution unit/cache pairs in a chip with their own architectural states. This is the dual-core architecture. In this case, as the actual processing power is doubled and the performance boost is almost doubled (minus the overhead). The next idea was to combine both, thus giving four hardware execution threads !
But high performance is not just adding hardware threads, the software also need to be written to take advantage of the logical and physical parallelism. High performance = multi-core hardware + Hyper-threading + multi-core-aware scheduler + parallel programming algorithmics.
Software & hardware threads
There is a correlation between software and hardware threads. In some quarters, there is a firm belief that there no advantage by having more software threads than can be mapped to h/w threads; in fact the performance might be negative as more threads will invalidate each others’ cache and spend time in thrashing in and out.
But there is another dimension – compute intensive tasks vs. I/O bound interactive tasks; in order for an OS to be sensitive to users and appear responsive, the interactive tasks need to have higher priority and should be given preference when they are runnable – usually they spend time in waiting for user input – for example waiting on a keyboard.
Going back to hardware threads, till the advent of the HT and dual-core philosophy, there was only one hardware thread in the normal microprocessors. But now the picture has changed – in a HT microprocessor, there are two threads; a dual-core one with HT has 4 hardware threads and moving on, in the Xbox there are 3 to 6 threads and in IBM’s cell processor (which is in PS/3) there are 6-9 hardware tasks ! Even with one hardware thread, Linux, for example, has a very sophisticated scheduler. Now, with 2-9 hardware tasks, the opportunities are endless. Also many new algorithmics need to be employed, in areas like rendering, visualization and graphics.
As the book says, multi threaded applications are inherently much more difficult than single threaded applications and proper software engineering principles need to be followed. For example, one main difference between hyper-threading and multi-core is that while in single core or hyper-thread technology, the tasks are interleaved, the multi-core architecture actually runs two tasks at the same time ! This might manifest some bugs, for example, if you achieve sequencing by different priorities, as a lower priority thread will run if it is runnable while the high priority one is running in another core ! I have made this assumption a few times, now I know should be avoided.
The authors have done a great job to point out good parallel programming as well as debugging techniques to avoid synchronization bugs (i.e. multi-threading errors involving race conditions, lock contentions through priority inversions/ceilings), while not impeding performance (i.e. performance bugs by the over use of locks and mutexes).
On the next level, one does have to think through and understand the parallel programming patterns. One advantage is that the art of parallel programming has a long history and has lots of good material to refer to and apply in the multi-core domain. The major forms of decomposition – task decomposition, data decomposition and data flow and the associated patterns task-level parallelism, divide and conquer, geometric decomposition, pipeline and wavefront are introduced in this book. But for a detailed discussion on the specific patterns, the patterns book would be better, even though it is slightly old and most probably will not contain multi-core patterns
In Short …
In short, this is a book with conceptual depth that touches all the essential elements of a very complex (and emerging – that will gain more momentum) domain and that, it does very well. The Slashdot crowd will like the book. The writing style is very dry and matter-of-fact, but it doesn’t come in the way of understanding – I was able to cover the last few chapters during my SFO-CDG-SFO flight, in between Soduku, where I am writing this review! And of course, I wanted to be the first to review this book and that was a good motivation … ;o)
As a foot-note for future exploration, dual-core is a domain which is much broader than just Intel microprocessors. In fact, many consider AMD to be the leader in this space. It will be informative to see how AMD handles the multi-core technology in terms of interfaces, programming mechanics et al. As a comparison, the cell processor and the Xenon in Xbox 360 need to be specifically programmed. But in case of Intel and AMD dual-cores, I am sure, a programmer, so long as one uses the Java/C# APIs, will not even notice the difference; a Windows API or the POSIX thread model or OpenMP, should not have to code any differently either. But still, it would still be instructive to see the difference in the designs and the resultant execution models between the two … The difference would be felt by the kernel writers and folks who write optimized code for device drivers and high performance digital media and game applications.
Reviewer : Krishna Sankar