
Tech Evolution Insights
Explore the evolution of technology through the years with a focus on transistor counts, clock speeds, and processor advancements. Discover key milestones, including Moore's Law, concurrent programming in Visual Studio 2010, and the impact of cores on processing power.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Barry Wimlett BSc MBCS Technical Specialist Barry@blackmarble.com blackmarble.com
Concurrent Programming A lap around what s new in Visual Studio 2010 and .net 4.0
Moores Law April 19 1965 Transistor count and computing power double every 2* years for same cost. Law#2 Manufacturing plant costs double at the same time. http://www.wired.com/thisdayintech/tag/moores-law/ ftp://download.intel.com/museum/Moores_Law/Articles- Press_Releases/Gordon_Moore_1965_Article.pdf * originally 18 months
Some Data Year 1975 1979 1984 1987 1988 1989 1993 1995 1997 1997 1999 1999 2000 2001 2004 2005 2007 2009 Processor 6502 8086 Transistor Count Mhz 4,000 30,000 134,000 270,000 275,000 1,200,000 3,100,000 5,500,000 8,800,000 7,500,000 22,000,000 28,000,000 42,000,000 37,000,000 105,000,000 233,000,000 450,000,000 758,000,000 1 4 286 12 20 50 60 60 386SX 386DX 486 P60 Ppro k6-200 Pentium2 Athlon Pentium3 200 200 233 600 600 1400 1000 2,400 2,400 2,200 2,800 P4 AlthlonTBird Athlon64 Athlon64x2 Phenom PhenomII
Transistor Counts & Clock Speeds Clock Speed 3000 2500 2000 1500 Clock Speed 1000 500 0 1975 1979 1984 1987 1988 1989 1993 1995 1997 1997 1999 1999 2000 2001 2004 2005 2007 2009 Transistor Count 800,000,000 700,000,000 600,000,000 500,000,000 400,000,000 Clock Speed 300,000,000 Transistor Count 200,000,000 100,000,000 0
Transistor Count and Clock Speeds Log10 10 9 8 7 6 lg Transistor Count 5 lg ClockSpeed Cores 4 3 2 1 0 1975 1979 1984 1987 1988 1989 1993 1995 1997 1997 1999 1999 2000 2001 2004 2005 2007 2009
The Honeymoon Wintel - Grove giveth, and Gates taketh away. Lazy programmers dream
Moores Law is NOT dead. http://www.engadget.com/2010/05/03/nvidia-vp-says- moores-law-is-dead http://arstechnica.com/business/news/2010/05/moores- law-is-not-dead-its-merely-pining-for-the-fjords.ars Last few years clock speeds flatten out at approx 3 Ghz limited by silicon tech Transistor counts continue to increase, but... More cores not faster processors
Problems With More Cores It all goes wrong once you leave the package The Flat metaphor for shared resources. Cooperation required; just like development projects blindly throwing more processors at a problem will not necessarily give increase performance and rarely linear increases.
Single Core Memory I/O CPU Core Data/Address Bus Cache
Multi Core Memory CPU Core CPU Core I/O Cache Cache Data/Address Bus Cache
ManyCore Memory CPU Core CPU Core CPU Core CPU Core CPU Core I/O Cache Cache Cache Cache Cache Data/Address Bus Cache
Windows 7/Server 2008 Kernel and I/O contention Mention Windows 7 kernel advances @128cores http://www.osnews.com/story/22501/Microsoft_ Kernel_Engineers_Talk_About_Windows_7_s_Ker nel
For Windows 7, Microsoft removed several locks that seriously hindered performance - all without breaking a single application. The global dispatcher lock, for instance, is gone completely, and replaced by fine-grained locking which provides 11 types of more specific locks as well as rules on how locks can be obtained so that you no longer run into deadlocks. The pre-7 dispatcher spent 15% of the CPU time waiting to acquire contended locks. "If you think about it, 15% of the time on a 128-processor system is, more than 15 of these CPUs are pretty much full- time just waiting to acquire contended locks. So we're not getting the most out of this hardware," kernel engineer Arun Kishan explained
That has obviously changed in modern times, and in Vista, this architecture simply gave in. The statistic Wang gave during the talk was pretty... Disconcerting. "As you went to 128 processors, SQL Server itself had an 88% PFN lock contention rate. Meaning, nearly one out of every two times it tried to get a lock, it had to spin to wait for it... Which is pretty high, and would only get worse as time went on." The more fine-grained approach in Windows 7 and Windows Server 2008R2 yields some serious performance improvements: on 32-processor configurations, some operations in SQL and other applications perform 15 times faster than on Vista. And remember, the new fine-grained method has been implemented without any application breakage.
Multi-threading Programming tasks in parallel for compute intensive tasks.
Asynch Programming BeginFirstThing..... Do Something Else... FinishSecondThing() Allows processor to do something more useful while waiting for disk/network or other I/O. Helps keep UI responsive in windows apps, by allowing UI to execute while I/O done in background.
Problems Atomic data access operations , Locking, Deadlocks Race conditions Order of execution All subtle, infrequent difficult to detect and replicate attaching a debugger affects how the software behaves Not very unit testable.
User as a bottleneck User does not scale out Office Apps and other heavily interactive software limited more by user than processor. Think about ExcelTM and recalculation of spreadsheets JFDI versus thinking about it too hard real-time scheduling type problem
The future New languages F#, Axom Probably the as big a change as the shift from assembler to C Task orientated Less imperative, what you want doing - not how you want it doing. Making a cup of tea.
Task-Orientated Focus on what we want to achieve not how to do it. Read-only values where possible ( immutability, less conflict) Using workflow and workflow like programming (Azure) to scale out PDC 08 Axom @ PDC 09
New for .Net4 for NOW ThreadPool.Queue is dead, long live Tasks WorkStealing Queues in the ThreadPool AsParrallel and P/LINQ Collections/Bags/Lists/Queues/Locks http://msdn.microsoft.com/en-us/library/system.collections.concurrent.aspx RxExtensions - IObservable<T> ParallelExtensionsExtras http://blogs.msdn.com/pfxteam/archive/2010/04/04/9990342.aspx
Parallel Computing and .net4 Tools Managed Languages Visual F# Axum Visual Studio 2010 Parallel Debugger Windows Native Libraries Managed Libraries DryadLINQ Async Agents Library Parallel Pattern Library Profiler Concurrency Analysis Parallel LINQ Rx Data Structures Data Structures Task Parallel Library Microsoft Research Native Concurrency Runtime Race Detection Task Scheduler Managed Concurrency Runtime Fuzzing Resource Manager ThreadPool Operating System HPC Server Threads UMS Threads Key: Windows 7 / Server 2008 R2 Research / Incubation Visual Studio 2010 / .NET 4 Operating System
ThreadPool in .NET 3.5 Global Queue Worker Thread 1 Worker Thread 1 Item 4 Item 5 Item 1 Item 2 Program Thread Item 3 Item 6 Thread Management: Starvation Detection Idle Thread Retirement
ThreadPool in .NET 4 Local Work- Stealing Queue Local Work- Stealing Queue Lock-Free Global Queue Worker Thread 1 Worker Thread p Task 6 Task 3 Task 5 Task 4 Task 1 Task 2 Program Thread Thread Management: Starvation Detection Idle Thread Retirement Hill-climbing
Demo Time All Hail Murphy! * * An appeasement for the mischievous demo gods.
Easy wins with P/LINQ Uses TPL IParallelEnumerable<T> Parallel.AsParallell<T> Migration to LINQ a good first step to parallelisation Also Parallel.Foreach Choose carefully for best performance; but either is probably better than the alternatives. Lots of knobs. http://blogs.msdn.com/pfxteam/archive/2010/04/21/ 9997559.aspx
public static IEnumerable<T> Zipping<T>(IEnumerable<T> a, IEnumerable<T> b) { return a .AsParallel() .AsOrdered() .Select(element => ExpensiveComputation(element)) .Zip( b .AsParallel() .AsOrdered() .Select(element => DifferentExpensiveComputation(element)), (a_element, b_element) => Combine(a_element,b_element)); }
public static IEnumerable<T> Zipping<T>(IEnumerable<T> a, IEnumerable<T> b) { var numElements = Math.Min(a.Count(), b.Count()); var result = new T[numElements]; Parallel.ForEach(a, (element, loopstate, index) => { var a_element = ExpensiveComputation(element); var b_element = DifferentExpensiveComputation(b.ElementAt(index)); result[index] = Combine(a_element, b_element); }); return result; }
TPL - Task is Your New Best Friend ThreadPool.QueueUserWorkItem Great for fire-and-forget But what about Waiting Canceling Continuing Composing Exceptions Dataflow Debugging
IObserver<T>,IObservable<T> Part of the Rx Reactive Framework Duality of IEnumerable<T>; ie. Push versus pull Good for replacing events; Asynchronous I/O Programming Used in Sliverlight Toolkit for Unit Test See BurningMonk s article on Drag and Drop and Iobservable<T>. cf: ProducerConsumer
Code Show and Tell GPS Example form MSDN
New Sync Primitives in .NET 4 Public, and used throughout PLINQ and TPL Address many of today s core concurrency issues Thread-safe, scalable collections IProducerConsumerCollection<T> ConcurrentQueue<T> ConcurrentStack<T> ConcurrentBag<T> ConcurrentDictionary<TKey,TValue> AggregateException Initialization Lazy<T> LazyInitializer.EnsureInitialized<T> ThreadLocal<T> Locks Phases and work exchange Barrier BlockingCollection<T> CountdownEvent ManualResetEventSlim SemaphoreSlim SpinLock SpinWait Partitioning {Orderable}Partitioner<T> Partitioner.Create Cancellation CancellationToken{Source} Exception handling
Parrallel Extensions Extras Useful but either to specific or not mature enough to properly enter the framework. Built ontop of .net 4.0 objects Not fully tested being augmented continually, feedback welcome. LINQ to Tasks Task<TResult>.ToObservable Additional Task Extensions Methods BlockingCollectionExtensions StaTaskScheduler ConcurrentExclusiveInterleave Additional TaskSchedulers ReductionVariable<T> ObjectPool<T> Pipeline ParallelDynamicInvoke AsyncCache More to come...
LINQ for TASKS http://blogs.msdn.com/pfxteam/archive/2010/04/04/9990343.aspx GOAL: Task<string> result = from x in Task.Factory.StartNew( () => ProduceInt()) from y in Task.Factory.StartNew( () => Process(x)) select y.ToString(); The LinqToTasks.cs file in ParallelExtensionsExtras provides a set of more complete implementations, covering Select, SelectMany, Where, Join, GroupJoin, GroupBy, OrderBy, and more.
PipeLine Stage1 Stage2 Stage3 Pipeline.Create(rawChunk => Compress(rawChunk)) .Next(compressedChunk => Encrypt(compressedChunk));
Visual Studio Some of the new tooling.
Summary and Links Understand the Impact of Low-Lock Techniques in Multithreaded Apps. http://msdn.microsoft.com/en-us/magazine/cc163715.aspx Key Links Parallel Computing Dev Center http://msdn.com/concurrency Code samples http://code.msdn.microsoft.com/ParExtSamples Blogs Managed: http://blogs.msdn.com/pfxteam Tools: http://blogs.msdn.com/visualizeparallel Forums http://social.msdn.microsoft.com/Forums/en-US/category/parallelcomputing blackmarble.com