The Managed Era
Let’s just begin by revisiting recent history of managed development that will highlight current challenges. Remember the Java slogan? “write once runs everywhere”, it was the introduction of a paradigm where a complete “safe” single language-stack based on a virtual machine associated with a large set of API would allow to easily develop an application and target any kind of platforms/OS. It was the beginning of the “managed” era. While Java has been quite successfully adopted in several development industries, it was also quite rejected by lots of developers that were aware of memory management caveats and the JIT not being as optimized as it should be (though they did some impressive improvements over the years) with also a tremendous amount of bad design choice, like the lack of native struct, unsafe access or the route to go native through JNI extremely laborious and inefficient (and even recently, that they were considering to get rid off all native types and make everything an object, what a terrible direction!).
Java failed also in the heart of his slogan: it was in fact not possible to embrace in a single unified API all the usage of each target platforms/OS, leading to things like Swing, not what can be called an optimal UI framework. Also, from the beginning, Java was only design with a single language in mind, though lots of people found JIT/bytecode as an opportunity to port scripting languages to Java JVM.
In the meantime of early Java, Microsoft that tried to enter the Java market by integrating some custom language extensions (with the end story we know) and finally came with their own managed technology, which was in several aspects better conducted and designed: from the ground bytecode, unsafe construct, native interop, lightweight but very efficient JIT + NGEN, C# rapid language evolution, C++/CLI... etc, taking multiple language interop into account from the beginning and without the burden of the Java slogan (though Silverlight on MacOS or Moonlight were a good try).
Both systems share a similar managed monolithic stack: metadata, bytecode, JIT and GC are tightly coupled. Also performance wise, it is far from being perfect: the JIT is implying a startup cost and the executing code is not as fast as it should mainly because:
- The JIT is performing poor optimization compare to full C++ -O2, because it needs to be fast when generating code (also, unlike Java hotspot JVM, .NET JIT is not able to hot swap existing JIT code by a better optimized code)
- Managed types, like Array access are always checking bounds (apart for simple loops where the JIT can suppress the check if the for-limit is less or equal the array’s length)
- GC can pause all threads to collect objects (though new GC in 4.5 made some improvements) which can cause unexpected slow down in an application.
It turns out that somewhat it signs the "decline" of .NET. I don’t know much about Microsoft organization internals, but what is commonly reported is that there is some serious competition between divisions, good or bad, but for .NET, for the past few years, Microsoft seemed to running out of gas (for example, almost no significative improvements in the JIT/NGEN, lots of pending request for performance improvements, including things like SIMD that were asked for a long time), and my guess is that the required changes could only take place in a global strategy, with deep support and commitment from all divisions.
In the mean time, Google was starting to push its NativeClient technology, allowing to run sandboxed native code from the browser. Last year, in this delirium trend of going native, Microsoft revealed that even HTML5 implemented in next IE was going native! Sic.
In "Reader Q&A: When will better JITs save managed code?" Herb Sutter, one of the "Going Native" evangelist, provides some interesting insights about what the "Going Native" philosophy is thinking about JIT, with lots of inaccurate facts, but lets just focus on the main one : Even if JIT could improve in the future, managed languages made such a choice of safety over performance, that they are intrinsically doomed to not play in the big leagues. Miguel de Icaza posted a response about it in "Can JITs be faster?" and he explained lots of relevant things about why some of Herb Sutter statements were misleading.
Then WinRT came here to somewhat smooth the lines. By taking part of the .NET philosophy (metadata and some common “managed” types like strings and arrays) and the good old COM model (for a common denominator of native interop), WinRT is trying to solve the problem of language interoperability outside the CLR world (thus without the performance penalties for C++) and to provide a more “modern” OS API. Is this the definitive answer, the one that will rule them all? So far, not really, it is on the direction of the certain convergence that could lead to great things, but it is still uncertain that it will take the right track. But what could be this “right track”?
Going native 2.0, Performance for All
Though safety rules can have a negative impact on performance, managed code is not doomed to be run by poor JIT compiler (For example, Mono is able to run C# code natively compiled through LLVM on iOS/Linux) and it would be fairly easy to extend the bytecode with more "unsafe" levels to provide fine grained performance speedup (like suppressing array bounds checking...etc.).
But the first problem that can be currently identified is the lack of a strong cross-language compiler infrastructure, this is ranging from the compiler used in IE10 Javascript JIT, to the .NET JIT and NGEN compilers or into the Visual C++ compilers (to name a few), all using different code for almost the same kind of laborious and difficult problem of generating efficient machine code. Having a single common compiler is a very important step to provide a high performance code accessible from all languages.
Felix9 on Channel9 found that Microsoft could be actually working on this problem, so that's a good news, but the problem of the "performance for all" is a small part of a bigger picture. In fact the previous mentioned "right track" is a broader integrated architecture, not only an enhanced LLVM stack, but baked by Microsoft's experience in several fields (C++ compiler, JIT, GC, metadatas... etc), a system that would expose a completely externalized and modularized “CLR” composed of:
- An intermediate mid level language, entirely queriable/reflectionnable, very similar to LLVM IR or .NET bytecode, defining common datatypes (primitives, string, array... Etc). An API similar to System.Reflection.Emit should be available. Vectorized types (SIMD) should be first class types as int or double are. This IL code should not be limited to CPU target usage, but should allow GPU computing (similar to AMP) : it should be possible to express HLSL bytecode with this IL, with the benefits to leverage on a common compiler infrastructure (see following points). Typeless IL should also be possible to allow dynamic languages to be expressed more directly.
- A dynamic linked library/executable, like assemblies in .NET, providing metadatas, IL code, query/reflection friendly. When developing, code should be linked against assemblies/IL code (and not against crappy C/C++ headers).
- An IL to native code compiler, which could be integrated in a JIT, an offline or a cloud compiler, or a mixed combination. This compiler should provide vectorization whenever target platform support it. IL code would be compiled to native code at install/deploy time, based on the target machine architecture (at dev time, it could be done after the whole application has been compiled to IL). The compiler stages should be accessible from an API and offer extension points as much as possible (providing access to IL to IL optimization, or to provide pluggable IL to native code transform). The compiler would be responsible to perform global program optimization at deploy time (or at runtime for JIT scenarios). Optimizations options should range from fast compilation (like JIT) to aggressive (offline, or hot swap code in a JIT). A profile of the application could also be used to automatically tune localized optimizations. This compiler should support advanced JIT scenarios, like dynamic hotspot analysis and On Stack Replacement (aka OSR, allowing heavy computation code to be replaced at runtime by a better optim code), unlike current .NET JIT that only compiles a method on a 1st run. This kind of optimization are really important in dynamic scenarios where type inference is sometimes discovered later (like Javascript).
- An extensible allocator/memory component, allowing concurrent allocators, where the Garbage Collector/GC would be one implementation, though a major part of applications would use it to manage most of their lifecycle objects, leaving the most performance critical objects to be managed by other allocator schemes (like reference counting scenarios used by COM/WinRT). There is no restrictions to use different allocator models in a same application (and this is already what's happening when in a .NET application we need to deal with native interop to allocate objects using OS functions).
In this system, full native interoperability between languages would then be straightforward without sacrifying performance over simplicity and vice-verca. Ideally, an OS should be built from the ground up with such a core infrastructure. This is what was (is?) probably behind a project like Redhawk (for the compiler part), or Midori (for the OS part), in such an integrated system, probably only drivers would require some kind of unsafe behaviors.
[Update 9 Aug 2012: Felix9 again found that an intermediate bytecode, more low level than MSIL .NET bytecode, called MDIL could be already in used, and that could be the intermediate bytecode mentioned just above, though looking at the related patent "INTERMEDIATE LANGUAGE SUPPORT FOR CHANGE RESILIENCE", there are some native x86 registers in the specs that don't fit well with an architecture independent bytecode. Maybe they would keep MSIL as-is and leverage on a lower level MDIL. We will see.].
So what WinRT is tackling in this big picture? Metadatas, a bit of sandboxes API and an embryo of interoperability (through common datatypes and metadatas), as we can see, not so much, a basic COM++. And as we can obviously realize, WinRT is not able to provide advanced optimizations in scenarios where we use a WinRT API: for example, we cannot have a plain structure that can expose inlinable methods. Every method calls in WinRT are virtual calls, forced to go through a vtable (and sometimes several virtual calls are needed, when for example a static method is used), so even a simple property get/set will go through a virtual call. This is clearly inefficient. It looks like WinRT is only targeting coarse level API, leaving all the fine grained level API at the mercy of performance heterogeneity, restricting common scenarios where we want to access high performance code everywhere, without going through a layer of virtual calls and non-inlinable code. Using an extended COM model is not what we can call “Building the Future”.
Productivity and Performance for C# 6.0
A language like C# would be a perfect candidate in such a modular CLR system, and could be mapped easily to the previous intermediate bytecode. Though to efficiently use such a systen, C# should be improved on several aspects:
- More unsafe power where we could turn off “managed” behaviors like array access checking (kind of “super unsafe mode”, where we could possibly use CPU pre-caching instructions before accessing next array elements, kind of "advanced" stuff impossible to do with current managed arrays without using unsupported tricks)
- A configurable new operator that would integrate different allocator schemes.
- Vectorized types (like HLSL float4) should be added to the core types. This has been asked for a long time (with ugly patches in XNA WP to "solve" this problem).
- Lightweight interop to native code (in the case we would still be calling native code from C# unlike in an integrated OS): current manage to unmanaged transition is costly when calling native methods even without any "fixed" variables. An unsafe transition should be possible without the burden of the current x86/x64 prologue/epilogue of the unmanaged transition generated by current .NET JIT.
- Generics everywhere (in constructors, in implicit conversions) with more advanced constructs (contracts on operators... etc), closer to C++ template versatility but safer and less cluttered.
- Struct inheritance and finalizers (to allow lightweight code to be executed on exit of a method, without going through the cumbersome "try/finally" or "using" patterns).
- Add more MetaProgramming: allow static method extensions (not only for "this"), allow class mixin (mixin the content of a class inside another, usefull for things like math functions), allow modification of class/types/methods construction at compile time (for example, methods that would be called at compile time to add method/properties to a class, very similar to eigenclass in Ruby meta-programming instead of using things like T4 template code generation), more extensively, allow DSL like syntax language extensions at several points into the C# parser (Roslyn doesn't provide currently any extension point inside the parser) so that we could express language extensions in C# as well (for example, instead of having Linq syntax hardcoded, we should be able to write it as an extension parser plugin, fully written in C#). [Edit] I have posted a discussion "Meta-Programming and parser extensibility in C# and Roslyn" about what is intended behind this meta-programming thing at the Microsoft Roslyn forum. Check it out![/Edit]
- A builtin symbol or link type where we could express a link to a language object (a class, a property, a method) by using a simple construction like:
symbol LinkToMyMethod = @MyClass.MyMethod;
instead of using Linq expressions (like (myMethod) => MyMethod inside MyClass). This would make more robust code using INotifyPropertyChanged or simplify all property based systems like WPF (which is currently an ugly duplication of the method definition).
Next?
We can hope that Microsoft took a top-down approach, by addressing unified OS API for all languages and simple interoperability first, and that they will introduce these more advanced features in later version of their OS. But this is an ideal expectation and it will be interesting to follow if Microsoft will effectively challenge this. Even if It was recently revealed that WP8 .NET applications would benefit some Cloud compilers, so far, we don't know much about it: Is it just a repackaging of NGEN (which is again, not performance oriented, generating code very similar to current JIT) or a non public RedHawk compiler?
Microsoft has lots of gold in their backyard, with years of advanced native code compilations with their C++ compiler, JIT, GC, and all the related R&D projects they have...
So to summarize this post: .NET must die to a better integrated, performance oriented, common runtime where the managed (safety/productivity) vs native (performance) is no longer a border, and this should be a structural part of next WinRT architecture evolution.
Thank you for this fantastic write-up. "Managed code" developers need to read this.
ReplyDeletePersonally, I love C# - I like the simplicity and general "feel" of the language. However, today, C# is tied together with .NET which limits the possibilities (I use SharpDX for DX - that why I come here normally ;) ).
Obviously, I would love if I could write C# (maybe with "pure" WinRT APIs instead of .NET) and compile my Application to a efficient native assembly. For me, .NET is just boilerplate I need to run my C# application. I think this is a wish for many developers.
But I think there is major problem with how Microsoft has sold .Net to developers in the eary 2000s. Everybody was promoting .NET - most of my colleagues moved from native applications to .NET. Many new developers never used native code. In a Corporate Environment (mostly java and .net) and many core technologies had no native counterpart at all (ASP.NET, WPF, etc.).
How (this is problem for Microsoft) can they convince all those developers to jump into cold water? Yes, we developers need to learn every day to keep competitive but being realistic, most developers would not like to switch technolgies (It's many years of hard work after all).
This is a critical issue in the Windows stack that will ultimately decide the fate of the platform. For example, many previous WPF developers were outraged to see WPF die. Some years later, we got Silverlight and yes, same story here. The same story could be said about every other development technology at Microsoft (ok, maybe ASP.NET did make quite a progress).
Maybe Microsoft is too big to have a coherent message to developers. Why would they otherwise say "Hardware with limited ressources need native code" (~herb sutter) while the Windows Team is marketing Javascript (yikes!) and the Windows Phone Team only allowing managed code for development of applications.
Oh well, every thing is messed up I guess.
I'm 100% with you on that one Alex.
ReplyDeleteI personally feel lost right now as it seems we are at a crossroad with many paths to follow in every single field.
Step back a bit and look at our environment: how many technologies do we have to do approximativelly the same thing even in the same ecosystem such as Microsoft's .Net? Way too many.
Targeting mobile, desktop, web seems to be a common trend for product strategies because, it's a fact, we don't know anymore where the end user (and thus the client) is?
Will he be at a restaurant with his smartphone? At home with is tablet? At the office using his desktop? In a client office with his ARM laptop?...
And the list goes on with TVs getting direct internet access, consoles and media center boxes getting in our living rooms, internet & cable operators providing multimedia boxes on top of their initial modem/communication usage...
This trend started approximatelly with mobiles getting very popular in the 90's and lead us to create these "Managed" environments and the "RunOnceCrashEverywhere" mottos.
Now, only huge teams can afford time and money to target all their platforms and therefore we need to make legitimate choices either to target performance and native for a given application and/or prevent the future knowing that our product manager will certainly come back to us saying that we need to get the software running on this latest hardware that is a must have...
All this to set the point: languages and underlying technologies exist because a need is expressed businesswise and right now, the fault is on the business side of things.
Actually, the responsibility is on customers' hands: they will drive us where we should go and bad business decisions will as always clean it all.
I was expecting to see emerging a very aggressive and large acquisition strategy from one of the current leaders to dominate and define "their" (and not necessarly the best) development vision.
However, it seems that this isn't subject to happen as the overall market is way too fragmented.
This applies to standard business applications, games, websites, and even hardware...
Personnally, I love so much writing C# code that I would love to be able to use it everywhere even for direct native code.
From my perspective, managed is useful for 2 reasons:
- easier to share knowledge between developers as you share the same vision of the framework.
- reflection: even if it doesn't offer huge performances, this is a feature that I got to use so often to solve many issues, creating tools, and improve efficiency that I couldn't live without it.
Native C# is the goal I would claim for future. Don't care really if the framework is provided by Microsoft, Mono, an independent software publisher or even me implementing in Native C# to access root system features.
As long as this movement doesn't introduce all the ugliness of the C++ world, I don't mind that my C# code compiles directly to native code. However, I don't see the big performance penalty for .NET over native. I do a lot of parallel programming in genetics where we have to handle tons of data and a lot of processing. It's simply fast enough and the beauty of Parallel.Foreach is unparalleled. The reality is that most of the time it challenging for a mid-level skilled C++ programming to write optimized real world code that has clear performance advantage over managed code. If you bring productivity in the equation the performance advantage must be crucial for the scenario to justify the extra work. Don't get me wrong, I really like the idea that my code would run faster out of the box but I don't want to lose the beauty of the C# language and convenience .NET framework.
ReplyDeleteAll great points. The end of the day it's Idiocracy in full flight that is to say millions of .NET developers don't care how the API's were built in .NET so long as they can TAB-DOT-SHIP their way via VS2010 to victory. What happens beneath the C# layer is commonly ignored and often pushed aside in the "too hard basket".
ReplyDeleteDependencies on Frameworks that exist ontop of .NET is also high on the agenda along with controls and tooling to match this expectation. The trend here is to be less focused on the fundamentals of programming and more on the "how can I type as less as possible in order to achieve a faster shipping date... less code, more money more victory".
In order to reset .NET you have to take most of that off the table and almost start the ecosystem again. It's at this point Microsoft becomes the most vulnerable to the possibility of alternate language adoption or worse - tooling / platform adoption(s).
The foothold for a "ECMA" script agenda or push towards is to hopefully keep Microsoft's relevance intact long enough for them to reboot the .NET ethos. All the above is nod, smile and sure, why not... it will light up a lot of geek's who value the power of managing their own garbage collection because I can mantra's... it will do little for the bulk of the community who just finished writing a WinForm / WPF or SIlverlight app with a proud expression of "look mah, I'mz a programmaz" expression.
Thanks for your feedback!
ReplyDelete@Unknown, that's probably a consequence of .NET not being native enough to be endorsed by all MS divisions, leading to things like you describe. Though, I would expect no less than a large set of .NET API to be ported to this "WinRT Next", but this could also never happen. It depends a lot on the coherence of MS architecture development strategy across its divisions.
@IndieFreaks, indeed, the market is fragmented and cross platform development is becoming more challenging, but thanks to a product like Mono, It is really possible today for a .NET developer to target lots of different platforms with a single language and some specific code to run on these platforms. But only Microsoft could be able to drive and impose a coherent architecture in the scenario of a "fusion" of .NET & WinRT. Xamarin is too small, focused only on mobile (where the money currently goes), and big players like Google, Apple have different development philosophy and so far, have not been able to provide a coherent and strong development/runtime strategy.
@janhannemann, I fully agree, the purpose of the ".NET must die" sentence is of course not to trade to a dull version! I wrote this post because I love so much developing in C# that I would like this language to be pushed towards the future (and not in the tiny hole of the "Going Native" mirage). Async/Await framework was great, but we need much more, and first to bring back performance to the negotiating table. A language like C# have tons of opportunities to be improved for a better productivity without sacrifying its simplicity.
@Scott Barnes, yes, if Microsoft is not tackling this seriously, they will be also in a very vulnerable position, considering today the importance of cross platforms development - mobile and web platforms being the major current trend, two segments where MS is almost inaudible -. The market today is completely different to 10 years ago, when .NET was introduced. So the .NET reset is probably much more tricky than the naive vision described in my post, but it is important to understand the current challenge of WinRT, what it does, and what it does not, so that people won't say: we didn't know, even if as you said (I lol at look mah :), a majority of programmers (.NET or "Native" coders) don't really see what's the point and challenge in all of this.
You wrote:
ReplyDelete> (JavaScript being one of the most prominent JIT user)...
Javascript is always interpreted. Because most of the Javascript code that's sent to a browser is never executed, JIT doesn't particularly make sense. I'm not aware of any JIT implementations.
@Tim Roberts, you should probably better check again how Javascript engines are running nowadays. Most of them have a JIT and even some of them (like the V8 that is generating directly machine code: check this https://developers.google.com/v8/design#mach_code ) doesn't have any interpreter.
ReplyDeleteI was reading an article the other day called "Broaden your options: Don’t fear native code". Interestingly it was written in 2006 and yet is so relevant even today.
ReplyDeleteAnyway, I wanted to add another important factor that I believe accounted towards to the "decline" of .NET. The success of the iPhone and the resurgence of Objective-C / NextSTEP as a viable platform. Remember what technology was dominating the mobile space during that time? JavaME! Apple took advantage of native execution to strike a perfect blow to that entrenched managed mediocrity.
@koumparos, absolutely, iOS native development has probably participated to the "Going Native" resurgence. Also with the no other choice than using managed code on Windows Phone, lots of game company that had their engine written in C++ didn't bother to port their games to Windows Phone.
ReplyDeleteHey Alex, great post!
ReplyDeleteI heard Microsoft is hiring to create a native compiler for C#, so maybe (or hopefully) they share some/all-of your thoughts.
As you know, I love C#. And I don't want to go back to C++. I prefer to use C# and go unsafe where and when needed (like array checking, fixing primitives, string manipulation, etc.) for all platforms.
By restriting unsafe environments for some platforms, and the lack of powerful native-image compiler, MSFT pushed big devs away from C# (no stackalloc on the 360, no unsafe ops for WinPhone7). Maybe the news that apps will be precompiled for win8 will change the game ...
Btw, you forgot to mention that NaCl also supports C# (Bastion => Mono).
I forgot to post this link: http://blog.prabir.me/post/LLVM-e28093-Native-C-Compiler.aspx
ReplyDelete@Ultrahead, yes, Mono JIT has been ported to run through NaCl (http://naclports.googlecode.com/svn/trunk/src/experimental/v8/nacljit.pdf). Unlike WinRT, NaCl allows code to be generated at runtime (apart for the official .NET JIT in WinRT) still with an interesting sandboxing approach.
ReplyDeleteConcerning the LLVM# project, It is a bit weird that they didn't use the .NET IL bytecode to translate it directly to LLVM IR. imho, it's a waste of time and error-prone to re-code a C# parser and It would have been much easier to directly start from bytecode.
Maybe they thought they could do it better. Lol.
ReplyDeleteWhat dou you think about "D" language?
ReplyDeleteMy two cents: http://amapplease.blogspot.com/#!/2012/08/net-must-die-to-go-or-not-to-go-native.html
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteOn Vectorized types: we do like the idea, we like the idea so much that Mono shipped Mono.Simd (http://tirania.org/blog/archive/2008/Nov-03.html and http://docs.go-mono.com/index.aspx?link=N:Mono.Simd) data types that map to the hardware SIMD on x86 and x86-64. The major problem we ran into, is that this is hardly cross platform. Different PPC CPUs have different features, ARM, SPARC and pretty much everyone is different. And real production code is badly hurt if you add if-statements to opt in or out of a particular SIMD optimization across platforms. We found that these types were not as useful as we thought they could be.
ReplyDeleteClearly an example where we believed that the feature would pay off and in practice ran into real limitations.
Where I want to go instead, is SIMD-tune existing high-level classes that are in use today for matrix, quaternions, vectors and a handful of others in well-known libraries. Teach the VM about some of these and hardware accelerate those in the way that makes the most sense on a particular architecture.
Miguel
As for some of your feedback on how to improve C#, those are not bad ideas. The "configurable" new operator would just be syntactic sugar, because you can get the advantages of if by writing your allocator and consuming unsafe data. For example, a pool allocator is trivially written:
ReplyDeleteclass PoolAllocator {
PoolAllocator (int size);
void *GetMemory (int n);
void Relase ();
}
It is up to you to cast "void *" to whatever you like.
Want to use the system malloc? DllImport ("libc") extern static void * malloc (int size);
So the idea is not a new one, it has been discussed, but nobody has suggested anything that would make a change to the language worth more than learning and building a simple API. If a good idea comes out, it would be good to hear it, but most of the ideas are presented in the same way they were presented here, in the "it would be nice", but with no concrete steps to execute on them.
You are probably thinking that you want a new that works with both worlds, one that allows you to mix GC-managed objects with non GC-managed objects and have each be managed on its own. This breaks the requirements of a GC (the ability to trace all live objects).
Miguel, you completely miss understood this statement "with lots of inaccurate facts, but lets just focus on the main one: Even if JIT could improve in the future, managed languages made such a choice of safety over performance, that they are intrinsically doomed to not play in the big leagues". It was making Herb Sutter talking, nothing else. Eveything that you are explaining here, I completely agree.
ReplyDeleteHaving working with Mono and .NET at bytecode and assembly level, I know perfectly all the optimizations you are mentioning, and I'm applauding all the work you did in Mono to bring more unsafe power.
[Edit]Ok, you just deleted your comment, I'm going to update the article to avoid confusion[/Edit]
Concerning more specifically the allocator scheme, yes, It is a kind of syntactic sugar, probably the same as the C++/CLI syntactic sugar that allows to mix native types with CLR types into the same language. Of course, the idea is not new, I never claimed that this post was full of new thinking. Concerning the GC, yes, I still do believe that we could have some more options when allocating an object to choose by which GC instance it could be handled. Currently, all objects are managed by the same single/general GC, and this is not always efficient, where in certain scenarios, you do know that some specific objects would be better handled in a more specific way. I agree this is of course much less trivial than it sounds (and I would have to scratch my head way more than this little post If I had to implement it, sure!), as It would require for this uber-GC to know which particular GC is attached to an object, which would require some indirection at some point, but whatever the method, this is possible (use of some special memory paging memory, add a special field just after the vtbl, or specify it at the type level...etc.).
Concerning SIMD, yes, I do know Mono SIMD (I mentionned it in the article), I even applaud Mono initiative 4 years ago when I released NetAsm (http://netasm.codeplex.com/), a dynamic JIT compiler that was hacking the .NET CLR by replacing JIT code with custom ASM code.
I was mainly talking about SIMD in the implicit perspective of massive matrix multiplication for example (I'm working on game engine, Paradox3d.net with Virgile Bello remember? ;), not as a general problem solver for all floating point instructions. But having done some heavy DSP audio myself in the past, where floating point instructions were heavy, even switching from pure x86 FPU to SSE2 instructions made a great boost, even without vectorizing... but for all matrices/vectors ops, vectorizing is the key to performance.