Intel's biggest blunder: Itanium

preview_player
Показать описание
Intel is a very successful company, but sometimes it makes mistakes, and very few mistakes are this titanic, its time to talk about Itanium.

SGI Video
Рекомендации по теме
Комментарии
Автор

Oh Itanium… I did NTDDK work, writing drivers, and I got called in to help on an Itanium effort. I had low-level MC680x0, x86, VAX and Alpha experience, so they figured I could help with IA64 too, I suppose. Boy were the performance anomalies bizarre. You’d have an ISR that was nearly functionally and syntactically identical to an x86 (or later AMD64) equivalent (in C), and yet it would perform terribly, but only under certain circumstances. Given that VLIW/EPIC was supposed to largely predetermine/hint ideal branching and ILP at object generation time, you’d expect fairly consistent performance. Instead, depending on what other ISR’s or DPC’s had been serviced, performance for our code would vary massively (I literally would break the system to remote windbg and drill through step by step, instruction by instruction). Eventually it became clear that there were a lot of bad pre-fetch gotchas, lack of ILP “bundling” that the compiler was padding with nop’s, and the resulting instruction cache miss rate was high (along with mp L2 cache coherency problems). Most of this had no clear recovery path as IA64 was an IPF in-order architecture, as they’d really gone all-in on speculative execution.

To fix certain issues literally required hand-coded assembler, which oddly enough was actually quite excellent. The ISA itself, abundant gpr’s, clear mnemonics, made for a nice programming front-end (especially compared to our old friend, x86). But none of that meant a hill of beans, because the uArch’s commitment to predetermined/hinted instruction level parallelism was just a huge handful. The people writing the compilers were fighting a nearly impossible battle.

Sad thing is, there were a lot of good things about IA64, had Intel taken a slightly less religious approach to the whole affair, things might have gone differently (a problem that stemmed from their abundant insecurity around being “the CISC” people, during an era when RISC was the end-all of every flamewar discussion around processor technology - they wanted very badly for IA64 and EPIC (VLIW) to be the next big buzzwords).

I still have, sitting in a storage unit, the McKinely (Itanium2) HP zx6000 workstation that I was provided for that project - the company didn’t want it back! Thing is, unlike my Amiga’s, vintage Mac’s, bread-bin C= machines and my beloved Alpha box, I also can’t get excited by IA64. The reason is pretty simple: it just wasn’t very good (and, as another commenter pointed out, the IA64 bandwagon took down Alpha, PA-RISC and MIPS, all more worthy than IA64).

smakfu
Автор

Thirty years ago I overheard DEC engineers say that RISC (reduced instruction set computing) really means "Relegate Important Stuff to the Compiler"

NeilRieck
Автор

I was doing research at a university that had an Itanium-based supercomputer. It produced a neat wow factor when you had to mention it in papers, but the thing was a colossal failure and I was able to outperform it with my power mac G5 at home for most things. Probably cost millions of dollars, and certainly tens of thousands a year just in electricity and air conditioning.

benjaminsmith
Автор

I can confirm that, despite the fact that HP-UX on Itanium were not an astounding success, many companies with links to HP had entire server clusters based on that architecture. My second job in IT (around 2007) in the TLC sector started with a migration of lots of custom software (written in C++) from version 11.11 of HP-UX to 11.31. However, many years later (around 2013 I think), I was asked to migrate the same software from HP-UX on Itanium to RHEL on Intel. I still remember fighting with the discrepancies not only of the two Unix flavors, but of the two target C++ compilers (acc vs gcc), each one with its own unique quirks - eg. slightly different handling of IPC ("System V compliant" my foot), very different handling of read-only memory regions etc.
Fast forward to my current job: last November I started working in the Banking sector. Guess what was my first project? Migrating a server hosting different batch processes (with a mixture of pure shell scripting and Java applications) from HP-UX on Itanium to Linux (of course, lots of different basic Unix commands behave differently, e.g. ps). Fate is an odd mistress, indeed...

QuintusCunctator
Автор

AMD calling their server cpu's EPYC is actually such an amazing insult

liamcampbell
Автор

I remember Itanium and the hammer blow that was AMD64, but I didn't know much about it and the deals that were done at the time, nor did I know it continued as a zombie for as long as it has. This was a FANTASTIC video that really answered some questions I had and was well worth watching. Great work!

heidirichter
Автор

Place I worked at, we were porting software from 1.1ghz PA8900 Mako's running 11iv2 to 11.23 on in theory 'nice' (tm) 4/8 socket I2's. Oodles more cores, oodles more Mhz and internal memory bandwidth (lets forget the discussion about the 'generous' amount of cache on the Itanium .. because maybe she didn't lie and it apparently doesn't matter). Sadly, it all ran at best about 1/4 the speed of the older PA system for some sections of the code where it had a good tailwind (other sections much worse..), irrespective of single or 120 concurrent threads. Four of HP's optimiser engineers were hired at an eye watering rate for 3 weeks. In the end the result was "Wait for the newer better compiler and it'll magically get better ..delay the HP product 8 months". We waited (no choice), it didn't happen, but on the plus side.. they were able to afford lovely 5-star holidays in exotic places that year. It was embarrassing that the old 3-4 year old pentium-3 server and the older dual processor IBM pSeries workstation (275) performance out ran it also. It was all just sad and it killed three of my favorite RISC architectures.

VKFVAX
Автор

There's an odd little bit of symmetry. The CG rendering in Titanic was done on Dec Alpha CPUs.

etansivad
Автор

And then there is the story of the completely forgotten Intel iAPX 432 architecture... arguably Intel's first costly mistake..!

orinokonx
Автор

I used to work for DEC and remember getting my hands on an Alpha 150MHz workstation for the first time in the early-mid 1990's. It was was running Windows NT for Alpha and had a software emulator built in for x86 windows support. The first thing I did with it was to try to play Doom2 in the emulator window, and it actually ran - and much, much faster than my personal Pentium computer could do. It was shocking how fast it was. It also freaked me out when I looked at the chip and saw that the heatspreader actually had two gold plated screws welded on it to to bolt on the heatsink. The Alpha was a beast!

pattonpending
Автор

I worked on the SGI version of the IA64 workstation and porting/tuning 3rd party apps. I would sit in on the SGI Pro64 compiler team meetings and one time they called in the Intel team to clear up some issues. The Pro64 complier could handle the multiple instructions queues and all that just fine, given that it could predict what the CPU would do. It had an internal table of op-codes and various execution times (I think it was called a NIBS table) and on the MIPS R10K CPU, there was a precise mapping of those, a core of the RISC architecture. There was a whole family of IA64 op-codes that had variable execution times. The compiler guys asked Intel if there was a determinant execution time for those op-codes. The Intel engineer proudly stood up and said "why yes, there is". Then he went on to explain that this or that instruction would take something like 6, 8, 10, or 12 cycles to execute. At that point, the compiler guys just about got up and left. In the RISC world, it's typically 1 or maybe 2 clock cycles/instruction (fixed). In the Intel CISC_trying_to_be_RISC world, there's a formula for the execution time. Faced with that, the compiler team had to use the worst case execution times for every one of these instructions and pad the rest of the pipeline with NOPs which killed performance. On the MIPS CPU, they could jam the instruction pipelines nearly full and keep every part of the CPU busy. On Intel, is was way less than that.

TheCrawler
Автор

A few things - I was not yet working at Intel in 1999 when AMD-64 was released, but I was working at a BIOS vendor and there is a tiny detail that conflicts with this video. Two BIOS vendors were already booting x86 with EFI by 2000 and we had EFI customers in 2001 (namely a Nokia x86 clamshell set-top box). Thusly, EFI BIOS has been a continuous selling "BIOS product" since the beginning. There was never a need to bring it back, it was always there with a non-zero customer base. Mainly AMD was the hold out. To be fair, Phoenix BIOS was also a hold out. But primarily it was AMD that refused to use or support EFI boot and so any customers using both Intel and AMD would simply use legacy BIOS so they did not have to buy two entirely different BIOS codebases. UEFI then was setup specifically to detach EFI from Intel so AMD could help drive it as an open source project not directly attached to Intel. When AMD finally got on board with UEFI - legacy BIOS finally started to lose customer base.

rer
Автор

Some would say that Itanium was an Intel executives attempt to eliminate pesky x86 competition IOW AMD through a new architecture for which AMD had neither a license nor clean room implementation. Ironically AMD handed Intel their asses with x86-64 and it must have been humiliating and humbling for Intel to have to approach AMD for an x86-64 ISA license. Hopefully the Intel executive who green lighted got fired from this.

boydpukalo
Автор

8:07. That “expectations over time miss reality over and over again” graph is just legendary!-)

stefankral
Автор

Small nit: It was PA-RISC, not RISC-PA. The PA stands for precision architecture.

BerndBausch
Автор

I recall Itanium being some hot stuff, so unreachable, so unseeable. We had Itanium servers in Novosibirsk State University, but not for any usual people. And now I find it marked "retro", causing cognitive dissonance.

If you are interested in VLIW architecture, Russian Elbrus e2k is also VLIW, but this is an alive project. With better x86 emulation, aided by CPU ISA adjustments. With Mono, JVM and JavaScript JIT. With C++ compiler although not bleeding edge, but still passable

OCTAGRAM
Автор

I still remember when our department at university got its HP-Itanium-1 workstation: I ran a few application benchmarks… 20% slower than my AMD-K7 PC at home:)

stefankral
Автор

Great video knitting together all the OS & CPU craziness of then. On the AMD64 you mentioned as one of Itaniums nails in the coffin, I remember just how toasty their TDP was (hotter these days of course)

leeselectronicwidgets
Автор

I spent about 18 months porting software to Itanium Windows. Nearly all of that was digging out and reporting compiler bugs, of which there were lots. Eventually got it working fairly well, but few customers took it up. I gradually transitioned onto porting the same software to AMD64 Windows. When we shipped that, I got a call from my Intel contact asking how much work AMD64 had been relative to Itanium. "Under 5% of the work of Itanium." "Oh..."

johndallman
Автор

Thanks for this. We used to run our telecoms applications and database on HP-UX on PA-RISC machines (please not "risc-pa"), some K-class servers, then L-class and we wondered if the new Itanium systems would be our future hardware platform. However the next step for us was the rp4400 series servers and after that we migrated to Red Hat Linux on the HP c7000 BladeSystem with x86-64 cpu architecture.

alexjenner