Pentium 4 – when Intel got it SO wrong

Huge clock speeds, hot temperatures, and rubbish performance. We look back at NetBurst, when Intel got its CPU strategy catastrophically wrong.

Pentium 4 grave

Like that Simpsons episode where the director’s commentary for The Postman is just Kevin Costner repeatedly apologizing into his microphone, there needs to be a monument to the Pentium 4 somewhere at Intel’s HQ to remind people how badly you can get it wrong.

You might think Intel is a bit behind the competition now, being stuck on a 10nm node, and struggling to get Meteor Lake out the door, but at least Raptor Lake is a half-decent architecture. You’d struggle to say the same for the Pentium 4’s NetBurst architecture.

Let’s put the Pentium 4 in its historical context. Intel came up with the Pentium brand to distinguish its processors from the competition in the post-486 era, and it had worked. People looked for the Pentium brand as a seal of quality, and that had largely continued throughout the Pentium II era. Then, soon after the launch of the Pentium III, AMD brought out its first premium CPU brand, the AMD Athlon.

There were teething problems with Athlon, as you might expect, but it showed that AMD could beat Intel in terms of performance. Not only that but, to Intel’s shame, AMD’s Athlon later beat the Pentium III to the 1GHz finish line at the end of 1999.

Intel Pentium 4

First missteps

There were some clues to what was to come with Pentium 4 in the latter days of the mainstream Pentium III, when Intel introduced a 133MHz front side bus (FSB). Now, the 133MHz FSB was a great idea, as it not only bumped up the CPU speed, but also the I/O speed between the CPU and the motherboard chipset’s Northbridge – if you used 133MHz memory with it, you got a load more bandwidth.

The problem was that Intel’s two chipsets for it were both flawed in crucial ways. At the top end was 820, with pricey motherboards and the need for a new type of memory made by Rambus, called Rambus Dynamic RAM (RDRAM). While SDRAM was generally running at up to 133MHz, RDRAM could run at 400MHz. Not only that, but by transferring data on the rise and fall of the clock, much like DDR memory today, it effectively ran at 800MHz.

It wasn’t quite that simple though. At this time, RDRAM had a 16-bit bus, compared with the 64-bit bus used for SDRAM, and it also had much higher latency. RDRAM still had higher bandwidth by the end of it, but it wasn’t quite the trouncing people expected. Also, all that bandwidth didn’t make a massive difference when the front side bus only ran at 133MHz anyway. More to the point, RDRAM was much more expensive than SDRAM – around three times the price, in fact.

Intel had another option, the cheaper 810 chipset, which supported 133MHz SDRAM, but again had a crucial flaw – there was no AGP slot, so you couldn’t run the latest graphics cards on it. If you were a gamer, you had to buy into 820 and, needless to say, we were more inclined to buy a cheaper SDRAM-based Athlon system instead, or to overclock an Intel 440BX motherboard from the last generation. Intel later addressed this with the SDRAM and AGP-supporting 815 chipset, but by this time it was too late to stop the Athlon onslaught.

Intel needed an answer, and it had one in the works – a brand-new CPU microarchitecture that rewrote many of the previous rules and could be clocked to high heaven. The Pentium 4 would be built on several new technologies and principles, and leave existing CPUs looking like relics of yesteryear. At least, that was the idea.

Pentium 4 die shot

A die shot of a Prescott Pentium 4, which had a colossal 31-stage pipeline

NetBurst forth

Intel called its new microarchitecture NetBurst, and it represented a very different approach to the Pentium II and III, which had been largely based on the earlier Pentium Pro’s P6 core. At this time, all CPUs had just the one core, and clock speed was the primary indicator of a CPU’s performance. There were no model numbers like we have today – you bought a 1GHz Pentium III, for example, or an 800MHz Athlon.

This is where NetBurst could beat previous CPU microarchitectures, as it was built to be scaled up to super-high clock speeds. In fact, at the Pentium 4 launch in 2000, Intel said it expected the architecture to scale up to 10GHz as fabrication processes were refined over the years. Yes, really.

That didn’t happen, as we now know, but NetBurst could hit very high clock speeds for the time, despite the first ‘Willamette’ chips being built on the same 180nm process as the previous ‘Coppermine’ Pentium III chips. They launched at 1.4GHz and 1.5GHz, with a 1.7GHz model arriving a few months later and a 2GHz model coming out in the summer of 2001.

Pentium 4 wafer

A Pentium 4C wafer – the ‘C’ denoted an 800MHz front side bus

Stuck in the pipeline

That sounds amazing, you might think. Intel had gone from 1GHz to 2GHz in under two years, and its new microarchitecture was clearly built for high-frequency operation. However, achieving that high clock speed required some fundamental changes to the structure of the microarchitecture, one of which was a large increase in the number of stages to the execution pipeline. With the pipeline split over more stages, Intel could devote fewer transistors to each specific stage, enabling it to increase the clock speed.

The first Willamette Pentium 4 CPUs had 20 pipeline stages, which increased to a massive 31 stages in the later Pentium 4 CPUs, codenamed Prescott. As a point of comparison, the Pentium III had 14 pipeline stages, and the first Athlon 64 CPUs had 12 pipeline stages – NetBurst had a long pipeline, especially for the time.

There were two main problems with this long pipeline – the first was that it required a higher voltage than a shorter pipeline, and therefore the CPU generated more heat, especially in conjunction with the high clock speeds. The Pentium 4 was the first CPU to really require the large heatsink-and-fan assemblies that we still use today, with some of the first Pentium 4 PCs being equipped with wind tunnels to link the CPU cooler with the case’s exhaust fan.

The second problem is that a long pipeline makes a CPU very inefficient at processing code with unpredictable branches. Intel’s plan to get around the latency created by the many-stage pipeline was to make use of advanced branch prediction techniques.

If a CPU was performing a task that repeatedly used predictable code branches, such as video encoding, then the CPU could efficiently predict what it needed to do. Loops of code could be handled quickly too, thanks to Intel’s new L1 Trace Cache system, which moved the L1 cache to a position after the decode unit, so any microinstructions held in it would already be decoded.

For these reasons, Pentium 4 CPUs usually excelled in software with predictable instructions. The problem was that a lot of code didn’t behave this predictably, particularly if you were running lots of legacy applications at once. If the CPU got it very wrong, the pipeline would have to be flushed and start again. That’s not a massive problem if a CPU has a short pipeline, but it quickly makes the CPU inefficient if it has a long pipeline.

The result was that the Pentium 4 could process fewer instructions per clock (IPC) than the Pentium III in a lot of standard software, negating the benefits of those huge clock frequencies.

2.6GHz Pentium 4 Socket 478

A relic from Custom PC’s past – the Beat the Office CPU from Issue 1 of the print magazine was an overclocked 2.6GHz Northbridge Pentium 4C attached to a Vapochill phase change system at -22.5°C – we had it running stably at 3.54GHz

Missing the bus

Another key difference between the Pentium 4 and its predecessors was its front side bus. As we mentioned earlier, there was a large disparity between the 133MHz Pentium III FSB and the massive bandwidth of RDRAM. Intel aimed to fix this with the Pentium 4 by introducing a quad-pumped FSB, where four signals are sent per clock cycle.

The FSB still fundamentally ran at 100MHz, but it had an effective frequency of 400MHz. Intel launched the Pentium 4 with the 850 chipset, which only supported RDRAM, but the Pentium 4 could now take advantage of all that extra memory bandwidth.

Again, though, RDRAM was expensive, and not many people already had RDRAM sticks in their old PCs that they could simply transfer to a new system. Intel started bundling RDRAM with the CPUs in an effort to get people on board, but it was a tough ask.

With disappointing sales and reviews for the Pentium 4, Intel backtracked and launched a new SDRAM-supporting chipset for the Pentium 4, called 840. The performance was dreadful without the extra memory bandwidth, but it did show that Intel’s combination of RDRAM with a quad-pumped FSB worked – the Pentium 4 really needed to be paired with fast memory.

The magic bullet finally came with Intel’s 845 chipset, which supported DDR memory running at 133MHz (266MHz effective). DDR memory was significantly cheaper than RDRAM, and it provided a very sensible compromise over Rambus in terms of bang per buck.

The last gasp for Rambus came with the later 850E chipset, which could run RDRAM at an effective frequency of 1066MHz, but had no native support for USB 2 and was pretty much dead in the water on launch. Intel then put all its work into supporting DDR, with the E7205 ‘Granite Bay’ chipset supporting dual-channel memory, where two memory sticks interleave to create more bandwidth.

Then, the later 865 and 875 chipsets went on to support 200MHz (400MHz effective) DDR memory in dual-channel mode, which went well with Intel’s latest Pentium 4C chips, which had an 800MHz FSB. RIP, RDRAM.

Pentium Extreme Edition

Intel introduced Extreme Editions for enthusiasts, with high clock speeds

The 64-bit question

Intel had dropped the ball with the Pentium 4, and it had taken several years to get to the point where it had affordable memory and decent performance. That would be fine if AMD had been resting on its laurels, but in late 2003 AMD unleashed its AMD64 architecture, resulting in its famous Athlon 64 desktop CPUs.

The headline was 64-bit computing. However, it’s worth remembering that, at this time, there was no 64-bit version of Windows XP, and that it would take a good few years before 64-bit Windows became a standard. AMD had also removed the front side bus from its CPU design, introducing an integrated, on-die memory controller, which reduced latency. The first Socket 754 Athlon 64 CPUs only supported single-channel memory, but the Socket 939 CPUs in 2004 supported dual-channel memory.

While the integrated memory controller and 64-bit instructions are often touted as the benefits of the Athlon 64 over NetBurst, the main difference when it came to performance was that AMD64 had a much shorter pipeline, with just 12 stages. Intel would later add support for 64-bit instructions to some of its Pentium 4 CPUs, but AMD64 CPUs were just massively more efficient in terms of instructions per clock, thermals and power consumption.

Later revisions

There were several iterations of Pentium 4 over the years, starting with a die shrink from the 180nm Willamette to the 130nm Northwood core, along with a new dinky package called Socket 478. The top-end Northwood Pentium 4 HT CPUs also brought us an idea that we still use today, which is executing more than one thread simultaneously on one core, with the HT standing for Hyper-Threading.

The way Hyper-Threading works has changed a bit since then, but the principle is the same – Windows saw a Pentium 4 HT CPU as two processors, with the CPU splitting its one core’s resources to create a virtual second core. Hardly any desktop software was properly multi-threaded at this point, but it worked well in software optimized for dual-CPU setups, such as Lightwave.

Next came Gallatin, which introduced the Pentium 4 Extreme Edition with Hyper-Threading and a 3.2GHz clock speed for enthusiasts. Until this time, Pentium 4 CPUs had maintained the 20-stage pipeline, but then came Prescott with its 31 stages, fabricated on a 90nm process, and clock speeds of up to 3.8GHz. By this time, the thermal and power demands of Pentium 4 were looking utterly ridiculous.

Intel Pentium M die shot

The Pentium-M mixed the good ideas from the Pentium 4 with the efficiency of the P6 core, plus a massive load of cache

NetBurst, ahem, bursts

By the time Prescott was launched, the power and thermal demands of NetBurst had already made it redundant in the laptop world. Intel had introduced the Pentium 4-M, but its thermal demands had resulted in thicker laptops and slow performance. Not surprisingly, the first part of Intel to backtrack on NetBurst was its mobile division in 2003, with the introduction of the Pentium-M.

It was fabricated on a 130nm process, and took some of the good ideas from Pentium 4, such as the front side bus and improved branch prediction, but attached them to a core based on the P6 core from the Pentium Pro days, with a much shorter pipeline and a massive load of cache.

The result was a really good mobile CPU, with great performance and decent battery life. People started saying that Intel needed to do the same on the desktop, but Intel had sunk so much investment into NetBurst on the desktop at this point, and it doggedly stuck by NetBurst 4 for a few more years, bringing it into the dual-core era and dropping the ‘4’ from the end of the Pentium brand.

It wasn’t until 2006 that Intel finally threw in the towel on NetBurst. There were rumors flying around the Internet on the day before it happened. I phoned an Intel PR rep the next morning and asked him if there was any truth in them, and he laughed at the very idea of it. A few hours later he called me back, sheepishly confirming that Intel was indeed about to completely overhaul its desktop strategy.

NetBurst was in the bin, and so was the Pentium brand as a sign of premium quality. Intel’s next desktop CPUs would feature the Core 2 brand, and would again be based on the P6 core Intel had abandoned years earlier. We’d learned that there was more to CPUs than clock frequency, and that you don’t necessarily have to reinvent the wheel to get ahead in tech.

Thankfully, Intel’s latest CPUs are much better than the Pentium 4. If you’re looking to buy a new processor, then make sure you check out our guide to the best gaming CPU, where we take you through all the best options at a range of prices. One of our favorites from Intel’s current lineup is the Intel Core i5-13600K, which is cheap, fast, and overclockable too.

We hope you’ve enjoyed this personal retrospective about the Pentium 4 and NetBurst architecture. For more articles about the PC’s vintage history, check out our Retro tech page, as well as our writeup of the first Intel x86 CPU, the Intel 8086.