Monday November 23 2009
-
-
-
Madden NFL 08
-
Halo 3
-
Gears of War
-
Halo 2
Forum Threads
Story Header

Madden Home: End all arguments: PS3 vs 360

By: Michael Perry - Published November 13, 2006 at 12:38 AM EST - Writer Archive

I wasn’t too clear earlier on the difference between the RSX’s dedicated pixel and vertex shader pipelines compared to the 360s unified shader architecture. The 360 GPU has 48 unified pipelines capable of accepting either pixel or vertex shader operations whereas with the older dedicated pixel and vertex pipeline architecture that RSX uses when you are in a vertex heavy situation most of the 24 pixel pipes go idle instead of helping out with vertex work.

Or on the flip side in a pixel heavy situation those 8 vertex shader pipelines are just idle and don’t help out the pixel pipes (because they aren’t able to), but with the 360’s unified architecture in a vertex heavy situation for example none of the pipes go idle. All 48 unified pipelines are capable of helping with either pixel or vertex shader operations when needed so as a result efficiency is greatly improved and so is overall performance. When pipelines are forced to go idle because they lack the capability to help another set of pipelines accomplish their task it’s detrimental to performance. This inefficient manner is how all current GPUs operate including the PS3's RSX. The pipelines go idle because the pixel pipes aren't able to help the vertex pipes accomplish a task or vice versa. Whats even more impressive about this GPU is it by itself determines the balance of how many pipelines to dedicate to vertex or pixel shader operations at any given time a programmer is NOT needed to handle any of this the GPU takes care of all this itself in the quickest most efficient way possible. 1080p is not a smart resolution to target in any form this generation, but if 360 developers wanted to get serious about 1080p, thanks to Xenos, could actually outperform the ps3 in 1080p. (The less efficient GPU always shows its weaknesses against the competition in higher resolutions so the best way for the rsx to be competitive is to stick to 720P) In vertex shader limited situations the 360’s gpu will literally be 6 times faster than RSX. With a unified shader architecture things are much more efficient than previous architectures allowed (which is extremely important). The 360’s GPU for example is 95-99% efficient with 4XAA enabled. With traditional architecture there are design related roadblocks that prevent such efficiency. To avoid such roadblocks, which held back previous hardware, the 360 GPU design team created a complex system of hardware threading inside the chip itself. In this case, each thread is a program associated with the shader arrays. The Xbox 360 GPU can manage and maintain state information on 64 separate threads in hardware. There's a thread buffer inside the chip, and the GPU can switch between threads instantaneously in order to keep the shader arrays busy at all times.

Want to know why Xenos doesn’t need as much raw horsepower to outperform say something like the x1900xtx or the 7900GTX? It makes up for not having as much raw horsepower by actually being efficient enough to fully achieve its advertised performance numbers which is an impressive feat. The x1900xtx has a peak pixel fillrate of 10.4Gigasamples a second while the 7900GTX has a peak pixel fillrate of 15.6Gigasamples a second. Neither of them is actually able to achieve and sustain those peak fillrate performance numbers though due to not being efficient enough, but they get away with it in this case since they can also bank on all the raw power. The performance winner between the 7900GTX and the X1900XTX is actually the X1900XTX despite a lower pixel fillrate (especially in higher resolutions) because it has twice as many pixel pipes and is the more efficient of the 2. It’s just a testament as to how important efficiency is. Well how exactly can the mere 360 GPU stand up to both of those with only a 128 bit memory interface and 500MHZ? Well the 360 GPU with 4XFSAA enabled achieves AND sustains its peak fillrate of 16Gigasamples per second which is achieved by the combination of the unified shader architecture and the excessive amount of bandwidth which gives it the type of efficiency that allows it to outperform GPUs with far more raw horsepower. I guess it also helps that it’s the single most advanced GPU currently available anyway for purchase. Things get even better when you factor in the Xenos’ MEMEXPORT ability which allows it to enable “streamout” which opens the door for Xenos to achieve DX10 class functionality. A shame Microsoft chose to disable Xenos’ other 16 pipelines to improve yields and keep costs down. Not many are even aware that the 360’s GPU has the exact same number of pipelines as ATI’s unreleased R600, but to keep costs down and to make the GPU easier to manufacture, Microsoft chose to disable one of the shader arrays containing 16 pipelines. What MEMEXPORT does is it expands the graphics pipeline in more general purpose and programmable manner.

I’ll borrow a quote from Dave Baumann since he explains it rather well.

“With the capability to fetch from anywhere in memory, perform arbitrary ALU operations and write the results back to memory, in conjunction with the raw floating point performance of the large shader ALU array, the MEMEXPORT facility does have the capability to achieve a wide range of fairly complex and general purpose operations; basically any operation that can be mapped to a wide SIMD array can be fairly efficiently achieved and in comparison to previous graphics pipelines it is achieved in fewer cycles and with lower latencies. For instance, this is probably the first time that general purpose physics calculation would be achievable, with a reasonable degree of success, on a graphics processor and is a big step towards the graphics processor becoming much more like a vector co-processor to the CPU.”

Even with all of this information there is still a lot more about this GPU that ATI just simply isn't revealing and considering they'll be borrowing technology used to design this GPU in their future pc products can you really blame them?

Continued (9/11) »

User Comments

- 150 Comments

» This story has had 150 comments posted since November 13, 2006 at 12:38 AM EST.

Latest Poll