Discussion Forums > Technology

Your view on AMD's Bulldozer

<< < (31/43) > >>

kureshii:

--- Quote from: kureshii on October 16, 2011, 03:47:26 PM ---It’s late and I’m tired, so comparisons with i5-vs-i7 will have to wait.
--- End quote ---
Screw sleeping.

Hyperthreading gain: calculated as ratio of i7-2600K to i5-2500K performance.
Module +1 core gain: calculated as ratio of FX-8150P 4CU/8C to 4CU/4C performance.

wPrime 32M*
Hyperthreading gain: 10.7 ÷ 7.222 = 48.2% increase
Module +1 core gain: 13.814 ÷ 9.531 = 44.9% increase

3DMark06
Hyperthreading gain: 6758 ÷ 6043 = 11.8% gain
Module +1 core gain: 5803 ÷ 4413 = 31.5% gain

3DMark Vantage
Hyperthreading gain: 23753 ÷ 17503 = 35.7% gain
Module +1 core gain: 19215 ÷ 12102 = 58.8% gain

3DMark 11 Physics
Hyperthreading gain: 8277 ÷ 6422 = 28.9% gain
Module +1 core gain: 6340 ÷ 4289 = 47.8% gain

Cinebench R10
Hyperthreading gain: 22875 ÷ 20381 = 12.2% gain
Module +1 core gain: 20592 ÷ 15033 = 37.0% gain

Cinebench R11.5*
Hyperthreading gain: 6.88 ÷ 5.47 = 25.8% gain
Module +1 core gain: 6 ÷ 3.8 = 57.9% gain

x264 HD
Hyperthreading gain: 36.3 ÷ 27.7 = 31.0% gain
Module +1 core gain: 37.23 ÷ 25.18 = 47.9% gain

Blender*^
Hyperthreading gain: 46.1 ÷ 40.1 = 15.0% gain
Module +1 core gain: 9.76 ÷ 7.16 = 36.3% gain

WinRAR^
Hyperthreading gain: 70.5 ÷ 59.6 = 18.3% gain
Module +1 core gain: 4467 ÷ 30.27 = 47.6% gain

* Numbers involved are small, margin of error is higher
^ Measurements seem to be different, or are in different units. Be careful when interpreting, since measured factor could be qualitatively different.

Attempt at analysis after some shuteye.

TMRNetShark:

--- Quote from: kureshii on October 16, 2011, 03:47:26 PM ---
--- Quote from: per on October 16, 2011, 03:11:38 PM ---And in our tests intels hyperthreading actually is _more_ efficient at boosting performance than AMD:s "true core":s. That is the most suprising thing to me.

--- End quote ---
How was that test conducted? Do you have a link? I really am quite curious, because AMD’s been hyping their Bulldozer Module as a better-than-hyperthreading technology—on paper, of course. But like we’ve seen with the Pentium 4, theories don’t always work out as expected.

Google turns up no comparison of the sort, but looking at this image gives me an idea for how to go about doing this:

(click to show/hide)
Note that the point here is not to see how a quad-core, quad-module Bulldozer benchmarks against an i5, but to see how much of a performance gain hyperthreading gains for Sandy Bridge in various workloads, vs adding a second integer core in each Bulldozer module.

What use is this? It could help isolate Bulldozer’s bottlenecks in various workloads, particularly the single-threaded/lightly-threaded workloads. Those who have been convincingly bought over by AMD’s octo-core marketing might suddenly remember that the only truly separated resources in each module are the integer cores; both cores in each module still share the same fetch/decode units, as well as L2. The high latency of each cache, though necessary for AMD to achieve those high clock speeds, could adversely affect some memory-heavy benchmarks. The branch predictor is another possible bottleneck; Anand hypothesised this from the AIDA64 Queens benchmark, and it could potentially bottleneck other branch-heavy workloads as well.

My current thought is that the decode units might perhaps not be able to keep up both cores fully fed for light-compute workloads (where threads do not spend a lot of time in the integer execution units). Anandtech’s review has a comparison of decode capabilities of Thuban, FX and SB. This does not directly translate to raw instruction decode speed, but results from this test could help examine the results/implications of this change in decode resources, which could explain some of the discrepancies between Thuban and FX performance.

Add to this AMD’s claim that Windows 7’s scheduler is not optimised for a “non-uniform core architecture”, and you should be able to see how this experiment could lead to interesting results.

[edit] Some interesting results: http://www.xtremesystems.org/forums/showthread.php?275873-AMD-FX-quot-Bulldozer-quot-Review-%284%29-!exclusive!-Excuse-for-1-Threaded-Perf

It’s late and I’m tired, so comparisons with i5-vs-i7 will have to wait.

--- End quote ---

Is that what ASUS's BIOS look like? :O

Anyways, I read those results in your edit link... interesting to say the least. So instead of the X6 having an L2 cache for each core... they have 2 cores sharing the SAME L2 cache? The Bulldozer really just add 2 cores, combined cores and L2 caches, and split the L3 cache in to 4. It seems that all 8-cores makes a difference in those benchmarks though.

kitamesume:
thats not good... the drop from performance from 4modules to 2modules is too linear, that would imply that a 2modules is exactly half of what the 4modules can do which puts it below i3 performance, and if you think about it, i3 can perform well under 50watts under load while the linear drop of the bulldozer would point the 2modules running at 50~watts under load.

PS: i3-2100 is approximately 35% slower than i5-2500K
http://www.anandtech.com/bench/Product/288?vs=289

FX-8150 vs i3-2100
http://www.anandtech.com/bench/Product/434?vs=289

mgz:

--- Quote from: TMRNetShark on October 15, 2011, 01:21:17 AM ---
--- Quote from: mgz on October 15, 2011, 12:48:22 AM ---thats only because you are thinking in current terms and physical items.

As processors shrink heat output generally goes down and power consumption goes down while making a more powerful item as that continues to scale it becomes much more feasible to have what is seemingly unthinkable in a very short period of time.
Just read some shit ray kursweil or w/e writes more or less hes a futurist and inventor i dont care if i spelled his name right.
And just expects our technological advances to follow the same growth it has been for some time which means compared how far we went from the calculator power sofa sized computers to now in 50 years. Our components are tens of thousands times faster and more efficient, and so much fucking smaller.

So just slide the scale in your mind and think about that, and then apply that to a concept like integrated graphics and realize that graphics can only get so good with the type of viewing we currently use.

--- End quote ---

I just think there is something wrong with having a dedicated video card chipset on the same dye as the CPU. If it went in that direction, there would only be 8-10 different PC's because they all have the same "APUs". But like you said, the future only means you get more for less space. Then imagine what real dedicated cards will be like. You'll have 48-core GPUs which become the bases of the 1000's of shader/vector/whatever the fuck ATI/Nvidia call them... Graphics will be photo realistic and games like that Unreal tech demo would be like how Battlefield will look in 10 years. :P

--- End quote ---
The problem often becomes bottlenecks at some point in time the ability for the dedicated seperate videocard can process the information and do everything faster then it can communicate what its doing to the rest of the system. And around that point they become obsolete. Its like having a person go to your mailbox and get mail for you. At some point your mail just gets handed to you by the mail man and your like well wtf do i have that guy around for.

Not saying that we are near that point yet. But you understand the concept that at some point they will be 100% integrated just because it will be slower for it not to be.

kureshii:

--- Quote from: kitamesume on October 16, 2011, 11:38:48 PM ---thats not good... the drop from performance from 4modules to 2modules is too linear, that would imply that a 2modules is exactly half of what the 4modules can do which puts it below i3 performance, and if you think about it, i3 can perform well under 50watts under load while the linear drop of the bulldozer would point the 2modules running at 50~watts under load.
--- End quote ---
You’re looking at it the wrong way. A linear “drop” in performance is good; it means when you buy 2 more cores (over a dual-core) you’re actually getting “two more cores worth of performance”, and it means you can get an almost linear scaling in performance by adding cores (until you start hitting other bottlenecks in the CPU).

Unfortunately, even in performance scaling, Intel ousts AMD when adding more cores/modules. Just see the difference in performance between an i3 vs an i7 (note that the i3 does not have Turbo, while the i7 turbos up to 3.5GHz with all four cores in use), and compare it with the gain from adding two more modules below.

I’m hesitant to compare the Blender scores, because as I’ve mentioned in my comparison the metric doesn’t seem to be the same. However, even accounting for the clock speed difference between i3 and i7, one can see that Cinebench R10 and x264 performance scales much more linearly with cores going from dual- to quad-core, than going from dual- to quad-module. (Readers might also notice that these two benchmarks are where FX-8150 did best, matching i7-2600K’s performance or coming really close).

Bulldozer does have Turbo as well, but considering that quad-module load pegs it at 3.9GHz while its max Turbo is 4.2GHz, I don’t think that is enough to explain the less-than-linear scaling. If AMD’s plan is to release huge, many-core processors, I sure hope that core scaling improves.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version