Deprecated: Function get_magic_quotes_gpc() is deprecated in /srv/BOINC/live-webcode/html/inc/util.inc on line 640
Project server code update

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Project server code update

Message boards : News : Project server code update
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 17 · Next

AuthorMessage
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 112988 - Posted: 17 Jun 2014, 12:03:04 UTC - in response to Message 112987.  
Last modified: 17 Jun 2014, 12:25:22 UTC

Ta. Looks like fpops estimate isn't too shabby then. But you need to exclude that contribution.


In both the Seti cases, and in the CPU app case here that I've only looked at cursorily, the fpops_est have been what I would call 'reasonable'. I don't think it would be reasonable to ask of project developers to estimate more precisely than what theory tells them (though it's certainly open to debate)

Overnight I've been musing on how to stage/phase/break-up the tests. Since there seems to be evidence of interaction between CPU & GPU scaling, I'd like to start with the coarse CPU scaling first, and prescribe watching all applications for effects when it's engaged.

@Richard, yeah the pfc scale should be compensating handily for any initial SIMD related disagreement there (It's had enough time), but since the scaling swing is in the opposite direction to GPU, and likely below 1 as at seti (which implies magical CPU fairies), I believe the coarse scaling correction there should be the first step in isolation. Supporting effects include the SIMAP non-SIMD app on SIMD aware android client whetstone, as well as Seti's uniformly below 1 pfc_scales despite quite tight theoretically based estimates.

The damping component I'll shift to pass 2. It's important, and has proven treatments available, but I feel that analysis of the impact of the unmitigated [CPU app] coarse scaling error is more important at this stage.

[Edit:] i.e. in the examples given here, median credit awarded should be a bit over the prior fixed credit of 1000, rather than 500-600 , so the SIMD scaling correction of 1.5^2 = 2.25x seems appropriate given the numbers.

For dominant single precision, That'd be 1.5 times for each step in logbase2 of the vector length, so 1.5x for mmx, 2.25x for SSE+, 3.35x for avx256 where the app and host support it), coarse correction will be just fine IMO, as damping will be added in pass 2.

Maximum supported vector length in the app can be regarded as known, client scheduler requests contain the CPU features and OS, so a min() taken between the two.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 112988 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 112989 - Posted: 17 Jun 2014, 13:33:51 UTC - in response to Message 112988.  

@Richard, yeah the pfc scale should be compensating handily for any initial SIMD related disagreement there (It's had enough time), but since the scaling swing is in the opposite direction to GPU, and likely below 1 as at seti (which implies magical CPU fairies), I believe the coarse scaling correction there should be the first step in isolation. Supporting effects include the SIMAP non-SIMD app on SIMD aware android client whetstone, as well as Seti's uniformly below 1 pfc_scales despite quite tight theoretically based estimates.

Are you sure that the SETI rsc_fpops_est are 'tight'?

I remember that when we were helping Josh Von Korff choose initial runtime estimates for Astropulse, we had a rule-of-thumb that the *stock* MB CPU app reached a DCF - only available scaling factor in those days - of ~0.2 on the then cutting-edge Intel Core2 range (Q6600 and similar). The stock SETI app has had internal despatch of at least some SIMD pathways for a long time, and more have been added over the years.

Knowing Eric's approach to these matters - he's never wanted to exclude anyone from the search for ET, no matter how primitive their hardware - I suspect rsc_fpops_est may have been 'tight' for the mythical cobblestone reference machine (1 GHz FPU only), but never since.

Certainly the same GPU I'm plotting here has a SETI APR of 180, again running two tasks at once. So, even allowing for the inefficient GPU processing of autocorrelations (which I don't think has been reversed into the AR-fpops curve), SETI thinks this card is twice as fast as the stock BOINC code here and at GPUGrid does. At GPUGrid, it runs pretty tight cuda60 code, and since the whole project has revolved around NV and cuda since they dropped the PS3 dead-end, I reckon they know their stuff. They over-pay credit, but that's a manual decision, not BOINC's fault.
ID: 112989 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 112990 - Posted: 17 Jun 2014, 14:14:11 UTC - in response to Message 112989.  
Last modified: 17 Jun 2014, 14:44:37 UTC

Are you sure that the SETI rsc_fpops_est are 'tight'?


Yes, to +/- 10% of actual operations w/o overhead (which is not 'paid' at this point). That's still quite a spread in user and machine utilisation terms, which is the entropy that damped responses should be absorbing (as opposed to estimates). That's still approximately 3.3-30x 'tighter' than the coarse scaling error induced by unnacounted for AVX multiplied by machine utilisation.

Certainly the same GPU I'm plotting here has a SETI APR of 180, again running two tasks at once. So, even allowing for the inefficient GPU processing of autocorrelations (which I don't think has been reversed into the AR-fpops curve), SETI thinks this card is twice as fast as the stock BOINC code here and at GPUGrid does. At GPUGrid, it runs pretty tight cuda60 code, and since the whole project has revolved around NV and cuda since they dropped the PS3 dead-end, I reckon they know their stuff. They over-pay credit, but that's a manual decision, not BOINC's fault.


It's relative. [You use the theoretically best serial algorithm for estimate, as opposed to reflect back implementation ineffeicient or otherwise]... On Seti the GPU autocorrelation uses my own 4NFFT method, so in [uncounted] computation it's 4x...Since that's drowned by latencies to the tune of 60% on high end cards, you roughly double the claim for that portion. Other areas (variable) are more efficient. Then divide the overall claim by about 3.3 times (because of AVX global pfc). The net result is 'shorties' that should be getting ~100 credits, getting ~40-60

A similar effect is happening here, with tasks that should be ~1000, seeing a median of ~500. It's not the project supplied estimates that are out, but the induced scaling error.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 112990 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 112992 - Posted: 17 Jun 2014, 14:46:10 UTC - in response to Message 112990.  

Are you sure that the SETI rsc_fpops_est are 'tight'?

Yes, to +/- 10% of actual operations w/o overhead (which is not 'paid' at this point). That's still quite a spread in user and machine utilisation terms, which is the entropy that damped responses should be absorbing (as opposed to estimates). That's still approximately 3.3-30x 'tighter' than the coarse scaling error induced by unnacounted for AVX multiplied by machine utilisation.

I have to say I'm ... surprised.

For a recently completed task, I can see

<rsc_fpops_est> 14979862651149.264000
<fpops_cumulative> 111064200000000.000000
Flopcounter: 38995606754768.391000

Because I'm running Anon. Plat., the first is scaled to allow the client to show a decent runtime estimate using it's internal reference speed. It seems to be using <flops> 9235454731.586426, although the APR on that machine is 122.47 GFLOPS.

The second and third differ by Eric's imfamous 'credit multiplier' of x2.85, of course.

I'll have to load up a machine with a stock app sometime and try that one again.
ID: 112992 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 112993 - Posted: 17 Jun 2014, 14:48:23 UTC - in response to Message 112992.  
Last modified: 17 Jun 2014, 14:51:19 UTC

Are you sure that the SETI rsc_fpops_est are 'tight'?

Yes, to +/- 10% of actual operations w/o overhead (which is not 'paid' at this point). That's still quite a spread in user and machine utilisation terms, which is the entropy that damped responses should be absorbing (as opposed to estimates). That's still approximately 3.3-30x 'tighter' than the coarse scaling error induced by unnacounted for AVX multiplied by machine utilisation.

I have to say I'm ... surprised.

For a recently completed task, I can see

<rsc_fpops_est> 14979862651149.264000
<fpops_cumulative> 111064200000000.000000
Flopcounter: 38995606754768.391000

Because I'm running Anon. Plat., the first is scaled to allow the client to show a decent runtime estimate using it's internal reference speed. It seems to be using <flops> 9235454731.586426, although the APR on that machine is 122.47 GFLOPS.

The second and third differ by Eric's imfamous 'credit multiplier' of x2.85, of course.

I'll have to load up a machine with a stock app sometime and try that one again.



'That' <rsc_fpops_est> has already been downscaled. 2.85x would indeed be a great compromise if on average most of the hosts were SSE+ (~2.25x) equipped with about a third taking up AVX (3.375x).
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 112993 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 112994 - Posted: 17 Jun 2014, 14:50:34 UTC - in response to Message 112956.  

Attached a new host to Albert, looking through the logs i keep getting the following download error:

14-Jun-2014 06:06:34 [Albert@Home] Giving up on download of EatH_mastercat_1344952579.txt: permanent HTTP error


Was related to the web code update. Shouldn't have affected crunching. Should be fixed now.

BM
ID: 112994 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 112995 - Posted: 17 Jun 2014, 14:56:21 UTC - in response to Message 112993.  

'That' <rsc_fpops_est> has already been downscaled. 2.85x would indeed be a great compromise if on average most of the hosts were SSE+ (~2.25x) equipped with about a third taking up AVX (3.375x).

Yes, I said I was running anon. plat. I'll have to get a raw one on a stock machine.

seeing a median of ~500

?? My own machine - attached after the bulk of the averaging has been done by the population at large - is showing a median of 1716.32 currently. That's climbed from 1644.33 at last night's show.
ID: 112995 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 112996 - Posted: 17 Jun 2014, 15:07:36 UTC - in response to Message 112995.  
Last modified: 17 Jun 2014, 15:17:10 UTC

'That' <rsc_fpops_est> has already been downscaled. 2.85x would indeed be a great compromise if on average most of the hosts were SSE+ (~2.25x) equipped with about a third taking up AVX (3.375x).

Yes, I said I was running anon. plat. I'll have to get a raw one on a stock machine.

seeing a median of ~500

?? My own machine - attached after the bulk of the averaging has been done by the population at large - is showing a median of 1716.32 currently. That's climbed from 1644.33 at last night's show.


i.e. median on *my* GPU here early in the piece, which has been running one task at a time, from the figures you posted earlier at http://albert.phys.uwm.edu/forum_thread.php?id=9017&postid=112911

We know that after that it's been going up, It's not clear yet IMO whether that's a controlled rise/correction or another instability.

[There's two scales fighting, the host app version driving it upward, and the global PFC scale selected from the underclaiming CPU app driving it down, which one wins is a matter of numbers]
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 112996 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 112997 - Posted: 17 Jun 2014, 15:17:31 UTC - in response to Message 112996.  
Last modified: 17 Jun 2014, 15:20:50 UTC

i.e. median on *my* GPU here early in the piece, which has been running one task at a time, from the figures you posted earlier at http://albert.phys.uwm.edu/forum_thread.php?id=9017&postid=112911

We know that after that it's been going up, It's not clear yet IMO whether that's a controlled rise/correction or another instability.

Still rising inexorably here - this is roughly the last three days (right margin is midnight UTC tonight). Horizontal lines are 1K and 10K, still logarithmic. My minimum (of 60) is 1355



Edit - to your edit: there's no CPU app here - that was Eric's point.

'Binary Radio Pulsar Search' and 'Binary Radio Pulsar Search (Arecibo, GPU)' are deployed as different Applications on this project, not just different app_versions.
ID: 112997 · Report as offensive     Reply Quote
Eyrie

Send message
Joined: 20 Feb 14
Posts: 47
Credit: 2,410
RAC: 0
Message 112998 - Posted: 17 Jun 2014, 15:23:49 UTC - in response to Message 112993.  

Are you sure that the SETI rsc_fpops_est are 'tight'?

Yes, to +/- 10% of actual operations w/o overhead (which is not 'paid' at this point). That's still quite a spread in user and machine utilisation terms, which is the entropy that damped responses should be absorbing (as opposed to estimates). That's still approximately 3.3-30x 'tighter' than the coarse scaling error induced by unnacounted for AVX multiplied by machine utilisation.

I have to say I'm ... surprised.

For a recently completed task, I can see

<rsc_fpops_est> 14979862651149.264000
<fpops_cumulative> 111064200000000.000000
Flopcounter: 38995606754768.391000

Because I'm running Anon. Plat., the first is scaled to allow the client to show a decent runtime estimate using it's internal reference speed. It seems to be using <flops> 9235454731.586426, although the APR on that machine is 122.47 GFLOPS.

The second and third differ by Eric's imfamous 'credit multiplier' of x2.85, of course.

I'll have to load up a machine with a stock app sometime and try that one again.



'That' <rsc_fpops_est> has already been downscaled. 2.85x would indeed be a great compromise if on average most of the hosts were SSE+ (~2.25x) equipped with about a third taking up AVX (3.375x).

CPU flops under anon are [unless supplied in app_info.xml] raw whetstone for CPU and some mystical figure for GPU. At some point it was just 10X CPU and I am seeing roughly 10x for Eve.

rsc_fpops_est for anon is a combination of APR and rsc_fpops_est iirc.
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
ID: 112998 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 112999 - Posted: 17 Jun 2014, 15:24:09 UTC - in response to Message 112997.  

Still rising inexorably here - this is roughly the last three days (right margin is midnight UTC tonight). Horizontal lines are 1K and 10K, still logarithmic. My minimum (of 60) is 1355


No certainty in how that will react to correcting the CPU app scale as patch one. In an ideal world it wouldn't react at all (though I firmly believe it will react visibly, I'm open to surprises.). That's what I want to watch, because I suspect it should bump up a bit further, then level, drop and oscillate (for subsequent smoothing in patch two). Anyway, with such steady work it should have stabilised by now and hasn't so rolling up my sleeves for the first pass (CPU coarse scale correction)
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 112999 · Report as offensive     Reply Quote
Eyrie

Send message
Joined: 20 Feb 14
Posts: 47
Credit: 2,410
RAC: 0
Message 113000 - Posted: 17 Jun 2014, 15:25:54 UTC - in response to Message 112996.  
Last modified: 17 Jun 2014, 15:29:19 UTC


[There's two scales fighting, the host app version driving it upward, and the global PFC scale selected from the underclaiming CPU app driving it down, which one wins is a matter of numbers]

There is no underclaiming CPU app!

edit: what Richard said... global pfc are for GPU apps only - but there are several of those according to applications
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
ID: 113000 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 113001 - Posted: 17 Jun 2014, 15:26:59 UTC - in response to Message 112998.  
Last modified: 17 Jun 2014, 15:28:12 UTC

CPU flops under anon are [unless supplied in app_info.xml] raw whetstone for CPU and some mystical figure for GPU. At some point it was just 10X CPU and I am seeing roughly 10x for Eve.

rsc_fpops_est for anon is a combination of APR and rsc_fpops_est iirc.


Yep, that's pretty much the way I remember it ( spaghetti, including assorted mysticism and voodoo :P). Once we get the cpu-misscalingout of the picture, pass 2 might either be focussed on smoothing/damping, GPU scale correction, or both. In any of those cases, I'll let what happens with CPU guide.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 113001 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 113002 - Posted: 17 Jun 2014, 15:39:20 UTC - in response to Message 113000.  
Last modified: 17 Jun 2014, 15:51:04 UTC

... global pfc are for GPU apps only


whatever the web pages call them (probably not the same thing), in code the global pfc_scale for a suite of applications is the lowest claiming application for given tasks (by unstable averages). That's then used to downscale the estimate of every other application wholesale, and is how the underclaiming SSE-AVX apps are dividing the credit. They are claiming fewer operations than it actually takes to do a task, evidenced by pfc_scales being below 1 ( magical fairy CPU applications doing work for free )

I'm telling you it takes no fewer than nlogn operations to do an FFT ...
- AVX CPU + App tells me it did it in nlogn/3.3 . 'I'm Magic'
- Server stupidly sets pfc_scale to 1/3.3 'This magic app wins'
- Jason says 'hang on a minute, Boinc client uses broken inapplicable non-SIMD whetstone to calculate that... Ya canna defy the laws of physics, there are no CPU fairies or magic... It's using SIMD and doing up to 8 operations per cycle for an average of ~3.3x throughput' (scaling is about 50%)
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 113002 · Report as offensive     Reply Quote
Eyrie

Send message
Joined: 20 Feb 14
Posts: 47
Credit: 2,410
RAC: 0
Message 113004 - Posted: 17 Jun 2014, 15:56:49 UTC - in response to Message 113002.  

... global pfc are for GPU apps only


whatever the web pages call them (probably not the same thing), in code the global pfc_scale for a suite of applications is the lowest claiming application for given tasks (by unstable averages). That's then used to downscale the estimate of every other application wholesale, and is how the underclaiming SSE-AVX apps are dividing the credit. They are claiming fewer operations than it actually takes to do a task, evidenced by pfc_scales being below 1 ( magical fairy CPU applications doing work for free )


that would be whatever _GPU_ application claims lowest!
AVX doesn't come into play there. Don't ask me what happens when there is no CPU version to take the lead. well we see what happens we end up pretty low (compared to the dip CPU apps take as per my 'day zero' analysis. where did O put that? wiki?)

take your pick of opencl_ati for win, mac linux and cuda versions for win mac and linux.
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
ID: 113004 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 113005 - Posted: 17 Jun 2014, 16:03:20 UTC - in response to Message 113004.  
Last modified: 17 Jun 2014, 16:04:58 UTC

... global pfc are for GPU apps only


whatever the web pages call them (probably not the same thing), in code the global pfc_scale for a suite of applications is the lowest claiming application for given tasks (by unstable averages). That's then used to downscale the estimate of every other application wholesale, and is how the underclaiming SSE-AVX apps are dividing the credit. They are claiming fewer operations than it actually takes to do a task, evidenced by pfc_scales being below 1 ( magical fairy CPU applications doing work for free )


that would be whatever _GPU_ application claims lowest!
AVX doesn't come into play there. Don't ask me what happens when there is no CPU version to take the lead. well we see what happens we end up pretty low (compared to the dip CPU apps take as per my 'day zero' analysis. where did O put that? wiki?)

take your pick of opencl_ati for win, mac linux and cuda versions for win mac and linux.


I'm pretty sure the CPU apps will be claiming far fewer operations so be chosen as scale. Here is a bit special because of those aggregate GPU tasks, so how that's weaved in is another question ( i.e. treated as GPU Only? or 16x a CPU task ?).

With the current logic pure GPU only projects would likely grant from 3.3x-100x the credit, and initial estimates not be all that bad.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 113005 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113006 - Posted: 17 Jun 2014, 16:03:39 UTC - in response to Message 113002.  

... global pfc are for GPU apps only

whatever the web pages call them (probably not the same thing), in code the global pfc_scale for a suite of applications is the lowest claiming application for given tasks (by unstable averages). That's then used to downscale the estimate of every other application wholesale, and is how the underclaiming SSE-AVX apps are dividing the credit. They are claiming fewer operations than it actually takes to do a task, evidenced by pfc_scales being below 1 ( magical fairy CPU applications doing work for free )

Well, you're the one who's been walking the code, but let's standardise on one set of terminology.

Using *BOINC* terminology, the hierarchy is

Project
|__> Application
     |__> App_version (each separate binary executable)


The 'underclaiming SSE-AVX' *executables* are a suite within the BRP4 (CPU only) Application. That is different from, and will have a (if I understand you correctly) *different* pfc_scale from the suite of BRP4G (GPU only)

According to the CreditNew whitepaper, normalisation is applied across versions of each application, not across the project as a whole. So, in the specific case at point (BRP4 and BRP4G), the GPU app will not be scaled by CPU concerns, because there isn't a CPU executable within the GPU suite.
ID: 113006 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 113007 - Posted: 17 Jun 2014, 16:08:06 UTC - in response to Message 113006.  
Last modified: 17 Jun 2014, 16:09:13 UTC

... global pfc are for GPU apps only

whatever the web pages call them (probably not the same thing), in code the global pfc_scale for a suite of applications is the lowest claiming application for given tasks (by unstable averages). That's then used to downscale the estimate of every other application wholesale, and is how the underclaiming SSE-AVX apps are dividing the credit. They are claiming fewer operations than it actually takes to do a task, evidenced by pfc_scales being below 1 ( magical fairy CPU applications doing work for free )

Well, you're the one who's been walking the code, but let's standardise on one set of terminology.

Using *BOINC* terminology, the hierarchy is

Project
|__> Application
     |__> App_version (each separate binary executable)


The 'underclaiming SSE-AVX' *executables* are a suite within the BRP4 (CPU only) Application. That is different from, and will have a (if I understand you correctly) *different* pfc_scale from the suite of BRP4G (GPU only)

According to the CreditNew whitepaper, normalisation is applied across versions of each application, not across the project as a whole. So, in the specific case at point (BRP4 and BRP4G), the GPU app will not be scaled by CPU concerns, because there isn't a CPU executable within the GPU suite.


That's where it depends on how they hooked in those *4G and *5G aggregates of 16 tasks. If the estimate is standalone, then there is no visible reason it should give us 3 second estimates for hour long tasks. If it is hooked in via a multiple of the CPU apps PFC_Scale &/or CPU app estimate, then that would explain it.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 113007 · Report as offensive     Reply Quote
Eyrie

Send message
Joined: 20 Feb 14
Posts: 47
Credit: 2,410
RAC: 0
Message 113008 - Posted: 17 Jun 2014, 16:09:11 UTC - in response to Message 113005.  

... global pfc are for GPU apps only


whatever the web pages call them (probably not the same thing), in code the global pfc_scale for a suite of applications is the lowest claiming application for given tasks (by unstable averages). That's then used to downscale the estimate of every other application wholesale, and is how the underclaiming SSE-AVX apps are dividing the credit. They are claiming fewer operations than it actually takes to do a task, evidenced by pfc_scales being below 1 ( magical fairy CPU applications doing work for free )


that would be whatever _GPU_ application claims lowest!
AVX doesn't come into play there. Don't ask me what happens when there is no CPU version to take the lead. well we see what happens we end up pretty low (compared to the dip CPU apps take as per my 'day zero' analysis. where did O put that? wiki?)

take your pick of opencl_ati for win, mac linux and cuda versions for win mac and linux.


I'm pretty sure the CPU apps will be claiming far fewer operations so be chosen as scale. Here is a bit special because of those aggregate GPU tasks, so how that's weaved in is another question ( i.e. treated as GPU Only? or 16x a CPU task ?).

With the current logic pure GPU only projects would likely grant from 3.3x-100x the credit, and initial estimates not be all that bad.


JASON, CPU and GPU have different APPS here!

there is a BRP app and a BRPG app. thta's like MB and AP on SETI...

or are you saying pfc for MB and AP are the same? :P
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
ID: 113008 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 113009 - Posted: 17 Jun 2014, 16:10:35 UTC - in response to Message 113008.  
Last modified: 17 Jun 2014, 16:11:29 UTC

JASON, CPU and GPU have different APPS here!

there is a BRP app and a BRPG app. thta's like MB and AP on SETI...

or are you saying pfc for MB and AP are the same? :P


No, and don't yell at me please ;)

I am saying that the BRP4G and 5G estimates come from CPU app estimates multiplied by 16 ---> different app & hardware, but estimates are linked.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 113009 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 17 · Next

Message boards : News : Project server code update



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration