Project server code update

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Author	Message
Claggy Send message Joined: 29 Dec 06 Posts: 78 Credit: 4,040,969 RAC: 0	Message 112956 - Posted: 14 Jun 2014, 15:56:44 UTC Last modified: 14 Jun 2014, 16:02:35 UTC Attached a new host to Albert, looking through the logs i keep getting the following download error: 14-Jun-2014 06:06:32 [Albert@Home] Started download of eah_slide_05.png 14-Jun-2014 06:06:32 [Albert@Home] Started download of eah_slide_07.png 14-Jun-2014 06:06:32 [Albert@Home] Started download of eah_slide_08.png 14-Jun-2014 06:06:33 [Albert@Home] Finished download of eah_slide_07.png 14-Jun-2014 06:06:33 [Albert@Home] Started download of EatH_mastercat_1344952579.txt 14-Jun-2014 06:06:34 [Albert@Home] Finished download of eah_slide_05.png 14-Jun-2014 06:06:34 [Albert@Home] Finished download of eah_slide_08.png 14-Jun-2014 06:06:34 [Albert@Home] Giving up on download of EatH_mastercat_1344952579.txt: permanent HTTP error On this new host (as well as on my HD7770) i'm still getting the very short estimates for Perseus Arm Survey GPU tasks, so i've added two zero's to the rsc_fpops values so they'll complete. Computer 11441 Claggy ID: 112956 · Reply Quote

Snow Crash Send message Joined: 11 Aug 13 Posts: 10 Credit: 5,011,603 RAC: 0	Message 112965 - Posted: 16 Jun 2014, 10:29:34 UTC - in response to Message 112955. Last modified: 16 Jun 2014, 10:39:49 UTC updated details [u]OS BOINC CPU GPU Utilization Factor[/u] Win7 7.2.42 980x 670 0.50 Win7 7.2.42 920 7950 0.33 [u]Win7 7.2.42 4670k 7850 0.50 [/u] ID: 112965 · Reply Quote

Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0	Message 112966 - Posted: 16 Jun 2014, 11:12:41 UTC Last modified: 16 Jun 2014, 11:52:04 UTC The GPU plot continues to thicken. Some of the machines I picked to monitor seem to have dropped out of the running, so I added one of my own yesterday. It seems to have started in line with current trends, but like the others, there seem to be distinct 'upper' and 'lower' credit groupings. Why would that be? I've also noticed something that doesn't show with this style of plotting against reporting time: when I've been infilling late validations, the credit awards - in line with the trendlines - have been much higher than their contemporaries got, and the double population is visible there too. For example, Claggy's WU 606387 from 5 June was awarded over 7K overnight. That should show up in the changing averages: Jason Holmis Claggy Zombie ZombieM Jacob RH Host: 11363 2267 9008 6490 6109 10320 5367 GTX 780 GTX 660 GT 650M TITAN 680MX FX3800M GTX 670 Credit for BRP4G, GPU Maximum 1584.75 1495.38 10951.9 2031.40 11847.5 10951.9 4137.85 Minimum 115.82 88.84 153.90 91.50 94.88 508.73 1355.49 Average 813.32 688.57 4120.83 1074.52 2010.65 2743.09 1917.71 Median 973.50 539.19 3037.38 1122.70 1591.80 1456.42 1641.43 Std Dev 523.51 428.48 3074.79 393.82 1823.08 3177.59 703.78 nSamples 36 52 48 387 189 17 21 (Why does the new web code double-space [ pre ]?) Edit - corrected copy/paste error on application type. ID: 112966 · Reply Quote

Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0	Message 112967 - Posted: 16 Jun 2014, 12:14:08 UTC Interesting. Of course credit is at time of validation not reporting against a moving scale, yadda yadda. So, late validations earn higher. the wingmate is then either a slow host or has a larger cache. the first is in line with expectations iirc, the second would be puzzling. then again if you are observig a chaotic system it's hard to know what may or may not happen anyway... Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. ID: 112967 · Reply Quote

jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0	Message 112968 - Posted: 16 Jun 2014, 12:46:54 UTC - in response to Message 112967. Last modified: 16 Jun 2014, 12:51:59 UTC Interesting. Of course credit is at time of validation not reporting against a moving scale, yadda yadda. So, late validations earn higher. the wingmate is then either a slow host or has a larger cache. the first is in line with expectations iirc, the second would be puzzling. then again if you are observig a chaotic system it's hard to know what may or may not happen anyway... Yes, the two mixed and partially overlapping time domains for the credit mechanism (issue time and validation time) both oscillate & interact. Those two are enough combined to setup a resonance by themselves ( additive & subtractive), but then with the natural variation in the real-world processing thrown in, you introduce another one or more 'bodies', making the system an unpredictable n-body problem. For our purposes the dominant overlap is in those time domains, so sufficient damping to each control point to place them in separate time domains should be enough to break the chaotic behaviour. e.g. global pfc_scale -> minimise the coarse scaling error and vary smoothly over some 100's to 1000's of task validations host (app version) scale -> vary smoothly over some 10's of validations, small enough to respond to hardware or app change in 'reasonable' time (number of tasks) client side (currently either disabled or broken by project DCF not beign per application) - vary smoothly over a few to 10 tasks Once separated in time, then this coarse->fine->finer tuning is difficult to destabilise, even with fairly sudden aggressive change at any point. On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage ID: 112968 · Reply Quote

Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0	Message 112969 - Posted: 16 Jun 2014, 13:23:33 UTC - in response to Message 112968. The few wingmates I've spot-checked against high credit awards (my own and others) don't seem to be outside the normal population. They too show bi-modal distributions, and high credits don't seem to be correlated with varying runtimes. I'm trying to run the GTX 670 as stable as possible, and runtimes are very steady. Which leaves the moving average theory. Two more tasks have both reported and validated since I posted the graph: they both got over 2,000 credits, which suggests the average is still moving upwards. ID: 112969 · Reply Quote

zombie67 [MM] Send message Joined: 10 Oct 06 Posts: 130 Credit: 30,924,459 RAC: 0	Message 112970 - Posted: 16 Jun 2014, 13:30:16 UTC Just FYI, I had to take my two CUDA machines off albert for a while. I need to help a team mate at another project. I will be back. Dublin, California Team: SETI.USA ID: 112970 · Reply Quote

jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0	Message 112971 - Posted: 16 Jun 2014, 13:47:23 UTC - in response to Message 112969. Last modified: 16 Jun 2014, 13:51:58 UTC The few wingmates I've spot-checked against high credit awards (my own and others) don't seem to be outside the normal population. They too show bi-modal distributions, and high credits don't seem to be correlated with varying runtimes. I'm trying to run the GTX 670 as stable as possible, and runtimes are very steady. Which leaves the moving average theory. Two more tasks have both reported and validated since I posted the graph: they both got over 2,000 credits, which suggests the average is still moving upwards. Yes the beauty of testing here has been the relatively consistent runtimes on a given host/gpu, which tends to demonstrate that much of the instability is artificially induced (as opposed to a direct function of the noisy elapsed times). The upward trend would appear to be a drift instability, and bets are off as to whether it continues indefinitely, levels off, or (my guess) starts a downward cycle after some peak. The bi-modal characteristic appears to be that oscillation between saturation and cuttoff, most likely turn-around time related. That'd be predominantly proportional error ('overshoot'). [post scaling by the average of the validated claims isntt going to help that, though it was a valiant attempt over the original choice of taking the minimum claim] On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage ID: 112971 · Reply Quote

Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0	Message 112972 - Posted: 16 Jun 2014, 13:48:18 UTC Heisenberg? My Maths teacher? Chaos theory? Zen? Superstitious pidgeons? In a chaotic system, a trend observed may be genuine or coincidental. And in any case, we shouldn't be putting too much effort into observing the scintillations of a soapbubble we intend to burst. Of course the colourpattern is fascinating indeed... Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. ID: 112972 · Reply Quote

jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0	Message 112973 - Posted: 16 Jun 2014, 13:56:03 UTC - in response to Message 112972. Last modified: 16 Jun 2014, 13:59:27 UTC Heisenberg? My Maths teacher? Chaos theory? Zen? Superstitious pidgeons? In a chaotic system, a trend observed may be genuine or coincidental. And in any case, we shouldn't be putting too much effort into observing the scintillations of a soapbubble we intend to burst. Of course the colourpattern is fascinating indeed... Good. what I'm doing at the moment is drafting goals for the first patch. A lot of that will be involved with confirming/rejecting the particular suspects for the purposes of isolation. I do think the observations, or 'getting a feel' for the character of what it's doing, is important at the moment... and the patterns are pretty to look at... but you're right, I'd like to have the full first pass patch ready for trials by the time Bernd's back on duty... Outline for pass 1: Objectives and scope Resources Procedure Results Discussion of Results Conclusions and further work (plan for pass 2) On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage ID: 112973 · Reply Quote

Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0	Message 112975 - Posted: 16 Jun 2014, 15:14:37 UTC Last modified: 16 Jun 2014, 15:16:34 UTC I've been asked to pull out stats on runtime and turnround, in case they explain anything - I don't think they do. Jason Holmis Claggy Zombie ZombieM Jacob RH Host: 11363 2267 9008 6490 6109 10320 5367 GTX 780 GTX 660 GT 650M TITAN 680MX FX3800M GTX 670 Credit for BRP4G, GPU Maximum 1584.75 1495.38 10952.0 2031.40 11847.5 10952.0 4137.85 Minimum 115.82 88.84 153.90 91.50 94.88 508.73 1355.49 Average 813.32 688.57 4120.83 1074.52 2010.65 2743.09 1890.82 Median 973.50 539.19 3037.38 1122.70 1591.80 1456.42 1644.33 Std Dev 523.51 428.48 3074.79 393.82 1823.08 3177.59 614.08 nSamples 36 52 48 387 189 17 28 Runtime (seconds) Maximum 4401.98 5088.99 11295.0 5383.71 23977.4 11774.3 4138.67 Minimum 3259.10 3294.83 8136.47 1908.26 1512.16 11515.5 4061.45 Average 3668.57 4483.11 8910.76 4194.81 4216.60 11623.3 4102.53 Median 3603.24 4608.05 8837.27 4228.18 4183.74 11603.6 4109.83 Std Dev 314.13 506.27 554.34 555.88 1981.31 70.73 25.21 Turnround (days) Maximum 2.11 3.91 1.70 3.44 2.94 5.57 0.72 Minimum 0.15 0.07 0.14 0.24 1.52 0.18 0.15 Average 1.14 1.97 0.69 2.10 2.32 1.64 0.44 Median 0.73 1.90 0.74 2.00 2.42 0.88 0.48 Std Dev 0.79 0.98 0.35 0.57 0.32 1.68 0.17 Zombie's Mac had just two of those extra-long runtimes: WU 612084, 1,328.76 cr WU 612045, 6,419.08 cr Edit - I updated my own column since this morning, the others are unchanged. ID: 112975 · Reply Quote

Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0	Message 112976 - Posted: 16 Jun 2014, 15:28:43 UTC hmm. maybe. would need to ball eye the code again to get more certainity. smaple size is a bit small too - maybe in a few days. Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. ID: 112976 · Reply Quote

Holmis Send message Joined: 4 Jan 05 Posts: 104 Credit: 2,104,736 RAC: 0	Message 112977 - Posted: 16 Jun 2014, 15:57:48 UTC I'll add that my GTX660Ti is running 2 task at a time mixing BRP4G and BRP5 from Albert and BRP5 from Einstein. The Intel HD4000 is running single tasks. Here's an updated Excel file with data and plots from host 2267 and the following searches: BRP4X64, BRP4G, S6CasA, BRP5 (iGPU) and BRP5 (Nvidia GPU). ID: 112977 · Reply Quote

jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0	Message 112978 - Posted: 16 Jun 2014, 16:40:03 UTC - in response to Message 112975. Last modified: 16 Jun 2014, 16:58:16 UTC ... in case they explain anything - I don't think they do. ... Well explore any possibility for sure. From an engineering perspective, understanding the spoon doing the stirring is probably a good idea [though asking it what it's doing might not reveal much :) ]. Probably we won't try getting rid of the spoon, but instead what it's mixing... fine granulated sugar and some honey, instead of a solid hunk of extra thick molasses. On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage ID: 112978 · Reply Quote

Holmis Send message Joined: 4 Jan 05 Posts: 104 Credit: 2,104,736 RAC: 0	Message 112979 - Posted: 16 Jun 2014, 18:11:02 UTC - in response to Message 112912. Nvidia GPU: <flops>12454544406626.100000</flops> <plan_class>BRP5-cuda32-nv301</plan_class> That's 12454 GFlops or 12,45 TeraFlops! Boinc reports it @ 2985 GFlops peak in the startup messages. And the APR for the BRP4G Nvidia tasks is 58.1 GFlops when running 2 at a time. If the BRP5 app gets the same APR then the initial speed estimate is that the card is 214 times as fast as it actually is!!! A follow up on my post about initial estimates, the 11th BRP5 task has now been validated and the APR has been calculated to 30.16 GFlops when running 2 tasks at a time. So the initial estimate was that the card was a whooping 412,9 times faster than actual! =O ID: 112979 · Reply Quote

jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0	Message 112980 - Posted: 16 Jun 2014, 18:36:08 UTC - in response to Message 112979. Last modified: 16 Jun 2014, 18:40:04 UTC Nvidia GPU: <flops>12454544406626.100000</flops> <plan_class>BRP5-cuda32-nv301</plan_class> That's 12454 GFlops or 12,45 TeraFlops! Boinc reports it @ 2985 GFlops peak in the startup messages. And the APR for the BRP4G Nvidia tasks is 58.1 GFlops when running 2 at a time. If the BRP5 app gets the same APR then the initial speed estimate is that the card is 214 times as fast as it actually is!!! A follow up on my post about initial estimates, the 11th BRP5 task has now been validated and the APR has been calculated to 30.16 GFlops when running 2 tasks at a time. So the initial estimate was that the card was a whooping 412,9 times faster than actual! =O Yeah, for my 780 initial guesstimates were 3 seconds, and came in about an hour, so about a thousand-fold discrepancy. Room for optimisation in the application doesn't account for the whole discrepancy (of course :) ). looks like there'll be some digging along these lines to do. Just some factors to look at with new GPU apps & hosts might include: - CPU app underclaim becomes global scaling reference, we know these leave out SIMD (sse->avx) on raw claims, so look more efficient than they are, because certain numbers come out 'impossible' if the logic were right (e.g. pfc_scale < 1) [perhaps up to 10x discrepancy) - GPU overestimate of performance (attempting to be generous) [could be in the ballpark of another 10x discrepancy] - Actual utilisation variation ( e.g. from people using their machines, multiple tasks per GPU...more...) [ combined perhaps up to another 10x] - room for optimisation of the application(s), or inherent serial limitations [maybe from 1x to very big number discrepancy, let's go with another 10x] So all-told with guesstimates probably 1000x discrepancy is easily plausible, just among known factors, and there may be more... That's completely ignoring the possibility of hard-logic flaws, which are entirely possible, even likely. That all will probably be dependant on some of the customisations and special situations here at Albert, stil lto be examined in context. Thes include if & how some apps are 'pure GPU' or otherwise packages of multiple CPU-like tasks, how they are wired in and scaled. On the surface it looks like there may well be multiple of these things in play. These all seem connected, so I suspect we'll need to make patch one multi-step, so we make switch in one small change at a time, and watch for weird interactions. For example, Fix CPU side coarse scale error then watch GPU estimate &/or credit blow-out in response... LoL On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage ID: 112980 · Reply Quote

Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0	Message 112984 - Posted: 17 Jun 2014, 10:35:30 UTC - in response to Message 112978. ... in case they explain anything - I don't think they do. ... Well explore any possibility for sure. From an engineering perspective, understanding the spoon doing the stirring is probably a good idea [though asking it what it's doing might not reveal much :) ]. Probably we won't try getting rid of the spoon, but instead what it's mixing... fine granulated sugar and some honey, instead of a solid hunk of extra thick molasses. 'There is no spoon' ;) Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. ID: 112984 · Reply Quote

Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0	Message 112985 - Posted: 17 Jun 2014, 10:53:20 UTC - in response to Message 112980. Nvidia GPU: <flops>12454544406626.100000</flops> <plan_class>BRP5-cuda32-nv301</plan_class> That's 12454 GFlops or 12,45 TeraFlops! Boinc reports it @ 2985 GFlops peak in the startup messages. And the APR for the BRP4G Nvidia tasks is 58.1 GFlops when running 2 at a time. If the BRP5 app gets the same APR then the initial speed estimate is that the card is 214 times as fast as it actually is!!! A follow up on my post about initial estimates, the 11th BRP5 task has now been validated and the APR has been calculated to 30.16 GFlops when running 2 tasks at a time. So the initial estimate was that the card was a whooping 412,9 times faster than actual! =O Yeah, for my 780 initial guesstimates were 3 seconds, and came in about an hour, so about a thousand-fold discrepancy. Room for optimisation in the application doesn't account for the whole discrepancy (of course :) ). looks like there'll be some digging along these lines to do. Just some factors to look at with new GPU apps & hosts might include: - CPU app underclaim becomes global scaling reference, we know these leave out SIMD (sse->avx) on raw claims, so look more efficient than they are, because certain numbers come out 'impossible' if the logic were right (e.g. pfc_scale < 1) [perhaps up to 10x discrepancy) - GPU overestimate of performance (attempting to be generous) [could be in the ballpark of another 10x discrepancy] - Actual utilisation variation ( e.g. from people using their machines, multiple tasks per GPU...more...) [ combined perhaps up to another 10x] - room for optimisation of the application(s), or inherent serial limitations [maybe from 1x to very big number discrepancy, let's go with another 10x] So all-told with guesstimates probably 1000x discrepancy is easily plausible, just among known factors, and there may be more... That's completely ignoring the possibility of hard-logic flaws, which are entirely possible, even likely. That all will probably be dependant on some of the customisations and special situations here at Albert, stil lto be examined in context. Thes include if & how some apps are 'pure GPU' or otherwise packages of multiple CPU-like tasks, how they are wired in and scaled. On the surface it looks like there may well be multiple of these things in play. These all seem connected, so I suspect we'll need to make patch one multi-step, so we make switch in one small change at a time, and watch for weird interactions. For example, Fix CPU side coarse scale error then watch GPU estimate &/or credit blow-out in response... LoL Must be picking up extra factors from somewhere - might be due to the fact that (as Eric stated) there is no damping of GPu by CPU figures. But there is one or more massive scaling errors lurking around. Dang. That code is a nightmare to walk. might not all be flops scaling error of course, need to cast an eye over the rsc_fpops_est calculation as well... Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. ID: 112985 · Reply Quote

Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0	Message 112986 - Posted: 17 Jun 2014, 11:22:14 UTC - in response to Message 112985. Last modified: 17 Jun 2014, 11:29:40 UTC might not all be flops scaling error of course, need to cast an eye over the rsc_fpops_est calculation as well... Since I'm not running anonymous platform, rsc_fpops_est isn't being scaled. I'm seeing fpops_est 280,000,000,000,000 FLOPs 69,293,632,242 APR 69.29 GFLOPS Order-of-magnitude sanity check: 280 e12 fpops 70 e9 flops/sec --> 4 e3 runtime estimate. Correct on both server and client, and sane. Edit: identical card in same host has APR of 156.36 at GPUGrid. I'm running one task at a time there, and two at a time here, so I'd say that the speed estimates agree. IOW, runtime estimates are OK once the initial gross mis-estimation has been overcome: this bit of the quest is for credit purposes only. ID: 112986 · Reply Quote

Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0	Message 112987 - Posted: 17 Jun 2014, 11:34:24 UTC - in response to Message 112986. Ta. Looks like fpops estimate isn't too shabby then. But you need to exclude that contribution. Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. ID: 112987 · Reply Quote