Project server code update

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Author	Message
Claggy Send message Joined: 29 Dec 06 Posts: 78 Credit: 4,040,969 RAC: 0	Message 113219 - Posted: 2 Jul 2014, 23:27:23 UTC - in response to Message 113217. Yea, I've got something similar, 13.01 cr for 150 minutes of HD7770 work: https://albert.phys.uwm.edu/workunit.php?wuid=619367 Claggy ID: 113219 · Reply Quote

juan BFB Send message Joined: 10 Dec 12 Posts: 8 Credit: 1,674,320 RAC: 0	Message 113220 - Posted: 3 Jul 2014, 0:39:02 UTC Last modified: 3 Jul 2014, 0:41:49 UTC Ok. I will start one host at a time to see what happening, that will going to take some days since the caches are allready loaded. ID: 113220 · Reply Quote

Snow Crash Send message Joined: 11 Aug 13 Posts: 10 Credit: 5,011,603 RAC: 0	Message 113221 - Posted: 3 Jul 2014, 1:37:49 UTC Last modified: 3 Jul 2014, 1:38:22 UTC July 3, 29, 2014 04:00 UTC (switched to BRP5) https://albert.phys.uwm.edu/show_host_detail.php?hostid=9649 BRP5 2x using 1 cpu thread each (app_config), GPU utilization = 92% running an additional 4x Skynet POGs cpu WUs GPU 7950 mem=1325, gpu=1150, pcie v2 x16 OS Win7 x64 Home Premium CPU 980X running at 3.41 GHz with HT off MEM Triple channel 1600 (7.7.7.20.2) ID: 113221 · Reply Quote

juan BFB Send message Joined: 10 Dec 12 Posts: 8 Credit: 1,674,320 RAC: 0	Message 113222 - Posted: 3 Jul 2014, 8:01:31 UTC - in response to Message 113217. Last modified: 3 Jul 2014, 8:02:21 UTC Well, here's the first conundrum: All Binary Radio Pulsar Search (Perseus Arm Survey) tasks for computer 5367 After 200 minutes of solid GTX 670 work on Perseus, I earn the princely sum of ... 15 credits! Allmost the same 15 cr for 10k to 20k secs of running time with a 690. ThatÂ´s i could call a "credit deflation" https://albert.phys.uwm.edu/results.php?hostid=10352&offset=0&show_names=0&state=4&appid=27 ID: 113222 · Reply Quote

jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0	Message 113223 - Posted: 3 Jul 2014, 13:27:48 UTC Yeah, looks a lot like the sortof discrepancies I see in simulations. Will definitely be worth putting a 1.4 app onramp into the spreadsheets, to see how well the models reflect reality On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage ID: 113223 · Reply Quote

Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0	Message 113225 - Posted: 3 Jul 2014, 22:38:23 UTC - in response to Message 113223. Most of the BRP5 'Perseus Arm' tasks I've seen so far have old WUs which have been lying around in the database for some time, with multiple failures - not sure whether anybody has looked to see if that affects the credit granting process - even if only by the averages shifting between initial creation and final validation (I don't think so, because I don't think anything about the prevailing averages are stored into the task record when it's created from the WU - but I haven't looked at the database schema or the code). But I've just validated the first 'clean', two replications only case: WU 625789 For 12.62 credits. ID: 113225 · Reply Quote

juan BFB Send message Joined: 10 Dec 12 Posts: 8 Credit: 1,674,320 RAC: 0	Message 113226 - Posted: 4 Jul 2014, 1:37:23 UTC Last modified: 4 Jul 2014, 1:38:07 UTC Richard The WU you talk about was validated against one of my host with a 670 too. Something calls my atention, the crunching times, your takes about 12k secs mine 7.5k secs. I run 1 WU at a time and my 670 (EVGA FTW) is powered by an slow I5 vs your powerfull I7. Can you tell me why the time diference since both GPUÂ´s are relative similars? BTW The 12,62 credits received are realy amazing. :) ID: 113226 · Reply Quote

Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0	Message 113227 - Posted: 4 Jul 2014, 6:46:08 UTC - in response to Message 113226. Richard The WU you talk about was validated against one of my host with a 670 too. Something calls my atention, the crunching times, your takes about 12k secs mine 7.5k secs. I run 1 WU at a time and my 670 (EVGA FTW) is powered by an slow I5 vs your powerfull I7. Can you tell me why the time diference since both GPUÂ´s are relative similars? BTW The 12,62 credits received are realy amazing. :) That seems simple - I'm running two at a time, so effective throughput would be one task every 6k seconds (on your figures - I haven't looked at the data for BRP5 in any detail yet). The efficiency gain from running two together is probably more significant than the i5/i7 difference. ID: 113227 · Reply Quote

juan BFB Send message Joined: 10 Dec 12 Posts: 8 Credit: 1,674,320 RAC: 0	Message 113228 - Posted: 4 Jul 2014, 10:02:32 UTC Last modified: 4 Jul 2014, 10:03:14 UTC Thanks, yes thas easely explain the crunching time diferences. Seems like i missunderstood something again. I have the ideia we where asked for the test period to run 1 WU at a time to avoid any noise from one task transfered to the other. ID: 113228 · Reply Quote

Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0	Message 113229 - Posted: 4 Jul 2014, 13:00:04 UTC - in response to Message 113228. Thanks, yes thas easely explain the crunching time diferences. Seems like i missunderstood something again. I have the ideia we where asked for the test period to run 1 WU at a time to avoid any noise from one task transfered to the other. Sorry about that. We've all been pretty much making it up as we go along. I think I made that choice some time before somebody else posted the "one at a time" suggestion: I decided it was better to keep "steady as she goes" - there would be more noise in the results if you keep changing the utilisation factor. Most of the time while running Arecibo tasks I got an incredibly stable run time: that counts for more in extended tests, where it's the measured APR that counts, and little (if any) weight is given to the theoretical "peak GFLOPS" the card is capable of. ID: 113229 · Reply Quote

Holmis Send message Joined: 4 Jan 05 Posts: 104 Credit: 2,104,736 RAC: 0	Message 113230 - Posted: 4 Jul 2014, 20:00:18 UTC Last modified: 4 Jul 2014, 20:01:37 UTC Got my 4th validation for the v1.40 BRP5 app in earlier today and credits are on the rise, first two got 12,62, the 3rd got 12,73 and the 4th a whopping 15,41! The 12,73 one was against Richard both running v1.40 and the last one an older WU against Snow Crash on v1.39. ID: 113230 · Reply Quote

Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0	Message 113233 - Posted: 5 Jul 2014, 21:52:29 UTC - in response to Message 113230. The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all: [AV#934] (BRP5-cuda32-nv301) adjusting projected flops based on PFC avg: 19.76G According to WU 619924, the figures for v1.39 were rather different. ID: 113233 · Reply Quote

jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0	Message 113234 - Posted: 6 Jul 2014, 6:02:42 UTC - in response to Message 113233. Last modified: 6 Jul 2014, 6:24:07 UTC The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all: [AV#934] (BRP5-cuda32-nv301) adjusting projected flops based on PFC avg: 19.76G According to WU 619924, the figures for v1.39 were rather different. Yeah I see it with 3 app_versions in the same app id, so it'll do its wacky averaging thing [aka 'normalisation', but not], to create a min_avg_pfc. [Edit:] Ugh, a lot more than 3, make that ~22 . Since a number of those older ones are well beyond their 100 samples, this will have ramifications for the codewalking, because nvers thresholds for scaling will be engaged. On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage ID: 113234 · Reply Quote

Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0	Message 113235 - Posted: 6 Jul 2014, 7:15:58 UTC - in response to Message 113234. WU 618702 looks perkier - v1.39/v1.40 cross-validation. ID: 113235 · Reply Quote

jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0	Message 113236 - Posted: 6 Jul 2014, 8:11:07 UTC - in response to Message 113235. WU 618702 looks perkier - v1.39/v1.40 cross-validation. That's certainly more like the credits I expected from the models. I suspect that the cross app normalisation / averaging business may be quite valid/needed for credit purposes. It just royally screws with the time estimates before a new host/app version engages host scaling (which we've been calling onramp periods) Rectifying that will probably need all our walkthrough efforts compared in detail to fill any knowledge gaps, but basically seeing something resembling expected behaviour is a good start. Having no incorrectly scaled CPU app to contend with in the mix means the credit part should be around the right region, even if quite noisy & prone to destabilisation. On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage ID: 113236 · Reply Quote

jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0	Message 113237 - Posted: 6 Jul 2014, 8:11:10 UTC - in response to Message 113235. Last modified: 6 Jul 2014, 8:12:04 UTC ... double post On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage ID: 113237 · Reply Quote

Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0	Message 113246 - Posted: 7 Jul 2014, 7:56:46 UTC - in response to Message 113234. Last modified: 7 Jul 2014, 8:06:21 UTC The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all: [AV#934] (BRP5-cuda32-nv301) adjusting projected flops based on PFC avg: 19.76G According to WU 619924, the figures for v1.39 were rather different. Yeah I see it with 3 app_versions in the same app id, so it'll do its wacky averaging thing [aka 'normalisation', but not], to create a min_avg_pfc. [Edit:] Ugh, a lot more than 3, make that ~22 . Since a number of those older ones are well beyond their 100 samples, this will have ramifications for the codewalking, because nvers thresholds for scaling will be engaged. Oh F*** to be fair we did ask for details to be inherited to new versions, to limit the onramp damage. Probably does the opposite o what would be clever. edit: app_version doesn't get scaled until it has 100 samples, but it may be picking up scaling in other parts. Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. ID: 113246 · Reply Quote

jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0	Message 113247 - Posted: 7 Jul 2014, 8:34:21 UTC - in response to Message 113246. Yeah, cross check of walkthroughs should help. Big problem is at least 16 possible general starting states, multiplied across wingmen for many combinations, I'm going to resist the temptation to model all 256 base combinations, and think in terms of reducing those # of states... for example correct the system in places so that CPU & GPU become considered the same much earlier in the sequence, remove the need for onramps, and perhaps even consider if stock & anon are really different enough to warrant completely separate codepaths as they have in places. On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage ID: 113247 · Reply Quote

Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0	Message 113250 - Posted: 11 Jul 2014, 6:58:09 UTC Last modified: 11 Jul 2014, 7:24:05 UTC We've just got a fresh release of FGRP to version 1.12. Apps are identical to 1.11. This _should_ solve the time limit exceeded problem, but more bugs may be lurking. edit: you may have to opt in for the app. edit2: To be more precise, you may have to allow both beta apps and FGRP. Anybody runs into further -197 time limit exceeded errors with FGRP [or any other app] please report ASAP. Please always include host ID - we can glean most variables from database dumps now, but if you can also state your peak_flops (from BOINC startup messages) that would be very helpful. We have more or less finished analysis and are contemplating how we can best address any issues that we established as problem areas from the live run. You can only do so much from the theory [i.e. code reading] you always need the actual data too, to get a complete picture. Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. ID: 113250 · Reply Quote

Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0	Message 113252 - Posted: 11 Jul 2014, 8:39:39 UTC - in response to Message 113250. Please cross-refer to thread 'Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED' in the 'Problems and bug reports' area before carrying out the tests that Eyrie requested. ID: 113252 · Reply Quote