WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!
Project server code update |
Message boards :
News :
Project server code update
Message board moderation
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · Next
Author | Message |
---|---|
Claggy Send message Joined: 29 Dec 06 Posts: 78 Credit: 4,040,969 RAC: 0 |
Yea, I've got something similar, 13.01 cr for 150 minutes of HD7770 work: https://albert.phys.uwm.edu/workunit.php?wuid=619367 Claggy |
juan BFB Send message Joined: 10 Dec 12 Posts: 8 Credit: 1,674,320 RAC: 0 |
Ok. I will start one host at a time to see what happening, that will going to take some days since the caches are allready loaded. |
Snow Crash Send message Joined: 11 Aug 13 Posts: 10 Credit: 5,011,603 RAC: 0 |
July 3, 29, 2014 04:00 UTC (switched to BRP5) https://albert.phys.uwm.edu/show_host_detail.php?hostid=9649 BRP5 2x using 1 cpu thread each (app_config), GPU utilization = 92% running an additional 4x Skynet POGs cpu WUs GPU 7950 mem=1325, gpu=1150, pcie v2 x16 OS Win7 x64 Home Premium CPU 980X running at 3.41 GHz with HT off MEM Triple channel 1600 (7.7.7.20.2) |
juan BFB Send message Joined: 10 Dec 12 Posts: 8 Credit: 1,674,320 RAC: 0 |
Well, here's the first conundrum: Allmost the same 15 cr for 10k to 20k secs of running time with a 690. That´s i could call a "credit deflation" https://albert.phys.uwm.edu/results.php?hostid=10352&offset=0&show_names=0&state=4&appid=27 |
jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0 |
Yeah, looks a lot like the sortof discrepancies I see in simulations. Will definitely be worth putting a 1.4 app onramp into the spreadsheets, to see how well the models reflect reality On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Most of the BRP5 'Perseus Arm' tasks I've seen so far have old WUs which have been lying around in the database for some time, with multiple failures - not sure whether anybody has looked to see if that affects the credit granting process - even if only by the averages shifting between initial creation and final validation (I don't think so, because I don't think anything about the prevailing averages are stored into the task record when it's created from the WU - but I haven't looked at the database schema or the code). But I've just validated the first 'clean', two replications only case: WU 625789 For 12.62 credits. |
juan BFB Send message Joined: 10 Dec 12 Posts: 8 Credit: 1,674,320 RAC: 0 |
Richard The WU you talk about was validated against one of my host with a 670 too. Something calls my atention, the crunching times, your takes about 12k secs mine 7.5k secs. I run 1 WU at a time and my 670 (EVGA FTW) is powered by an slow I5 vs your powerfull I7. Can you tell me why the time diference since both GPU´s are relative similars? BTW The 12,62 credits received are realy amazing. :) |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Richard That seems simple - I'm running two at a time, so effective throughput would be one task every 6k seconds (on your figures - I haven't looked at the data for BRP5 in any detail yet). The efficiency gain from running two together is probably more significant than the i5/i7 difference. |
juan BFB Send message Joined: 10 Dec 12 Posts: 8 Credit: 1,674,320 RAC: 0 |
Thanks, yes thas easely explain the crunching time diferences. Seems like i missunderstood something again. I have the ideia we where asked for the test period to run 1 WU at a time to avoid any noise from one task transfered to the other. |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Thanks, yes thas easely explain the crunching time diferences. Seems like i missunderstood something again. I have the ideia we where asked for the test period to run 1 WU at a time to avoid any noise from one task transfered to the other. Sorry about that. We've all been pretty much making it up as we go along. I think I made that choice some time before somebody else posted the "one at a time" suggestion: I decided it was better to keep "steady as she goes" - there would be more noise in the results if you keep changing the utilisation factor. Most of the time while running Arecibo tasks I got an incredibly stable run time: that counts for more in extended tests, where it's the measured APR that counts, and little (if any) weight is given to the theoretical "peak GFLOPS" the card is capable of. |
Holmis Send message Joined: 4 Jan 05 Posts: 104 Credit: 2,104,736 RAC: 0 |
Got my 4th validation for the v1.40 BRP5 app in earlier today and credits are on the rise, first two got 12,62, the 3rd got 12,73 and the 4th a whopping 15,41! The 12,73 one was against Richard both running v1.40 and the last one an older WU against Snow Crash on v1.39. |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all: [AV#934] (BRP5-cuda32-nv301) adjusting projected flops based on PFC avg: 19.76G According to WU 619924, the figures for v1.39 were rather different. |
jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0 |
The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all: Yeah I see it with 3 app_versions in the same app id, so it'll do its wacky averaging thing [aka 'normalisation', but not], to create a min_avg_pfc. [Edit:] Ugh, a lot more than 3, make that ~22 . Since a number of those older ones are well beyond their 100 samples, this will have ramifications for the codewalking, because nvers thresholds for scaling will be engaged. On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
WU 618702 looks perkier - v1.39/v1.40 cross-validation. |
jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0 |
WU 618702 looks perkier - v1.39/v1.40 cross-validation. That's certainly more like the credits I expected from the models. I suspect that the cross app normalisation / averaging business may be quite valid/needed for credit purposes. It just royally screws with the time estimates before a new host/app version engages host scaling (which we've been calling onramp periods) Rectifying that will probably need all our walkthrough efforts compared in detail to fill any knowledge gaps, but basically seeing something resembling expected behaviour is a good start. Having no incorrectly scaled CPU app to contend with in the mix means the credit part should be around the right region, even if quite noisy & prone to destabilisation. On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage |
jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0 |
... double post On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage |
Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0 |
The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all: Oh F*** to be fair we did ask for details to be inherited to new versions, to limit the onramp damage. Probably does the opposite o what would be clever. edit: app_version doesn't get scaled until it has 100 samples, but it may be picking up scaling in other parts. Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. |
jason_gee Send message Joined: 4 Jun 14 Posts: 109 Credit: 1,043,639 RAC: 0 |
Yeah, cross check of walkthroughs should help. Big problem is at least 16 possible general starting states, multiplied across wingmen for many combinations, I'm going to resist the temptation to model all 256 base combinations, and think in terms of reducing those # of states... for example correct the system in places so that CPU & GPU become considered the same much earlier in the sequence, remove the need for onramps, and perhaps even consider if stock & anon are really different enough to warrant completely separate codepaths as they have in places. On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage |
Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0 |
We've just got a fresh release of FGRP to version 1.12. Apps are identical to 1.11. This _should_ solve the time limit exceeded problem, but more bugs may be lurking. edit: you may have to opt in for the app. edit2: To be more precise, you may have to allow both beta apps and FGRP. Anybody runs into further -197 time limit exceeded errors with FGRP [or any other app] please report ASAP. Please always include host ID - we can glean most variables from database dumps now, but if you can also state your peak_flops (from BOINC startup messages) that would be very helpful. We have more or less finished analysis and are contemplating how we can best address any issues that we established as problem areas from the live run. You can only do so much from the theory [i.e. code reading] you always need the actual data too, to get a complete picture. Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Please cross-refer to thread 'Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED' in the 'Problems and bug reports' area before carrying out the tests that Eyrie requested. |