Deprecated: Function get_magic_quotes_gpc() is deprecated in /srv/BOINC/live-webcode/html/inc/util.inc on line 640
Project server code update

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Project server code update

Message boards : News : Project server code update
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · Next

AuthorMessage
Claggy

Send message
Joined: 29 Dec 06
Posts: 78
Credit: 4,040,969
RAC: 0
Message 113219 - Posted: 2 Jul 2014, 23:27:23 UTC - in response to Message 113217.  

Yea, I've got something similar, 13.01 cr for 150 minutes of HD7770 work:

https://albert.phys.uwm.edu/workunit.php?wuid=619367

Claggy
ID: 113219 · Report as offensive     Reply Quote
juan BFB

Send message
Joined: 10 Dec 12
Posts: 8
Credit: 1,674,320
RAC: 0
Message 113220 - Posted: 3 Jul 2014, 0:39:02 UTC
Last modified: 3 Jul 2014, 0:41:49 UTC

Ok. I will start one host at a time to see what happening, that will going to take some days since the caches are allready loaded.
ID: 113220 · Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 11 Aug 13
Posts: 10
Credit: 5,011,603
RAC: 0
Message 113221 - Posted: 3 Jul 2014, 1:37:49 UTC
Last modified: 3 Jul 2014, 1:38:22 UTC

July 3, 29, 2014 04:00 UTC (switched to BRP5)
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9649

BRP5   2x using 1 cpu thread each (app_config), GPU utilization = 92%
       running an additional 4x Skynet POGs cpu WUs
GPU    7950 mem=1325, gpu=1150, pcie v2 x16
OS     Win7 x64 Home Premium
CPU    980X running at 3.41 GHz with HT off
MEM    Triple channel 1600 (7.7.7.20.2)
ID: 113221 · Report as offensive     Reply Quote
juan BFB

Send message
Joined: 10 Dec 12
Posts: 8
Credit: 1,674,320
RAC: 0
Message 113222 - Posted: 3 Jul 2014, 8:01:31 UTC - in response to Message 113217.  
Last modified: 3 Jul 2014, 8:02:21 UTC

Well, here's the first conundrum:

All Binary Radio Pulsar Search (Perseus Arm Survey) tasks for computer 5367

After 200 minutes of solid GTX 670 work on Perseus, I earn the princely sum of ... 15 credits!

Allmost the same 15 cr for 10k to 20k secs of running time with a 690. That´s i could call a "credit deflation"

https://albert.phys.uwm.edu/results.php?hostid=10352&offset=0&show_names=0&state=4&appid=27
ID: 113222 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 113223 - Posted: 3 Jul 2014, 13:27:48 UTC

Yeah, looks a lot like the sortof discrepancies I see in simulations.

Will definitely be worth putting a 1.4 app onramp into the spreadsheets, to see how well the models reflect reality
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 113223 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113225 - Posted: 3 Jul 2014, 22:38:23 UTC - in response to Message 113223.  

Most of the BRP5 'Perseus Arm' tasks I've seen so far have old WUs which have been lying around in the database for some time, with multiple failures - not sure whether anybody has looked to see if that affects the credit granting process - even if only by the averages shifting between initial creation and final validation (I don't think so, because I don't think anything about the prevailing averages are stored into the task record when it's created from the WU - but I haven't looked at the database schema or the code).

But I've just validated the first 'clean', two replications only case:

WU 625789

For 12.62 credits.
ID: 113225 · Report as offensive     Reply Quote
juan BFB

Send message
Joined: 10 Dec 12
Posts: 8
Credit: 1,674,320
RAC: 0
Message 113226 - Posted: 4 Jul 2014, 1:37:23 UTC
Last modified: 4 Jul 2014, 1:38:07 UTC

Richard

The WU you talk about was validated against one of my host with a 670 too.

Something calls my atention, the crunching times, your takes about 12k secs mine 7.5k secs. I run 1 WU at a time and my 670 (EVGA FTW) is powered by an slow I5 vs your powerfull I7. Can you tell me why the time diference since both GPU´s are relative similars?

BTW The 12,62 credits received are realy amazing. :)
ID: 113226 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113227 - Posted: 4 Jul 2014, 6:46:08 UTC - in response to Message 113226.  

Richard

The WU you talk about was validated against one of my host with a 670 too.

Something calls my atention, the crunching times, your takes about 12k secs mine 7.5k secs. I run 1 WU at a time and my 670 (EVGA FTW) is powered by an slow I5 vs your powerfull I7. Can you tell me why the time diference since both GPU´s are relative similars?

BTW The 12,62 credits received are realy amazing. :)

That seems simple - I'm running two at a time, so effective throughput would be one task every 6k seconds (on your figures - I haven't looked at the data for BRP5 in any detail yet). The efficiency gain from running two together is probably more significant than the i5/i7 difference.
ID: 113227 · Report as offensive     Reply Quote
juan BFB

Send message
Joined: 10 Dec 12
Posts: 8
Credit: 1,674,320
RAC: 0
Message 113228 - Posted: 4 Jul 2014, 10:02:32 UTC
Last modified: 4 Jul 2014, 10:03:14 UTC

Thanks, yes thas easely explain the crunching time diferences. Seems like i missunderstood something again. I have the ideia we where asked for the test period to run 1 WU at a time to avoid any noise from one task transfered to the other.
ID: 113228 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113229 - Posted: 4 Jul 2014, 13:00:04 UTC - in response to Message 113228.  

Thanks, yes thas easely explain the crunching time diferences. Seems like i missunderstood something again. I have the ideia we where asked for the test period to run 1 WU at a time to avoid any noise from one task transfered to the other.

Sorry about that. We've all been pretty much making it up as we go along. I think I made that choice some time before somebody else posted the "one at a time" suggestion: I decided it was better to keep "steady as she goes" - there would be more noise in the results if you keep changing the utilisation factor.

Most of the time while running Arecibo tasks I got an incredibly stable run time: that counts for more in extended tests, where it's the measured APR that counts, and little (if any) weight is given to the theoretical "peak GFLOPS" the card is capable of.
ID: 113229 · Report as offensive     Reply Quote
Profile Holmis

Send message
Joined: 4 Jan 05
Posts: 104
Credit: 2,104,736
RAC: 0
Message 113230 - Posted: 4 Jul 2014, 20:00:18 UTC
Last modified: 4 Jul 2014, 20:01:37 UTC

Got my 4th validation for the v1.40 BRP5 app in earlier today and credits are on the rise, first two got 12,62, the 3rd got 12,73 and the 4th a whopping 15,41!
The 12,73 one was against Richard both running v1.40 and the last one an older WU against Snow Crash on v1.39.
ID: 113230 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113233 - Posted: 5 Jul 2014, 21:52:29 UTC - in response to Message 113230.  

The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all:

[AV#934] (BRP5-cuda32-nv301) adjusting projected flops based on PFC avg: 19.76G

According to WU 619924, the figures for v1.39 were rather different.
ID: 113233 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 113234 - Posted: 6 Jul 2014, 6:02:42 UTC - in response to Message 113233.  
Last modified: 6 Jul 2014, 6:24:07 UTC

The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all:

[AV#934] (BRP5-cuda32-nv301) adjusting projected flops based on PFC avg: 19.76G

According to WU 619924, the figures for v1.39 were rather different.


Yeah I see it with 3 app_versions in the same app id, so it'll do its wacky averaging thing [aka 'normalisation', but not], to create a min_avg_pfc.

[Edit:]
Ugh, a lot more than 3, make that ~22 . Since a number of those older ones are well beyond their 100 samples, this will have ramifications for the codewalking, because nvers thresholds for scaling will be engaged.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 113234 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113235 - Posted: 6 Jul 2014, 7:15:58 UTC - in response to Message 113234.  

WU 618702 looks perkier - v1.39/v1.40 cross-validation.
ID: 113235 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 113236 - Posted: 6 Jul 2014, 8:11:07 UTC - in response to Message 113235.  

WU 618702 looks perkier - v1.39/v1.40 cross-validation.


That's certainly more like the credits I expected from the models. I suspect that the cross app normalisation / averaging business may be quite valid/needed for credit purposes. It just royally screws with the time estimates before a new host/app version engages host scaling (which we've been calling onramp periods)

Rectifying that will probably need all our walkthrough efforts compared in detail to fill any knowledge gaps, but basically seeing something resembling expected behaviour is a good start. Having no incorrectly scaled CPU app to contend with in the mix means the credit part should be around the right region, even if quite noisy & prone to destabilisation.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 113236 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 113237 - Posted: 6 Jul 2014, 8:11:10 UTC - in response to Message 113235.  
Last modified: 6 Jul 2014, 8:12:04 UTC

... double post
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 113237 · Report as offensive     Reply Quote
Eyrie

Send message
Joined: 20 Feb 14
Posts: 47
Credit: 2,410
RAC: 0
Message 113246 - Posted: 7 Jul 2014, 7:56:46 UTC - in response to Message 113234.  
Last modified: 7 Jul 2014, 8:06:21 UTC

The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all:

[AV#934] (BRP5-cuda32-nv301) adjusting projected flops based on PFC avg: 19.76G

According to WU 619924, the figures for v1.39 were rather different.


Yeah I see it with 3 app_versions in the same app id, so it'll do its wacky averaging thing [aka 'normalisation', but not], to create a min_avg_pfc.

[Edit:]
Ugh, a lot more than 3, make that ~22 . Since a number of those older ones are well beyond their 100 samples, this will have ramifications for the codewalking, because nvers thresholds for scaling will be engaged.

Oh F***

to be fair we did ask for details to be inherited to new versions, to limit the onramp damage. Probably does the opposite o what would be clever.

edit: app_version doesn't get scaled until it has 100 samples, but it may be picking up scaling in other parts.
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
ID: 113246 · Report as offensive     Reply Quote
jason_gee

Send message
Joined: 4 Jun 14
Posts: 109
Credit: 1,043,639
RAC: 0
Message 113247 - Posted: 7 Jul 2014, 8:34:21 UTC - in response to Message 113246.  

Yeah, cross check of walkthroughs should help. Big problem is at least 16 possible general starting states, multiplied across wingmen for many combinations, I'm going to resist the temptation to model all 256 base combinations, and think in terms of reducing those # of states... for example correct the system in places so that CPU & GPU become considered the same much earlier in the sequence, remove the need for onramps, and perhaps even consider if stock & anon are really different enough to warrant completely separate codepaths as they have in places.
On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - C Babbage
ID: 113247 · Report as offensive     Reply Quote
Eyrie

Send message
Joined: 20 Feb 14
Posts: 47
Credit: 2,410
RAC: 0
Message 113250 - Posted: 11 Jul 2014, 6:58:09 UTC
Last modified: 11 Jul 2014, 7:24:05 UTC

We've just got a fresh release of FGRP to version 1.12. Apps are identical to 1.11. This _should_ solve the time limit exceeded problem, but more bugs may be lurking.

edit: you may have to opt in for the app.
edit2: To be more precise, you may have to allow both beta apps and FGRP.

Anybody runs into further -197 time limit exceeded errors with FGRP [or any other app] please report ASAP. Please always include host ID - we can glean most variables from database dumps now, but if you can also state your peak_flops (from BOINC startup messages) that would be very helpful.

We have more or less finished analysis and are contemplating how we can best address any issues that we established as problem areas from the live run. You can only do so much from the theory [i.e. code reading] you always need the actual data too, to get a complete picture.
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
ID: 113250 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113252 - Posted: 11 Jul 2014, 8:39:39 UTC - in response to Message 113250.  

Please cross-refer to thread 'Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED' in the 'Problems and bug reports' area before carrying out the tests that Eyrie requested.
ID: 113252 · Report as offensive     Reply Quote
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · Next

Message boards : News : Project server code update



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration