Deprecated: Function get_magic_quotes_gpc() is deprecated in /srv/BOINC/live-webcode/html/inc/util.inc on line 640
Sending work

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Sending work

Message boards : News : Sending work
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7

AuthorMessage
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 111574 - Posted: 14 Dec 2011, 22:11:07 UTC - in response to Message 111572.  

This is quite a mission impossible from a BOINC perspective, isn't it? If the initial estimation is off too much, the WUs will ALL get terminated prematurely, and the server will NEVER get a valid result to adjust the estimation of the computation performance, which is actually needed to provide a good estimation for the max elapsed time in the first place!!


Yep, that occurred to me, too.

I already added some code to our plan-class stuff that should allow me to play around with the flops estimation a bit. I intend to do this tomorrow, together with some more analysis of the scheduler code (sched_version.cpp).

BM
ID: 111574 · Report as offensive     Reply Quote
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 28 Aug 06
Posts: 1483
Credit: 1,864,017
RAC: 0
Message 111575 - Posted: 14 Dec 2011, 22:19:43 UTC
Last modified: 14 Dec 2011, 22:36:02 UTC

@Jord: Well, because I fiddled manually in client_state.xml, finally a lot of workunits were completed (even validated, not sure if that matters), and my card took ca 4500 sec per WU.

I think your card takes around 33k sec to complete a WU, so no matter what the theoretical (computed) peak performance is, the server would be right to assign a ca 7 times lower performance to your card.

Why is your card slower? Actually the debugging output gives a hint that because of the physical capabilities of your card, the app was forced to re-size the internal processing layout to make it fit. I'm afraid it requires some deep analysis to find out whether this re-sizing is leading to differences in the result of the computation, and whether the differences are tolerable (==>validator adjustment) or intolerable (maybe the re-sizing has a bug).

It would be instructive to see whether the debugging output in question is common to all 4xxx series cards. Ah..but that's a different subject.

@Bernd: my host is now almost out of work (was on nomorework) so I'll give it a try tomorrow or whenever it's ready.

CU
HB
ID: 111575 · Report as offensive     Reply Quote
robertmiles

Send message
Joined: 16 Nov 11
Posts: 19
Credit: 4,468,368
RAC: 0
Message 111576 - Posted: 14 Dec 2011, 22:54:54 UTC

Over on GPUGRID, I saw something about them finding that the HD4xxx series cards had some type of memory access problem - a limit on the amount of graphics memory each processor on the GPU can access before it starts using a much lower bandwidth path to the computer's main memory instead. I haven't kept up with whether more recent software updates have removed this restriction.
ID: 111576 · Report as offensive     Reply Quote
Profile pragmatic prancing periodic problem child, left
Avatar

Send message
Joined: 26 Jan 05
Posts: 1639
Credit: 70,000
RAC: 0
Message 111577 - Posted: 14 Dec 2011, 23:14:54 UTC - in response to Message 111575.  

I think your card takes around 33k sec to complete a WU, so no matter what the theoretical (computed) peak performance is, the server would be right to assign a ca 7 times lower performance to your card.

7 or 60? Quite some difference. But OK, I am running with a changed flops value, still only 11 digits long but different than what Albert gave me. Since its estimates are all too low (you're right about the ~32k seconds) I've made it think that the tasks are actually longer, not shorter.

Just too bad I'm still quite busy with Skyrim. That hacks into the time anything else can use the GPU. ;-)
Jord.

BOINC FAQ Service

They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake.
ID: 111577 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 111580 - Posted: 15 Dec 2011, 19:02:22 UTC - in response to Message 111577.  
Last modified: 15 Dec 2011, 19:02:56 UTC

The way of calculating (projected_)flops differs largely depending on how many tasks with this app version your host has successfully computed. Maybe this value differs between your hosts.

Anyway, I did change the scheduler (the "projected_flops" supplied by th eplan classes should be much lower now). At least they should bot overestimate the actual flops now, which could lead to "maximum time exceeded" errors. Time estimates on the Client side may be far off now, though. Have a try.

BM
ID: 111580 · Report as offensive     Reply Quote
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 28 Aug 06
Posts: 1483
Credit: 1,864,017
RAC: 0
Message 111581 - Posted: 15 Dec 2011, 20:44:28 UTC - in response to Message 111580.  
Last modified: 15 Dec 2011, 20:45:37 UTC

Hmmm....I get the same cut-off time as before (even tho I resetted the Albert project before allowing new work). In addition, the app now seems to be configured to use a full CPU core.

http://albert.phys.uwm.edu/result.php?resultid=66620

CU
HB
ID: 111581 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 111582 - Posted: 15 Dec 2011, 21:16:35 UTC - in response to Message 111581.  

Ok, scheduler reverted. Needs further investigation.

BM
ID: 111582 · Report as offensive     Reply Quote
Profile pragmatic prancing periodic problem child, left
Avatar

Send message
Joined: 26 Jan 05
Posts: 1639
Credit: 70,000
RAC: 0
Message 111585 - Posted: 16 Dec 2011, 0:32:33 UTC - in response to Message 111580.  
Last modified: 16 Dec 2011, 0:33:15 UTC

The way of calculating (projected_)flops differs largely depending on how many tasks with this app version your host has successfully computed. Maybe this value differs between your hosts.

LOL, like zero times for me? None of the tasks I do validate, remember?

As for testing your over_flops, what do I do with the extra tasks? Stupid BOINC always fetches 6 tasks, doesn't matter that it then takes ~9 days to do them... Though you now have ~9 days to come up with a better schedule(r). ;-)
Jord.

BOINC FAQ Service

They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake.
ID: 111585 · Report as offensive     Reply Quote
robertmiles

Send message
Joined: 16 Nov 11
Posts: 19
Credit: 4,468,368
RAC: 0
Message 111586 - Posted: 16 Dec 2011, 0:54:09 UTC
Last modified: 16 Dec 2011, 0:55:32 UTC

Have you thought of starting with a certain number of dummy tasks, to be replaced with similar information from tasks actually completed as soon as there are enough of them?

Some BOINC projects limit the number of tasks any computer can have downloaded and in progress at first, with this limit relaxed as soon as there are enough tasks successfully completed by that computer to get a better idea of how often it can handle yet another workunit.
ID: 111586 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 111592 - Posted: 16 Dec 2011, 13:21:13 UTC - in response to Message 111586.  

Some BOINC projects limit the number of tasks any computer can have downloaded and in progress at first, with this limit relaxed as soon as there are enough tasks successfully completed by that computer to get a better idea of how often it can handle yet another workunit.


That's certainly an option to limit the effect of the runtime estimation / work fetch going mad. But actually I'd like to understand and fix what's going wrong in the first place.

For now i raised the FLOPS estimation and thus the FLOPS limit by a factor of 10 for newly generated workunits. It will take some time (usually about 1.5d) until the first tasks from that will be sent out, though.

BM
ID: 111592 · Report as offensive     Reply Quote
robertmiles

Send message
Joined: 16 Nov 11
Posts: 19
Credit: 4,468,368
RAC: 0
Message 111594 - Posted: 16 Dec 2011, 17:30:54 UTC

I've read that at least some of the BOINC versions never initialize one of the variables often used in runtime estimation. You may want to add reporting of the variables you use so you can check for signs of this.
ID: 111594 · Report as offensive     Reply Quote
Profile pragmatic prancing periodic problem child, left
Avatar

Send message
Joined: 26 Jan 05
Posts: 1639
Credit: 70,000
RAC: 0
Message 111601 - Posted: 17 Dec 2011, 1:23:32 UTC - in response to Message 111592.  

That's certainly an option to limit the effect of the runtime estimation / work fetch going mad. But actually I'd like to understand and fix what's going wrong in the first place.

There is something weird going on with the amount of tasks one has per day. As you can see from my double zero credit & RAC, I haven't had one task validate yet. So by now, the amount of tasks I should be able to download for the v1.19 app should be 1, maybe 2.

Yesterday it was 26, now it is 32. Why is it going up?
I am not returning any valid work. Shouldn't it, like in the old days, continue to go down and eventually only give me 1 task per device (CPU core or GPU) per day? As with this, I can continue ad infinitum doing 'bad work'.
Jord.

BOINC FAQ Service

They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake.
ID: 111601 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 111609 - Posted: 20 Dec 2011, 10:26:47 UTC
Last modified: 20 Dec 2011, 10:29:50 UTC

I incorporated D.A.s recent fix for using "conservative flops estimate" in case "we don't have enough statistics" (i.e. too few valid results) into the scheduler running on Albert.

Let's see whether this helps ...

BM

PS: Besides I added some logging that should write the Client's max runtime for every job sent to the scheduler log. You may spot it in the logs for your hosts.
ID: 111609 · Report as offensive     Reply Quote
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 28 Aug 06
Posts: 1483
Credit: 1,864,017
RAC: 0
Message 111622 - Posted: 22 Dec 2011, 22:46:41 UTC - in response to Message 111609.  
Last modified: 25 Dec 2011, 21:13:49 UTC

Hi!

I just got this:

2011-12-22 22:39:37.5065 [PID=14669]    [version] Checking plan class 'atiOpenCL'
2011-12-22 22:39:37.5065 [PID=14669]    [version] host_flops: 2.972295e+09, 	speedup: 15.00, 	projected_flops: 4.458442e+10, 	peak_flops: 4.176000e+12, 	peak_flops_factor: 1.00


Still, the estimated CPU time as displayed by boinccmd for such a task is below 50 seconds ... :-( It will actaully take almost 100 times longer.

HB
ID: 111622 · Report as offensive     Reply Quote
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 28 Aug 06
Posts: 1483
Credit: 1,864,017
RAC: 0
Message 111625 - Posted: 25 Dec 2011, 21:16:15 UTC - in response to Message 111622.  

I guess I got a few from the old batch.

Now everything is fine, the runtime estimate is reasonably pessimistic now and tasks validate ok.

HB
ID: 111625 · Report as offensive     Reply Quote
Profile Oliver Behnke
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 4 Sep 07
Posts: 130
Credit: 8,545,955
RAC: 0
Message 111646 - Posted: 4 Jan 2012, 11:58:47 UTC - in response to Message 111573.  

no hint from Oliver either. Where is he by the way, seems like he evaporated. ;-)


Sort of, holiday season... :-)

Happy new year!
ID: 111646 · Report as offensive     Reply Quote
Profile Oliver Behnke
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 4 Sep 07
Posts: 130
Credit: 8,545,955
RAC: 0
Message 111647 - Posted: 4 Jan 2012, 12:02:15 UTC - in response to Message 111575.  

It would be instructive to see whether the debugging output in question is common to all 4xxx series cards. Ah..but that's a different subject.


They will. The 4xxx series doesn't support local memory, it's emulated via global memory which incurs a big impact on performance. Also, this series only allows for 64 work items per work group when local memory is used, hence the resizing. However, I doubt that the resizing actually affects the accuracy of the computation, but if it does, it needs to be fixed!

Oliver
ID: 111647 · Report as offensive     Reply Quote
oz

Send message
Joined: 28 Feb 05
Posts: 10
Credit: 1,285,478
RAC: 0
Message 111660 - Posted: 6 Jan 2012, 20:38:39 UTC

Hi,

I also have aborted task due to
exceeded elapsed time limit 19036.53 (28000000.00G/1470.86G) problem
.

The GPU is in bad state with reboot required. All other downloaded openCL tasks are started by BOINC and immediately aborted with:

Output file p2030.20100913.G44.55+00.20.N.b6s0g0.00000_2424_1_3 for task p2030.20100913.G44.55+00.20.N.b6s0g0.00000_2424_1 absent


This is finished after reaching the daily quota of task
I successfully finished atiopenCL tasks with 50000s runtime.
System: Linux Ubuntu Oneiric
OpenCL: ATI GPU 0: Juniper (driver version CAL 1.4.1646, device version OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213), 1024MB)
Catalyst 11.11
ID: 111660 · Report as offensive     Reply Quote
Profile Oliver Behnke
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 4 Sep 07
Posts: 130
Credit: 8,545,955
RAC: 0
Message 111776 - Posted: 31 Jan 2012, 11:21:05 UTC - in response to Message 111647.  


They will. The 4xxx series doesn't support local memory, it's emulated via global memory which incurs a big impact on performance. Also, this series only allows for 64 work items per work group when local memory is used, hence the resizing. However, I doubt that the resizing actually affects the accuracy of the computation, but if it does, it needs to be fixed!


Well, it turned out it does indeed! We'll fix it ASAP.

Oliver
ID: 111776 · Report as offensive     Reply Quote
Profile Oliver Behnke
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 4 Sep 07
Posts: 130
Credit: 8,545,955
RAC: 0
Message 111779 - Posted: 1 Feb 2012, 12:19:53 UTC - in response to Message 111776.  

Ok, bug fix implemented and tested. We'll release v1.20 shortly...

Oliver
ID: 111779 · Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7

Message boards : News : Sending work



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration