Deprecated: Function get_magic_quotes_gpc() is deprecated in /srv/BOINC/live-webcode/html/inc/util.inc on line 640
Where does work go when it's being resent and aborted...

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Where does work go when it's being resent and aborted...

Message boards : Problems and Bug Reports : Where does work go when it's being resent and aborted...
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile pragmatic prancing periodic problem child, left
Avatar

Send message
Joined: 26 Jan 05
Posts: 1639
Credit: 70,000
RAC: 0
Message 111610 - Posted: 21 Dec 2011, 9:03:33 UTC
Last modified: 21 Dec 2011, 9:46:12 UTC

Where does work go when it's being resent by the server, but actively denied by the client (7.0.3)?

I have this task being resent to me, but it doesn't get to my computer, because I told my BOINC to exclude the Albert project from using the einsteinbinary_BRP4.

Yet just before it fetched work for this project and binary, I had reset the project. Then BOINC will be confused, as it'll say stuff like:
21/12/2011 03:27:45 | Albert@Home | Config: excluded GPU. Type: ATI. App: einsteinbinary_BRP4. Device: 0
21/12/2011 03:27:45 | Albert@Home | A GPU exclusion in your cc_config.xml file specifies a non-existent application 'einsteinbinary_BRP4'. Existing applications:

Existing applications being none, as we have reset the project... which wipes everything out.

So then we do a work request, which we get work for:
21/12/2011 03:28:29 | Albert@Home | [sched_op] Starting scheduler request
21/12/2011 03:28:29 | Albert@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) ATI (51840.00 sec, 1.00 inst)
21/12/2011 03:28:29 | Albert@Home | Sending scheduler request: To fetch work.
21/12/2011 03:28:29 | Albert@Home | Requesting new tasks for ATI
21/12/2011 03:28:29 | Albert@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs
21/12/2011 03:28:29 | Albert@Home | [sched_op] ATI work request: 51840.00 seconds; 1.00 CPUs
21/12/2011 03:28:31 | Albert@Home | Scheduler request completed: got 1 new tasks
21/12/2011 03:28:31 | Albert@Home | [sched_op] Server version 613
21/12/2011 03:28:31 | Albert@Home | Project requested delay of 60 seconds
21/12/2011 03:28:31 | Albert@Home | [task] result state=NEW for p2030.20100913.G44.54-00.26.S.b3s0g0.00000_144_6 from handle_scheduler_reply
21/12/2011 03:28:31 | Albert@Home | [sched_op] estimated total CPU task duration: 0 seconds
21/12/2011 03:28:31 | Albert@Home | [sched_op] estimated total ATI task duration: 46193 seconds
21/12/2011 03:28:31 | Albert@Home | [sched_op] Deferring communication for 1 min 0 sec
21/12/2011 03:28:31 | Albert@Home | [sched_op] Reason: requested by project

Then we get a warning that we're doing something wrong:
21/12/2011 03:30:47 | Albert@Home | [cpu_sched_debug] insufficient ATI for p2030.20100913.G44.54-00.26.S.b3s0g0.00000_144_6
This sets the task as "GPU missing, Ready to start" in BOINC Manager's Tasks list.

However, moments later we try to top off our cache a little, with:
21/12/2011 03:31:08 | Albert@Home | [sched_op] Starting scheduler request
21/12/2011 03:31:08 | Albert@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) ATI (51840.00 sec, 1.00 inst)
21/12/2011 03:31:08 | Albert@Home | Sending scheduler request: To fetch work.
21/12/2011 03:31:08 | Albert@Home | Requesting new tasks for ATI
21/12/2011 03:31:08 | Albert@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs
21/12/2011 03:31:08 | Albert@Home | [sched_op] ATI work request: 51840.00 seconds; 1.00 CPUs
21/12/2011 03:31:11 | Albert@Home | Scheduler request completed: got 1 new tasks
21/12/2011 03:31:11 | Albert@Home | [sched_op] Server version 613
21/12/2011 03:31:11 | Albert@Home | Project requested delay of 60 seconds
21/12/2011 03:31:11 | Albert@Home | [error] Missing coprocessor for task p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2; aborting
21/12/2011 03:31:11 | Albert@Home | [task] result state=ABORTED for p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2 from RESULT::abort_inactive

21/12/2011 03:31:11 | Albert@Home | [sched_op] estimated total CPU task duration: 0 seconds
21/12/2011 03:31:11 | Albert@Home | [sched_op] estimated total ATI task duration: 0 seconds
21/12/2011 03:31:11 | Albert@Home | [work_fetch] backing off ATI 852 sec
21/12/2011 03:31:11 | Albert@Home | [sched_op] Deferring communication for 1 min 0 sec
21/12/2011 03:31:11 | Albert@Home | [sched_op] Reason: requested by project

And way later at 05:44:05 we have that same work request:
21/12/2011 05:44:05 | Albert@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) ATI (51840.00 sec, 1.00 inst)
21/12/2011 05:44:08 | Albert@Home | Scheduler request completed: got 1 new tasks
21/12/2011 05:44:08 | Albert@Home | [error] Missing coprocessor for task p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2; aborting
21/12/2011 05:44:08 | Albert@Home | [task] result state=ABORTED for p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2 from RESULT::abort_inactive

Which is actively denied, so we end with http://albert.phys.uwm.edu/host_sched_logs/0/835 showing [user_messages] [HOST#835] MSG(low) Resent lost task p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2 which does not ever get to my system, since BOINC is aborting it immediately. Ad infinitum --or until deadline anyway-- since we're excluding your application and your project... ;-)

Fun, these different sides of BOINC, no? ;-)
Jord.

BOINC FAQ Service

They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake.
ID: 111610 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 111612 - Posted: 21 Dec 2011, 13:09:39 UTC

Isn't this task reported as aborted? Then the server shouldn't resend it...

BM
ID: 111612 · Report as offensive     Reply Quote
Profile pragmatic prancing periodic problem child, left
Avatar

Send message
Joined: 26 Jan 05
Posts: 1639
Credit: 70,000
RAC: 0
Message 111614 - Posted: 21 Dec 2011, 18:26:52 UTC - in response to Message 111612.  

No, the task isn't reported at all. As far as I can see, it's blocked at the time that the scheduler contact is being done. Not very neat.

21/12/2011 19:25:40 | Albert@Home | [sched_op] Starting scheduler request
21/12/2011 19:25:40 | Albert@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) ATI (51840.00 sec, 1.00 inst)
21/12/2011 19:25:40 | Albert@Home | Sending scheduler request: To fetch work.
21/12/2011 19:25:40 | Albert@Home | Requesting new tasks for ATI
21/12/2011 19:25:40 | Albert@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs
21/12/2011 19:25:40 | Albert@Home | [sched_op] ATI work request: 51840.00 seconds; 1.00 CPUs
21/12/2011 19:25:40 | Albert@Home | [http] HTTP_OP::init_post(): http://albert.phys.uwm.edu/EinsteinAtHome_cgi/cgi
21/12/2011 19:25:40 | Albert@Home | [http] HTTP_OP::libcurl_exec(): ca-bundle set
21/12/2011 19:25:40 | Albert@Home | [http] [ID#1] Info: Connection #1 seems to be dead!
21/12/2011 19:25:40 | Albert@Home | [http] [ID#1] Info: Closing connection #1
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Info: About to connect() to albert.phys.uwm.edu port 80 (#1)
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Info: Trying 129.89.61.67...
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Info: Connected to albert.phys.uwm.edu (129.89.61.67) port 80 (#1)
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: POST /EinsteinAtHome_cgi/cgi HTTP/1.1
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.0.3)
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: Host: albert.phys.uwm.edu
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: Accept: */*
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: Content-Length: 9881
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: Expect: 100-continue
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server:
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: HTTP/1.1 100 Continue
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: HTTP/1.1 200 OK
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: Date: Wed, 21 Dec 2011 18:25:27 GMT
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: Server: Apache/2.2.3 (CentOS)
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: Transfer-Encoding: chunked
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: Content-Type: text/xml
21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server:
21/12/2011 19:25:41 | | [http_xfer] [ID#1] HTTP: wrote 1316 bytes
21/12/2011 19:25:41 | | [http_xfer] [ID#1] HTTP: wrote 1460 bytes
21/12/2011 19:25:41 | | [http_xfer] [ID#1] HTTP: wrote 1296 bytes
21/12/2011 19:25:41 | | [http_xfer] [ID#1] HTTP: wrote 156 bytes
21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 1460 bytes
21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 1460 bytes
21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 1460 bytes
21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 1460 bytes
21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 1460 bytes
21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 4380 bytes
21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 4380 bytes
21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 1088 bytes
21/12/2011 19:25:42 | Albert@Home | [http] [ID#1] Info: Connection #1 to host albert.phys.uwm.edu left intact
21/12/2011 19:25:42 | Albert@Home | Scheduler request completed: got 1 new tasks
21/12/2011 19:25:42 | Albert@Home | [sched_op] Server version 613
21/12/2011 19:25:42 | Albert@Home | Resent lost task p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2
21/12/2011 19:25:42 | Albert@Home | Project requested delay of 60 seconds
21/12/2011 19:25:42 | Albert@Home | [error] Missing coprocessor for task p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2; aborting
21/12/2011 19:25:42 | Albert@Home | [task] result state=ABORTED for p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2 from RESULT::abort_inactive
21/12/2011 19:25:42 | Albert@Home | [sched_op] estimated total CPU task duration: 0 seconds
21/12/2011 19:25:42 | Albert@Home | [sched_op] estimated total ATI task duration: 0 seconds
21/12/2011 19:25:42 | Albert@Home | [work_fetch] backing off ATI 663 sec
21/12/2011 19:25:42 | Albert@Home | [sched_op] Deferring communication for 1 min 0 sec
21/12/2011 19:25:42 | Albert@Home | [sched_op] Reason: requested by project
21/12/2011 19:25:42 | | [statefile] set dirty: RPC complete
21/12/2011 19:25:42 | | [work_fetch] Request work fetch: RPC complete
21/12/2011 19:25:42 | | [statefile] Writing state file
21/12/2011 19:25:42 | | [statefile] Done writing state file

Jord.

BOINC FAQ Service

They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake.
ID: 111614 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 111615 - Posted: 21 Dec 2011, 18:44:23 UTC - in response to Message 111614.  

No, the task isn't reported at all. As far as I can see, it's blocked at the time that the scheduler contact is being done. Not very neat.


Well, reporting the aborting task should prevent the scheduler from sending it again. The client should definitely report aborted tasks before asking for work. Looks like a client bug to me.

BM
ID: 111615 · Report as offensive     Reply Quote
Profile pragmatic prancing periodic problem child, left
Avatar

Send message
Joined: 26 Jan 05
Posts: 1639
Credit: 70,000
RAC: 0
Message 111618 - Posted: 21 Dec 2011, 19:08:58 UTC - in response to Message 111615.  

The problem here is that it's not even downloaded. There's physically nothing of the task on the client, it's only the scheduler that says "I'm going to send you work", but immediately at the door it's told that the work isn't necessary, blocked before it can send one bit about it.

So since there's nothing about the task on the system, there's nothing to report about it.

Sure, it's a bug, but not just one of the client.
Jord.

BOINC FAQ Service

They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake.
ID: 111618 · Report as offensive     Reply Quote
Profile pragmatic prancing periodic problem child, left
Avatar

Send message
Joined: 26 Jan 05
Posts: 1639
Credit: 70,000
RAC: 0
Message 111621 - Posted: 22 Dec 2011, 8:56:38 UTC

It gets worse.
Last night I stopped excluding the GPU for this project, as the BOINC developers are not available for the moment, so it's no use testing this at this time.

So after the BOINC restart (GPU detection decisions happen at BOINC start), I went to the Albert web site and set there not to use the ATI GPU, then made sure that BOINC on this system knew about that (Update).

I knew that the one task would be resent eventually, not going to worry about that, when I see it I'll abort it. Since no work validates for the HD4850 anyway, there's no use running it for the 9 to 10 hours, unless you just want to add to your electricity bill because you're rich. ;-)

And so I was a tad confused when I just found this:
22/12/2011 06:06:15 | Albert@Home | [sched_op] Starting scheduler request
22/12/2011 06:06:15 | Albert@Home | Sending scheduler request: To fetch work.
22/12/2011 06:06:15 | Albert@Home | Requesting new tasks for ATI
22/12/2011 06:06:15 | Albert@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs
22/12/2011 06:06:15 | Albert@Home | [sched_op] ATI work request: 43384.32 seconds; 0.00 CPUs
22/12/2011 06:06:18 | Albert@Home | Scheduler request completed: got 1 new tasks
22/12/2011 06:06:18 | Albert@Home | [sched_op] Server version 613
22/12/2011 06:06:18 | Albert@Home | Resent lost task p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2
22/12/2011 06:06:18 | Albert@Home | Project requested delay of 60 seconds
22/12/2011 06:06:18 | Albert@Home | [task] result state=NEW for p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2 from handle_scheduler_reply
22/12/2011 06:06:18 | Albert@Home | [sched_op] estimated total CPU task duration: 0 seconds
22/12/2011 06:06:18 | Albert@Home | [sched_op] estimated total ATI task duration: 46193 seconds
22/12/2011 06:06:18 | Albert@Home | [sched_op] Deferring communication for 1 min 0 sec
22/12/2011 06:06:18 | Albert@Home | [sched_op] Reason: requested by project

That is a work request.
That is a work request for the ATI.
But it shouldn't be asking work for the ATI.
Since what do my project preferences say since last night?

<project_preferences>
<resource_share>800</resource_share>
<no_cpu>0</no_cpu>
<no_ati>1</no_ati>
<no_cuda>1</no_cuda>
<no_cuda>1</no_cuda>
<no_ati>1</no_ati>
<project_specific>
<graphics fps="20" quality="low" width="800" height="600" />
<also_run_cpu>0</also_run_cpu>
</project_specific>
</project_preferences>


Why there's a "no CUDA" twice in the account_albert.phys.uwm.edu.xml file, I don't know... Maybe that the "no ATI" needs that as well. ;-)
Jord.

BOINC FAQ Service

They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake.
ID: 111621 · Report as offensive     Reply Quote

Message boards : Problems and Bug Reports : Where does work go when it's being resent and aborted...



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration