WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!
Where does work go when it's being resent and aborted... |
Message boards :
Problems and Bug Reports :
Where does work go when it's being resent and aborted...
Message board moderation
Author | Message |
---|---|
pragmatic prancing periodic problem child, left Send message Joined: 26 Jan 05 Posts: 1639 Credit: 70,000 RAC: 0 |
Where does work go when it's being resent by the server, but actively denied by the client (7.0.3)? I have this task being resent to me, but it doesn't get to my computer, because I told my BOINC to exclude the Albert project from using the einsteinbinary_BRP4. Yet just before it fetched work for this project and binary, I had reset the project. Then BOINC will be confused, as it'll say stuff like: 21/12/2011 03:27:45 | Albert@Home | Config: excluded GPU. Type: ATI. App: einsteinbinary_BRP4. Device: 0 21/12/2011 03:27:45 | Albert@Home | A GPU exclusion in your cc_config.xml file specifies a non-existent application 'einsteinbinary_BRP4'. Existing applications: Existing applications being none, as we have reset the project... which wipes everything out. So then we do a work request, which we get work for: 21/12/2011 03:28:29 | Albert@Home | [sched_op] Starting scheduler request 21/12/2011 03:28:29 | Albert@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) ATI (51840.00 sec, 1.00 inst) 21/12/2011 03:28:29 | Albert@Home | Sending scheduler request: To fetch work. 21/12/2011 03:28:29 | Albert@Home | Requesting new tasks for ATI 21/12/2011 03:28:29 | Albert@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs 21/12/2011 03:28:29 | Albert@Home | [sched_op] ATI work request: 51840.00 seconds; 1.00 CPUs 21/12/2011 03:28:31 | Albert@Home | Scheduler request completed: got 1 new tasks 21/12/2011 03:28:31 | Albert@Home | [sched_op] Server version 613 21/12/2011 03:28:31 | Albert@Home | Project requested delay of 60 seconds 21/12/2011 03:28:31 | Albert@Home | [task] result state=NEW for p2030.20100913.G44.54-00.26.S.b3s0g0.00000_144_6 from handle_scheduler_reply 21/12/2011 03:28:31 | Albert@Home | [sched_op] estimated total CPU task duration: 0 seconds 21/12/2011 03:28:31 | Albert@Home | [sched_op] estimated total ATI task duration: 46193 seconds 21/12/2011 03:28:31 | Albert@Home | [sched_op] Deferring communication for 1 min 0 sec 21/12/2011 03:28:31 | Albert@Home | [sched_op] Reason: requested by project Then we get a warning that we're doing something wrong: 21/12/2011 03:30:47 | Albert@Home | [cpu_sched_debug] insufficient ATI for p2030.20100913.G44.54-00.26.S.b3s0g0.00000_144_6 This sets the task as "GPU missing, Ready to start" in BOINC Manager's Tasks list. However, moments later we try to top off our cache a little, with: 21/12/2011 03:31:08 | Albert@Home | [sched_op] Starting scheduler request 21/12/2011 03:31:08 | Albert@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) ATI (51840.00 sec, 1.00 inst) 21/12/2011 03:31:08 | Albert@Home | Sending scheduler request: To fetch work. 21/12/2011 03:31:08 | Albert@Home | Requesting new tasks for ATI 21/12/2011 03:31:08 | Albert@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs 21/12/2011 03:31:08 | Albert@Home | [sched_op] ATI work request: 51840.00 seconds; 1.00 CPUs 21/12/2011 03:31:11 | Albert@Home | Scheduler request completed: got 1 new tasks 21/12/2011 03:31:11 | Albert@Home | [sched_op] Server version 613 21/12/2011 03:31:11 | Albert@Home | Project requested delay of 60 seconds 21/12/2011 03:31:11 | Albert@Home | [error] Missing coprocessor for task p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2; aborting 21/12/2011 03:31:11 | Albert@Home | [task] result state=ABORTED for p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2 from RESULT::abort_inactive 21/12/2011 03:31:11 | Albert@Home | [sched_op] estimated total CPU task duration: 0 seconds 21/12/2011 03:31:11 | Albert@Home | [sched_op] estimated total ATI task duration: 0 seconds 21/12/2011 03:31:11 | Albert@Home | [work_fetch] backing off ATI 852 sec 21/12/2011 03:31:11 | Albert@Home | [sched_op] Deferring communication for 1 min 0 sec 21/12/2011 03:31:11 | Albert@Home | [sched_op] Reason: requested by project And way later at 05:44:05 we have that same work request: 21/12/2011 05:44:05 | Albert@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) ATI (51840.00 sec, 1.00 inst) 21/12/2011 05:44:08 | Albert@Home | Scheduler request completed: got 1 new tasks 21/12/2011 05:44:08 | Albert@Home | [error] Missing coprocessor for task p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2; aborting 21/12/2011 05:44:08 | Albert@Home | [task] result state=ABORTED for p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2 from RESULT::abort_inactive Which is actively denied, so we end with http://albert.phys.uwm.edu/host_sched_logs/0/835 showing [user_messages] [HOST#835] MSG(low) Resent lost task p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2 which does not ever get to my system, since BOINC is aborting it immediately. Ad infinitum --or until deadline anyway-- since we're excluding your application and your project... ;-) Fun, these different sides of BOINC, no? ;-) Jord. BOINC FAQ Service They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake. |
Bernd Machenschalk Volunteer moderator Project administrator Project developer Send message Joined: 15 Oct 04 Posts: 1956 Credit: 6,218,130 RAC: 0 |
Isn't this task reported as aborted? Then the server shouldn't resend it... BM |
pragmatic prancing periodic problem child, left Send message Joined: 26 Jan 05 Posts: 1639 Credit: 70,000 RAC: 0 |
No, the task isn't reported at all. As far as I can see, it's blocked at the time that the scheduler contact is being done. Not very neat. 21/12/2011 19:25:40 | Albert@Home | [sched_op] Starting scheduler request 21/12/2011 19:25:40 | Albert@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) ATI (51840.00 sec, 1.00 inst) 21/12/2011 19:25:40 | Albert@Home | Sending scheduler request: To fetch work. 21/12/2011 19:25:40 | Albert@Home | Requesting new tasks for ATI 21/12/2011 19:25:40 | Albert@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs 21/12/2011 19:25:40 | Albert@Home | [sched_op] ATI work request: 51840.00 seconds; 1.00 CPUs 21/12/2011 19:25:40 | Albert@Home | [http] HTTP_OP::init_post(): http://albert.phys.uwm.edu/EinsteinAtHome_cgi/cgi 21/12/2011 19:25:40 | Albert@Home | [http] HTTP_OP::libcurl_exec(): ca-bundle set 21/12/2011 19:25:40 | Albert@Home | [http] [ID#1] Info: Connection #1 seems to be dead! 21/12/2011 19:25:40 | Albert@Home | [http] [ID#1] Info: Closing connection #1 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Info: About to connect() to albert.phys.uwm.edu port 80 (#1) 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Info: Trying 129.89.61.67... 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Info: Connected to albert.phys.uwm.edu (129.89.61.67) port 80 (#1) 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: POST /EinsteinAtHome_cgi/cgi HTTP/1.1 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.0.3) 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: Host: albert.phys.uwm.edu 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: Accept: */* 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: Content-Length: 9881 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: Expect: 100-continue 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Sent header to server: 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: HTTP/1.1 100 Continue 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: HTTP/1.1 200 OK 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: Date: Wed, 21 Dec 2011 18:25:27 GMT 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: Server: Apache/2.2.3 (CentOS) 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: Transfer-Encoding: chunked 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: Content-Type: text/xml 21/12/2011 19:25:41 | Albert@Home | [http] [ID#1] Received header from server: 21/12/2011 19:25:41 | | [http_xfer] [ID#1] HTTP: wrote 1316 bytes 21/12/2011 19:25:41 | | [http_xfer] [ID#1] HTTP: wrote 1460 bytes 21/12/2011 19:25:41 | | [http_xfer] [ID#1] HTTP: wrote 1296 bytes 21/12/2011 19:25:41 | | [http_xfer] [ID#1] HTTP: wrote 156 bytes 21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 1460 bytes 21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 1460 bytes 21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 1460 bytes 21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 1460 bytes 21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 1460 bytes 21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 4380 bytes 21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 4380 bytes 21/12/2011 19:25:42 | | [http_xfer] [ID#1] HTTP: wrote 1088 bytes 21/12/2011 19:25:42 | Albert@Home | [http] [ID#1] Info: Connection #1 to host albert.phys.uwm.edu left intact 21/12/2011 19:25:42 | Albert@Home | Scheduler request completed: got 1 new tasks 21/12/2011 19:25:42 | Albert@Home | [sched_op] Server version 613 21/12/2011 19:25:42 | Albert@Home | Resent lost task p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2 21/12/2011 19:25:42 | Albert@Home | Project requested delay of 60 seconds 21/12/2011 19:25:42 | Albert@Home | [error] Missing coprocessor for task p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2; aborting 21/12/2011 19:25:42 | Albert@Home | [task] result state=ABORTED for p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2 from RESULT::abort_inactive 21/12/2011 19:25:42 | Albert@Home | [sched_op] estimated total CPU task duration: 0 seconds 21/12/2011 19:25:42 | Albert@Home | [sched_op] estimated total ATI task duration: 0 seconds 21/12/2011 19:25:42 | Albert@Home | [work_fetch] backing off ATI 663 sec 21/12/2011 19:25:42 | Albert@Home | [sched_op] Deferring communication for 1 min 0 sec 21/12/2011 19:25:42 | Albert@Home | [sched_op] Reason: requested by project 21/12/2011 19:25:42 | | [statefile] set dirty: RPC complete 21/12/2011 19:25:42 | | [work_fetch] Request work fetch: RPC complete 21/12/2011 19:25:42 | | [statefile] Writing state file 21/12/2011 19:25:42 | | [statefile] Done writing state file Jord. BOINC FAQ Service They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake. |
Bernd Machenschalk Volunteer moderator Project administrator Project developer Send message Joined: 15 Oct 04 Posts: 1956 Credit: 6,218,130 RAC: 0 |
No, the task isn't reported at all. As far as I can see, it's blocked at the time that the scheduler contact is being done. Not very neat. Well, reporting the aborting task should prevent the scheduler from sending it again. The client should definitely report aborted tasks before asking for work. Looks like a client bug to me. BM |
pragmatic prancing periodic problem child, left Send message Joined: 26 Jan 05 Posts: 1639 Credit: 70,000 RAC: 0 |
The problem here is that it's not even downloaded. There's physically nothing of the task on the client, it's only the scheduler that says "I'm going to send you work", but immediately at the door it's told that the work isn't necessary, blocked before it can send one bit about it. So since there's nothing about the task on the system, there's nothing to report about it. Sure, it's a bug, but not just one of the client. Jord. BOINC FAQ Service They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake. |
pragmatic prancing periodic problem child, left Send message Joined: 26 Jan 05 Posts: 1639 Credit: 70,000 RAC: 0 |
It gets worse. Last night I stopped excluding the GPU for this project, as the BOINC developers are not available for the moment, so it's no use testing this at this time. So after the BOINC restart (GPU detection decisions happen at BOINC start), I went to the Albert web site and set there not to use the ATI GPU, then made sure that BOINC on this system knew about that (Update). I knew that the one task would be resent eventually, not going to worry about that, when I see it I'll abort it. Since no work validates for the HD4850 anyway, there's no use running it for the 9 to 10 hours, unless you just want to add to your electricity bill because you're rich. ;-) And so I was a tad confused when I just found this: 22/12/2011 06:06:15 | Albert@Home | [sched_op] Starting scheduler request 22/12/2011 06:06:15 | Albert@Home | Sending scheduler request: To fetch work. 22/12/2011 06:06:15 | Albert@Home | Requesting new tasks for ATI 22/12/2011 06:06:15 | Albert@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs 22/12/2011 06:06:15 | Albert@Home | [sched_op] ATI work request: 43384.32 seconds; 0.00 CPUs 22/12/2011 06:06:18 | Albert@Home | Scheduler request completed: got 1 new tasks 22/12/2011 06:06:18 | Albert@Home | [sched_op] Server version 613 22/12/2011 06:06:18 | Albert@Home | Resent lost task p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2 22/12/2011 06:06:18 | Albert@Home | Project requested delay of 60 seconds 22/12/2011 06:06:18 | Albert@Home | [task] result state=NEW for p2030.20100913.G44.54-00.26.S.b5s0g0.00000_3136_2 from handle_scheduler_reply 22/12/2011 06:06:18 | Albert@Home | [sched_op] estimated total CPU task duration: 0 seconds 22/12/2011 06:06:18 | Albert@Home | [sched_op] estimated total ATI task duration: 46193 seconds 22/12/2011 06:06:18 | Albert@Home | [sched_op] Deferring communication for 1 min 0 sec 22/12/2011 06:06:18 | Albert@Home | [sched_op] Reason: requested by project That is a work request. That is a work request for the ATI. But it shouldn't be asking work for the ATI. Since what do my project preferences say since last night? <project_preferences> <resource_share>800</resource_share> <no_cpu>0</no_cpu> <no_ati>1</no_ati> <no_cuda>1</no_cuda> <no_cuda>1</no_cuda> <no_ati>1</no_ati> <project_specific> <graphics fps="20" quality="low" width="800" height="600" /> <also_run_cpu>0</also_run_cpu> </project_specific> </project_preferences> Why there's a "no CUDA" twice in the account_albert.phys.uwm.edu.xml file, I don't know... Maybe that the "no ATI" needs that as well. ;-) Jord. BOINC FAQ Service They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake. |