WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!
Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED |
Message boards :
Problems and Bug Reports :
Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED
Message board moderation
Author | Message |
---|---|
Jacob Klein Send message Joined: 6 Nov 11 Posts: 16 Credit: 2,938,967 RAC: 0 |
I have received some errors lately, for app: Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia) The error is: Outcome Computation error Client state Compute error Exit status 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED <core_client_version>7.4.2</core_client_version> <![CDATA[ <message> exceeded elapsed time limit 136.04 (300000.00G/2205.26G) </message> Is the app's rsc_fpops_bound value set incorrectly? http://albert.phys.uwm.edu/workunit.php?wuid=604516 http://albert.phys.uwm.edu/workunit.php?wuid=604518 http://albert.phys.uwm.edu/workunit.php?wuid=604531 http://albert.phys.uwm.edu/workunit.php?wuid=604554 |
Claggy Send message Joined: 29 Dec 06 Posts: 78 Credit: 4,040,969 RAC: 0 |
We're in the middle of Boinc server software testing here, see the news threads, the rsc_fpops_bound is O.K, the server is supplying ridiculous speed estimates for the initial tasks. Claggy |
Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0 |
You need to edit client_state.xml to supply 100x higher rsc_fpops_bound values, to get you past the 11 validations needed to get APR to drive rsc_fpops_est. Albert is running vanillia creditNew code and David is using GPU peak flops to estimate GPU speeds, which turned out to be a pretty daft assumption. We'll be moving to a redesigned Credit Scheme over the next few weeks or so - you can try and keep up with the 'project code updted' news thread for that development. Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. |
Jacob Klein Send message Joined: 6 Nov 11 Posts: 16 Credit: 2,938,967 RAC: 0 |
Thanks guys. I just found that thread, and subscribed to it. I just figured that an issue like this deserved to be reported in its own thread. I do not plan on editing my client_state.xml file, because, if Albert fixes it server-side, I'd like the ability to test that fix. Thanks again, Jacob |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Trouble is, if you don't take precautions like that, you'll never complete any tasks, and never be able to explore any other aspects of the server code. I'm expecting that when the server code is next updated, I'll let it run for a few more hours to check we haven't introduced any new bugs, then force my 'graphing' host to get a new HostID and do it all over again with a completely clean application_details record. |
Jacob Klein Send message Joined: 6 Nov 11 Posts: 16 Credit: 2,938,967 RAC: 0 |
It's no trouble for me. It's just wasting a bit of my resources. I am not attached to work out any additional server/scheduler bugs. I am attached simply to test that units complete. And I had a problem with that. And I reported it. Until server-side implements a fix, it looks like the units will continue to waste resources. It is unfortunate. I hope they fix it. |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
It's no trouble for me. It's just wasting a bit of my resources. I am not attached to work out any additional server/scheduler bugs. I am attached simply to test that units complete. And I had a problem with that. And I reported it. Well, if you continue to waste resources after you've been told what's going on and why, be our guest. It's your electricity bill. It's going to be fixed, though it may not be "they" that does it. |
Jacob Klein Send message Joined: 6 Nov 11 Posts: 16 Credit: 2,938,967 RAC: 0 |
I am happy to be your guest, as you and the team figure out the scheduler issues. |
tjreuter Send message Joined: 11 Feb 05 Posts: 25 Credit: 2,084,544 RAC: 0 |
I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app. Only Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia) have this error (at my side). Greetings from, TJ. |
Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0 |
The error is due to an initialisation bug in the creditNew code. If you want to prevent it, you need to make a manual adjustment (edit) to client_state.xml. Specifically you need to increase the <rsc_fpops_bound> value for the GPU tasks by at least 2 magnitudes (add 2-3 zeros). If you don;t feel confident enough for that, you can either try to juggle around with what tasks you get until the patch has been deployed, (e.g. opt out of GPU tasks), stop crunching for Albert until the patch is in or just let them error out [as Jacob is doing] - then when the patch is in it should correct itself. In the first two, you need to monitor the news thread(s), for patch announcements. Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Specifically: Fetch some FGRP (Gamma-ray pulsar search) work. Exit BOINC completely Edit <rsc_fpops_bound> as Eyrie describes. You'll find it in the <workunit> definition for each of the tasks you've downloaded. Restart BOINC, and allow the tasks to run and report as usual. Probably best to set 'No New Tasks' while you do this. Once you've reported and validated 11 tasks, the procedure should no longer be necessary. If you didn't get 11 validations from the first batch, repeat as needed. |
Jacob Klein Send message Joined: 6 Nov 11 Posts: 16 Credit: 2,938,967 RAC: 0 |
Just wanted to post to mention that I will likely no-longer be testing these problematic work units. Now that my RacerX machine has [GTX 660 Ti + GTX 660 Ti + GTX 460] instead of [GTX 660 Ti + GTX 460 + GTS 240], I will likely not be running any more [Albert/Einstein/SETI/SETIBETA] on it. I had only been running those 4 projects on the GTS 240, since it couldn't run any other projects, but it is now in storage. I pulled it out because the R343+ NVIDIA drivers are dropping support for pre-Fermi GPUs, so I replaced it with another GTX 660 Ti, and all 3 of my GPUs will can now focus on GPUGrid. Thanks, Jacob |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app. @ tjreuter, Could you possibly unhide your host(s) at this project, or give us a direct link to the one you're having problems with? It would help us to give you more specific advice, and it would also help us (and the project) to understand more clearly why this problem happens in the first place. |
tjreuter Send message Joined: 11 Feb 05 Posts: 25 Credit: 2,084,544 RAC: 0 |
I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app. Rigs should be visible now Richard. However I have checked the Gamma-ray pulsar search out. (At Einstein@home, they work though). Greetings from, TJ. |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app. Yes, visible now, thanks. I assume we're talking about Error Gamma-ray pulsar search #3 tasks for computer 7731 - tasks issued yesterday. Unfortunately, Application details for host 7731 shows no APR for that app, because none of the tasks completed successfully. And the server log https://albert.phys.uwm.edu/host_sched_logs/7/7731 isn't much use either, because the last scheduler contact was to report work only, with no new work requested. What I'd like to see, if at all possible, is a copy of the server log for an example of a work request where an FGRP task was issued. It would look something like 2014-07-01 17:18:03.1608 [PID=30917] [version] Checking plan class 'FGRPopencl-nvidia' Note that in my case (from host 11362) the server is estimating - last line - that the task will run for 746 seconds (which is what I'm seeing locally too), and won't be thrown out with a time limit error for over four hours. That's calculated from "using conservative projected flops: 20.12G" a few lines above (which is a new one on me). Since your tasks error out in under 4 minutes, I assume the initial estimates must have been 20 times smaller than that - 12 seconds or something. What I'd ideally like to see is a similar server log from your machine, showing the GFlops value it's using to calculate your runtime. You have to be quick to catch it: there seem to be very few tasks around at the moment, and I had to try several times. Then, you have to capture the server log within a minute, otherwise another attempt will overwrite the successful one (unless you set NNT before your computer asks again). There's something very odd about the way the Albert server is setting these estimated speeds, and we haven't fully got to the bottom of it yet. |
tjreuter Send message Joined: 11 Feb 05 Posts: 25 Credit: 2,084,544 RAC: 0 |
Thank you Richard for your swift replay. I will see it I can "catch" these server code in the coming days. By the way all your assumptions are correct. Greetings from, TJ. |
Claggy Send message Joined: 29 Dec 06 Posts: 78 Credit: 4,040,969 RAC: 0 |
My HD7770's estimates have just got to the point where one of the apps for Binary Radio Pulsar Search (Perseus Arm Survey) now completes without error, the other app version is still erroring at 422 seconds. All Binary Radio Pulsar Search (Perseus Arm Survey) tasks for computer 8143 Claggy |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
I think we're going to have real problems with the Gamma-ray pulsar search #3 app for a while. I posted that my host 11362 was getting runtime estimates of 12 minutes, time allowed 4 hours, @ 20 GHz. Turns out that two of the three tasks I've returned so far would have exceeded bounds if I hadn't inoculated them. So my GTX 470 GPU is running at an effective rate of 1 GHz or less. As is described elsewhere, this app is very much still a work-in-progress, where very little work is done on the GPU, and most of it still on the CPU - it wants a full CPU core, and uses it to the hilt. Similarly, TJ's GTX 660 has been taking around three hours for the matching tasks over at the main Einstein project. So that makes even more of a mockery of the server dishing out a bounds limit of four minutes for his machine - his speed must be mis-estimated by a factor of 1,000 or so. And to put the icing on the cake, all three of my returned results have been paired with different anonymous Intel HD 2500 GPUs running with the dodgy OpenCL 1.1 driver that Claggy noticed. Inconclusive, the lot of them. It's going to take a while to get the server averages back into kilter... |
tullio Send message Joined: 22 Jan 05 Posts: 796 Credit: 137,342 RAC: 0 |
Most of my gamma ray units finish in time but are not validated. All on CPU. Tullio |
Claggy Send message Joined: 29 Dec 06 Posts: 78 Credit: 4,040,969 RAC: 0 |
And to put the icing on the cake, all three of my returned results have been paired with different anonymous Intel HD 2500 GPUs running with the dodgy OpenCL 1.1 driver that Claggy noticed. Inconclusive, the lot of them. It's going to take a while to get the server averages back into kilter... I've got something like 26 inconclusives spread across all these intel GPU hosts, all of them running openCL 1.1 drivers, most of them are anonymous with an i3-3220 and a HD Graphics 2500 and Boinc 7.0.64: https://albert.phys.uwm.edu/show_host_detail.php?hostid=4792 https://albert.phys.uwm.edu/show_host_detail.php?hostid=5414 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9043 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9046 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9048 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9041 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9045 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9048 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9089 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9090 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9091 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9094 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9095 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9099 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9101 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9106 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9114 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9115 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9119 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9122 https://albert.phys.uwm.edu/show_host_detail.php?hostid=9129 https://albert.phys.uwm.edu/show_host_detail.php?hostid=10714 Can we have an OpenCL 1.2 requirement put into FGRPopencl-intel_gpu app please. Claggy |