Deprecated: Function get_magic_quotes_gpc() is deprecated in /srv/BOINC/live-webcode/html/inc/util.inc on line 640
Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED

Message boards : Problems and Bug Reports : Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Jacob Klein

Send message
Joined: 6 Nov 11
Posts: 16
Credit: 2,938,967
RAC: 0
Message 113122 - Posted: 23 Jun 2014, 3:12:02 UTC
Last modified: 23 Jun 2014, 3:13:05 UTC

I have received some errors lately, for app:
Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia)

The error is:
Outcome Computation error
Client state Compute error
Exit status 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED
<core_client_version>7.4.2</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 136.04 (300000.00G/2205.26G)
</message>

Is the app's rsc_fpops_bound value set incorrectly?

http://albert.phys.uwm.edu/workunit.php?wuid=604516
http://albert.phys.uwm.edu/workunit.php?wuid=604518
http://albert.phys.uwm.edu/workunit.php?wuid=604531
http://albert.phys.uwm.edu/workunit.php?wuid=604554
ID: 113122 · Report as offensive     Reply Quote
Claggy

Send message
Joined: 29 Dec 06
Posts: 78
Credit: 4,040,969
RAC: 0
Message 113123 - Posted: 23 Jun 2014, 7:11:34 UTC - in response to Message 113122.  

We're in the middle of Boinc server software testing here, see the news threads, the rsc_fpops_bound is O.K, the server is supplying ridiculous speed estimates for the initial tasks.

Claggy
ID: 113123 · Report as offensive     Reply Quote
Eyrie

Send message
Joined: 20 Feb 14
Posts: 47
Credit: 2,410
RAC: 0
Message 113124 - Posted: 23 Jun 2014, 8:46:07 UTC

You need to edit client_state.xml to supply 100x higher rsc_fpops_bound values, to get you past the 11 validations needed to get APR to drive rsc_fpops_est.

Albert is running vanillia creditNew code and David is using GPU peak flops to estimate GPU speeds, which turned out to be a pretty daft assumption.

We'll be moving to a redesigned Credit Scheme over the next few weeks or so - you can try and keep up with the 'project code updted' news thread for that development.
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
ID: 113124 · Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 6 Nov 11
Posts: 16
Credit: 2,938,967
RAC: 0
Message 113126 - Posted: 23 Jun 2014, 11:56:37 UTC - in response to Message 113124.  

Thanks guys. I just found that thread, and subscribed to it. I just figured that an issue like this deserved to be reported in its own thread.

I do not plan on editing my client_state.xml file, because, if Albert fixes it server-side, I'd like the ability to test that fix.

Thanks again,
Jacob
ID: 113126 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113127 - Posted: 23 Jun 2014, 14:21:04 UTC - in response to Message 113126.  

Trouble is, if you don't take precautions like that, you'll never complete any tasks, and never be able to explore any other aspects of the server code.

I'm expecting that when the server code is next updated, I'll let it run for a few more hours to check we haven't introduced any new bugs, then force my 'graphing' host to get a new HostID and do it all over again with a completely clean application_details record.
ID: 113127 · Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 6 Nov 11
Posts: 16
Credit: 2,938,967
RAC: 0
Message 113128 - Posted: 23 Jun 2014, 14:27:30 UTC - in response to Message 113127.  

It's no trouble for me. It's just wasting a bit of my resources. I am not attached to work out any additional server/scheduler bugs. I am attached simply to test that units complete. And I had a problem with that. And I reported it.

Until server-side implements a fix, it looks like the units will continue to waste resources. It is unfortunate. I hope they fix it.
ID: 113128 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113129 - Posted: 23 Jun 2014, 14:37:46 UTC - in response to Message 113128.  

It's no trouble for me. It's just wasting a bit of my resources. I am not attached to work out any additional server/scheduler bugs. I am attached simply to test that units complete. And I had a problem with that. And I reported it.

Until server-side implements a fix, it looks like the units will continue to waste resources. It is unfortunate. I hope they fix it.

Well, if you continue to waste resources after you've been told what's going on and why, be our guest. It's your electricity bill.

It's going to be fixed, though it may not be "they" that does it.
ID: 113129 · Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 6 Nov 11
Posts: 16
Credit: 2,938,967
RAC: 0
Message 113130 - Posted: 23 Jun 2014, 15:33:43 UTC

I am happy to be your guest, as you and the team figure out the scheduler issues.
ID: 113130 · Report as offensive     Reply Quote
tjreuter

Send message
Joined: 11 Feb 05
Posts: 25
Credit: 2,084,544
RAC: 0
Message 113194 - Posted: 30 Jun 2014, 18:24:58 UTC

I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app.
Only Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia) have this error (at my side).
Greetings from,
TJ.
ID: 113194 · Report as offensive     Reply Quote
Eyrie

Send message
Joined: 20 Feb 14
Posts: 47
Credit: 2,410
RAC: 0
Message 113195 - Posted: 1 Jul 2014, 8:31:13 UTC

The error is due to an initialisation bug in the creditNew code.

If you want to prevent it, you need to make a manual adjustment (edit) to client_state.xml. Specifically you need to increase the <rsc_fpops_bound> value for the GPU tasks by at least 2 magnitudes (add 2-3 zeros).

If you don;t feel confident enough for that, you can either try to juggle around with what tasks you get until the patch has been deployed, (e.g. opt out of GPU tasks), stop crunching for Albert until the patch is in or just let them error out [as Jacob is doing] - then when the patch is in it should correct itself.

In the first two, you need to monitor the news thread(s), for patch announcements.
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
ID: 113195 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113196 - Posted: 1 Jul 2014, 8:39:22 UTC - in response to Message 113195.  
Last modified: 1 Jul 2014, 8:43:01 UTC

Specifically:

Fetch some FGRP (Gamma-ray pulsar search) work.
Exit BOINC completely
Edit <rsc_fpops_bound> as Eyrie describes. You'll find it in the <workunit> definition for each of the tasks you've downloaded.
Restart BOINC, and allow the tasks to run and report as usual. Probably best to set 'No New Tasks' while you do this.

Once you've reported and validated 11 tasks, the procedure should no longer be necessary. If you didn't get 11 validations from the first batch, repeat as needed.
ID: 113196 · Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 6 Nov 11
Posts: 16
Credit: 2,938,967
RAC: 0
Message 113199 - Posted: 1 Jul 2014, 12:40:35 UTC
Last modified: 1 Jul 2014, 12:41:09 UTC

Just wanted to post to mention that I will likely no-longer be testing these problematic work units.

Now that my RacerX machine has [GTX 660 Ti + GTX 660 Ti + GTX 460] instead of [GTX 660 Ti + GTX 460 + GTS 240], I will likely not be running any more [Albert/Einstein/SETI/SETIBETA] on it. I had only been running those 4 projects on the GTS 240, since it couldn't run any other projects, but it is now in storage. I pulled it out because the R343+ NVIDIA drivers are dropping support for pre-Fermi GPUs, so I replaced it with another GTX 660 Ti, and all 3 of my GPUs will can now focus on GPUGrid.

Thanks,
Jacob
ID: 113199 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113200 - Posted: 1 Jul 2014, 12:42:44 UTC - in response to Message 113194.  

I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app.
Only Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia) have this error (at my side).

@ tjreuter,

Could you possibly unhide your host(s) at this project, or give us a direct link to the one you're having problems with?

It would help us to give you more specific advice, and it would also help us (and the project) to understand more clearly why this problem happens in the first place.
ID: 113200 · Report as offensive     Reply Quote
tjreuter

Send message
Joined: 11 Feb 05
Posts: 25
Credit: 2,084,544
RAC: 0
Message 113201 - Posted: 1 Jul 2014, 16:08:28 UTC - in response to Message 113200.  

I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app.
Only Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia) have this error (at my side).

@ tjreuter,

Could you possibly unhide your host(s) at this project, or give us a direct link to the one you're having problems with?

It would help us to give you more specific advice, and it would also help us (and the project) to understand more clearly why this problem happens in the first place.

Rigs should be visible now Richard. However I have checked the Gamma-ray pulsar search out. (At Einstein@home, they work though).
Greetings from,
TJ.
ID: 113201 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113202 - Posted: 1 Jul 2014, 17:36:38 UTC - in response to Message 113201.  

I have the same errors, but my wing(wo)man with nVidia cards also have this error. If done by a CPU then it is validated. So I think it has something to do with the GPU app.
Only Gamma-ray pulsar search #3 v1.11 (FGRPopencl-nvidia) have this error (at my side).

@ tjreuter,

Could you possibly unhide your host(s) at this project, or give us a direct link to the one you're having problems with?

It would help us to give you more specific advice, and it would also help us (and the project) to understand more clearly why this problem happens in the first place.

Rigs should be visible now Richard. However I have checked the Gamma-ray pulsar search out. (At Einstein@home, they work though).

Yes, visible now, thanks.

I assume we're talking about

Error Gamma-ray pulsar search #3 tasks for computer 7731 - tasks issued yesterday.

Unfortunately, Application details for host 7731 shows no APR for that app, because none of the tasks completed successfully.

And the server log https://albert.phys.uwm.edu/host_sched_logs/7/7731 isn't much use either, because the last scheduler contact was to report work only, with no new work requested.

What I'd like to see, if at all possible, is a copy of the server log for an example of a work request where an FGRP task was issued. It would look something like

2014-07-01 17:18:03.1608 [PID=30917] [version] Checking plan class 'FGRPopencl-nvidia'
2014-07-01 17:18:03.1608 [PID=30917] [version] plan_class_spec: parsed project prefs setting 'gpu_util_fgrp' : true : 1.000000
2014-07-01 17:18:03.1609 [PID=30917] [version] [AV#913] (FGRPopencl-nvidia) using conservative projected flops: 20.12G
2014-07-01 17:18:03.1609 [PID=30917] [version] Best app version is now AV913 (29.38 GFLOP)
2014-07-01 17:18:03.1610 [PID=30917] [version] [AV#913] (FGRPopencl-nvidia) 11362
2014-07-01 17:18:03.1610 [PID=30917] [version] Best version of app hsgamma_FGRP3 is [AV#913] (20.12 GFLOPS)
2014-07-01 17:18:03.1610 [PID=30917] [send] est delay 0, skipping deadline check
2014-07-01 17:18:03.1629 [PID=30917] [send] Sending app_version hsgamma_FGRP3 2 111 FGRPopencl-nvidia; projected 20.12 GFLOPS
2014-07-01 17:18:03.1630 [PID=30917] [CRITICAL] No filename found in [WU#605548 LATeah0109C_32.0_99_-5.66e-10]
2014-07-01 17:18:03.1630 [PID=30917] [send] est. duration for WU 605548: unscaled 745.62 scaled 745.95
2014-07-01 17:18:03.1630 [PID=30917] [send] [HOST#11362] sending [RESULT#1453006 LATeah0109C_32.0_99_-5.66e-10_0] (est. dur. 745.95s (0h12m25s95)) (max time 14912.31s (4h08m32s31))

Note that in my case (from host 11362) the server is estimating - last line - that the task will run for 746 seconds (which is what I'm seeing locally too), and won't be thrown out with a time limit error for over four hours.

That's calculated from "using conservative projected flops: 20.12G" a few lines above (which is a new one on me). Since your tasks error out in under 4 minutes, I assume the initial estimates must have been 20 times smaller than that - 12 seconds or something.

What I'd ideally like to see is a similar server log from your machine, showing the GFlops value it's using to calculate your runtime. You have to be quick to catch it: there seem to be very few tasks around at the moment, and I had to try several times. Then, you have to capture the server log within a minute, otherwise another attempt will overwrite the successful one (unless you set NNT before your computer asks again). There's something very odd about the way the Albert server is setting these estimated speeds, and we haven't fully got to the bottom of it yet.
ID: 113202 · Report as offensive     Reply Quote
tjreuter

Send message
Joined: 11 Feb 05
Posts: 25
Credit: 2,084,544
RAC: 0
Message 113204 - Posted: 1 Jul 2014, 20:41:04 UTC - in response to Message 113202.  
Last modified: 1 Jul 2014, 20:41:34 UTC

Thank you Richard for your swift replay. I will see it I can "catch" these server code in the coming days.

By the way all your assumptions are correct.
Greetings from,
TJ.
ID: 113204 · Report as offensive     Reply Quote
Claggy

Send message
Joined: 29 Dec 06
Posts: 78
Credit: 4,040,969
RAC: 0
Message 113205 - Posted: 1 Jul 2014, 20:47:12 UTC

My HD7770's estimates have just got to the point where one of the apps for Binary Radio Pulsar Search (Perseus Arm Survey) now completes without error,
the other app version is still erroring at 422 seconds.

All Binary Radio Pulsar Search (Perseus Arm Survey) tasks for computer 8143

Claggy
ID: 113205 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 113208 - Posted: 2 Jul 2014, 10:47:40 UTC

I think we're going to have real problems with the Gamma-ray pulsar search #3 app for a while.

I posted that my host 11362 was getting runtime estimates of 12 minutes, time allowed 4 hours, @ 20 GHz.

Turns out that two of the three tasks I've returned so far would have exceeded bounds if I hadn't inoculated them. So my GTX 470 GPU is running at an effective rate of 1 GHz or less. As is described elsewhere, this app is very much still a work-in-progress, where very little work is done on the GPU, and most of it still on the CPU - it wants a full CPU core, and uses it to the hilt.

Similarly, TJ's GTX 660 has been taking around three hours for the matching tasks over at the main Einstein project. So that makes even more of a mockery of the server dishing out a bounds limit of four minutes for his machine - his speed must be mis-estimated by a factor of 1,000 or so.

And to put the icing on the cake, all three of my returned results have been paired with different anonymous Intel HD 2500 GPUs running with the dodgy OpenCL 1.1 driver that Claggy noticed. Inconclusive, the lot of them. It's going to take a while to get the server averages back into kilter...
ID: 113208 · Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 22 Jan 05
Posts: 796
Credit: 137,342
RAC: 0
Message 113210 - Posted: 2 Jul 2014, 15:42:25 UTC

Most of my gamma ray units finish in time but are not validated. All on CPU.
Tullio
ID: 113210 · Report as offensive     Reply Quote
Claggy

Send message
Joined: 29 Dec 06
Posts: 78
Credit: 4,040,969
RAC: 0
Message 113231 - Posted: 5 Jul 2014, 20:58:06 UTC - in response to Message 113208.  

And to put the icing on the cake, all three of my returned results have been paired with different anonymous Intel HD 2500 GPUs running with the dodgy OpenCL 1.1 driver that Claggy noticed. Inconclusive, the lot of them. It's going to take a while to get the server averages back into kilter...

I've got something like 26 inconclusives spread across all these intel GPU hosts, all of them running openCL 1.1 drivers, most of them are anonymous with an i3-3220 and a HD Graphics 2500 and Boinc 7.0.64:

https://albert.phys.uwm.edu/show_host_detail.php?hostid=4792
https://albert.phys.uwm.edu/show_host_detail.php?hostid=5414
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9043
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9046
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9048
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9041
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9045
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9048
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9089
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9090
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9091
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9094
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9095
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9099
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9101
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9106
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9114
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9115
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9119
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9122
https://albert.phys.uwm.edu/show_host_detail.php?hostid=9129
https://albert.phys.uwm.edu/show_host_detail.php?hostid=10714

Can we have an OpenCL 1.2 requirement put into FGRPopencl-intel_gpu app please.

Claggy
ID: 113231 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Problems and Bug Reports : Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration