WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!
Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED |
Message boards :
Problems and Bug Reports :
Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
I've already reported to Bernd, by email: And, (4), a quite different gripe. https://albert.phys.uwm.edu/results.php?hostid=11362 is plodding through some FGRP #3 OpenCL tasks. EVERY SINGLE ONE (sorry for shouting) has been paired with a different one from a sequence of apparently identical, anonymous, "Intel(R) Core(TM) i3-3220 CPU" with HD 2500 iGPUs. So far, I've returned results paired with hosts: He replied, I'll take a look; not sure this will fit in today, though. 'today' being Thursday 03 July. |
Claggy Send message Joined: 29 Dec 06 Posts: 78 Credit: 4,040,969 RAC: 0 |
Got my first invalid, where my task was matched against two OpenCL 1.1 running intel GPUs: Workunit 603716 Claggy |
Jacob Klein Send message Joined: 6 Nov 11 Posts: 16 Credit: 2,938,967 RAC: 0 |
Let's keep this thread on the topic of its subject, please. |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Let's keep this thread on the topic of its subject, please. Your last contribution to the subject, five days ago was. Just wanted to post to mention that I will likely no-longer be testing these problematic work units. Have you come back to testing, and if so, what have you discovered in the meantime about the cause of the problems? |
Jacob Klein Send message Joined: 6 Nov 11 Posts: 16 Credit: 2,938,967 RAC: 0 |
I have not resumed testing on this issue, and do not anticipate doing so. I replaced my GTS 240 with a second GTX 660 Ti, and am focusing on GPUGrid and Poem. Regarding the issue, it seemed to be bad estimations (based on the existing GTX 660 Ti, which had a local exclude_gpu option set on this project) for tasks that ran on the uber weak GTS 240. As I said, rsc_fpops_bound was busted, and the problem was server-side. I don't think it's fixed yet, though I don't know for sure. Regards, Jacob |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
I have not resumed testing on this issue, and do not anticipate doing so. I replaced my GTS 240 with a second GTX 660 Ti, and am focusing on GPUGrid and Poem. Exactly. Specifically during what we are calling "stage 2 of the onramp", between 100 global validations for the project as a whole, and 11 local validations for the individual host - the phase during which flops determined by "PFC avg" can be seen in the server logs. If you're not testing any more, and we understand that much, why do you wish to prevent us discussing other matters of mutual interest in this thread? |
Jacob Klein Send message Joined: 6 Nov 11 Posts: 16 Credit: 2,938,967 RAC: 0 |
This thread is for the "EXIT_TIME_LIMIT_EXCEEDED" error a user might get running these newer apps. A forum search on that error, will find this thread. I am actually still monitoring this thread for an answer. Any other problem, such as bad OpenCL versions generating bad results and bad validations, deserve to be in their own threads. Thanks. |
Claggy Send message Joined: 29 Dec 06 Posts: 78 Credit: 4,040,969 RAC: 0 |
Exactly. Specifically during what we are calling "stage 2 of the onramp", between 100 global validations for the project as a whole, and 11 local validations for the individual host - the phase during which flops determined by "PFC avg" can be seen in the server logs. And to get to those 100 global validations, and 11 local validations, tasks need to validate, having masses of hosts throwing inconclusives into the works is slowing down the process of recovering from the -197 errors, at least for the Gamma-ray pulsar search #3 Claggy |
Jacob Klein Send message Joined: 6 Nov 11 Posts: 16 Credit: 2,938,967 RAC: 0 |
I don't know hardly anything about how the server does its calculations. I had a problem, I reported the problem, and at some point I was hoping to receive an answer to the problem. In the meantime, I was expecting the thread to stay on-topic to the problem, to make it easier to find an answer to in the future. Maybe I'm old-fashioned. If you (the Albert team) need me to do additional testing on the "197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED" problem that my computer was receiving, I'd have to swap hardware to do it, but could do so. Let me know if you'd like to request that. Thanks, Jacob |
Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0 |
I think I found the problem. FGRP has been updated to version 1.12 as a result - the apps are identical. Anybody runs into further -197 time limit exceeded errors please report here ASAP. Please always include host ID - we can glean most variables from database dumps now, but if you can also state your peak_flops (from BOINC startup messages) that would be very helpful. Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Unfortunately, I think Eyrie has jumped the gun on this one. Speaking specifically about FGRP (Gamma-ray pulsar search #3) only: My NVidia 420M laptop (host 11359) has just been allocated new work from the v1.12 run. It was sent out with the 'conservative' (first stage onramp) speed estimate of 27.76 GFlops: that's very close to the 23.59 GFlops the same host achieves on BRP4G-cuda32-nv301. BUT: FGRP is a beta app, which makes very little use of the GPU as yet. It runs much, much slower than BRP4G-cuda32-nv301 on my hardware. The tasks would have got error 197 if I hadn't taken precautions. I can't say whether the problem is Einstein's programming, or NVidia's OpenCL implementation, but at this initial stage for the new app_version, we can't blame BOINC. But we're back to square one with the validation count. Could testers please run more of these tasks (with edited <rsc_fpops_bound>, so they can complete), please? We still need to test how BOINC handles the transitions at 100 validations for the app_version across the project as a whole, and 11 validations for each individual host. |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
OK, guys, gals, and fellow alpha-testers. Again, speaking specifically about FGRP (Gamma-ray pulsar search #3) GPU apps only: I've got a workround for the -197 EXIT_TIME_LIMIT_EXCEEDED. First, set up an app_config.xml file containing this section: <app_config> <app> <name>hsgamma_FGRP3</name> <max_concurrent>1</max_concurrent> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> </app_config> Make BOINC read the file, and check that it's been found properly: this is important, else the next stage will try to make your computer run 10 tasks at once.... Second, check that you understand the host/venue mapping for your fleet, and identify which host(s) and venue(s) will be running the FGRP3/GPU tests. Go to the Albert@Home preferences page for your account, and for the venue(s) you've selected for the test, set the "GPU utilization factor of FGRP apps" low. Really low. Crazy low, like 0.1 This is madness (don't say you haven't been warned), but with app_config.xml keeping the lid on your machine, it works. Third, allow and fetch new FGRP3/GPU work for the machine. You should see the estimated run time for any work already cached on the machine jump five-fold - that's your confirmation that the setting has been transferred properly. That should get us through to the first 100 validations for the project as a whole. Watch out for further advice on how we might handle phase 2. |
Claggy Send message Joined: 29 Dec 06 Posts: 78 Credit: 4,040,969 RAC: 0 |
I hope the validate error problem has been fixed, I'm just about to start my first, all my wingmen that have completed this Wu already have validate error: https://albert.phys.uwm.edu/workunit.php?wuid=603947 Otherwise this is going to be Another real long journey. Claggy |
Claggy Send message Joined: 29 Dec 06 Posts: 78 Credit: 4,040,969 RAC: 0 |
That Wu is completed, and is showing as 'Completed, waiting for validation', but the 'In progress' wingman is an OpenCL 1.1 Intel GPU, so it's either going to be inconclusive or Validate error. A lot of my other Wu's were round 1 inconclusives with OpenCL 1.1 Intel GPUs, so they should validate straight away. Claggy |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Well, we're making progress with the validations. As at 22:00 UTC (Bernd has kindly given us some access to the statistics), we had the following validations, all from Windows/64. plan_class n FGRPopencl-ati 9 FGRPopencl-intel_gpu 5 FGRPopencl-nvidia 20 I'm still not sure what will happen when that last line reaches 100, but it looks like I won't have to keep watching overnight. |
Trotador Send message Joined: 15 May 13 Posts: 6 Credit: 26,130,548 RAC: 0 |
I continue having "Maximum elapsed time exceeded" error in all Binary Radio Pulsar Search (Perseus Arm Survey) v1.39 (BRP5-cuda32-nv270) tasks. Rest of WU seem to run ok. |
Trotador Send message Joined: 15 May 13 Posts: 6 Credit: 26,130,548 RAC: 0 |
Good I posted, last four units have finished ok!:) |