[New release] BRP app v1.22 feedback thread |
| log in |
Message boards : Problems and Bug Reports : [New release] BRP app v1.22 feedback thread
| Author | Message |
|---|---|
|
Hi, | |
| ID: 111876 | | |
|
Just to be sure: By "latest Catalyst driver (>=12.1)" you mean "AMD Catalystâ„¢ 11.11 - Revision number 12.1" ? | |
| ID: 111878 | | |
|
the one available from here http://www2.ati.com/drivers/linux/amd-driver-installer-12.1-x86.x86_64.run | |
| ID: 111879 | | |
|
should we dump Wu's still using the 1.21 app? I've noticed a few inconclusives and errors on that app. not seeing any problems from other GPU projects | |
| ID: 111880 | | |
Just to be sure: By "latest Catalyst driver (>=12.1)" you mean "AMD Catalystâ„¢ 11.11 - Revision number 12.1" ? That's just because of AMD's sloppy web editing. The one you found is the 12.1 driver. Oliver | |
| ID: 111881 | | |
should we dump Wu's still using the 1.21 app? I've noticed a few inconclusives and errors on that app. not seeing any problems from other GPU projects Depends on what errors you saw. If they are memory-related you probably want to reset your project. You'll be resent the same tasks but they'll be crunched with the latest app version which required less memory (OpenCL only). Cheers, Oliver | |
| ID: 111882 | | |
|
Task comes in with <rsc_fpops_est>300000000000000.000000</rsc_fpops_est> which tells BOINC the task is going to take 210 hours and a bit, so BOINC will run it for a long time in panic mode. Can we please get a reasonable fpops estimate, one that doesn't immediately throw Albert tasks in High Priority? | |
| ID: 111883 | | |
|
Are there actually tasks available? I'm not receiving anything for my GPU. | |
| ID: 111884 | | |
|
you have to leave a cpu free otherwise it's unlikely to get work. | |
| ID: 111885 | | |
|
Freed up 2 threads out of 8 and still nothing. Still no work. I'm currently running Moo! so I suspended the project as well. No tasks. | |
| ID: 111887 | | |
Are there actually tasks available? I'm not receiving anything for my GPU. Yes there are and your config looks fine so far. Please have a look at the BOINC event log: did BOINC recognize your AMD GPU as OpenCL device? According to our logs it doesn't seem to be the case. You might need to reinstall the Catalyst driver. Also, remember to start the X server and make sure that BOINC can access the X display. If "clinfo" exists on your system you may use it to verify that your GPU is properly enumerated by OpenCL. Cheers, Oliver | |
| ID: 111888 | | |
|
No X windows here, I'm running win7. | |
| ID: 111889 | | |
No X windows here, I'm running win7 Oops, sorry :-) When you start the BOINC client it'll list all GPUs in the event log (advanced view). For AMD/ATI devices it might talk about CAL and OpenCL - we're interested only in the latter. You should find the list of GPUs more or less at the top of the event log, right before the registered projects are mentioned. Oliver | |
| ID: 111890 | | |
|
My first OpenCL WU with the v1.22 app validated aigainst a CUDA result. No problem (see this Task). However, I believe that I am using an earlier version of Catalyst (< 12.1). I'll have to check that when I'm home from work. | |
| ID: 111891 | | |
|
This happens in 7.0.18: Started download of p2030.20111110.G39.19-00.79.N.b2s0g0.00000_3648.binary Finished download of p2030.20111110.G39.19-00.79.N.b2s0g0.00000_3648.binary first. 01-Mar-2012 08:39:39 [Albert@Home] Sending scheduler request: To fetch work. 01-Mar-2012 08:39:39 [Albert@Home] Reporting 4 completed tasks, requesting new tasks for CPU 01-Mar-2012 08:39:48 [Albert@Home] Scheduler request completed: got 1 new tasks 01-Mar-2012 08:39:48 [Albert@Home] Resent lost task h1_0059.95_S6GC1__39_S6LV1A_1 01-Mar-2012 08:39:51 [Albert@Home] Starting task h1_0059.95_S6GC1__39_S6LV1A_1 using einstein_S6LV1 version 110 (SSE2) in slot 10 01-Mar-2012 08:39:52 [Albert@Home] Computation for task h1_0059.95_S6GC1__39_S6LV1A_1 finished 01-Mar-2012 08:39:52 [Albert@Home] Output file h1_0059.95_S6GC1__39_S6LV1A_1_0 for task h1_0059.95_S6GC1__39_S6LV1A_1 absent 01-Mar-2012 08:41:39 [Albert@Home] Sending scheduler request: To fetch work. 01-Mar-2012 08:41:39 [Albert@Home] Reporting 1 completed tasks, requesting new tasks for CPU 01-Mar-2012 08:41:41 [Albert@Home] Scheduler request completed: got 4 new tasks 01-Mar-2012 08:41:43 [Albert@Home] Starting task h1_0059.95_S6GC1__35_S6LV1A_1 using einstein_S6LV1 version 110 (SSE2) in slot 10 01-Mar-2012 08:41:43 [Albert@Home] Starting task h1_0059.95_S6GC1__33_S6LV1A_1 using einstein_S6LV1 version 110 (SSE2) in slot 11 01-Mar-2012 08:41:43 [Albert@Home] Starting task h1_0059.95_S6GC1__34_S6LV1A_1 using einstein_S6LV1 version 110 (SSE2) in slot 12 01-Mar-2012 08:41:44 [Albert@Home] Computation for task h1_0059.95_S6GC1__35_S6LV1A_1 finished 01-Mar-2012 08:41:44 [Albert@Home] Output file h1_0059.95_S6GC1__35_S6LV1A_1_0 for task h1_0059.95_S6GC1__35_S6LV1A_1 absent 01-Mar-2012 08:41:44 [Albert@Home] Starting task h1_0059.95_S6GC1__36_S6LV1A_1 using einstein_S6LV1 version 110 (SSE2) in slot 10 01-Mar-2012 08:41:45 [Albert@Home] Computation for task h1_0059.95_S6GC1__33_S6LV1A_1 finished 01-Mar-2012 08:41:45 [Albert@Home] Output file h1_0059.95_S6GC1__33_S6LV1A_1_0 for task h1_0059.95_S6GC1__33_S6LV1A_1 absent 01-Mar-2012 08:41:46 [Albert@Home] Computation for task h1_0059.95_S6GC1__34_S6LV1A_1 finished 01-Mar-2012 08:41:46 [Albert@Home] Output file h1_0059.95_S6GC1__34_S6LV1A_1_0 for task h1_0059.95_S6GC1__34_S6LV1A_1 absent 01-Mar-2012 08:41:47 [Albert@Home] Computation for task h1_0059.95_S6GC1__36_S6LV1A_1 finished 01-Mar-2012 08:41:47 [Albert@Home] Output file h1_0059.95_S6GC1__36_S6LV1A_1_0 for task h1_0059.95_S6GC1__36_S6LV1A_1 absent | |
| ID: 111892 | | |
Well, S6LV1 tasks are not the same as BRP tasks. S6LV1 tasks re-use data already present on your host while BRP data is only used once, for a single WU. Looking at the error output of the failed S6LV1 task should tell us what happened. Please open another thread for that problem if it persists. This thread is meant to discuss BRP v1.22 only. Cheers, Oliver | |
| ID: 111893 | | |
No X windows here, I'm running win7 He may need to uninstall the drivers, run driver sweep, then reinstall the 12.1 drivers. Leaving old drivers wreaks havoc on the OpenCL apps at Seti. Probably the same here. His Card is recognized as a 6900 series and he is running the 7.0.18 BOINC so that isn't a problem. Could running vbox be a problem? | |
| ID: 111894 | | |
Don't know but could be if vbox acquires the GPU somehow... Oliver | |
| ID: 111895 | | |
|
No it doesn't. No GPUs are being used on T4T or the Vboxwrapper test project (the only two projects at this time where VBox is being used), other than for showing graphics of sorts. And then these projects require Vbox 4.1.4 or higher, as far as I know. | |
| ID: 111896 | | |
|
Tasks still come in expecting to run for 205 hours. <time_stats> <on_frac>0.939516</on_frac> <connected_frac>0.783900</connected_frac> <active_frac>0.392607</active_frac> <gpu_active_frac>0.392447</gpu_active_frac> <last_update>1330725382.604116</last_update> </time_stats> Of course, it's because BOINC thinks that the 205 hours it's estimated to go do is really 205h / (39 / 100) = 525h (or almost 22 days). A tad difficult to do in 14 days. So it'll run from start to finish in high priority. And as we can see in here, DCF is no longer really used with Boinc 7. Not that it matters, DCF is 7.5, way too high to use reliably. So, pretty please, can the fpops estimate be adjusted enough that they don't come in thinking to take 200+ hours? ____________ Jord. BOINC FAQ Service They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake. | |
| ID: 111897 | | |
|
Since v1.21/v1.22 update, I have got a lot of validation errors. | |
| ID: 111900 | | |
|
Same for me, my results | |
| ID: 111901 | | |
Same for me, my results Besides there is this error in the log [19:14:19][1216][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). Might it explain why calculation time is so huge compared to other platforms ? | |
| ID: 111902 | | |
Not on my machine. vbox (test4theory, 2 cpu's) and three albert BRP's (2 ati, one nvidia) are running fine together. Some are waiting for validation, some are validated, no errors or invalids. win7 x64 8GB 7.0.12 ____________ | |
| ID: 111903 | | |
|
Hi Jord, So, pretty please, can the fpops estimate be adjusted enough that they don't come in thinking to take 200+ hours? I'll forward this to Bernd but he's pretty overwhelmed with more important topics right now and the BOINC devs are of little help analyzing this right now. Please bear with us. Cheers, Oliver | |
| ID: 111904 | | |
Same for me, my results 1) Please read my intro post of this thread. The Mac version is known to produce invalid results. We already disabled it. 2) Your GPU is simply not that efficient that's why it takes so long. It's not about the platform but the GPU. 3) The message you quote is an "INFO" message, so no, it's not the reason. It's normal when a fresh dataset is being analyzed for the first time - there can't be any checkpoint then. HTH, Oliver | |
| ID: 111905 | | |
Since v1.21/v1.22 update, I have got a lot of validation errors. Same as for "[AF>Le_Pommier] McRoger": no working OS X OpenCL app for the time being... Oliver | |
| ID: 111906 | | |
Hi Jord, It may be quite easy. I changed <rsc_fpops_est>300000000000000.000000</rsc_fpops_est> to <rsc_fpops_est>30000000000000.000000</rsc_fpops_est> (one zero less) and restarted BOINC. Estimated time on a new task is now 15 hours, which is more in line than the original 208 hours. ____________ Jord. BOINC FAQ Service They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake. | |
| ID: 111907 | | |
|
It's not. The reason isn't a flaw in our runtime estimation (stored in the work unit definition) but BOINC's new automatic runtime estimation system (a.k.a. new credit system) we're also testing here on albert... | |
| ID: 111909 | | |
It's not. The reason isn't a flaw in our runtime estimation (stored in the work unit definition) but BOINC's new automatic runtime estimation system (a.k.a. new credit system) we're also testing here on albert... ... and it looks like its faulty ???? ____________ | |
| ID: 111910 | | |
|
I've gotten a great deal of invalids and inconclusives. Somethings wrong and I don't think its my GPU | |
| ID: 111911 | | |
|
Just wanted to post that I finally got some Albert tasks! | |
| ID: 111912 | | |
Well, let's say it's non-optimal, in particular for GPU apps. The runtime estimates are determined for every application version independently. Thus after each newly released version BOINC needs some time to gather statistics to come up with a valid/reasonable runtime estimate. Don't worry, we won't be using this new system over on einstein until it proves reliable, but we need to test it here in order to improve (fix) it at all - as soon as time permits. Best, Oliver | |
| ID: 111913 | | |
|
Today I had a lot of atiOpenCL tasks aborted after exactly 24:14 min. 133328 43214 7 Mar 2012 | 16:58:05 UTC 8 Mar 2012 | 6:11:00 UTC Error while computing 1,454.57 580.95 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133327 39414 7 Mar 2012 | 16:59:13 UTC 8 Mar 2012 | 6:11:00 UTC Error while computing 1,453.71 578.01 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133326 39395 7 Mar 2012 | 16:59:13 UTC 8 Mar 2012 | 6:11:00 UTC Error while computing 1,454.23 582.54 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133325 39432 7 Mar 2012 | 16:59:13 UTC 8 Mar 2012 | 8:30:35 UTC Error while computing 1,453.80 582.52 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133324 39441 7 Mar 2012 | 16:59:13 UTC 8 Mar 2012 | 8:30:35 UTC Error while computing 1,453.70 586.05 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133323 43314 7 Mar 2012 | 16:58:05 UTC 8 Mar 2012 | 6:11:00 UTC Error while computing 1,454.00 584.04 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133321 39279 7 Mar 2012 | 16:55:50 UTC 8 Mar 2012 | 6:11:00 UTC Error while computing 1,453.96 582.06 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133320 37403 7 Mar 2012 | 17:00:24 UTC 8 Mar 2012 | 8:30:35 UTC Error while computing 1,454.57 614.77 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133319 36932 7 Mar 2012 | 17:00:24 UTC 8 Mar 2012 | 8:30:35 UTC Error while computing 1,453.83 607.69 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133318 44053 7 Mar 2012 | 17:00:25 UTC 8 Mar 2012 | 10:38:10 UTC Error while computing 1,453.83 662.62 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133317 38006 7 Mar 2012 | 17:01:34 UTC 8 Mar 2012 | 12:57:42 UTC Error while computing 1,454.31 665.01 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133316 43437 7 Mar 2012 | 16:58:05 UTC 8 Mar 2012 | 6:11:00 UTC Error while computing 1,454.19 582.07 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) Bikemans: 133387 44311 7 Mar 2012 | 17:42:19 UTC 8 Mar 2012 | 9:42:23 UTC Error while computing 947.54 846.04 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133348 44229 7 Mar 2012 | 17:42:19 UTC 8 Mar 2012 | 9:42:23 UTC Error while computing 946.95 840.90 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133346 44226 7 Mar 2012 | 17:43:26 UTC 8 Mar 2012 | 10:07:15 UTC Error while computing 947.58 838.51 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133314 39550 7 Mar 2012 | 17:41:09 UTC 8 Mar 2012 | 4:31:47 UTC Error while computing 946.85 838.29 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 133274 44093 7 Mar 2012 | 17:41:09 UTC 8 Mar 2012 | 4:31:47 UTC Error while computing 947.08 842.30 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 130790 44395 7 Mar 2012 | 17:43:27 UTC 8 Mar 2012 | 10:43:45 UTC Error while computing 946.86 844.07 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) 130749 44374 7 Mar 2012 | 17:42:19 UTC 8 Mar 2012 | 9:42:23 UTC Error while computing 947.30 837.06 --- Binary Radio Pulsar Search v1.22 (atiOpenCL) PS.: Bikemans end up earlier due to better hardware | |
| ID: 111914 | | |
|
Now that's strange. Looks like BOINC's borked runtime estimation again. Thanks for reporting... | |
| ID: 111915 | | |
|
Hi, | |
| ID: 111920 | | |
|
All 1.22 wu's are erroring out with max time elapsed http://albert.phys.uwm.edu/results.php?userid=128605&offset=0&show_names=0&state=5&appid= | |
| ID: 111924 | | |
|
I don't know if this affects the OpenCL in any way, but the Catalysts 12.2 do cause Anti Aliasing problems in some games. I noticed it after upgrading to these drivers, that all fine mist like graphics in Skyrim would become lots of square pixels. This can only be fixed by disabling AA and enabling FSAA instead. | |
| ID: 111925 | | |
|
And again... | |
| ID: 111926 | | |
|
As Tullio posted in an other thread, the Albert wu's are slower than the Einstein wu's. | |
| ID: 111928 | | |
|
Hi, <app_version> <app_name>einsteinbinary_BRP4</app_name> <version_num>122</version_num> <platform>i686-pc-linux-gnu</platform> <avg_ncpus>0.150000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <flops>4127438621653.708496</flops> <plan_class>atiOpenCL</plan_class> <api_version>7.0.18</api_version> <file_ref> <file_name>einsteinbinary_BRP4_1.22_i686-pc-linux-gnu__atiOpenCL</file_name> <main_program/> </file_ref> <file_ref> <file_name>einsteinbinary_BRP4_1.00_graphics_i686-pc-linux-gnu</file_name> <open_name>graphics_app</open_name> </file_ref> <coproc> <type>ATI</type> <count>1.000000</count> </coproc> <gpu_ram>377487360.000000</gpu_ram> </app_version> | |
| ID: 111929 | | |
|
Hi | |
| ID: 111930 | | |
|
WUID 47277, run time: 29,286.40 seconds. [00:38:31][368][INFO ] Checkpoint committed! Activated exception handling... [02:14:57] 46805 has this: [04:51:58][3600][INFO ] Checkpoint committed! Activated exception handling... [21:55:34] And from there on in, they slow down. 46559 ran from start to finish without exception handling (aka a break), and as such it ran in 'normal' time. Now, the troubling thing is that it doesn't do this with all tasks. WUID 47791 has a run time of 6,306.80 seconds, yet it also has this: [00:47:25][4336][INFO ] Checkpoint committed! Activated exception handling... [00:48:18] That was a BOINC exit & restart. The other two were stops of the task itself while BOINC continued running. ____________ Jord. BOINC FAQ Service They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake. | |
| ID: 111933 | | |
|
I have an invalid. | |
| ID: 111934 | | |
All 1.22 wu's are erroring out with max time elapsed http://albert.phys.uwm.edu/results.php?userid=128605&offset=0&show_names=0&state=5&appid= Looks like it was the client at fault, upgraded to 7.0.23 & I have a wu in progress ____________ | |
| ID: 111937 | | |
|
I have been away for a bit due to my motherboard dying, and when I got back up and running with the rebuild I was waiting for 7.0.25 to go live so I could run Milkway with Albert without using a beta version for a live project. | |
| ID: 111958 | | |
|
I've seen a message elsewhere saying that OpenCL workunits tend to need much more CPU use than running similar workunits using CUDA. This implies that slow CPUs will slow down OpenCL workunits much more than they slow down CUDA workunits. | |
| ID: 111959 | | |
This implies that slow CPUs will slow down OpenCL workunits much more than they slow down CUDA workunits. Understood. However, that's why I checked Anandtech's benchmarks to see just how much faster the 2600k was than my cpu. The benchmarks do not reflect a 66% performance difference so there is something else going on. Also, unless I read the charts wrong, comparing the GFLOPS between the two video cards, theoretically the 6950 should smoke the 550Ti in SP output (2253 vs. 691.2). So, back to my original question, is the OpenCL app that unoptimized compared to the CUDA app? ____________ | |
| ID: 111960 | | |
|
Here is a WU from Seti@Home Beta's OpenCL application: | |
| ID: 111961 | | |
|
Hmm the OpenCl app uses a full CPU core to work. | |
| ID: 111962 | | |
Hmm the OpenCl app uses a full CPU core to work. Hmm the OpenCl app uses a full CPU core to work. Hi! In terms of CPU usage, the OpenCL app should in theory be comparable to the NVIDIA/CUDA app, but we have seen huge differences in CPU usage with different driver versions from ATI. So the only advice I can give now is to try different drivers, sorry. Please let us know any results for your card (e.g. which driver worked better wrt CPU usage). From the previous message: My conclusion from all this is then, that the Albert AMD OpenCL application isn't as quite as optimized as the Albert CUDA application. Can anyone confirm/deny? It's fair to say that the CUDA app is more optimized to NVIDIA cards than the OpenCL app is optimized to ATI cards, yes. This has several reasons: * OpenCL is a multi-vendor platform while CUDA is NVIDIA only. If you write OpenCL code you want to keep the vendor-independence. It would be great if we could have just one code basis, it has to be seen whether this will be realistic without too much impact on performance on either platform. * The OpenCL app for the pulsar search is a port of the CUDA app which came out first of course, so it's not specifically tuned to the strengths of ATI cards...yet * The first priority is, needless to say, to get the app to a point where it runs on all our target platforms (OSX, Linux, Windows) and produces scientifically sound results that cross-validate with the CUDA and CPU apps. As has been mentioned elsewhere, the level of support (tools, libraries, bugfixing, drivers...) is certainly more mature for CUDA/NVIDIA than for OpenCL/ATI, so almost all our efforts currently have to be directed into "making it work at all" and less can be spent on "optimizing". On the other hand the ATI cards are, without any questions, fine pieces of hardware! So I'm quite optimistic that already the first OpenCL app that will go into production on E@H will have a decent performance/Watt ratio. Stay tuned and thanks for helping us test the thing here on Albert@Home! HBE ____________ | |
| ID: 111964 | | |
|
I did only now realise that my card is not supported because you demand a min workgroup size of 256. | |
| ID: 111969 | | |
I did only now realise that my card is not supported because you demand a min workgroup size of 256. We don't, we just set a preferred value. If your GPU doesn't support it, the value is dynamically adjusted accordingly. Cheers, Oliver | |
| ID: 111973 | | |
I did only now realise that my card is not supported because you demand a min workgroup size of 256. Ah, that sound good. but please have a look at this result, because I don't see an indication that the work group size is adjusted. Another point is this: [04:23:10][4764][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... NOT only at the beginnig of the WU. But after nearly 2 hours runtime. Is that the app or is it BOINC? http://albert.phys.uwm.edu/result.php?resultid=189038 ____________ Christoph | |
| ID: 111977 | | |
|
Hi! | |
| ID: 111978 | | |
|
Ah, these tiny bits of info.....now I remeber that I read somewhere about that. | |
| ID: 111980 | | |
Message boards :
Problems and Bug Reports :
[New release] BRP app v1.22 feedback thread