[New release] BRP app v1.22 feedback thread

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Author	Message
oz Send message Joined: 28 Feb 05 Posts: 10 Credit: 1,285,478 RAC: 0	Message 111929 - Posted: 15 Mar 2012, 8:43:00 UTC - in response to Message 111920. Hi, what did you do exactly on client_state.xml? If I change the <flops> entry in the ati_openCL application section it was automatically reset by the application after a while and tasks end up before finishing. <app_version> <app_name>einsteinbinary_BRP4</app_name> <version_num>122</version_num> <platform>i686-pc-linux-gnu</platform> <avg_ncpus>0.150000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <flops>4127438621653.708496</flops> <plan_class>atiOpenCL</plan_class> <api_version>7.0.18</api_version> <file_ref> <file_name>einsteinbinary_BRP4_1.22_i686-pc-linux-gnu__atiOpenCL</file_name> <main_program/> </file_ref> <file_ref> <file_name>einsteinbinary_BRP4_1.00_graphics_i686-pc-linux-gnu</file_name> <open_name>graphics_app</open_name> </file_ref> <coproc> <type>ATI</type> <count>1.000000</count> </coproc> <gpu_ram>377487360.000000</gpu_ram> </app_version> ID: 111929 · Reply Quote

choks Send message Joined: 24 Feb 05 Posts: 5 Credit: 1,110,845 RAC: 0	Message 111930 - Posted: 16 Mar 2012, 14:02:23 UTC - in response to Message 111929. Hi Once the jobs has been loaded and a couple were aborted, I disabled requesting new jobs, changed <flops> and waited for the remaining jobs to complete. It looks this is no more required because the jobs I got today are processing OK, so it looks fixed. Christophe ID: 111930 · Reply Quote

pragmatic prancing periodic problem child, left Send message Joined: 26 Jan 05 Posts: 1639 Credit: 70,000 RAC: 0	Message 111933 - Posted: 18 Mar 2012, 1:07:17 UTC - in response to Message 111926. WUID 47277, run time: 29,286.40 seconds. WUID 46805, run time: 39,079.57 seconds. WUID 46559, run time: 4,538.00 seconds. 47277 has this: [00:38:31][368][INFO ] Checkpoint committed! Activated exception handling... [02:14:57] 46805 has this: [04:51:58][3600][INFO ] Checkpoint committed! Activated exception handling... [21:55:34] And from there on in, they slow down. 46559 ran from start to finish without exception handling (aka a break), and as such it ran in 'normal' time. Now, the troubling thing is that it doesn't do this with all tasks. WUID 47791 has a run time of 6,306.80 seconds, yet it also has this: [00:47:25][4336][INFO ] Checkpoint committed! Activated exception handling... [00:48:18] That was a BOINC exit & restart. The other two were stops of the task itself while BOINC continued running. Jord. BOINC FAQ Service They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake. ID: 111933 · Reply Quote

Christoph Send message Joined: 25 Aug 05 Posts: 48 Credit: 208,211 RAC: 0	Message 111934 - Posted: 19 Mar 2012, 19:45:34 UTC Last modified: 19 Mar 2012, 19:47:19 UTC I have an invalid. http://albert.phys.uwm.edu/result.php?resultid=140891 Oh, and the long runtimes which Ageless has are normal to me. I will set NNT to other projects to see if the times go down when my tasks run in one go. Christoph ID: 111934 · Reply Quote

Trog Dog Send message Joined: 25 Nov 05 Posts: 204 Credit: 64,008 RAC: 0	Message 111937 - Posted: 24 Mar 2012, 20:00:56 UTC - in response to Message 111924. All 1.22 wu's are erroring out with max time elapsed http://albert.phys.uwm.edu/results.php?userid=128605&offset=0&show_names=0&state=5&appid= running on boinc 7.0.20 ati drivers 12.2 Looks like it was the client at fault, upgraded to 7.0.23 & I have a wu in progress ID: 111937 · Reply Quote

Infusioned Send message Joined: 11 Feb 05 Posts: 45 Credit: 149,000 RAC: 0	Message 111958 - Posted: 15 Apr 2012, 22:12:48 UTC - in response to Message 111937. I have been away for a bit due to my motherboard dying, and when I got back up and running with the rebuild I was waiting for 7.0.25 to go live so I could run Milkway with Albert without using a beta version for a live project. So, poking through my WU times, I am hovering at ~ 5900 GPU seconds, and ~ 3,200 CPU seconds per WU. AMD Phenom II x4 975 (couldn't find an 1100T for non-ripoff prices and am waiting for Piledriver [not happy with Bulldozer]) AMD HD 6950 8G DDR3 1600 Win 7 x64 Boinc 7.0.25 This WU vs. i2600k Sandybridge/550Ti shows the 2600k coming in at 1/3 the time of my cpu. However, Anandtech Bench does not show the 2600k as 66% faster. Also, wikipedia shows the AMD HD6950 SP GFLOPS at 2253 and the NVIDIA GTX 55Ti SP GFLOPS at 691.2, but the 550Ti time is 2/3 of mine. So, my question is, what gives? Is the OpenCL app that unoptimized compared to the CUDA app? ID: 111958 · Reply Quote

robertmiles Send message Joined: 16 Nov 11 Posts: 19 Credit: 4,468,368 RAC: 0	Message 111959 - Posted: 16 Apr 2012, 3:39:42 UTC - in response to Message 111958. Last modified: 16 Apr 2012, 3:42:21 UTC I've seen a message elsewhere saying that OpenCL workunits tend to need much more CPU use than running similar workunits using CUDA. This implies that slow CPUs will slow down OpenCL workunits much more than they slow down CUDA workunits. ID: 111959 · Reply Quote

Infusioned Send message Joined: 11 Feb 05 Posts: 45 Credit: 149,000 RAC: 0	Message 111960 - Posted: 16 Apr 2012, 13:17:59 UTC - in response to Message 111959. Last modified: 16 Apr 2012, 13:19:36 UTC This implies that slow CPUs will slow down OpenCL workunits much more than they slow down CUDA workunits. Understood. However, that's why I checked Anandtech's benchmarks to see just how much faster the 2600k was than my cpu. The benchmarks do not reflect a 66% performance difference so there is something else going on. Also, unless I read the charts wrong, comparing the GFLOPS between the two video cards, theoretically the 6950 should smoke the 550Ti in SP output (2253 vs. 691.2). So, back to my original question, is the OpenCL app that unoptimized compared to the CUDA app? ID: 111960 · Reply Quote

Infusioned Send message Joined: 11 Feb 05 Posts: 45 Credit: 149,000 RAC: 0	Message 111961 - Posted: 20 Apr 2012, 14:26:02 UTC - in response to Message 111960. Here is a WU from Seti@Home Beta's OpenCL application: http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=3973426 I am 58327 and someone with a GTX 590 GPU, Intel 2600k CPU, Cuda OpenCL client is 56759. My CPU seconds are 1463 and theirs are 2061. My GPU seconds are 3244 and theirs are 2198. My CPU time is actually lower (75%) of the 2600k, but my GPU time is ~150% of the GTX 590 (which again, is curious, given the GFLOP numbers). My conclusion from all this is then, that the Albert AMD OpenCL application isn't as quite as optimized as the Albert CUDA application. Can anyone confirm/deny? ID: 111961 · Reply Quote

Falconet Send message Joined: 20 Jan 12 Posts: 1 Credit: 0 RAC: 0	Message 111962 - Posted: 22 Apr 2012, 17:03:27 UTC Hmm the OpenCl app uses a full CPU core to work. IS there any way to lower that usage? ID: 111962 · Reply Quote

Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0	Message 111964 - Posted: 24 Apr 2012, 9:25:11 UTC - in response to Message 111962. Hmm the OpenCl app uses a full CPU core to work. IS there any way to lower that usage? Hmm the OpenCl app uses a full CPU core to work. IS there any way to lower that usage? Hi! In terms of CPU usage, the OpenCL app should in theory be comparable to the NVIDIA/CUDA app, but we have seen huge differences in CPU usage with different driver versions from ATI. So the only advice I can give now is to try different drivers, sorry. Please let us know any results for your card (e.g. which driver worked better wrt CPU usage). From the previous message: My conclusion from all this is then, that the Albert AMD OpenCL application isn't as quite as optimized as the Albert CUDA application. Can anyone confirm/deny? It's fair to say that the CUDA app is more optimized to NVIDIA cards than the OpenCL app is optimized to ATI cards, yes. This has several reasons: * OpenCL is a multi-vendor platform while CUDA is NVIDIA only. If you write OpenCL code you want to keep the vendor-independence. It would be great if we could have just one code basis, it has to be seen whether this will be realistic without too much impact on performance on either platform. * The OpenCL app for the pulsar search is a port of the CUDA app which came out first of course, so it's not specifically tuned to the strengths of ATI cards...yet * The first priority is, needless to say, to get the app to a point where it runs on all our target platforms (OSX, Linux, Windows) and produces scientifically sound results that cross-validate with the CUDA and CPU apps. As has been mentioned elsewhere, the level of support (tools, libraries, bugfixing, drivers...) is certainly more mature for CUDA/NVIDIA than for OpenCL/ATI, so almost all our efforts currently have to be directed into "making it work at all" and less can be spent on "optimizing". On the other hand the ATI cards are, without any questions, fine pieces of hardware! So I'm quite optimistic that already the first OpenCL app that will go into production on E@H will have a decent performance/Watt ratio. Stay tuned and thanks for helping us test the thing here on Albert@Home! HBE ID: 111964 · Reply Quote

Christoph Send message Joined: 25 Aug 05 Posts: 48 Credit: 208,211 RAC: 0	Message 111969 - Posted: 25 Apr 2012, 12:28:56 UTC Last modified: 25 Apr 2012, 12:38:03 UTC I did only now realise that my card is not supported because you demand a min workgroup size of 256. I have 128. HD 5450. Can you lower that? Otherwise I will stop crunching with my GPU here for now. Raistmer is waiting for more results over at SETI Beta. Christoph ID: 111969 · Reply Quote

Oliver Behnke Volunteer moderator Project administrator Project developer Send message Joined: 4 Sep 07 Posts: 130 Credit: 8,545,955 RAC: 0	Message 111973 - Posted: 27 Apr 2012, 9:13:54 UTC - in response to Message 111969. I did only now realise that my card is not supported because you demand a min workgroup size of 256. We don't, we just set a preferred value. If your GPU doesn't support it, the value is dynamically adjusted accordingly. Cheers, Oliver ID: 111973 · Reply Quote

Christoph Send message Joined: 25 Aug 05 Posts: 48 Credit: 208,211 RAC: 0	Message 111977 - Posted: 27 Apr 2012, 12:10:46 UTC - in response to Message 111973. I did only now realise that my card is not supported because you demand a min workgroup size of 256. We don't, we just set a preferred value. If your GPU doesn't support it, the value is dynamically adjusted accordingly. Cheers, Oliver Ah, that sound good. but please have a look at this result, because I don't see an indication that the work group size is adjusted. Another point is this: [04:23:10][4764][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... NOT only at the beginnig of the WU. But after nearly 2 hours runtime. Is that the app or is it BOINC? http://albert.phys.uwm.edu/result.php?resultid=189038 Christoph ID: 111977 · Reply Quote

Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0	Message 111978 - Posted: 27 Apr 2012, 13:36:53 UTC - in response to Message 111977. Hi! Each task that you download is actually a bundle of 8 independent sub-tasks. When one sub task is finished, processing of the next one begins, using its own checkpoints. So it is normal that there will be exactly 8 instances of the "starting from scratch" message in the logs per task. CU Heinz-Bernd ID: 111978 · Reply Quote

Christoph Send message Joined: 25 Aug 05 Posts: 48 Credit: 208,211 RAC: 0	Message 111980 - Posted: 27 Apr 2012, 19:50:52 UTC - in response to Message 111978. Ah, these tiny bits of info.....now I remeber that I read somewhere about that. Thank you for reminding me of that. Christoph ID: 111980 · Reply Quote