[New release] BRP app v1.23/1.24 (OpenCL) feedback thread

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Author	Message
[VENETO] boboviz Send message Joined: 6 Oct 06 Posts: 7 Credit: 344,106 RAC: 0	Message 112007 - Posted: 2 May 2012, 15:08:55 UTC In my AMD HD 6850 i'm running 2 boinc projects: albert@home and poem@home (3 gpu wu in 1 cpu). When i download an Albert@home gpu wu, the poem wus entered in "suspended" state and albert@home wu doesn't start - aka, no work on gpu. If i reboot boinc client, the poem wu remain suspended, but albert wu starts and runs ok... Is this normal?? ID: 112007 · Reply Quote

Infusioned Send message Joined: 11 Feb 05 Posts: 45 Credit: 149,000 RAC: 0	Message 112008 - Posted: 2 May 2012, 17:00:48 UTC - in response to Message 112007. Last modified: 2 May 2012, 17:04:19 UTC p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1728_0 For some reason this wu is showing 0% GPU load and 25% CPU load. My initial reaction was that this must be an error, however, you can see the GPU clock was down to 725 from 840. http://img140.imageshack.us/img140/883/b0s0g00000017280.jpg ID: 112008 · Reply Quote

Infusioned Send message Joined: 11 Feb 05 Posts: 45 Credit: 149,000 RAC: 0	Message 112009 - Posted: 2 May 2012, 17:50:57 UTC - in response to Message 112008. p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1504_1 using einsteinbinary_BRP4 version 123 (atiOpenCL) http://img15.imageshack.us/img15/3065/b0s0g00000015041.jpg ID: 112009 · Reply Quote

Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0	Message 112010 - Posted: 2 May 2012, 18:11:36 UTC - in response to Message 112007. In my AMD HD 6850 i'm running 2 boinc projects: albert@home and poem@home (3 gpu wu in 1 cpu). When i download an Albert@home gpu wu, the poem wus entered in "suspended" state and albert@home wu doesn't start - aka, no work on gpu. If i reboot boinc client, the poem wu remain suspended, but albert wu starts and runs ok... Is this normal?? hmmm....theoretically it is possible that the Albert task thought it didn't have enough memory and waited for some to get available, which happened after the reboot...still, this looks suspicious. Thanks for reporting. One question tho: is this reproducible, e.g. after each new WU download from Albert? Cheers HBE ID: 112010 · Reply Quote

Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0	Message 112011 - Posted: 2 May 2012, 18:18:57 UTC - in response to Message 112008. Last modified: 2 May 2012, 18:38:44 UTC p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1728_0 For some reason this wu is showing 0% GPU load and 25% CPU load. My initial reaction was that this must be an error, however, you can see the GPU clock was down to 725 from 840. http://img140.imageshack.us/img140/883/b0s0g00000017280.jpg Strange...this is this one I guess: http://albert.phys.uwm.edu/result.php?resultid=197941 which has finished in abeout the same time as other tasks. Let's see if it validates. But I would expect a lower GPU temperature if the load had really been 0% for a longer time, so actually I suspect that the readout is wrong. The app does have phases (at the beginning of each of the 8 subtasks) when there is exclusively CPU load, but this will last only a couple of seconds, not minutes. THX HBE ID: 112011 · Reply Quote

Infusioned Send message Joined: 11 Feb 05 Posts: 45 Credit: 149,000 RAC: 0	Message 112013 - Posted: 2 May 2012, 22:15:48 UTC - in response to Message 112011. p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1920_1 using einsteinbinary_BRP4 version 123 (atiOpenCL) http://img96.imageshack.us/img96/6813/b0s0g00000019201.jpg ID: 112013 · Reply Quote

Infusioned Send message Joined: 11 Feb 05 Posts: 45 Credit: 149,000 RAC: 0	Message 112014 - Posted: 2 May 2012, 23:11:55 UTC - in response to Message 112013. Last modified: 2 May 2012, 23:15:43 UTC Digging through some of the stderr outputs I notice the atiOpenCl app is doing an awful lot of checkpointing. Curious to see if the cuda app was the same, I looked into one of my wu's: http://albert.phys.uwm.edu/workunit.php?wuid=68681 My (atiOpenCL) output (abbreviated): [06:49:19][3424][INFO ] Starting data processing... [06:49:19][3424][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc. [06:49:19][3424][INFO ] Using OpenCL device "Cayman" by: Advanced Micro Devices, Inc. [06:49:19][3424][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [06:49:19][3424][INFO ] Header contents: ------> Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.40 ... [06:50:25][3424][INFO ] Checkpoint committed! [06:51:30][3424][INFO ] Checkpoint committed! [06:52:35][3424][INFO ] Checkpoint committed! [06:53:41][3424][INFO ] Checkpoint committed! [06:54:46][3424][INFO ] Checkpoint committed! [06:55:52][3424][INFO ] Checkpoint committed! [06:56:58][3424][INFO ] Checkpoint committed! [06:58:03][3424][INFO ] Checkpoint committed! [06:59:08][3424][INFO ] Checkpoint committed! [07:00:15][3424][INFO ] Checkpoint committed! [07:01:20][3424][INFO ] Checkpoint committed! [07:02:25][3424][INFO ] Checkpoint committed! [07:03:30][3424][INFO ] Checkpoint committed! [07:04:36][3424][INFO ] Checkpoint committed! [07:05:41][3424][INFO ] Checkpoint committed! [07:06:47][3424][INFO ] Checkpoint committed! [07:07:53][3424][INFO ] Checkpoint committed! [07:08:58][3424][INFO ] Checkpoint committed! [07:09:25][3424][INFO ] OpenCL shutdown complete! [07:09:25][3424][INFO ] Data processing finished successfully! ... And then repeats the process for: Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.50 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.60 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.70 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.80 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.90 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM127.00 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM127.10 Checkpointing each WAPP file once per minute, 20 times. Comparing to the BRP3cuda32 app (abbreviated): [12:27:01][5004][INFO ] Starting data processing... [12:27:01][5004][INFO ] CUDA global memory status (initial GPU state, including context): ------> Used in total: 218 MB (807 MB free / 1025 MB total) -> Used by this application (assuming a single GPU task): 0 MB [12:27:01][5004][INFO ] Using CUDA device #0 "GeForce GTX 560" (336 CUDA cores / 1105.44 GFLOPS) [12:27:01][5004][INFO ] Version of installed CUDA driver: 4020 [12:27:01][5004][INFO ] Version of CUDA driver API used: 3020 [12:27:01][5004][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [12:27:01][5004][INFO ] Header contents: ------> Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.40 ... [12:27:31][5004][INFO ] Checkpoint committed! [12:28:01][5004][INFO ] Checkpoint committed! [12:28:31][5004][INFO ] Checkpoint committed! [12:29:01][5004][INFO ] Checkpoint committed! [12:29:31][5004][INFO ] Checkpoint committed! [12:30:02][5004][INFO ] Checkpoint committed! [12:30:32][5004][INFO ] Checkpoint committed! [12:31:01][5004][INFO ] Data processing finished successfully! ... which then also repeats for: Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.50 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.60 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.70 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.80 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.90 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM127.00 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM127.10 Checkpointing each WAPP file once per minute, 5 times. So, my questions are: * What is checkpointing? An intermidiate state (variables) save in case calculations get interrupted and you don't have to start over? * Is the aitOpenCl app checkpointing more? Or is it that the two apps are doing the same amount of work (calcs), and it's just that the CUDA app/GTX 560 is doing more work per unit time and therefore only needs to checkpoint 5 vs. my 20 times? * Is the GTX 560/CUDA app really 4x (20/5=4) than the HD6950/AtiOpenCl? The 6950 shows 2253 SP GFLOPS vs. the GTX 560 SP GFLOPS of 1088.6. http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units To semi-answer that, GPU Time indicates a 2.503x increase for the GTX560/CUDA vs. the AtiOpenCl/HD6950. The CPU time for the CUDA app is ,however, 4.24x less than that of the OpenCl app. Anandtech Bench shows the 2500k vs. my AMD 975BE to be slightly better in single-threaded, multi-threaded, and total MIPS (7-Zip test), but nothing earth shattering. http://www.anandtech.com/bench/Product/288?vs=435 I know you said before that the OpenCl app uses way more CPU than the CUDA app. Perhaps the OpenCl standard is still yet immature, AMD has crappy drivers, or a mix of both? Regardless, I really commend everyone's efforts. Having done a fair bit of coding myself, I know what a pain this can all be. ID: 112014 · Reply Quote

Infusioned Send message Joined: 11 Feb 05 Posts: 45 Credit: 149,000 RAC: 0	Message 112015 - Posted: 2 May 2012, 23:27:37 UTC - in response to Message 112014. Last modified: 2 May 2012, 23:29:40 UTC p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1928_0 using einsteinbinary_BRP4 version 123 (atiOpenCL) This one seems to have some weid GPU Load spottiness at ~ the 20% completion mark, but seems to have steadied out at 23% load. http://img210.imageshack.us/img210/4024/b0s0g00000019280.jpg Edit: I take that back, I noticed spottiness again, so I ran the latest 3 versions of GPU-Z side-by-side just to see if there was a bug in one of the versions. There doesn't appear to be as they all report the same load %. http://img196.imageshack.us/img196/7073/gpuzcomparison.jpg ID: 112015 · Reply Quote

Infusioned Send message Joined: 11 Feb 05 Posts: 45 Credit: 149,000 RAC: 0	Message 112016 - Posted: 3 May 2012, 2:08:36 UTC - in response to Message 112015. p2030.20110421.G41.29-00.40.S.b0s0g0.00000_2504_0 using einsteinbinary_BRP4 version 123 (atiOpenCL) http://img140.imageshack.us/img140/4502/b0s0g00000025040.jpg ID: 112016 · Reply Quote

Christoph Send message Joined: 25 Aug 05 Posts: 48 Credit: 208,211 RAC: 0	Message 112017 - Posted: 3 May 2012, 8:12:36 UTC For me the new app takes a full CPU core when it is running. Is that by intention? Christoph ID: 112017 · Reply Quote

[VENETO] boboviz Send message Joined: 6 Oct 06 Posts: 7 Credit: 344,106 RAC: 0	Message 112018 - Posted: 3 May 2012, 12:33:26 UTC - in response to Message 112010. One question tho: is this reproducible, e.g. after each new WU download from Albert? Cheers HBE My pc has 8gb DDR3 on Win7 64bit, it's enough? If i continue to download and run A@H wus, the wus take precedence over Poem. After the last A@H wu, Poem restarts correctly and if i download another A@H the situation occurs again... :-( I forget: during the no-gpu-use state, 1 cpu core is in use (like A@h is running) ID: 112018 · Reply Quote

Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0	Message 112020 - Posted: 3 May 2012, 16:24:11 UTC - in response to Message 112017. For me the new app takes a full CPU core when it is running. Is that by intention? Hi! This will depend on the driver version you are using, e.g. you can see from the screenshots posted here that this is not the in general the case for this app. Cheers HB ID: 112020 · Reply Quote

Christoph Send message Joined: 25 Aug 05 Posts: 48 Credit: 208,211 RAC: 0	Message 112021 - Posted: 3 May 2012, 20:25:57 UTC - in response to Message 112020. Last modified: 3 May 2012, 20:27:59 UTC For me the new app takes a full CPU core when it is running. Is that by intention? Hi! This will depend on the driver version you are using, e.g. you can see from the screenshots posted here that this is not the in general the case for this app. Cheers HB I was not exact enough in my statement. BOINC reserves a full core. '1 CPUs + 1 ATI GPU' and as I understand it that should not be the case. Here one log snippet: 03.05.2012 22:25:33 \| Albert@Home \| [rr_sim_detail] 339385.57: starting p2030.20110421.G41.06+00.53.N.b6s0g0.00000_1448_2 (1.00 CPU + 1.00 ATI) Oh, driver version 12.3 as far as I read it there that high CPU usage is solved. Christoph ID: 112021 · Reply Quote

Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0	Message 112022 - Posted: 3 May 2012, 20:27:55 UTC - in response to Message 112014. So, my questions are: * What is checkpointing? An intermidiate state (variables) save in case calculations get interrupted and you don't have to start over? Exactly * Is the aitOpenCl app checkpointing more? Or is it that the two apps are doing the same amount of work (calcs), and it's just that the CUDA app/GTX 560 is doing more work per unit time and therefore only needs to checkpoint 5 vs. my 20 times? By default, all apps are checkpointing every 60 seconds. All workunits, whether they will get picked up by a CPU, NVIDIA GPU or ATI GPU will do the same amount of work, but the faster the processing is, the fewer checkpoints will happen during the execution time. * Is the GTX 560/CUDA app really 4x (20/5=4) than the HD6950/AtiOpenCl? The 6950 shows 2253 SP GFLOPS vs. the GTX 560 SP GFLOPS of 1088.6. http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units Well, the Flops numbers are just theoretical peak performance and not too meaningful. But anyway, it's fair to say that our CUDA app and the libraries we used with it are more mature and optimized than our OpenCL app. We have some ideas in the pipeline how to further improve the OpenCL app and hopefully we can implement them in a timeframe of weeks rather than months, stay tuned. To semi-answer that, GPU Time indicates a 2.503x increase for the GTX560/CUDA vs. the AtiOpenCl/HD6950. The CPU time for the CUDA app is ,however, 4.24x less than that of the OpenCl app. Anandtech Bench shows the 2500k vs. my AMD 975BE to be slightly better in single-threaded, multi-threaded, and total MIPS (7-Zip test), but nothing earth shattering. http://www.anandtech.com/bench/Product/288?vs=435 I know you said before that the OpenCl app uses way more CPU than the CUDA app. Perhaps the OpenCl standard is still yet immature, AMD has crappy drivers, or a mix of both? Regardless, I really commend everyone's efforts. Having done a fair bit of coding myself, I know what a pain this can all be. Indeed :-). I don't think one should blame the OpenCL standard, it's not different from CUDA anyway. It's the implementation of the standard and the drivers that are causing a few troubles. Neither AMD nor NVIDIA seem too enthusiastic about OpenCL anymore, I'm afraid. CU HB ID: 112022 · Reply Quote

Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0	Message 112023 - Posted: 3 May 2012, 20:34:00 UTC - in response to Message 112021. For me the new app takes a full CPU core when it is running. Is that by intention? Hi! This will depend on the driver version you are using, e.g. you can see from the screenshots posted here that this is not the in general the case for this app. Cheers HB I was not exact enough in my statement. BOINC reserves a full core. '1 CPUs + 1 ATI GPU' and as I understand it that should not be the case. Here one log snippet: 03.05.2012 22:25:33 \| Albert@Home \| [rr_sim_detail] 339385.57: starting p2030.20110421.G41.06+00.53.N.b6s0g0.00000_1448_2 (1.00 CPU + 1.00 ATI) Oh, driver version 12.3 as far as I read it there that high CPU usage is solved. Yup, the allocation of a full core by BOINC is something that is configured at the server side. This was to prevent the situation where CPUs will get overcommitted for those users with older drivers where indeed a full CPU is taken by the driver...it's a conservative choice. We will look into the question how to handle this when we go live with the app, e.g. we could make the CPU allocation dependent on the driver version as we once did for NVIDIA where a similar driver problem existed under Linux, iirc. Cheers HB ID: 112023 · Reply Quote

Christoph Send message Joined: 25 Aug 05 Posts: 48 Credit: 208,211 RAC: 0	Message 112024 - Posted: 3 May 2012, 20:44:22 UTC - in response to Message 112023. Ah, ok. Looks like I missed that detail somehow. So I can stop scratching my head. Christoph ID: 112024 · Reply Quote

Christoph Send message Joined: 25 Aug 05 Posts: 48 Credit: 208,211 RAC: 0	Message 112026 - Posted: 3 May 2012, 21:28:50 UTC I got one computation error. Propably it has something to do with my reboot yesterday. I had one also with WCG, that was the reason for the reboot. http://albert.phys.uwm.edu/result.php?resultid=201372 Christoph ID: 112026 · Reply Quote

Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0	Message 112027 - Posted: 3 May 2012, 21:38:40 UTC - in response to Message 112026. I got one computation error. Propably it has something to do with my reboot yesterday. I had one also with WCG, that was the reason for the reboot. http://albert.phys.uwm.edu/result.php?resultid=201372 Hi! Thanks very much for reporting this, this might actually point to a real problem in the code which affects only some cards that have certain restrictions that the app has to take into account when generating work for the GPU. Stay tuned, I hope I can install a fix tomorrow. HB ID: 112027 · Reply Quote

Infusioned Send message Joined: 11 Feb 05 Posts: 45 Credit: 149,000 RAC: 0	Message 112030 - Posted: 4 May 2012, 1:29:39 UTC - in response to Message 112027. Two more errors with: <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> http://albert.phys.uwm.edu/result.php?resultid=199760 http://albert.phys.uwm.edu/result.php?resultid=199762 I just read through the 7.0.27 change log and there is some stuff about trying to address this error. I installed 7.0.27, I'll see if this helps. ID: 112030 · Reply Quote

astro-marwil Send message Joined: 28 May 05 Posts: 47 Credit: 1,633 RAC: 0	Message 112031 - Posted: 4 May 2012, 7:15:55 UTC - in response to Message 111974. IÂ´m new here since yesterday afternoon. The change from BOINC 7.0.25 to .26 was straight forward, except that it took astonishing long - some minutes ? - until my established tasks became running once again. To establish AaH in my BOINC was more complicated, as AaH is not included in the list of projects in BOINC Manager/Tools/Add a project or project manager. IÂ´d help me by clicking on EaH and replacing in the URL Einstein by Albert. It took a while to find this way. Why isnÂ´t AaH included in the list of projects ??? In the AaH preferences I set the GPU utilization factor to 0.5, BRP4 check and S6LV1 unchecked. I was somewhat astonished to find in the task log AaH running (1 CPU + 0.5 NVIDIA GPU) and 1 CPU waiting to run S6LV1. Whereas BRP4 tasks from EaH are running with (0.2 CPU + 0.5 NVIDIA GPU) and all 4 CPUs crunching S6LV1 tasks. The CPU load was reduced to about 90%, where as I had before always 100% of load. During running AaH the desktop was very sticky, most time I had to wait some seconds before any activity could be performed. This was also during the phases of waiting of the AaH task. The desktop was no longer sticky when the AaH project was suspended. This is a very uncomfortable way of operation. So it was running quite a while, but about 15 minutes before the AaH task came to end, I found, that within the last nearly exact 20 minutes interval 3 of the running BRP4 tasks from EaH became marked as "Error while computing", all with Exit code 1002. The AaH task it self ended fine. To morning it was validated by a ATI card running under Linux on a Intel CPU. The running time is about a factor 3 longer, where as the CPU time is comparable - AaH/EaH -. Because of the divers reported malfunctioning I supended AaH. ItÂ´s nice that the task became validated, especialy as the counterpart was of much other type. It shows, you are on a very good way, and when the next version will be available, I will try again. Kind regards Martin ID: 112031 · Reply Quote