[New release] BRP app v1.23/1.24 (OpenCL) feedback thread

log in

Advanced search

Message boards : Problems and Bug Reports : [New release] BRP app v1.23/1.24 (OpenCL) feedback thread

Author Message
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 111974 - Posted: 27 Apr 2012 | 9:52:51 UTC

Hi,

We just released BRP4 v1.23 for ATI OpenCL under Linux and Windows which adds a number of improvements.

Notes:
* Now handles work units compatible with those on Einstein@Home (previously workunits on Albert were tweaked to work around a limitation in the OpenCL code)
* OpenCL GPU memory usage reduced
* modest performance improvement
* minor bug fixes
* better selection of work group size for kernels

* Known issue: no OpenCL support for Mac OS X for the time being (we're still looking into a potential Apple bug)

* Please use the latest Catalyst driver (>=12.1) and BOINC client (>=7.0.26). Note that this BOINC version is still a development version (but fixes some OpenGL related problems), it can be downloaded from here:

http://boinc.berkeley.edu/dl/

Without updating to this BOINC version, you will not be able to get OpenCL work on Albert!

Let's try and collect your feedback to this specific release (and this one only) in this thread.


Thanks,
Heinz-Bernd
____________

TRuEQ & TuVaLu
Send message
Joined: 11 Sep 06
Posts: 74
Credit: 119,220
RAC: 291
Message 111979 - Posted: 27 Apr 2012 | 16:09:20 UTC

I made an answer in another thread that also migth be in here.

http://albert.phys.uwm.edu/forum_thread.php?id=8883&nowrap=true#111976

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 111985 - Posted: 28 Apr 2012 | 22:40:43 UTC - in response to Message 111979.

GPU load is steady at 20-21%, and CPU load literally bounces: 5%,15%,6%,14%,4%,17%, etc. with 17 being the highest I've seen.
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 111986 - Posted: 29 Apr 2012 | 7:11:19 UTC - in response to Message 111985.

Thanks for the feedback.

We would also be interested to hear about graphics RAM usage, especially when crunching workunits that were generated beginning from 28th of May.

Cheers
HBE
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 111987 - Posted: 29 Apr 2012 | 16:42:36 UTC - in response to Message 111986.
Last modified: 29 Apr 2012 | 16:47:22 UTC


Thanks for the feedback.

We would also be interested to hear about graphics RAM usage, especially when crunching workunits that were generated beginning from 28th of May.

Cheers
HBE


March? April? Or May of last year?



wu 4/29/2012 9:29:59 AM | Albert@Home | Starting task p2030.20110421.G41.06+00.53.N.b6s0g0.00000_3728_0 using einsteinbinary_BRP4 version 123 (atiOpenCL) in slot 0




____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 111988 - Posted: 29 Apr 2012 | 16:46:21 UTC - in response to Message 111987.

Also, is there a way to make them thumbnails in my post and when you click them they link to larger images (just to not annoy people with really large images)?
____________

steffen_moeller
Send message
Joined: 9 Feb 05
Posts: 13
Credit: 395,435
RAC: 0
Message 111989 - Posted: 29 Apr 2012 | 17:38:25 UTC - in response to Message 111986.

HD 5670, 1GB RAM, Windows 7 Home, Catalyst version 12.4
uses 521 MB, 50 MB dynamic, load 90%, temperature 71.5 deg. Celsius
http://albert.phys.uwm.edu/result.php?resultid=198490

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 111990 - Posted: 29 Apr 2012 | 18:14:22 UTC - in response to Message 111989.

Hi

Thanks again for the feedback.

Oops...I meant workunits generated on 28th of April, not May :-)

Cheers
HB
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 111991 - Posted: 29 Apr 2012 | 18:48:18 UTC - in response to Message 111990.

One more thing: while the workunit mentioned above was sent only recently, it was generated already on the 23rd of April, so it is still one of the "tweaked" workunits. Once the newly generated workunits are reached out, we should see a reduced memory usage and some modest performance increase.

Cheers
HB
____________

steffen_moeller
Send message
Joined: 9 Feb 05
Posts: 13
Credit: 395,435
RAC: 0
Message 111992 - Posted: 29 Apr 2012 | 19:14:27 UTC - in response to Message 111991.

One more thing: while the workunit mentioned above was sent only recently, it was generated already on the 23rd of April, so it is still one of the "tweaked" workunits. Once the newly generated workunits are reached out, we should see a reduced memory usage and some modest performance increase.


Does this mean we are 5 community-days late with processing? If so, I suggest to just stop everything from being sent that does not bring additional insights. Hm, thinking again, you have certainly done that and I was just too quick when I read the announcement. Ah, wait, you expect an impact on the performance also from the tweaking, so you need to have the same new app performed both on tweaked and regular workunits ?!?

Steffen

____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 111993 - Posted: 29 Apr 2012 | 22:11:58 UTC - in response to Message 111991.
Last modified: 29 Apr 2012 | 22:12:29 UTC

One more thing: while the workunit mentioned above was sent only recently, it was generated already on the 23rd of April, so it is still one of the "tweaked" workunits. Once the newly generated workunits are reached out, we should see a reduced memory usage and some modest performance increase.

Cheers
HB


I was afraid of that. However, I didn't know how to decipher what date p2030.20110421.G41.06+00.53.N.b6s0g0.00000_3728_0 ... Never mind, I just realized that 20110421 means April 21, 2011.
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 111994 - Posted: 30 Apr 2012 | 9:49:23 UTC - in response to Message 111993.

Hi

It's ok that the new app version is first crunching thru some of the old workunits, to make sure we didn't break anything or significantly degraded performance even for the code paths that are used only with those old workunits. The support for the old, "tweaked" workunits will stay in the code in case we will again need it later.

The timestamps of 2011 that you might see in the logs or workunit file names refer to the time when the raw data for the workunit was recorded at the radio telescope. This is not crucial for the question we are discussing here.

Cheers
HB
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 111996 - Posted: 1 May 2012 | 0:03:26 UTC - in response to Message 111994.
Last modified: 1 May 2012 | 0:04:58 UTC

Hi!

I've seen the first "new" workunits being completed now, e.g. this one:

http://albert.phys.uwm.edu/workunit.php?wuid=68037

This should give a rough idea what to expect:

good:
* this one validated against a CUDA task
* comparing to older openCL tasks of the same host, the new app with the new workunits seems to show a 10-20% performance increase.

still needs improvement:
* CPU usage seems to be higher that for the CUDA app. I'm not sure how much of this is caused by the driver rather than the app itself
* overall performance is in the right ballpark as compared to the CUDA app, but there should be a bit more room for improvement.


Still, I think if this trend is confirmed by more results and validation is successful, we have a release candidate for Einstein@Home. We will have to upgrade the server side BOINC software to a version that supports OpenCL (as here on Albert@Home), tho.

So with your continued help as beta testers for the OpenCL app here, we are now closing in on going into production with the ATI app.


Cheers
HB
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 111997 - Posted: 1 May 2012 | 0:47:43 UTC - in response to Message 111996.
Last modified: 1 May 2012 | 0:51:02 UTC

I'm not sure why, but I've thrown 3 error recently:

http://albert.phys.uwm.edu/workunit.php?wuid=67888
http://albert.phys.uwm.edu/workunit.php?wuid=66586
http://albert.phys.uwm.edu/workunit.php?wuid=66147


edit:

Upon examination, all the wu's that errored start with this:

<core_client_version>7.0.26</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 111999 - Posted: 1 May 2012 | 13:49:32 UTC - in response to Message 111997.

Thanks for the feedback, I think we have seen this particular error also with other apps and it might even be a general BOINC issue...definitley needs some investigation.

I see your host has now a mix of old and new WUs and I understand that the speedup is about 20%. If you can provide any numbers for the Video RAM usage, that would be cool.

Cheers
HB
____________

terencewee*
Send message
Joined: 2 Feb 12
Posts: 5
Credit: 4,500
RAC: 0
Message 112001 - Posted: 1 May 2012 | 21:46:55 UTC

Using this host.

It's a mobile workstation, i7-820qm, FirePro Mobility 7820 (Juniper-based).
Driver Package: 8.911.3.3-120309a-136336C
Catalyst version: 11.11

I was running POEM++ OpenCL x3 WU on it.
Pause all running WU.
Exit BOINC.

Re-launch BOINC.
Select Albert WU.
Resume

@ ~0.018%, the screen starts to have multi-color square dots
But it continue running.
Pause Albert WU.
Move mouse/window, dots disappear.
Resume Albert WU.
Driver restarts/recover @ ~0.320%.

Pause Albert WU.
Exit BOINC

Restart machine.

Login, launch BOINC.
Resume Albert WU.

No dots, continue run to completion (I hope).

Hope this can be rectified before release.

clinfo dump:
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 AMD-APP (831.4)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_khr_d3d10_sharing
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Board name: ATI FirePro M7820
Max compute units: 10
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 700Mhz
Address bits: 32
Max memory allocation: 536870912
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 1073741824
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 000007FEF1FBC9C8
Name: Juniper
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.1
Driver version: CAL 1.4.1607 (VM)
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP (831.4)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing



____________
--
terencewee*
Sicituradastra.

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112002 - Posted: 2 May 2012 | 0:19:05 UTC - in response to Message 111999.
Last modified: 2 May 2012 | 0:21:40 UTC

I see your host has now a mix of old and new WUs



? I poked through my history and all my wu's have 20110421 in them. I started aborting batches to try and get some new ones, but no dice so far. Unless I am mistaken, the 20110421 is the datestamp for when the data was recorded? Or is that the datestamp from when it was split?

I have the day off tomorrow so I will abort/babysit Boinc to try and get some newer ones.
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112003 - Posted: 2 May 2012 | 2:32:20 UTC - in response to Message 112002.
Last modified: 2 May 2012 | 2:35:20 UTC

p2030.20110421.G41.29-00.40.S.b0s0g0.00000_744_0 using einsteinbinary_BRP4 version 123 (atiOpenCL)


GPU-Z & Task Manager:
http://img7.imageshack.us/img7/7159/p203020110421g41290040s.jpg
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 112004 - Posted: 2 May 2012 | 11:18:20 UTC - in response to Message 112002.

I see your host has now a mix of old and new WUs



? I poked through my history and all my wu's have 20110421 in them.


This is not the WU creation date, you can see that one by following the WU link in the results list. It seems that the first "new" WUs were generated around 13:00 UTC on 27th of April already. When looking at your results, you will notice the results will fall into one of two narrow ranges of runtime, where the newer results (newer by WU creation time) run about 20% faster.

Cheers
HB





____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112006 - Posted: 2 May 2012 | 13:05:06 UTC - in response to Message 112004.
Last modified: 2 May 2012 | 13:05:16 UTC

p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1264_1


http://img809.imageshack.us/img809/154/b0s0g00000012641.jpg
____________

[VENETO] boboviz
Send message
Joined: 6 Oct 06
Posts: 4
Credit: 54,500
RAC: 2,055
Message 112007 - Posted: 2 May 2012 | 15:08:55 UTC

In my AMD HD 6850 i'm running 2 boinc projects: albert@home and poem@home (3 gpu wu in 1 cpu). When i download an Albert@home gpu wu, the poem wus entered in "suspended" state and albert@home wu doesn't start - aka, no work on gpu. If i reboot boinc client, the poem wu remain suspended, but albert wu starts and runs ok...
Is this normal??

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112008 - Posted: 2 May 2012 | 17:00:48 UTC - in response to Message 112007.
Last modified: 2 May 2012 | 17:04:19 UTC

p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1728_0


For some reason this wu is showing 0% GPU load and 25% CPU load. My initial reaction was that this must be an error, however, you can see the GPU clock was down to 725 from 840.

http://img140.imageshack.us/img140/883/b0s0g00000017280.jpg
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112009 - Posted: 2 May 2012 | 17:50:57 UTC - in response to Message 112008.

p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1504_1 using einsteinbinary_BRP4 version 123 (atiOpenCL)


http://img15.imageshack.us/img15/3065/b0s0g00000015041.jpg
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 112010 - Posted: 2 May 2012 | 18:11:36 UTC - in response to Message 112007.

In my AMD HD 6850 i'm running 2 boinc projects: albert@home and poem@home (3 gpu wu in 1 cpu). When i download an Albert@home gpu wu, the poem wus entered in "suspended" state and albert@home wu doesn't start - aka, no work on gpu. If i reboot boinc client, the poem wu remain suspended, but albert wu starts and runs ok...
Is this normal??


hmmm....theoretically it is possible that the Albert task *thought* it didn't have enough memory and waited for some to get available, which happened after the reboot...still, this looks suspicious. Thanks for reporting.

One question tho: is this reproducible, e.g. after each new WU download from Albert?

Cheers
HBE
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 112011 - Posted: 2 May 2012 | 18:18:57 UTC - in response to Message 112008.
Last modified: 2 May 2012 | 18:38:44 UTC

p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1728_0


For some reason this wu is showing 0% GPU load and 25% CPU load. My initial reaction was that this must be an error, however, you can see the GPU clock was down to 725 from 840.

http://img140.imageshack.us/img140/883/b0s0g00000017280.jpg



Strange...this is this one I guess:

http://albert.phys.uwm.edu/result.php?resultid=197941 which has finished in abeout the same time as other tasks. Let's see if it validates.

But I would expect a lower GPU temperature if the load had really been 0% for a longer time, so actually I suspect that the readout is wrong. The app does have phases (at the beginning of each of the 8 subtasks) when there is exclusively CPU load, but this will last only a couple of seconds, not minutes.


THX

HBE
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112013 - Posted: 2 May 2012 | 22:15:48 UTC - in response to Message 112011.

p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1920_1 using einsteinbinary_BRP4 version 123 (atiOpenCL)


http://img96.imageshack.us/img96/6813/b0s0g00000019201.jpg
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112014 - Posted: 2 May 2012 | 23:11:55 UTC - in response to Message 112013.
Last modified: 2 May 2012 | 23:15:43 UTC

Digging through some of the stderr outputs I notice the atiOpenCl app is doing an awful lot of checkpointing. Curious to see if the cuda app was the same, I looked into one of my wu's:

http://albert.phys.uwm.edu/workunit.php?wuid=68681



My (atiOpenCL) output (abbreviated):


[06:49:19][3424][INFO ] Starting data processing...
[06:49:19][3424][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[06:49:19][3424][INFO ] Using OpenCL device "Cayman" by: Advanced Micro Devices, Inc.
[06:49:19][3424][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[06:49:19][3424][INFO ] Header contents:
------> Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.40
...
[06:50:25][3424][INFO ] Checkpoint committed!
[06:51:30][3424][INFO ] Checkpoint committed!
[06:52:35][3424][INFO ] Checkpoint committed!
[06:53:41][3424][INFO ] Checkpoint committed!
[06:54:46][3424][INFO ] Checkpoint committed!
[06:55:52][3424][INFO ] Checkpoint committed!
[06:56:58][3424][INFO ] Checkpoint committed!
[06:58:03][3424][INFO ] Checkpoint committed!
[06:59:08][3424][INFO ] Checkpoint committed!
[07:00:15][3424][INFO ] Checkpoint committed!
[07:01:20][3424][INFO ] Checkpoint committed!
[07:02:25][3424][INFO ] Checkpoint committed!
[07:03:30][3424][INFO ] Checkpoint committed!
[07:04:36][3424][INFO ] Checkpoint committed!
[07:05:41][3424][INFO ] Checkpoint committed!
[07:06:47][3424][INFO ] Checkpoint committed!
[07:07:53][3424][INFO ] Checkpoint committed!
[07:08:58][3424][INFO ] Checkpoint committed!
[07:09:25][3424][INFO ] OpenCL shutdown complete!
[07:09:25][3424][INFO ] Data processing finished successfully!
...


And then repeats the process for:

Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.50
Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.60
Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.70
Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.80
Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.90
Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM127.00
Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM127.10

Checkpointing each WAPP file once per minute, 20 times.



Comparing to the BRP3cuda32 app (abbreviated):


[12:27:01][5004][INFO ] Starting data processing...
[12:27:01][5004][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 218 MB (807 MB free / 1025 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[12:27:01][5004][INFO ] Using CUDA device #0 "GeForce GTX 560" (336 CUDA cores / 1105.44 GFLOPS)
[12:27:01][5004][INFO ] Version of installed CUDA driver: 4020
[12:27:01][5004][INFO ] Version of CUDA driver API used: 3020
[12:27:01][5004][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[12:27:01][5004][INFO ] Header contents:
------> Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.40
...
[12:27:31][5004][INFO ] Checkpoint committed!
[12:28:01][5004][INFO ] Checkpoint committed!
[12:28:31][5004][INFO ] Checkpoint committed!
[12:29:01][5004][INFO ] Checkpoint committed!
[12:29:31][5004][INFO ] Checkpoint committed!
[12:30:02][5004][INFO ] Checkpoint committed!
[12:30:32][5004][INFO ] Checkpoint committed!
[12:31:01][5004][INFO ] Data processing finished successfully!
...


which then also repeats for:

Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.50
Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.60
Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.70
Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.80
Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.90
Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM127.00
Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM127.10

Checkpointing each WAPP file once per minute, 5 times.




So, my questions are:
* What is checkpointing? An intermidiate state (variables) save in case calculations get interrupted and you don't have to start over?

* Is the aitOpenCl app checkpointing more? Or is it that the two apps are doing the same amount of work (calcs), and it's just that the CUDA app/GTX 560 is doing more work per unit time and therefore only needs to checkpoint 5 vs. my 20 times?

* Is the GTX 560/CUDA app really 4x (20/5=4) than the HD6950/AtiOpenCl? The 6950 shows 2253 SP GFLOPS vs. the GTX 560 SP GFLOPS of 1088.6.
http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units
http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units

To semi-answer that, GPU Time indicates a 2.503x increase for the GTX560/CUDA vs. the AtiOpenCl/HD6950. The CPU time for the CUDA app is ,however, 4.24x less than that of the OpenCl app. Anandtech Bench shows the 2500k vs. my AMD 975BE to be slightly better in single-threaded, multi-threaded, and total MIPS (7-Zip test), but nothing earth shattering.
http://www.anandtech.com/bench/Product/288?vs=435

I know you said before that the OpenCl app uses way more CPU than the CUDA app. Perhaps the OpenCl standard is still yet immature, AMD has crappy drivers, or a mix of both? Regardless, I really commend everyone's efforts. Having done a fair bit of coding myself, I know what a pain this can all be.
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112015 - Posted: 2 May 2012 | 23:27:37 UTC - in response to Message 112014.
Last modified: 2 May 2012 | 23:29:40 UTC

p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1928_0 using einsteinbinary_BRP4 version 123 (atiOpenCL)

This one seems to have some weid GPU Load spottiness at ~ the 20% completion mark, but seems to have steadied out at 23% load.

http://img210.imageshack.us/img210/4024/b0s0g00000019280.jpg

Edit:
I take that back, I noticed spottiness again, so I ran the latest 3 versions of GPU-Z side-by-side just to see if there was a bug in one of the versions. There doesn't appear to be as they all report the same load %.

http://img196.imageshack.us/img196/7073/gpuzcomparison.jpg
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112016 - Posted: 3 May 2012 | 2:08:36 UTC - in response to Message 112015.

p2030.20110421.G41.29-00.40.S.b0s0g0.00000_2504_0 using einsteinbinary_BRP4 version 123 (atiOpenCL)


http://img140.imageshack.us/img140/4502/b0s0g00000025040.jpg
____________

Christoph
Send message
Joined: 25 Aug 05
Posts: 48
Credit: 148,613
RAC: 19
Message 112017 - Posted: 3 May 2012 | 8:12:36 UTC

For me the new app takes a full CPU core when it is running. Is that by intention?
____________
Christoph

[VENETO] boboviz
Send message
Joined: 6 Oct 06
Posts: 4
Credit: 54,500
RAC: 2,055
Message 112018 - Posted: 3 May 2012 | 12:33:26 UTC - in response to Message 112010.


One question tho: is this reproducible, e.g. after each new WU download from Albert?

Cheers
HBE


My pc has 8gb DDR3 on Win7 64bit, it's enough?
If i continue to download and run A@H wus, the wus take precedence over Poem.
After the last A@H wu, Poem restarts correctly and if i download another A@H the situation occurs again... :-(
I forget: during the no-gpu-use state, 1 cpu core is in use (like A@h is running)

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 112020 - Posted: 3 May 2012 | 16:24:11 UTC - in response to Message 112017.

For me the new app takes a full CPU core when it is running. Is that by intention?


Hi!

This will depend on the driver version you are using, e.g. you can see from the screenshots posted here that this is not the in general the case for this app.

Cheers
HB

____________

Christoph
Send message
Joined: 25 Aug 05
Posts: 48
Credit: 148,613
RAC: 19
Message 112021 - Posted: 3 May 2012 | 20:25:57 UTC - in response to Message 112020.
Last modified: 3 May 2012 | 20:27:59 UTC

For me the new app takes a full CPU core when it is running. Is that by intention?


Hi!

This will depend on the driver version you are using, e.g. you can see from the screenshots posted here that this is not the in general the case for this app.

Cheers
HB

I was not exact enough in my statement. BOINC reserves a full core. '1 CPUs + 1 ATI GPU' and as I understand it that should not be the case.

Here one log snippet: 03.05.2012 22:25:33 | Albert@Home | [rr_sim_detail] 339385.57: starting p2030.20110421.G41.06+00.53.N.b6s0g0.00000_1448_2 (1.00 CPU + 1.00 ATI)

Oh, driver version 12.3 as far as I read it there that high CPU usage is solved.
____________
Christoph

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 112022 - Posted: 3 May 2012 | 20:27:55 UTC - in response to Message 112014.



So, my questions are:
* What is checkpointing? An intermidiate state (variables) save in case calculations get interrupted and you don't have to start over?


Exactly


* Is the aitOpenCl app checkpointing more? Or is it that the two apps are doing the same amount of work (calcs), and it's just that the CUDA app/GTX 560 is doing more work per unit time and therefore only needs to checkpoint 5 vs. my 20 times?

By default, all apps are checkpointing every 60 seconds. All workunits, whether they will get picked up by a CPU, NVIDIA GPU or ATI GPU will do the same amount of work, but the faster the processing is, the fewer checkpoints will happen during the execution time.


* Is the GTX 560/CUDA app really 4x (20/5=4) than the HD6950/AtiOpenCl? The 6950 shows 2253 SP GFLOPS vs. the GTX 560 SP GFLOPS of 1088.6.
http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units
http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units


Well, the Flops numbers are just theoretical peak performance and not too meaningful. But anyway, it's fair to say that our CUDA app and the libraries we used with it are more mature and optimized than our OpenCL app. We have some ideas in the pipeline how to further improve the OpenCL app and hopefully we can implement them in a timeframe of weeks rather than months, stay tuned.


To semi-answer that, GPU Time indicates a 2.503x increase for the GTX560/CUDA vs. the AtiOpenCl/HD6950. The CPU time for the CUDA app is ,however, 4.24x less than that of the OpenCl app. Anandtech Bench shows the 2500k vs. my AMD 975BE to be slightly better in single-threaded, multi-threaded, and total MIPS (7-Zip test), but nothing earth shattering.
http://www.anandtech.com/bench/Product/288?vs=435

I know you said before that the OpenCl app uses way more CPU than the CUDA app. Perhaps the OpenCl standard is still yet immature, AMD has crappy drivers, or a mix of both? Regardless, I really commend everyone's efforts. Having done a fair bit of coding myself, I know what a pain this can all be.


Indeed :-). I don't think one should blame the OpenCL standard, it's not different from CUDA anyway. It's the implementation of the standard and the drivers that are causing a few troubles. Neither AMD nor NVIDIA seem too enthusiastic about OpenCL anymore, I'm afraid.

CU
HB



____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 112023 - Posted: 3 May 2012 | 20:34:00 UTC - in response to Message 112021.

For me the new app takes a full CPU core when it is running. Is that by intention?


Hi!

This will depend on the driver version you are using, e.g. you can see from the screenshots posted here that this is not the in general the case for this app.

Cheers
HB

I was not exact enough in my statement. BOINC reserves a full core. '1 CPUs + 1 ATI GPU' and as I understand it that should not be the case.

Here one log snippet: 03.05.2012 22:25:33 | Albert@Home | [rr_sim_detail] 339385.57: starting p2030.20110421.G41.06+00.53.N.b6s0g0.00000_1448_2 (1.00 CPU + 1.00 ATI)

Oh, driver version 12.3 as far as I read it there that high CPU usage is solved.


Yup, the allocation of a full core by BOINC is something that is configured at the server side. This was to prevent the situation where CPUs will get overcommitted for those users with older drivers where indeed a full CPU is taken by the driver...it's a conservative choice. We will look into the question how to handle this when we go live with the app, e.g. we could make the CPU allocation dependent on the driver version as we once did for NVIDIA where a similar driver problem existed under Linux, iirc.

Cheers
HB

____________

Christoph
Send message
Joined: 25 Aug 05
Posts: 48
Credit: 148,613
RAC: 19
Message 112024 - Posted: 3 May 2012 | 20:44:22 UTC - in response to Message 112023.

Ah, ok. Looks like I missed that detail somehow. So I can stop scratching my head.
____________
Christoph

Christoph
Send message
Joined: 25 Aug 05
Posts: 48
Credit: 148,613
RAC: 19
Message 112026 - Posted: 3 May 2012 | 21:28:50 UTC

I got one computation error. Propably it has something to do with my reboot yesterday.
I had one also with WCG, that was the reason for the reboot.

http://albert.phys.uwm.edu/result.php?resultid=201372
____________
Christoph

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1447
Credit: 1,758,241
RAC: 2,113
Message 112027 - Posted: 3 May 2012 | 21:38:40 UTC - in response to Message 112026.

I got one computation error. Propably it has something to do with my reboot yesterday.
I had one also with WCG, that was the reason for the reboot.

http://albert.phys.uwm.edu/result.php?resultid=201372



Hi!

Thanks very much for reporting this, this might actually point to a real problem in the code which affects only some cards that have certain restrictions that the app has to take into account when generating work for the GPU.

Stay tuned, I hope I can install a fix tomorrow.

HB
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112030 - Posted: 4 May 2012 | 1:29:39 UTC - in response to Message 112027.

Two more errors with:

<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>

http://albert.phys.uwm.edu/result.php?resultid=199760
http://albert.phys.uwm.edu/result.php?resultid=199762


I just read through the 7.0.27 change log and there is some stuff about trying to address this error. I installed 7.0.27, I'll see if this helps.
____________

astro-marwil
Send message
Joined: 28 May 05
Posts: 47
Credit: 1,633
RAC: 3
Message 112031 - Posted: 4 May 2012 | 7:15:55 UTC - in response to Message 111974.

I´m new here since yesterday afternoon.
The change from BOINC 7.0.25 to .26 was straight forward, except that it took astonishing long - some minutes ? - until my established tasks became running once again.
To establish AaH in my BOINC was more complicated, as AaH is not included in the list of projects in BOINC Manager/Tools/Add a project or project manager. I´d help me by clicking on EaH and replacing in the URL Einstein by Albert. It took a while to find this way. Why isn´t AaH included in the list of projects ??? In the AaH preferences I set the GPU utilization factor to 0.5, BRP4 check and S6LV1 unchecked.
I was somewhat astonished to find in the task log AaH running (1 CPU + 0.5 NVIDIA GPU) and 1 CPU waiting to run S6LV1. Whereas BRP4 tasks from EaH are running with (0.2 CPU + 0.5 NVIDIA GPU) and all 4 CPUs crunching S6LV1 tasks. The CPU load was reduced to about 90%, where as I had before always 100% of load. During running AaH the desktop was very sticky, most time I had to wait some seconds before any activity could be performed. This was also during the phases of waiting of the AaH task. The desktop was no longer sticky when the AaH project was suspended. This is a very uncomfortable way of operation.
So it was running quite a while, but about 15 minutes before the AaH task came to end, I found, that within the last nearly exact 20 minutes interval 3 of the running BRP4 tasks from EaH became marked as "Error while computing", all with Exit code 1002. The AaH task it self ended fine. To morning it was validated by a ATI card running under Linux on a Intel CPU. The running time is about a factor 3 longer, where as the CPU time is comparable - AaH/EaH -.

Because of the divers reported malfunctioning I supended AaH. It´s nice that the task became validated, especialy as the counterpart was of much other type. It shows, you are on a very good way, and when the next version will be available, I will try again.

Kind regards
Martin

____________

tullio
Send message
Joined: 22 Jan 05
Posts: 787
Credit: 50,786
RAC: 101
Message 112032 - Posted: 4 May 2012 | 12:20:30 UTC

Albert@home runs well on my Linux box, all results are validated. I have no GPU.I got some validation error on Einstein@home, on a Gamma-ray pulsar search unit.
Tullio
____________

EselTreiber
Send message
Joined: 29 Apr 08
Posts: 2
Credit: 44,670
RAC: 14
Message 112035 - Posted: 4 May 2012 | 18:00:30 UTC

Feedback from Ubuntu 12.04_amd64 with Catalyst 12.4 /HD6950@6870:
Boinc: last SVN version.

Runs fine, no computation errors if all dependencies are installed. (32bit libraries)

2 Tasks on one GPU give me 90-94% GPU-utilisation with CPU load of 12-14% (Core i7 4.3GHz) per Workunit.

Performance is (compared to nvidia) 1/2 of a GTX 470.

steffen_moeller
Send message
Joined: 9 Feb 05
Posts: 13
Credit: 395,435
RAC: 0
Message 112036 - Posted: 4 May 2012 | 20:27:25 UTC - in response to Message 112031.

During running AaH the desktop was very sticky, most time I had to wait some seconds before any activity could be performed. This was also during the phases of waiting of the AaH task. The desktop was no longer sticky when the AaH project was suspended. This is a very uncomfortable way of operation.

... uncomfortable, but caused by the graphics card interfering with your regular display and is not a defect by albert@home from what I grasp. I observe this with my graphics card on Linux, too. The only way out that I am aware of is to not allow GPU computing while the machine is in use. How much RAM does your card have, btw? I do not observe this behaviour on a 1GB ATI HD 5670 card running albert on Windows, but I do with a HD 5770 512MB card (running prime grid or so because of memory constrains) and this is very much unbearable. Anyone dual booting and observing the issue under Linux but not with Windows? Steffen
____________

Christoph
Send message
Joined: 25 Aug 05
Posts: 48
Credit: 148,613
RAC: 19
Message 112037 - Posted: 4 May 2012 | 22:28:01 UTC - in response to Message 112027.
Last modified: 4 May 2012 | 23:09:04 UTC

Hi,

I have two more errornous wu: http://albert.phys.uwm.edu/result.php?resultid=201372
and http://albert.phys.uwm.edu/result.php?resultid=201360

They have both the same exit code: [23:54:11][5900][ERROR] Error during OpenCL kernel setup: PS_R3 (error: -55)
[23:54:11][5900][ERROR] Demodulation failed (error: 2019)!

It is a bit different from my last failure. I just told BM to copy all Messages in case you need more info. Hope it works, atm BM is hanging and using one full core and around 700mb memory........

EDIT: Looks like I need to kill BOINC. Still stuck. The export did not happen. Which was that file where the messages are safed?

EDIT 2: So it was 'only the Manager that crashed. When I start BoincTask it told me that 4 tasks are running.........Somebody know an AddOn which is saving the Messages to a file outside BOINC?
____________
Christoph

astro-marwil
Send message
Joined: 28 May 05
Posts: 47
Credit: 1,633
RAC: 3
Message 112038 - Posted: 5 May 2012 | 14:05:03 UTC - in response to Message 112036.
Last modified: 5 May 2012 | 14:47:17 UTC

Hallo Steffen!
Thank you for your response.

... but caused by the graphics card interfering with your regular display and is not a defect by albert@home from what I grasp.

This task was running on a GTX550Ti with 1 GB of RAM in slot 0. At the same time a task of BRP4 from EaH was running on the same card - 0,5 mode -. So you are probably right. I didn´t check for the memory load of the GPU, as in EaH I can easily run 3 task a time. I don´t know, how much of memory the OpenCl task does require. The probably too high memory load might also the reason for the long run time. I will take attention on that next time.

Thank you for this hint.
Kind regards
martin
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112039 - Posted: 5 May 2012 | 15:33:54 UTC - in response to Message 112038.
Last modified: 5 May 2012 | 15:34:23 UTC

p2030.20110421.G41.18+00.30.N.b6s0g0.00000_1832_2 using einsteinbinary_BRP4 version 123 (atiOpenCL)


CPU usage is up a little (steady at ~16% [.16*4cores = ~64%]), but so is GPU usage (45%). All in all, everything is looking good.

http://img585.imageshack.us/img585/6087/b6s0g00000018322.jpg
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112042 - Posted: 5 May 2012 | 19:33:53 UTC - in response to Message 112039.

p2030.20110421.G41.18+00.30.N.b6s0g0.00000_1400_4 using einsteinbinary_BRP4 version 123 (atiOpenCL)

http://img842.imageshack.us/img842/3608/b6s0g00000014004.jpg
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112045 - Posted: 6 May 2012 | 3:02:27 UTC - in response to Message 112042.

This wu seems to be wreaking havoc. I completed it ok, but everyone is erroring out. Your client erorred too Bikeman, but I presume that is because you client is 6.12.33?

http://albert.phys.uwm.edu/workunit.php?wuid=69493



So far:

atiOpenCL: (mine)
Completed ok.


atiOpenCL:
<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
P�i odstra�ov�n� transformace barev do�lo k chyb�. (0x7e3) - exit code 2019 (0x7e3)
</message>


BRP3Cuda32:
<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>


atiOpenCL:
<core_client_version>7.0.26</core_client_version>
<![CDATA[
<message>
P�i odstra�ov�n� transformace barev do�lo k chyb�. (0x7e3) - exit code 2019 (0x7e3)
</message>
<stderr_txt>
____________

Infusioned
Send message
Joined: 11 Feb 05
Posts: 45
Credit: 149,000
RAC: 0
Message 112046 - Posted: 6 May 2012 | 3:05:36 UTC - in response to Message 112045.

This wu seems to be wreaking havoc. I completed it ok, but everyone is erroring out. Your client erorred too Bikeman, but I presume that is because you client is 6.12.33?

http://albert.phys.uwm.edu/workunit.php?wuid=69493

...



Seems to be the same types of problems with this wu also:

http://albert.phys.uwm.edu/workunit.php?wuid=69486

____________

ahorek's team
Send message
Joined: 16 Dec 05
Posts: 2
Credit: 116,013
RAC: 80
Message 112047 - Posted: 6 May 2012 | 13:41:32 UTC

Got same errors on my notebook with Mobile Radeon 5450 1GB vram:
Result: http://albert.phys.uwm.edu/result.php?resultid=204994
I'm using the newest drivers 1.4.1720 and Boinc Client 7.0.27. Previous versions of albert app works.

On my another machine with Radeon 5650, there is no problem. Runtime is about 4,5h/wu and memory consumtion 450MB, load 90% with dedicated CPU core (without it only 30%).

Log:
<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
P�i odstra�ov�n� transformace barev do�lo k chyb�. (0x7e3) - exit code 2019 (0x7e3)
</message>
<stderr_txt>
Activated exception handling...
[13:48:03][3088][INFO ] Starting data processing...
[13:48:04][3088][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[13:48:04][3088][INFO ] Using OpenCL device "Cedar" by: Advanced Micro Devices, Inc.
[13:48:05][3088][WARN ] Kernel "kernelTimeSeriesMeanReduction" exceeds device-specific maximum work group size (requested: 256)!
------> Reducing kernel's work group size to allowed maximum of: 128 work items
[13:48:05][3088][WARN ] Kernel "kernelPowerSpectrum" exceeds device-specific maximum work group size (requested: 256)!
------> Reducing kernel's work group size to allowed maximum of: 128 work items
[13:48:05][3088][WARN ] Kernel "kernelHarmonicSumming" exceeds device-specific maximum work group size (requested: 256)!
------> Reducing kernel's work group size to allowed maximum of: 128 work items
[13:48:06][3088][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[13:48:06][3088][INFO ] Header contents:
------> Original WAPP file: ./p2030.20110421.G41.18+00.30.N.b6s0g0.00000_DM192.00
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 55672.41520535187
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 190551.040699
------> DEC (J2000): 73613.7874002
------> Galactic l: 0
------> Galactic b: 0
------> Name: G41.18+00.30.N
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 192 cm^-3 pc
------> Scale factor: 0.00569057
[13:48:13][3088][INFO ] Seed for random number generator is 1158596523.
[13:48:57][3088][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[13:48:58][3088][ERROR] Error during OpenCL kernel setup: PS_R3 (error: -55)
[13:48:58][3088][ERROR] Demodulation failed (error: 2019)!
13:48:58 (3088): called boinc_finish

</stderr_txt>
]]>
____________

X1900AIW
Send message
Joined: 6 May 12
Posts: 2
Credit: 1,000
RAC: 0
Message 112049 - Posted: 6 May 2012 | 18:45:28 UTC
Last modified: 6 May 2012 | 18:48:29 UTC

    Hardware: Desktop-GPU, ATI Radeon HD5450, 1024 MB DDR3, (650/800Mhz)
    Software: Catalst 12.3, BOINC 7.0.26 (x64), Windows 7/64
    RAM-Usage: Taskmanager during GPU-process: ~207 MB (max)
    no visible GPU-Usage (by AMD Overdrive), computing the workunits took just same seconds until fail
    Each workunit failed, so I stopped processing.





Stderr output

<core_client_version>7.0.26</core_client_version>
<![CDATA[
<message>
Beim L�schen der Farbtransformation ist ein Fehler aufgetreten. (0x7e3) - exit code 2019 (0x7e3)
</message>
<stderr_txt>
Activated exception handling...
[20:24:32][5108][INFO ] Starting data processing...
[20:24:33][5108][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[20:24:33][5108][INFO ] Using OpenCL device "Cedar" by: Advanced Micro Devices, Inc.
[20:24:34][5108][WARN ] Kernel "kernelTimeSeriesMeanReduction" exceeds device-specific maximum work group size (requested: 256)!
------> Reducing kernel's work group size to allowed maximum of: 128 work items
[20:24:34][5108][WARN ] Kernel "kernelPowerSpectrum" exceeds device-specific maximum work group size (requested: 256)!
------> Reducing kernel's work group size to allowed maximum of: 128 work items
[20:24:34][5108][WARN ] Kernel "kernelHarmonicSumming" exceeds device-specific maximum work group size (requested: 256)!
------> Reducing kernel's work group size to allowed maximum of: 128 work items
[20:24:35][5108][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[20:24:35][5108][INFO ] Header contents:
------> Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM42.40
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 55672.400301627786
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 190804.6872
------> DEC (J2000): 71149.1882019
------> Galactic l: 0
------> Galactic b: 0
------> Name: G41.29-00.40.S
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 42.4 cm^-3 pc
------> Scale factor: 0.00758342
[20:24:40][5108][INFO ] Seed for random number generator is 1157054464.
[20:25:10][5108][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[20:25:10][5108][ERROR] Error during OpenCL kernel setup: PS_R3 (error: -55)
[20:25:10][5108][ERROR] Demodulation failed (error: 2019)!
20:25:10 (5108): called boinc_finish

</stderr_txt>
]]>

    Alex
    Send message
    Joined: 1 Mar 05
    Posts: 58
    Credit: 313,342
    RAC: 238
    Message 112053 - Posted: 7 May 2012 | 18:02:51 UTC
    Last modified: 7 May 2012 | 18:34:30 UTC

    I gave it a new chance (some weeks ago my system crashed every 20 min).
    Looks good so far!




    GPU usage is perfect when running 2 apps at a time
    CPU usage needs some rework

    BM 7.0.27
    CCC 12.4
    edit:

    figures from the other GPU HD6950

    ____________

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112054 - Posted: 7 May 2012 | 20:25:28 UTC - in response to Message 112053.

    Hi all

    Thanks for the testing, we really appreciate it!

    Some progress report:

    Today we identified the mysterious cause for the CUDA Windows App 1.24 crashing. We also found and hopefully fixed the problem with some OpenCL app errors (the one with "kernel setup: PS_R3" in the logs). If all goes well the fixed versions will be launched tomorrow, Tuesday, on Albert.

    All in all we are still "GO" for an OpenCL launch in this or next week :-). Stay tuned.

    Cheers
    HB


    ____________

    Christoph
    Send message
    Joined: 25 Aug 05
    Posts: 48
    Credit: 148,613
    RAC: 19
    Message 112055 - Posted: 7 May 2012 | 21:08:45 UTC

    This sounds very good!
    ____________
    Christoph

    Alex
    Send message
    Joined: 1 Mar 05
    Posts: 58
    Credit: 313,342
    RAC: 238
    Message 112056 - Posted: 7 May 2012 | 22:40:44 UTC

    Good news!

    'I' crunched 7 ATI wu's today, 3 already validated, 4 pending.
    HD6950: 2 wu's in 1:35 2 GB Ram
    HD5850: 2 wu's in 2:50 , 1 wu in 1:40 1 GB Ram

    win7 x 64, i7 2800, 8GB Ram, CCC 12.4, BM 7.0.27
    ____________

    ahorek's team
    Send message
    Joined: 16 Dec 05
    Posts: 2
    Credit: 116,013
    RAC: 80
    Message 112057 - Posted: 8 May 2012 | 12:37:08 UTC

    So, problem solved, now it works again with v1.24. There are screens of my machines crunching Albert. Is cpu/gpu memory usage normal? Because they differs alot.



    [/img]
    ____________

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112058 - Posted: 8 May 2012 | 13:20:22 UTC

    looks good, thanks!

    I don't fully understand the difference in memory usage, but it could be caused by the different capabilities of the cards. Anyone else here with a 54xx ?

    Cheers
    HB
    ____________

    Christoph
    Send message
    Joined: 25 Aug 05
    Posts: 48
    Credit: 148,613
    RAC: 19
    Message 112060 - Posted: 8 May 2012 | 14:38:19 UTC - in response to Message 112058.

    I have a 5450 but not yet the new app. SETI is right now on the GPU. There were some Ghosts wu lingering in my account so I allowed work to get them going. Sometime tomorrow maybe I will pickup new work here.
    ____________
    Christoph

    TRuEQ & TuVaLu
    Send message
    Joined: 11 Sep 06
    Posts: 74
    Credit: 119,220
    RAC: 291
    Message 112061 - Posted: 8 May 2012 | 17:21:02 UTC
    Last modified: 8 May 2012 | 17:21:39 UTC

    ATI 4850(512MB) no tasks.

    ATI 5850(1024) running with 0.94cpu and 0.5gpu alongside a milkyway task.
    progress of task is 44% and ticking.

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112062 - Posted: 9 May 2012 | 8:14:26 UTC - in response to Message 112061.

    ATI 4850(512MB) no tasks.

    ATI 5850(1024) running with 0.94cpu and 0.5gpu alongside a milkyway task.
    progress of task is 44% and ticking.



    Hi!

    Only OpenCL 1.1 capable cards are supported by this app, that's why the 4850 won't get jobs

    Cheers
    HB
    ____________

    TRuEQ & TuVaLu
    Send message
    Joined: 11 Sep 06
    Posts: 74
    Credit: 119,220
    RAC: 291
    Message 112063 - Posted: 9 May 2012 | 10:25:17 UTC

    I've ran a few tasks with 1.24 and it looks fine.

    60%-70% GPU usage and 0.932CPU but it doesn't use more then 40% and has an avg cpu usage of 25%.

    It works well with the 0.5 option and is able to run alongside Milkyway SETI POEM and Primegrid without a problem.

    TRuEQ & TuVaLu
    Send message
    Joined: 11 Sep 06
    Posts: 74
    Credit: 119,220
    RAC: 291
    Message 112064 - Posted: 9 May 2012 | 12:12:02 UTC

    I must add this:

    When running a cpu project on 1 core and albert gpu task starts then the cpu project goes into "waiting to run state".

    Any chance that you can make albert GPU task work so it is a GPU task and not as it is now a cpu task??

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112065 - Posted: 9 May 2012 | 16:16:24 UTC

    Hi all,

    In preparation for "the launch",we are currently experimenting with validator settings. This will cause an artificially high rate of invalid results in the next few hours, but this allows us to collect some important data. So nothing to worry about :-)

    Cheers
    HB

    ____________

    Christoph
    Send message
    Joined: 25 Aug 05
    Posts: 48
    Credit: 148,613
    RAC: 19
    Message 112066 - Posted: 9 May 2012 | 16:28:35 UTC

    I have my older task done but it is still running. On trying to save all Messages to Memory to dump them BM is hanging again. Will report via Alpha Email list.
    ____________
    Christoph

    [VENETO] boboviz
    Send message
    Joined: 6 Oct 06
    Posts: 4
    Credit: 54,500
    RAC: 2,055
    Message 112067 - Posted: 9 May 2012 | 19:04:17 UTC

    I don't understand.
    With 1.23 version i have the strange "stop and go" behavior.
    Now with 1.24, another problem.....
    With version 1.23, my 4-core cpu runs 3 wu cpu of another project and 1 gpu wu of A@H on my gpu card, now the A@H gpu indicates the use of 0.95cpu, but i run 4 cpu wu and the gpu wu is VERY slow (40% after 5h). If i crunch only 3 cpu wu (suspend others), the wu gpu accelerates and finish very fast.
    I try to restart client, but nothing changes.
    I don't use xml configuration file.

    TRuEQ & TuVaLu
    Send message
    Joined: 11 Sep 06
    Posts: 74
    Credit: 119,220
    RAC: 291
    Message 112068 - Posted: 9 May 2012 | 19:26:16 UTC
    Last modified: 9 May 2012 | 19:27:12 UTC

    I don't know if this is a BM thing or albert app thing.
    But the downloaded tasks shows an estimated runtime of 20hours and they all get done with about 1-2hours.
    I use BM 7.0.27
    All other projects adjust as they should....Except POEM@Home that sometimes show strangly numbers....
    Have I run to few tasks??

    TRuEQ & TuVaLu
    Send message
    Joined: 11 Sep 06
    Posts: 74
    Credit: 119,220
    RAC: 291
    Message 112069 - Posted: 10 May 2012 | 9:50:32 UTC

    I am now running 1 albert task on my ati 5850 with 0.932cpu and 1 seti ap task on the same gpu

    I also noticed 2 cpu tasks running at the same time on my 2 cores.
    1 of the cpu projects runs fully on 1 core and the other cpu project runs on the albert core with 0.1 cpu

    I cannot be sure that the cpu cores use different cores though...
    I lack the knowlegde in how to pursue the threads.
    My conclusion is that I know have 1 cpu project running with 0.1 cpu on the same core that feeds the GPU for albert.

    Any chance if this is correct that you can free some more % from albert task that runs? Albert only uses below 50% of 1 cpu core.

    Ver Greeneyes
    Send message
    Joined: 18 Nov 11
    Posts: 6
    Credit: 530,237
    RAC: 1,098
    Message 112070 - Posted: 10 May 2012 | 12:44:05 UTC - in response to Message 112069.

    I cannot be sure that the cpu cores use different cores though...
    I lack the knowlegde in how to pursue the threads.

    Your operating system's scheduler should take care of this unless you force specific applications to use specific cores. Basically the way it works is that applications/threads get 'time slices' from the OS scheduler, which is how it can run multiple applications side by side on a single core. Between time slices the scheduler might decide to continue to run a thread on a different core depending on how busy each core is - that's why you generally see even single-threaded applications using a bit of each core: because they spend about equal time running on each one.

    X1900AIW
    Send message
    Joined: 6 May 12
    Posts: 2
    Credit: 1,000
    RAC: 0
    Message 112071 - Posted: 10 May 2012 | 15:22:06 UTC - in response to Message 112049.

      Hardware: Desktop-GPU, ATI Radeon HD5450, 1024 MB DDR3, (650/800Mhz)
      Software: Catalst 12.3, BOINC 7.0.26 (x64), Windows 7/64
      RAM-Usage: Taskmanager during GPU-process: ~207 MB (max)
      no visible GPU-Usage (by AMD Overdrive), computing the workunits took just same seconds until fail
      Each workunit failed, so I stopped processing.



    New day, new test: it´s running !
    RAM-usage System: 208 MB RAM max., 84 MB at the moment
    GPU: 43 percent usage with (4,596 CPUs + 1 ATI GPU)
    GPU: 95 percent usage with (3,596 CPUs + 1 ATI GPU)
    GPU Temp: 50 stock, 65 degrees @Albert@home
    estimated runtime for the Albert-Workunit: 24 hours, dead line in 14 days.

    It runs slowly, but with noticeable lags @95 percent GPU usage. CPU usage in BOINC is suboptimal with 3,596 CPUs.

    Infusioned
    Send message
    Joined: 11 Feb 05
    Posts: 45
    Credit: 149,000
    RAC: 0
    Message 112075 - Posted: 12 May 2012 | 16:42:53 UTC - in response to Message 112071.

    These wu's show BRPCUDA32 v1.25 throwing errors:

    http://albert.phys.uwm.edu/workunit.php?wuid=69412
    http://albert.phys.uwm.edu/workunit.php?wuid=70631
    http://albert.phys.uwm.edu/workunit.php?wuid=70986
    http://albert.phys.uwm.edu/workunit.php?wuid=71008

    Most of which are from the same host (GTX 480), with one from this host (GTX285).


    <core_client_version>7.0.25</core_client_version>
    <![CDATA[
    <message>
    Cannot create a symbolic link in a registry key that already has subkeys or values. (0x3fc) - exit code 1020 (0x3fc)
    </message>
    <stderr_txt>
    Activated exception handling...
    [08:07:13][4260][INFO ] Starting data processing...
    [08:07:13][4260][ERROR] Couldn't initialize CUDA driver API (error: 100)!
    [08:07:13][4260][ERROR] Demodulation failed (error: 1020)!
    08:07:13 (4260): called boinc_finish

    </stderr_txt>
    ]]>




    Also, the BRPSSE3 v1.22 client is throwing errors:

    http://albert.phys.uwm.edu/workunit.php?wuid=70871
    http://albert.phys.uwm.edu/workunit.php?wuid=70837

    (from the same host)


    <core_client_version>6.10.60</core_client_version>
    <![CDATA[
    <message>
    too many exit(0)s
    </message>
    ]]>

    ____________

    Christoph
    Send message
    Joined: 25 Aug 05
    Posts: 48
    Credit: 148,613
    RAC: 19
    Message 112076 - Posted: 12 May 2012 | 16:43:48 UTC

    So, finally catched a running task. HD5450 1gb memory max workgroup 128.
    Memory use as per GPU-Z: 416 dedicated, around 70 dynamic. GPU load 96%
    ____________
    Christoph

    terencewee*
    Send message
    Joined: 2 Feb 12
    Posts: 5
    Credit: 4,500
    RAC: 0
    Message 112077 - Posted: 13 May 2012 | 7:40:49 UTC
    Last modified: 13 May 2012 | 8:06:04 UTC

    v1.24

    The app still corrupts the screen with square dots during the initial start-up.
    But there is not driver restart.

    Memory usage is 369MB(dedicated), ~39MB(dynamic).

    Seems faster.

    Will be running consecutive WUs this round using this host.

    1st result reported.

    2nd WU ran fine without any square dots on screen.
    Memory usage is higher.
    475MB(dedicated), ~38MB(dynamic).
    ____________
    --
    terencewee*
    Sicituradastra.

    terencewee*
    Send message
    Joined: 2 Feb 12
    Posts: 5
    Credit: 4,500
    RAC: 0
    Message 112079 - Posted: 13 May 2012 | 11:12:58 UTC

    First WU awaiting validation.

    Second WU completed & validated against a CUDA device. Good job!

    Processing third WU - no square dots on screen.
    Memory usage back to 369MB(dedicated), ~38MB(dynamic).

    Looks like there may be a problem with initial run.

    Scenario 1: Reboot > BOINC > run WU.
    Square dots on screen, no driver restart.
    Consecutive WU ran fine, with no square dots on screen, no driver restart.

    Scenario 2: Reboot > runs some apps > BOINC > run WU.
    Will report back tomorrow.

    ____________
    --
    terencewee*
    Sicituradastra.

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112082 - Posted: 14 May 2012 | 15:28:20 UTC

    Hi all!

    What we are beginning to see as a trend is that HD 6900 series cards have a far harder time to produce cross-validating results than both older and younger cards (meaning they seem to produce less accurate results with the current app).

    The difference is not dramatic but I wonder whether the HD 6900 owners are experiencing this in other projects as well?

    Cheers
    HB





    ____________

    Infusioned
    Send message
    Joined: 11 Feb 05
    Posts: 45
    Credit: 149,000
    RAC: 0
    Message 112083 - Posted: 14 May 2012 | 19:27:42 UTC - in response to Message 112082.
    Last modified: 14 May 2012 | 19:28:35 UTC

    Wow. I find that very strange as the 69xx series cards are double precision vs. the single precision of the NVIDIA and single precision AMD (54xx-57xx, 63xx-68xx, 73xx-76xx) cards.

    At Milkyway double precision cards are required. I haven't had any validation errors with my 6950.


    http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units
    ____________

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112084 - Posted: 14 May 2012 | 22:33:01 UTC - in response to Message 112083.

    The Einstein@Hom app does not need (and does not use) any double precision arithmetic on the GPU, so this should not be a factor.

    At the moment the higher validation failure rate for 6900 series cards is just an observation of correlation, no claim of causality :-), as the number of cards on the Albert@Home project is just too small. It could be an indirect effect, e.g. the FFT lib could choose to switch to a different, but less accurate, code path on 6900 cards because of differences in the runtime characteristics. We'll look into it. Any experience wrt this from other projects is welcome.

    Cheers
    HB



    ____________

    Infusioned
    Send message
    Joined: 11 Feb 05
    Posts: 45
    Credit: 149,000
    RAC: 0
    Message 112085 - Posted: 15 May 2012 | 3:14:24 UTC - in response to Message 112084.
    Last modified: 15 May 2012 | 3:15:54 UTC

    The Einstein@Home app does not need (and does not use) any double precision arithmetic on the GPU, so this should not be a factor.



    I am aware. The point I was trying to make, though, was that how the math is coded matters greatly and does impact precision of the final answer. Let's take for example pi^16 (exaggerated for show) with 3 different approximations for pi.


    3
    9
    27
    81
    243
    729
    2187
    6561
    19683
    59049
    177147
    531441
    1594323
    4782969
    14348907
    43046721


    3.1
    9.61
    29.791
    92.3521
    286.29151
    887.503681
    2751.261411
    8528.910374
    26439.62216
    81962.8287
    254084.769
    787662.7838
    2441754.63
    7569439.352
    23465261.99
    72742312.17


    3.141592654
    9.869604401
    31.00627668
    97.40909103
    306.0196848
    961.3891936
    3020.293228
    9488.531016
    29809.09933
    93648.04748
    294204.018
    924269.1815
    2903677.271
    9122171.182
    28658145.97
    90032220.84


    I did these in excel with the last set of calculations using the actual pi() function in excel (which obviously shows decimal truncations).


    So, **in general**, the more precision you start with, the better your final answer (depending on a host of other things I forget from my numerical computation class), but you pay for it with computation time. But I'm sure I'm not telling you guys anything new.

    Just out of curiosity, was the Einstein app ever run in double precision and compared to results of single precision calculations? I presume it was based on "does not need", but I'd be interested to know the difference.



    At the moment the higher validation failure rate for 6900 series cards is just an observation of correlation, no claim of causality :-), as the number of cards on the Albert@Home project is just too small. It could be an indirect effect, e.g. the FFT lib could choose to switch to a different, but less accurate, code path on 6900 cards because of differences in the runtime characteristics. We'll look into it. Any experience wrt this from other projects is welcome.


    All my above hot air aside, I could have sworn I remember reading somewhere about the accuracy of OpenCL results and a statement to the effect of "it seems AMD has ditched some precision in lieu of speed", however I thought that was rectified with new catalyst drivers. Maybe send a PM to Raistmer on the Seti@Home Beta boards. I'm more than positive he will know (I think he's the one who originally posted it).
    ____________

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112086 - Posted: 15 May 2012 | 14:21:58 UTC

    Hi all!


    With the help of your continued support, we were able to put the ATI/OpenCL app into production on Einstein@Home today.


    [url]http://einstein.phys.uwm.edu/forum_thread.php?id=9446[\url]

    The apps are the same as used here, but note that the minimum BOINC client version was again increased to 7.0.27 (the most recent development version atm).

    We will continue to improve the app so that we will again need beta testers at Albert@Home in the near future, but just now you will probably want to scale back the work at Albert a bit and throw your ATI-cards on the Einstein@Home production project.

    Thanks again,
    HB
    ____________

    Christoph
    Send message
    Joined: 25 Aug 05
    Posts: 48
    Credit: 148,613
    RAC: 19
    Message 112087 - Posted: 15 May 2012 | 14:38:32 UTC - in response to Message 112086.

    Hi all!


    With the help of your continued support, we were able to put the ATI/OpenCL app into production on Einstein@Home today.


    http://einstein.phys.uwm.edu/forum_thread.php?id=9446

    The apps are the same as used here, but note that the minimum BOINC client version was again increased to 7.0.27 (the most recent development version atm).

    We will continue to improve the app so that we will again need beta testers at Albert@Home in the near future, but just now you will probably want to scale back the work at Albert a bit and throw your ATI-cards on the Einstein@Home production project.

    Thanks again,
    HB


    That sounds great. You will cancel the unsent WUs?
    Then we could just keep our machines polling the servers and will get new work and apps as soon as you have them.
    ____________
    Christoph

    zombie67 [MM]
    Avatar
    Send message
    Joined: 10 Oct 06
    Posts: 110
    Credit: 13,286,248
    RAC: 37,239
    Message 112088 - Posted: 16 May 2012 | 13:38:21 UTC

    My 7970 is producing nothing but validation errors:

    http://albert.phys.uwm.edu/results.php?hostid=2209&offset=0&show_names=0&state=4&appid=

    Any ideas why?
    ____________

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112089 - Posted: 16 May 2012 | 20:32:34 UTC - in response to Message 112088.

    Hmmm... GPU temperature is ok??

    I see one result that IS valid, so it's not like strictly all results are junk.

    Out of curiorisity I would underclock the card and see what happens.

    Sometimes hardware just fails, e.g. I have one fairly old NVIDIA GT 9800 that tends to produce long runs of invalid results, and then again returns to normal. I have a strong suspicion that for that particular card this correlates strongly with (room) temperature. I consider it semi-broken by now and shut it down during summer. So there can be a grey zone between good and broken.

    Cheers
    HB
    ____________

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112090 - Posted: 16 May 2012 | 20:45:16 UTC - in response to Message 112085.
    Last modified: 16 May 2012 | 20:45:51 UTC


    Just out of curiosity, was the Einstein app ever run in double precision and compared to results of single precision calculations? I presume it was based on "does not need", but I'd be interested to know the difference.


    If memory serves me right, the BRP (then called ABP-) app started with code that indeed used double precision for some parts of its computations, and ran only on CPUs. When the idea came up to implement a GPU version, the code was changed to use single precision in those parts (almost all of the code) that were supposed to go on the GPU. At that point the scientists made sure that the ability to find pulsars wasn't compromised by this change. Note that the task of the app is not to determine the characteristics of a pulsar detection to extremely high precision (this is done in post-processing pulsar candidates and using re-observations), but to find candidate signals that stick out of the noise sufficiently clear to follow up on them. While this statement is simplifying things quite a bit, it gives you an intuitive idea why single precision is ok for this search.

    Cheers
    HB
    ____________

    Infusioned
    Send message
    Joined: 11 Feb 05
    Posts: 45
    Credit: 149,000
    RAC: 0
    Message 112091 - Posted: 17 May 2012 | 0:41:08 UTC - in response to Message 112090.
    Last modified: 17 May 2012 | 0:45:41 UTC

    Ah I understand. You need a way to cut through all the junk and the volunteers are the garbage filter; which means good enough detection is ok. Understood.


    Also, I checked my Milkway@Home history to see if I was having validation issues there:

    http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=429181

    and all my work is validated instantly because they are set to a minimum quorum of 1. I don't know if that's due to the fact that I have 44 million credit and the I am being considered a trusted source (if such a thing is even designated by the server), or that's just how the project is. I don't remember it being that way (I thought it used to be quorum of 2).

    So now, that makes me nervous. If my results are off, the project isn't comparing them. And, the project is double precision so that means the results need to be accurate.
    ____________

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112092 - Posted: 17 May 2012 | 5:58:04 UTC - in response to Message 112091.
    Last modified: 17 May 2012 | 5:58:57 UTC

    I don't want to get too far off topic here, but it happens there is a paper specifically on the validation strategies for the type of simulation that is done at Milkyway@Home, written by the MW scientists: http://www.cs.rpi.edu/~szymansk/papers/dais10.pdf. Just to cure your nervousness :-)

    Cheers
    HB
    ____________

    zombie67 [MM]
    Avatar
    Send message
    Joined: 10 Oct 06
    Posts: 110
    Credit: 13,286,248
    RAC: 37,239
    Message 112093 - Posted: 17 May 2012 | 15:33:32 UTC - in response to Message 112089.

    Hmmm... GPU temperature is ok??


    It is OC slightly. I will move back to stock and see if that maks a difference.
    ____________

    Infusioned
    Send message
    Joined: 11 Feb 05
    Posts: 45
    Credit: 149,000
    RAC: 0
    Message 112094 - Posted: 17 May 2012 | 15:54:50 UTC - in response to Message 112092.
    Last modified: 17 May 2012 | 16:21:38 UTC

    I don't want to get too far off topic here, but it happens there is a paper specifically on the validation strategies for the type of simulation that is done at Milkyway@Home, written by the MW scientists: http://www.cs.rpi.edu/~szymansk/papers/dais10.pdf. Just to cure your nervousness :-)

    Cheers
    HB


    Excellent. I will read it in chunks to break up the day as I need breaks from my work. Thanks.


    Edit:
    Ok I lied I read it all just now. So it seems that bad results aren't quite so bad, but still negatively effect things. And, ironically enough, they do have trusted/untrusted host status for users.

    I will try to dig more on this because I see I have a lot of inconclusive results for Einstein now. For what it is worth, I know there was an issue with NVIDIA cards silently overflowing and generating bad numbers on the Seti Beta app. However, that still doesn't excuse bad numbers from AMD 6xxx cards if that's the issue.
    ____________

    zombie67 [MM]
    Avatar
    Send message
    Joined: 10 Oct 06
    Posts: 110
    Credit: 13,286,248
    RAC: 37,239
    Message 112095 - Posted: 17 May 2012 | 22:55:15 UTC
    Last modified: 17 May 2012 | 23:02:10 UTC

    Looks like reducing the OC solved it. I also upgraded from 12.3 to 12.4. So I can't be 100% sure. But whatever the case, It's working again.

    Also, FWIW, I am running 3 at a time (.33), and still only ~45% GPU load. And this is with cores reserved, so the CPU has only ~90% load. Is it possible to get to >90% GPU load? Is there an upper limit on the number of simultaneous tasks?
    ____________

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112096 - Posted: 18 May 2012 | 17:13:23 UTC - in response to Message 112095.

    Looks like reducing the OC solved it. I also upgraded from 12.3 to 12.4. So I can't be 100% sure. But whatever the case, It's working again.

    Also, FWIW, I am running 3 at a time (.33), and still only ~45% GPU load. And this is with cores reserved, so the CPU has only ~90% load. Is it possible to get to >90% GPU load? Is there an upper limit on the number of simultaneous tasks?


    The upper limit is reached when the Video RAM is exhausted. So per GB of VRAM you should be able to execute at least 2, possibly 3 instances. It's hard to tell where the "sweet spot" is to maximize the overall output, so some experimentation with the number of "reserved" CPU cores (cores not allocated to CPU apps) and # of GPU jobs in parallel is the best way to find out.

    CU
    HB


    ____________

    Infusioned
    Send message
    Joined: 11 Feb 05
    Posts: 45
    Credit: 149,000
    RAC: 0
    Message 112098 - Posted: 24 May 2012 | 1:22:28 UTC - in response to Message 112096.

    A little update:

    I PM'd Raistmer on the Seti Beta boards and asked him to read the last bit of this thread. He said he did not notice a higher failure rate with the 69xx series cards during his development of AMD apps.

    Also, poking though my MW wu's, I validate just fine against:

    CPU:
    171830352
    171730343
    171601831
    171601829
    171650656
    171850223
    171838869

    Anonymous GPU:
    171940837

    Other 69xx: (making sure my card isn't defective)
    171917181
    171954514

    NVIDIA OpenCL:
    171784516

    HD 58xx GPU:
    171907299


    So, at this point, I am inclined to believe that my card isn't defective in specific, and that the 69xx series cards are producing valid results.

    Should I go back to doing Albert or Einstein wu's?
    ____________

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112100 - Posted: 26 May 2012 | 18:22:40 UTC - in response to Message 112098.

    Hi!


    The issue with the HD 6900 series is this: There is a specific function (used by the FFT lib we are using for the OpenCL apps) that is computed with less accuracy on HD 6900 cards than on others. This is confirmed by AMD. It is not even a defect or bug, because the OpenCL standard allows this behavior.

    To deal with it, we made an app that uses a more accurate, but somewhat slower variant of this function. On Einstein@Home, this special app version is now delivered to HD6900 cards running the OpenCL app.

    Bottom line: it is safe (validation wise) to resume computations on Einstein@Home with HD6900 cards.

    Cheers
    HB

    ____________

    Infusioned
    Send message
    Joined: 11 Feb 05
    Posts: 45
    Credit: 149,000
    RAC: 0
    Message 112102 - Posted: 28 May 2012 | 1:04:06 UTC - in response to Message 112100.

    I'm glad you got to the bottom of things. I guess that means that I the next card I add will be a 79xx card instead of another 69xx. I can't imagine why AMD thought worse accuracy was acceptable considering their whole push for compute oriented video cards and APUs. Then again, maybe that's why things were changed with the 7xxx cards (assuming you had no errors with those)?

    Hats off for all the hard work in getting this app developed.
    ____________

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112103 - Posted: 1 Jun 2012 | 0:06:22 UTC - in response to Message 112102.
    Last modified: 1 Jun 2012 | 0:07:06 UTC

    I'm glad you got to the bottom of things. I guess that means that I the next card I add will be a 79xx card instead of another 69xx. I can't imagine why AMD thought worse accuracy was acceptable considering their whole push for compute oriented video cards and APUs. Then again, maybe that's why things were changed with the 7xxx cards (assuming you had no errors with those)?


    It's actually not something you can blame AMD for (and they were quite helpful in diagnosing this issue). The function in question is documented to have implementation dependent accuracy. It was probably not a good idea for the author of the 3rd party FFT lib to make use of this function, but that's just my personal opinion. We will get rid of this part of code to make sure this doesn't hit us again with future cards.

    Cheers
    HB
    ____________

    robertmiles
    Send message
    Joined: 16 Nov 11
    Posts: 17
    Credit: 594,975
    RAC: 2,664
    Message 112104 - Posted: 2 Jun 2012 | 3:07:53 UTC - in response to Message 112103.

    When you're able to try it on both HD 69xx cards and similar HD 79xx cards, could you give us the relative speeds of the two?

    Some of us would like that information before deciding which card to buy next.

    zombie67 [MM]
    Avatar
    Send message
    Joined: 10 Oct 06
    Posts: 110
    Credit: 13,286,248
    RAC: 37,239
    Message 112105 - Posted: 8 Jun 2012 | 13:38:41 UTC
    Last modified: 8 Jun 2012 | 13:40:36 UTC

    * Known issue: no OpenCL support for Mac OS X for the time being (we're still looking into a potential Apple bug)


    I could swear that I saw a message yesterday, talking about how this was fixed (hopefully). But I can't find it now, and I cannot get any tasks for my mac. Was I hallucinating?

    Edit: It was over at Collatz. D'oh!
    ____________

    Profile Bikeman (Heinz-Bernd Eggenstein)
    Volunteer moderator
    Project administrator
    Project developer
    Avatar
    Send message
    Joined: 28 Aug 06
    Posts: 1447
    Credit: 1,758,241
    RAC: 2,113
    Message 112106 - Posted: 11 Jun 2012 | 14:08:05 UTC - in response to Message 112105.

    * Known issue: no OpenCL support for Mac OS X for the time being (we're still looking into a potential Apple bug)


    I could swear that I saw a message yesterday, talking about how this was fixed (hopefully). But I can't find it now, and I cannot get any tasks for my mac. Was I hallucinating?

    Edit: It was over at Collatz. D'oh!


    Maybe you had sort of a vision, because I've just released, here on Albert, a version that indeed might work on Macs for AMD/OpenCL under OSX (Lion). :-)

    Cheers
    HBE
    ____________

    Post to thread

    Message boards : Problems and Bug Reports : [New release] BRP app v1.23/1.24 (OpenCL) feedback thread


    Home · Your account · Message boards

    This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

    Copyright © 2013 Bruce Allen for the LIGO Scientific Collaboration