[New release] BRP app v1.28 feedback thread

log in

Advanced search

Message boards : Problems and Bug Reports : [New release] BRP app v1.28 feedback thread

Author Message
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1473
Credit: 1,845,340
RAC: 1,725
Message 112164 - Posted: 8 Aug 2012 | 17:37:23 UTC

We just released the first app versions of a new BRP4 v1.28 app release. Over the next days, we will release this app for all supported platforms, today we released

- OpenCL (ATI/AMD), 32 bit, for Linux,
- OpenCL (ATI/AMD), 32 bit, Windows and
- OpenCL (ATI/AMD) OSX (Lion and later).

All versions require BOINC 7.0.25 or later, but we encourage volunteers to actually use the latest recommended BOINC versions, currently 7.0.28 or 7.0.31.

Versions for CPU and NVIDIA CUDA GPU will follow soon, plus some app versions that are native 64 bit apps.

New in this app version:
- Various performance improvements
- cross-validation improvements for HD 69xx GPUs

Cheers
HBE
____________

zombie67 [MM]
Avatar
Send message
Joined: 10 Oct 06
Posts: 110
Credit: 13,385,878
RAC: 3,967
Message 112166 - Posted: 9 Aug 2012 | 2:36:50 UTC

Hmm. My ATI OSX/ML machine can't get work. 7.0.31.
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1473
Credit: 1,845,340
RAC: 1,725
Message 112168 - Posted: 9 Aug 2012 | 8:25:24 UTC - in response to Message 112166.

Hmm. My ATI OSX/ML machine can't get work. 7.0.31.


We had a temporary problem on Albert, BRP4 work should now be available again.

Cheers
HBE
____________

zombie67 [MM]
Avatar
Send message
Joined: 10 Oct 06
Posts: 110
Credit: 13,385,878
RAC: 3,967
Message 112172 - Posted: 9 Aug 2012 | 15:32:03 UTC

Got some. Thanks!
____________

TRuEQ & TuVaLu
Send message
Joined: 11 Sep 06
Posts: 75
Credit: 153,850
RAC: 1,440
Message 112173 - Posted: 12 Aug 2012 | 17:23:34 UTC

I did 1 task.

http://albert.phys.uwm.edu/results.php?hostid=3755

Was faster then the older one.

Good work.

Stephan Goll
Send message
Joined: 13 Dec 05
Posts: 17
Credit: 336,966
RAC: 131
Message 112174 - Posted: 12 Aug 2012 | 22:00:00 UTC

http://albert.phys.uwm.edu/workunit.php?wuid=98764
Looking good so far. Great work.
Stephan


____________

Petrion
Send message
Joined: 30 Apr 08
Posts: 1
Credit: 96,090
RAC: 0
Message 112175 - Posted: 12 Aug 2012 | 22:21:16 UTC
Last modified: 12 Aug 2012 | 22:24:26 UTC

Crunched roughly 50 so far with no problems; using an HD 6850 with Boinc v.7.28 and OC'd i5 2500K (4.5GHz).

Time reduced a lot from average 3,800secs to 2,100secs doing 1xWU at a time...now if you could reduce the CPU Test WUs from 16,000+ secs I wouldn't mind. :)

Profile Gary Roberts
Volunteer moderator
Send message
Joined: 9 Feb 05
Posts: 1685
Credit: 85,000
RAC: 0
Message 112176 - Posted: 13 Aug 2012 | 0:27:09 UTC - in response to Message 112164.


- OpenCL (ATI/AMD) OSX (Lion and later).

I have two pretty much identical hosts - 1859 and 1868 that now have this app. They are both running the latest BOINC (7.0.31) and both have AMD 5750 1GB GPUs. The only hardware difference is that the first one has 4GB RAM whilst the second has 8GB.

The one with smaller RAM has run times around 7Ksecs whilst the other is taking 18Ksecs????

Both machines run EAH tasks on all 4 CPUs and both are doing S6LV1 tasks in aroung 17-18Ksecs. All figures quoted were for tasks crunched over the weekend when there was no user sitting at the keyboard. These machines are used during office hours on weekdays. When idle, both machines power down the screen quite quickly (maybe 3 mins) so the only thing I can now think of is whether or not the screensaver process is still running when the screen turns off. I'll have to check that next time I have access to those machines.

Anyone got any other suggestions? At some point I will release a CPU core from crunching to see what difference that makes and later I may attempt to run two tasks simultaneously - maybe not when you think of the extra heat that might be produced. The aluminium cases are running pretty hot already :-).

____________
Cheers,
Gary.

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1473
Credit: 1,845,340
RAC: 1,725
Message 112177 - Posted: 13 Aug 2012 | 10:16:51 UTC

Hi Gary!

Freeing a CPU core is probably a good thing to try.

Another idea: What I'm always a bit suspicious about (BOINC-performance-wise) is OSX's power saving feature to automatically switch between a CPU build-in GPU (e.g. the Intel stuff in Sandy/Ivy Bridge CPUs) and the dedicated graphics card. Could this be the case that this feature is set differently on those two hosts perhaps?

Cheers
HB

____________

Alex
Send message
Joined: 1 Mar 05
Posts: 64
Credit: 322,756
RAC: 315
Message 112178 - Posted: 13 Aug 2012 | 21:37:24 UTC

Congratulations!

My pc did a BRP 1.28 in ~1980 sec. HD6900 Series GPU, i7 with free cpu, 7.0.31, CCC 12.6
On HD5850 a wu takes ~3450sec.
AMD-apps are grown up now.
____________

Alex
Send message
Joined: 1 Mar 05
Posts: 64
Credit: 322,756
RAC: 315
Message 112182 - Posted: 14 Aug 2012 | 11:09:38 UTC
Last modified: 14 Aug 2012 | 11:13:52 UTC

Phantastic!



The amd is a HD6950 running a single wu @ 82% gpu load, the nVidia is a GTX680, unknown settings.

edit: uups, picture is missing...


wu 997334
task 279100 runtime 1,981.83 Binary Radio Pulsar Search v1.28 (opencl-ati)
task 279099 runtime 11,162.70 Binary Radio Pulsar Search v1.25 (BRP3cuda32)
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1473
Credit: 1,845,340
RAC: 1,725
Message 112183 - Posted: 14 Aug 2012 | 12:34:08 UTC - in response to Message 112178.
Last modified: 14 Aug 2012 | 12:43:56 UTC

Congratulations!

My pc did a BRP 1.28 in ~1980 sec. HD6900 Series GPU, i7 with free cpu, 7.0.31, CCC 12.6
On HD5850 a wu takes ~3450sec.
AMD-apps are grown up now.


Hi,

actually the app should do even faster, did you free a CPU core on the machine with the HD 5850 as well? I have a HD 5850 that is doing a single unit in ca 1800 sec (!!) (although that is on a comparatively fast dual core 3 GHZ Core2 (no hyperthreading!) with only one other task running on the CPU.

Cheers
HB

EDIT: Ah I see....2 cards in one box I guess. That is distorting the numbers a bit of course, but I'm glad to see that you see a substantial performance increase.

HB
____________

Alex
Send message
Joined: 1 Mar 05
Posts: 64
Credit: 322,756
RAC: 315
Message 112184 - Posted: 14 Aug 2012 | 13:21:17 UTC - in response to Message 112183.



Hi,

actually the app should do even faster, did you free a CPU core on the machine with the HD 5850 as well? I have a HD 5850 that is doing a single unit in ca 1800 sec (!!) (although that is on a comparatively fast dual core 3 GHZ Core2 (no hyperthreading!) with only one other task running on the CPU.

Cheers
HB





OK, its the same machine which has both gpu's installed. CPU is a i7, it's a machine for normal daily business (office and controlsystem design), which sometimes runs 100% for BOINC, actually exclusively for Albert (I'm working here right now).
CPU-load for all 4 cores/8 threads are all well below 30%.
The 'smaller' gpu is in an x8 pcie-slot, maybe this also makes a difference.
And since two generations of gpu's are installed, I never can be shure that I have the optimal drivers installed.
Anyway, I'm happy that the amd gpu's are much more usable now for DC jobs than in earlier days when just Milkyway had a pretty fast amd app.

Alexander

____________

Steve Hawker*
Send message
Joined: 22 Jun 12
Posts: 1
Credit: 13,284
RAC: 16
Message 112185 - Posted: 14 Aug 2012 | 18:14:29 UTC - in response to Message 112184.

Seeing as there are few projects that will crunch WUs on my MacBook's GPU, I'm always interested to see how it pans on a new release.

1. the time estimates are way off. initial estimate was >50 hours. task crunched in about 7 hours. IIRC this is down from 12 hours.
2. the CPU utilization was 0.564 of a core. I think this is high (Collatz grabs just 0.02 of a core). Even so I think this is a reduction
3. i still only got 500 credits. for 7 hours crunching Collatz gives me ~6,500

I'm not whining about the credits - I read, understood and accepted the notice. But this is a feedback thread after all.

I enjoy the credit game as much as the next cruncher but I don't always pick the highest paying project. But i'd like a decent whack even so. What i'd like to know is if and when these apps move to production, what will be the going rate? I have a very slight preference for astronomy projects over mathematical projects but 500 credits won't get my GPU. I know, I'm just one and there's plenty more dedicated fish in the sea. nonetheless, this is my feedback.

I'll keep coming back and crunching a WU each time there's a new app. After all, you went to the trouble of making an OSX app, the least I can do is test it.

Thanks!

Alex
Send message
Joined: 1 Mar 05
Posts: 64
Credit: 322,756
RAC: 315
Message 112186 - Posted: 14 Aug 2012 | 19:22:14 UTC

I could not resist ..

I 'borrowed' my old testsystem and tried some wu's.
AMD A8 3870 APU: wu 100169 runtime 6,489.60
HD5830 runtime ~ 2,915.96
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1473
Credit: 1,845,340
RAC: 1,725
Message 112187 - Posted: 15 Aug 2012 | 7:45:29 UTC

Hi!

Thanks for the feedback!

As for credits, our yardstick used to be SETI@Home, so we tuned the credits so that the CPU apps would grant roughly as many credits as the SETI@HOME CPU apps on the same hardware in the the same time. The GPU apps would grant as many credits per UNIT as the CPU apps (that is: all results are treated equally, whether done on CPU or GPU). With SETI@Home now using the "Credit New" BOINC crediting system, things might have diverged a bit. Still my feeling is that E@H will probably try in the future to stay in line with the bigger (in terms of nr of participants) "mainstream" projects and will not try to "out-credit" all the other projects :-). Just my personal idea, tho.

Cheers
HB

____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1473
Credit: 1,845,340
RAC: 1,725
Message 112191 - Posted: 21 Aug 2012 | 14:14:10 UTC

Thanks all for testing, the OpenCL app versions are now released on Einstein@Home.

Next we will release the CUDA apps based on the same codebase to Albert@Home. The perfmance increase will not be as high as for the OpenCL versions which still had to to a bit of catching up with the CUDA versions, so we should now have CUDA and OpenCL apps that are on par, performance-wise, so to speak.


CU

HBE
____________

Vlatko
Send message
Joined: 9 Aug 12
Posts: 4
Credit: 149,500
RAC: 0
Message 112192 - Posted: 23 Aug 2012 | 19:10:18 UTC

Are the CUDA version released?
I am seeing the BRP4 v1.28 version is running on my BOINC menager

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1473
Credit: 1,845,340
RAC: 1,725
Message 112193 - Posted: 24 Aug 2012 | 9:26:39 UTC - in response to Message 112192.

Are the CUDA version released?
I am seeing the BRP4 v1.28 version is running on my BOINC menager


Indeed, the Windows CUDA app is out now, I'll launch the Linux and Mac OSX versions today and then write an announcement.

From what I've seen so far the CUDA Windows app should reduce the runtime of a BRP4 task in a way that scales more with the CPU speed than with GPU parameters, e.g. on modern CPUs (nd if running just one unit at a time), the overall runtime should be reduced by roughly 400-700 seconds. For slower cards that take 2 h per unit, this doesn't amount to much, but for faster cards that take less than half an hour, this should be a substantial speed-up. Any feedback on speed-up in different configurations (2 tasks t time, multiple cards,...) are welcome.

Cheers
HB

____________

Jeroen
Send message
Joined: 25 Nov 05
Posts: 12
Credit: 602,406
RAC: 457
Message 112194 - Posted: 24 Aug 2012 | 14:25:28 UTC - in response to Message 112193.

I am looking forward to testing out BRP4 v1.28 for Linux. Thanks for the updates!
____________

Vlatko
Send message
Joined: 9 Aug 12
Posts: 4
Credit: 149,500
RAC: 0
Message 112195 - Posted: 24 Aug 2012 | 19:39:43 UTC

The speed is amazing.Going from 3800s to 2200s for single task,and 3500s for 2 task.That is on PCIe x16 v1.1 not on v3.0

Great work

Holmis
Send message
Joined: 4 Jan 05
Posts: 70
Credit: 168,662
RAC: 1,516
Message 112196 - Posted: 25 Aug 2012 | 16:02:27 UTC

Yesterday I upgraded my GPU to a factory over clocked GTX 660Ti.

I've run about 15 units, two at a time, and observed a speedup of about 700s compared to the BRP4 v1.25 on Einstein. At the same time the CPU-time per result has decreased by about 400s, thats more than half on my system.

v1.28 on Albert, x2, run time ~2170s and CPU time ~360s.
v1.25 on Einstein, x2, run time ~2900 and CPU time ~770s.

GPU-load also increased from ~80% to 95+%.

Great work!
____________

Vlatko
Send message
Joined: 9 Aug 12
Posts: 4
Credit: 149,500
RAC: 0
Message 112197 - Posted: 26 Aug 2012 | 17:45:21 UTC - in response to Message 112196.

How bout a single work unit?I want to see if a 660Ti is faster than a 580gtx/

Holmis
Send message
Joined: 4 Jan 05
Posts: 70
Credit: 168,662
RAC: 1,516
Message 112198 - Posted: 26 Aug 2012 | 18:06:12 UTC - in response to Message 112197.

How bout a single work unit?I want to see if a 660Ti is faster than a 580gtx/

I'd like to test that but as of right now I can't get any more work for BRP4, probably because the server status pages says 0 tasks to send...
____________

Alex
Send message
Joined: 1 Mar 05
Posts: 64
Credit: 322,756
RAC: 315
Message 112199 - Posted: 26 Aug 2012 | 18:12:35 UTC

BRP3Cuda32
GTX550Ti 1793 / 303 single wu
i3 win7/64 7.0.31 x16 slot
____________

Jeroen
Send message
Joined: 25 Nov 05
Posts: 12
Credit: 602,406
RAC: 457
Message 112200 - Posted: 26 Aug 2012 | 23:51:06 UTC

I ran the new CUDA 1.28 app via one of my Windows systems today. I have not been able to get much work today but the two tasks that ran via my GTX 580, completed at 834 seconds each. This is with one task running at a time. GPU load was at approximately 90-91% while running one task.

If memory serves me right, the previous application ran at around 1360 seconds per task with the 1.25 app via this system. This is a very decent improvement in performance. Thanks for the work put into optimizing the BRP4 applications.

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1473
Credit: 1,845,340
RAC: 1,725
Message 112201 - Posted: 27 Aug 2012 | 8:50:08 UTC

Thanks for the feedback. That's actually a bit more of a speedup than I had expected based on some tests on slower hardware. Definitely in relative terms, the speedup is more pronounced on faster cards.

I will now install the Linux CUDA app on Albert, stand by for more tests. I'm eager to see those GTX 680 .... ;-)

Cheers
HB


____________

Holmis
Send message
Joined: 4 Jan 05
Posts: 70
Credit: 168,662
RAC: 1,516
Message 112202 - Posted: 27 Aug 2012 | 11:57:54 UTC - in response to Message 112197.

How bout a single work unit?I want to see if a 660Ti is faster than a 580gtx/

Got hold of 2 BRP4-tasks and ran them one at a time on my over clocked GTX660Ti (Core@1201.9 MHz).
Run time: 1183.69 and 1175.39 seconds for an average of 1179.54 s.
GPU load ~84%.
This on Win7 x64, PCI-E 3.0x16.
____________

Jeroen
Send message
Joined: 25 Nov 05
Posts: 12
Credit: 602,406
RAC: 457
Message 112204 - Posted: 28 Aug 2012 | 2:40:06 UTC - in response to Message 112201.
Last modified: 28 Aug 2012 | 2:41:07 UTC

Here are some preliminary numbers for the GTX 680.

One task per GPU

System #1 - Single GPU

x16 3.0 - 721 seconds

System #2 - Multi GPU

x16 3.0 - 785 seconds
x8 3.0 - 901 seconds

Overall, the performance looks great so far. I want to do some more testing with multiple tasks running at once, different PCI-E configurations, and with the CPU dedicated for BRP4 GPU only. The above tests were done with ~50% CPU load from running other CPU tasks at the same time.

archae86
Send message
Joined: 6 Dec 05
Posts: 412
Credit: 48,185
RAC: 171
Message 112205 - Posted: 28 Aug 2012 | 3:27:55 UTC

I have a possible finding--not even a little bit sure--but am posting in case others might spot such a thing.

I've got two different hosts with the same GPU, a GTX 460. Neither has had downclocking problems for some weeks, but I found both downclocked severely today, with the problem persisting through system reboot.

It might just barely be possible that running the current Albert BRP1.28 CUDA ap, or on an even less likely note the Albert 0.29 Gamma Ray Pulsar application--or switching back and forth from those to the current Einstein applications was involved.

More likely something else in my system's history was the problem, but I thought I'd post the suspicion in case someone else sees something.

I'm not even sure what the true downclock frequency was, as different sources reported different numbers, but it was either 405 MHz or less--the reduction in power consumption and GPU temperature, while reporting exceedingly high GPU utilization, but making very slow progress on the WU was persuasive that downclocking was at hand.
____________

Vlatko
Send message
Joined: 9 Aug 12
Posts: 4
Credit: 149,500
RAC: 0
Message 112206 - Posted: 28 Aug 2012 | 12:39:00 UTC - in response to Message 112205.

I have sometimes the same issue.It only happens when i overclock the gpu +130Mhz from baseline.After several hours the screen goes blank and with gpu-z it shows core speed of 400Mhz then i proceed with reboot and everything goes to normal.
I think it some fail safe method that Nvidia uses.Also I noticed when boinc is not running the core speed drops to 50mhz and goes to 800 instantaneous if a demanding gpu work is needed

zombie67 [MM]
Avatar
Send message
Joined: 10 Oct 06
Posts: 110
Credit: 13,385,878
RAC: 3,967
Message 112207 - Posted: 29 Aug 2012 | 0:55:26 UTC - in response to Message 112177.
Last modified: 29 Aug 2012 | 1:01:52 UTC

Another idea: What I'm always a bit suspicious about (BOINC-performance-wise) is OSX's power saving feature to automatically switch between a CPU build-in GPU (e.g. the Intel stuff in Sandy/Ivy Bridge CPUs) and the dedicated graphics card. Could this be the case that this feature is set differently on those two hosts perhaps?


Just to be clear, that feature can be turned off, so that the higher performance GPU is always used.

Also, be sure to be clear on which GPUs we are talking about. For example, there is the HD 5850 and there is the Mobility HD 5850. While the numbers are the same, they are completely different things. HD 5850 uses the Cypress PRO core, and is 2088 gflops. The Mobility HD 5850 is the Juniper chip, and is 800-1000 gflops, depending on the clock. The mobility versions are used in lap tops, including the Macbooks.
____________

Stephan Goll
Send message
Joined: 13 Dec 05
Posts: 17
Credit: 336,966
RAC: 131
Message 112208 - Posted: 30 Aug 2012 | 9:49:55 UTC
Last modified: 30 Aug 2012 | 10:03:35 UTC

After some time I decided to look for Albert again, this time with my nvidia. The reason was:
30-Aug-2012 08:53:39 [Albert@Home] Requesting new tasks for ATI
30-Aug-2012 08:53:44 [Albert@Home] Scheduler request completed: got 0 new tasks

To my surprise there was work:

30-Aug-2012 10:41:08 [Albert@Home] Started download of einsteinbinary_BRP4_1.28_i686-pc-linux-gnu__BRP3cuda32nv270

Hmmm. It looks like there is not only a 1.28 OpenCL binary and it looks like we can expect a newer CUDA binany. At the moment we have 1.24 in Einstein.

Bernd or Bikeman, can you please give a bit more information? :)
Thanks,
Stephan
PS: Ah ... I see that there was an bit of information. But mostly for Windows. Now let's see what my Linux box can do.
____________

Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar
Send message
Joined: 28 Aug 06
Posts: 1473
Credit: 1,845,340
RAC: 1,725
Message 112209 - Posted: 31 Aug 2012 | 8:13:27 UTC - in response to Message 112208.

Hi!

Indeed, the 1.28 CUDA app will be released on Einstein@Home shortly ...probably today.

Cheers
HB


____________

Jeroen
Send message
Joined: 25 Nov 05
Posts: 12
Credit: 602,406
RAC: 457
Message 112211 - Posted: 31 Aug 2012 | 14:06:51 UTC
Last modified: 31 Aug 2012 | 14:07:10 UTC

The older cards are also running well with the new version.

8800GT G92 512 MB - x16 slot @ 5.0 GT/s

1.28: 2940 seconds
1.24: ~3600 seconds

Post to thread

Message boards : Problems and Bug Reports : [New release] BRP app v1.28 feedback thread


Home · Your account · Message boards

This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2013 Bruce Allen for the LIGO Scientific Collaboration