Deprecated: Function get_magic_quotes_gpc() is deprecated in /srv/BOINC/live-webcode/html/inc/util.inc on line 640
FGRP #3 feedback

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

FGRP #3 feedback

Message boards : Problems and Bug Reports : FGRP #3 feedback
Message board moderation

To post messages, you must log in.

AuthorMessage
Alex

Send message
Joined: 1 Mar 05
Posts: 88
Credit: 398,734
RAC: 0
Message 112763 - Posted: 4 Jan 2014, 8:18:36 UTC

I run the app currently on my notebooks only. One of them has a build-in GTX470m. The GPU usage never exceeded 43% with very long periods of 1 - 2 %.
Checkpointing seems not to be the final solution; I stopped crunching with a wu @ 88,888% with a remaining time of <10 min; after restarting the remaining time jumped to > 25 min.
The work distribution does not work according the preferences. Also I've set to get FGRP #3 only, I got other wu's too, most of them casa's. And the GPU utilisation factor of 0.5 is also ignored. With a 'memory used' of only 127MB I could run easily 5 or 6 of these wu's. But this would require 6 or 8 core cpu's or in other words FGRP#3 only pc's.
If this is the final version according to the gpu usage a way must be found to run a stable combination of let's say 2 perseus arm wu's together with one FGRP#3, otherwise the GPU would run idle for longer periods.
Another way would be to make the gpu utilisation factor an argument for the wu type rather than to the pc's venue.

Alexander
ID: 112763 · Report as offensive     Reply Quote
DF1DX

Send message
Joined: 5 Mar 13
Posts: 4
Credit: 63,982
RAC: 0
Message 112764 - Posted: 4 Jan 2014, 15:13:06 UTC

Result for FGRP #3: 3060s, i5-3570K @ 4.2 GHz, no GPU

BTW: Is AVX support (also for S6-CasA etc.) possible?

For example, one WU Asteroids@home (MacBookPro Late 2013 2.4GHz Haswell):

110 min OSX-SSE3 and
90 min OSX-AVX.

HNY Jürgen.

ID: 112764 · Report as offensive     Reply Quote
Alex

Send message
Joined: 1 Mar 05
Posts: 88
Credit: 398,734
RAC: 0
Message 112765 - Posted: 4 Jan 2014, 15:33:29 UTC - in response to Message 112764.  
Last modified: 4 Jan 2014, 15:34:06 UTC


BTW: Is AVX support (also for S6-CasA etc.) possible?



MrS posted this idea @Einstein in the Android thread.
I think it's a good idea since more and more cpu's have this feature and a lot of resources are wasted by not using these possibilities.

On the other hand we should be aware that this project is really driven very professional; they work on an improved algorithm, cuda 5.5, have arm wu's for Linux and Android, support for all types of GPU's, have plans to introduce the wisdom file for the fft for different platforms aso. They are working hard to improve the project, which is the core for the success here (Remember: the PetaFlop barrier is crossed and stable above 1 PF)

Alexander
ID: 112765 · Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 6 Oct 06
Posts: 7
Credit: 344,106
RAC: 0
Message 112766 - Posted: 5 Jan 2014, 15:35:01 UTC - in response to Message 112763.  

I run the app currently on my notebooks only. One of them has a build-in GTX470m. The GPU usage never exceeded 43% with very long periods of 1 - 2 %.
Checkpointing seems not to be the final solution; I stopped crunching with a wu @ 88,888% with a remaining time of <10 min; after restarting the remaining time jumped to > 25 min.


Same here....
ID: 112766 · Report as offensive     Reply Quote
Profile zablociak

Send message
Joined: 10 Aug 12
Posts: 8
Credit: 5,858,908
RAC: 0
Message 112767 - Posted: 5 Jan 2014, 17:44:51 UTC

My host https://albert.phys.uwm.edu/results.php?hostid=9923 (GTX 560 non OC'd with 301.42 driver, i7 860 @2.8Ghz) calculated 29 FGRP#3 FGRPopencl-nvidia tasks with av. of 650 sec of Run/CPU time (A@H was the only project being calculated at that time). All WUs finished without error, but 23 of them validated so far as "Validate error" and 6 are still pending. Can you take a look into these results?

Regards
Luke
ID: 112767 · Report as offensive     Reply Quote
DF1DX

Send message
Joined: 5 Mar 13
Posts: 4
Credit: 63,982
RAC: 0
Message 112768 - Posted: 8 Jan 2014, 19:29:20 UTC
Last modified: 8 Jan 2014, 19:44:38 UTC

Nice work.

Please add an option "Use Intel-GPU" for FGRPopencl-intel_gpu-lion. Albert sent my Host 10045 (OS X) only cpu-workunits.

Jürgen
ID: 112768 · Report as offensive     Reply Quote
Profile zablociak

Send message
Joined: 10 Aug 12
Posts: 8
Credit: 5,858,908
RAC: 0
Message 112769 - Posted: 12 Jan 2014, 15:51:55 UTC
Last modified: 12 Jan 2014, 15:52:34 UTC

Now everything runs smoothly and without "validate error" problems, on both Nvidia GTX560, AMD HD 7970 and different CPUs (Phenom II X4, i5 Ivy and i7 Lynnfield).
I hope we can test production version of FGRP #3 soon (I mean "real" samples with full GPU load).

GTX560


HD7970


Regards
Luke
ID: 112769 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 112771 - Posted: 14 Jan 2014, 11:08:58 UTC

Well, the FGRP GPU app is in the first stage of development, similar to the first BRP GPU App: only the FFT is ran on the GPU, all other computation is still done on the CPU. For the time being we are more concerned to get it running at all and give the same results as the CPU version (i.e. validation). Further improvement is planned with time.

The text for the "GPU utilization factor" preference setting actually reads "GPU utilization factor of BRP apps", which obviously does not apply to FGRP apps. We'll add another such setting for FGRP.

BM
ID: 112771 · Report as offensive     Reply Quote
Profile archae86

Send message
Joined: 6 Dec 05
Posts: 414
Credit: 67,924
RAC: 0
Message 112773 - Posted: 21 Jan 2014, 18:36:13 UTC

One of my hosts today received an FGRP3 opencl_nvidia task

It progressed from 0 to 97% indicated completion over about a 50 minute period, with very sporadic indicated GPU usage ranging from 0 to perhaps 60%.

Subsequently it sat at 97.297% indicated completion for over twenty-five wall-clock minutes, steadily using one core of CPU, no discernible GPU, and no indicated progress.

I've suspended it for the time being, and am willing to give it more time if it seems likely to generate some useful information by doing so, but it seems likely it has hung up.

The host is a Windows 7 machine with a Xeon E5620 CPU running hyperthreaded. The graphics processor is a GTX660 with 327.23 driver installed.
ID: 112773 · Report as offensive     Reply Quote
Claggy

Send message
Joined: 29 Dec 06
Posts: 78
Credit: 4,040,969
RAC: 0
Message 112774 - Posted: 21 Jan 2014, 19:36:32 UTC - in response to Message 112773.  

That was quite normal when I ran both Nvidia and ATI tasks, it'll finish eventually,

Claggy
ID: 112774 · Report as offensive     Reply Quote
Profile archae86

Send message
Joined: 6 Dec 05
Posts: 414
Credit: 67,924
RAC: 0
Message 112775 - Posted: 21 Jan 2014, 20:57:20 UTC - in response to Message 112774.  

Claggy wrote:
it'll finish eventually

I turned it back on, and it finished, after spending well over half as long in the terminal phase of no visible progress and no apparent GPU use as it did for the first 97% of reported progress. Thanks for steering me to continue.

ID: 112775 · Report as offensive     Reply Quote
Eyrie

Send message
Joined: 20 Feb 14
Posts: 47
Credit: 2,410
RAC: 0
Message 112807 - Posted: 24 Feb 2014, 18:28:29 UTC

This host [CPU only] got this task on Friday. With a 3 day deadline on a 3.5/0.1 day cache. It went immediately into EDF, of course. When I investiagted why I'd gotten something that should not have made the deadline test I found something like 'estimated delay 0, deadline test skipped' for that task. A second of the same type was checked and summarly rejected - I got a BRP instead.
Unfortunately I missed out on keeping the server log for that workfetch.

According to some checkin around boinc 6.6 estimated delays are used for deadline acculations. A quick check of my sched_request files for various projects found that field was 0 for all of them, regardless of project cache. I've not had the time yet to check the code how that delay is calculated. Since this is the first time I've received a task that has a shorter deadline than the minimal cache I suspect minimal overall impact - but I do shut down over the weekend, so the 3.5 day setting is fairly realistic.
It might be a slight misconfiguration on Albert or I've found another bug...



So, when I received that task it had an estimate of 1:20 and was running and checkpointing nicely. It got to around 50% in something like 35 minutes so I assumed the estimate was fairly spot on and didn;t worry about the deadline any more. about two hours later I checked again on the task and to my extreme astonishment it had dropped back to around 4%. A quick check of stderr in the slot found it was doing skygrid 2/25, so 4% was a good fraction. Elapsed time was eventually 7 hours and I returned it in time, but I was left wondering why

a) I'd gotten the task in the first place and
b) how come its fraction completed dropped like that.

I suppose the crappy initial estimate is to be expected, but underestimating by 6x is suboptimal. David's new formula for calculating estimated time remaining doesn't really help there either.
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
ID: 112807 · Report as offensive     Reply Quote

Message boards : Problems and Bug Reports : FGRP #3 feedback



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration