Deprecated: Function get_magic_quotes_gpc() is deprecated in /srv/BOINC/live-webcode/html/inc/util.inc on line 640
Wrong GPU reported in output file.

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Wrong GPU reported in output file.

Message boards : Problems and Bug Reports : Wrong GPU reported in output file.
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Alez

Send message
Joined: 8 Apr 13
Posts: 7
Credit: 4,335,055
RAC: 0
Message 112416 - Posted: 10 Apr 2013, 9:17:27 UTC

I noticed on this computer of mine running Albert
http://albert.phys.uwm.edu/show_host_detail.php?hostid=6816

that the cuda tasks running ie Binary Radio Pulsar Search v1.33 (BRP4cuda32nv301)
example task here
http://albert.phys.uwm.edu/result.php?resultid=727451
correctly reports that it is running on a nVidia GTX 650.

the openCL tasks however ie. Gamma-ray pulsar search #2 v1.05 (FGRPopencl-nvidia)
example task here
http://albert.phys.uwm.edu/result.php?resultid=725134
reports that it is running on a nVidia GT 610, which they are not. That card is present in the system but is not used to run Albert, only the GTX 650 is presently doing that work.
in the output file it also reports the GT 610 as device 0 whilst in fact device 0 is a nVidia GTX 660ti. The 610 is device 1 and the GTX 650 is device 2 as reported by Boinc. I have also noticed that these units run for a little bit, then seem to stop doing any work with 0% load on the GPU but they keep running and the percentage complete slowly rises.
ID: 112416 · Report as offensive     Reply Quote
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 28 Aug 06
Posts: 1483
Credit: 1,864,017
RAC: 0
Message 112417 - Posted: 11 Apr 2013, 12:12:00 UTC - in response to Message 112416.  
Last modified: 11 Apr 2013, 12:19:30 UTC

Hi!

Thanks for reporting this. This is a very nice system for tests in a multi-GPU environment. Plus some of the tasks show failures during execution but do not terminate (as they should).

We'll look into this.

Just to make sure:

reports that it is running on a nVidia GT 610, which they are not. That card is present in the system but is not used to run Albert, only the GTX 650 is presently doing that work.


I understand you are sure that the GT 610 is not used by BOINC? Or are you saying that it's not supposed to be running BOINC tasks? E.g. is GPU-Z showing the 610 as idle?

Are you using the 32 bit or 64 bit version of BOINC?


Thx
HBE
ID: 112417 · Report as offensive     Reply Quote
Neil Newell

Send message
Joined: 9 Jan 13
Posts: 13
Credit: 4,081,564
RAC: 0
Message 112418 - Posted: 11 Apr 2013, 14:22:49 UTC - in response to Message 112417.  

Not sure if this is relevant to the issue, but BOINC certainly seems to mis-report the GPUs present on the web pages (the start-up messages are correct). For example this host is actually a GTX570 and a GTX460 but shows as 2xGTX460, while this host shows as 2xGTX580 whereas it's really a GTX580 and a GTX570 (n.b. these hosts are only on einstein, not albert).
ID: 112418 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 112419 - Posted: 11 Apr 2013, 16:19:47 UTC - in response to Message 112418.  

Not sure if this is relevant to the issue, but BOINC certainly seems to mis-report the GPUs present on the web pages (the start-up messages are correct). For example this host is actually a GTX570 and a GTX460 but shows as 2xGTX460, while this host shows as 2xGTX580 whereas it's really a GTX580 and a GTX570 (n.b. these hosts are only on einstein, not albert).

That problem is due to a well-known design limitation in the BOINC back-end database which drives the website: it wasn't given a separate relational table which would allow multiple individual (and different) GPUs to be associated with a single host.

The BOINC client itself does enumerate the GPUs individually, and (E&OE) reports them correctly to the server. In theory, the scheduler should be able to handle disparate GPUs correctly - it's just the subsequent cosmetic reporting which is broken.
ID: 112419 · Report as offensive     Reply Quote
Neil Newell

Send message
Joined: 9 Jan 13
Posts: 13
Credit: 4,081,564
RAC: 0
Message 112420 - Posted: 11 Apr 2013, 17:17:28 UTC - in response to Message 112419.  

Interesting - thanks for the info, I'd been wondering about it for a while (plus someone asked why my "2xGTX460" system was so fast!).

ID: 112420 · Report as offensive     Reply Quote
Profile Alez

Send message
Joined: 8 Apr 13
Posts: 7
Credit: 4,335,055
RAC: 0
Message 112421 - Posted: 11 Apr 2013, 20:24:59 UTC - in response to Message 112417.  

Hi!

Thanks for reporting this. This is a very nice system for tests in a multi-GPU environment. Plus some of the tasks show failures during execution but do not terminate (as they should).

We'll look into this.

Just to make sure:

reports that it is running on a nVidia GT 610, which they are not. That card is present in the system but is not used to run Albert, only the GTX 650 is presently doing that work.


I understand you are sure that the GT 610 is not used by BOINC? Or are you saying that it's not supposed to be running BOINC tasks? E.g. is GPU-Z showing the 610 as idle?

Are you using the 32 bit or 64 bit version of BOINC?

Thx
HBE


The GT 610 is used to run the display plus run milkyway and seti. I use cc_config.xml file to control what runs on which GPU and these are the only two projects running, and truth be told the only 2 projects that it can run. During the tests conducted all GPU's were being used. The split was as follows
GTX 660ti - Primegrid
GTX 650 - Albert / Einstein
GT 610 - Seti / Milkyway. The display is also connected to this GPU

I'm running 7.0.60 (x64) boinc under window 7 home premium x64 , SP1

I noticed the failures on the OpenCL apps and have suspended them just now. All the Cuda apps work correctly.



ID: 112421 · Report as offensive     Reply Quote
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 28 Aug 06
Posts: 1483
Credit: 1,864,017
RAC: 0
Message 112424 - Posted: 12 Apr 2013, 13:41:13 UTC

Hi!

Interesting.

I guess the problem is that the BOINC client and the science app are enumerating the devices in a different order, so the science app actually tries to run on the 610, and because there's already other stuff running there, it can't get enough RAM. So the root cause is a problem in device enumeration and a secondary problem is not handling the exception/error condition correctly.

Two bugs in one report...not bad :-) ! Thanks again

Cheers
HB
ID: 112424 · Report as offensive     Reply Quote
Profile Alez

Send message
Joined: 8 Apr 13
Posts: 7
Credit: 4,335,055
RAC: 0
Message 112425 - Posted: 12 Apr 2013, 16:01:21 UTC
Last modified: 12 Apr 2013, 16:06:33 UTC

That would explain the zero load on the GTX 650. I wonder if the issue could be compounded by the different device enumerations.
For Boinc GTX 660ti is 0, GT610 is 1 and GTX 650 is 2,
however as the display is ran from the GT610 then windows would regard that as the primary device. Could this be causing a conflict with the app accessing the windows device listing rather than the Boinc device enumeration ?

If it was trying to run on the 610 then I'm actually surprised that the card didn't crash as it's not a very powerful card and the run times on the units ( milkyway) running on the card didn't indicate another project running simultaneously. Also would trying to run on a card already running a different app not cause a lock file error ? or is that only when projects try to use the same slots in the boinc folder ?
If Albert was trying to access the GT610 it never actually managed to run on the card.It may have thought it was, but it was never physically running on the card.
Let me know if you require me to try the tests app again.
Alex
ID: 112425 · Report as offensive     Reply Quote
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 28 Aug 06
Posts: 1483
Credit: 1,864,017
RAC: 0
Message 112491 - Posted: 2 May 2013, 14:36:18 UTC - in response to Message 112425.  

Hi all,

Thanks for your patience, hopefully the bug discussed in this thread is finally fixed in this new version discussed here: http://albert.phys.uwm.edu/forum_thread.php?id=8975.

Thanks for the testing!!!

Cheers
HBE

ID: 112491 · Report as offensive     Reply Quote

Message boards : Problems and Bug Reports : Wrong GPU reported in output file.



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration