WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!
Wrong GPU reported in output file. |
Message boards :
Problems and Bug Reports :
Wrong GPU reported in output file.
Message board moderation
Author | Message |
---|---|
Alez Send message Joined: 8 Apr 13 Posts: 7 Credit: 4,335,055 RAC: 0 |
I noticed on this computer of mine running Albert http://albert.phys.uwm.edu/show_host_detail.php?hostid=6816 that the cuda tasks running ie Binary Radio Pulsar Search v1.33 (BRP4cuda32nv301) example task here http://albert.phys.uwm.edu/result.php?resultid=727451 correctly reports that it is running on a nVidia GTX 650. the openCL tasks however ie. Gamma-ray pulsar search #2 v1.05 (FGRPopencl-nvidia) example task here http://albert.phys.uwm.edu/result.php?resultid=725134 reports that it is running on a nVidia GT 610, which they are not. That card is present in the system but is not used to run Albert, only the GTX 650 is presently doing that work. in the output file it also reports the GT 610 as device 0 whilst in fact device 0 is a nVidia GTX 660ti. The 610 is device 1 and the GTX 650 is device 2 as reported by Boinc. I have also noticed that these units run for a little bit, then seem to stop doing any work with 0% load on the GPU but they keep running and the percentage complete slowly rises. |
Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0 |
Hi! Thanks for reporting this. This is a very nice system for tests in a multi-GPU environment. Plus some of the tasks show failures during execution but do not terminate (as they should). We'll look into this. Just to make sure: reports that it is running on a nVidia GT 610, which they are not. That card is present in the system but is not used to run Albert, only the GTX 650 is presently doing that work. I understand you are sure that the GT 610 is not used by BOINC? Or are you saying that it's not supposed to be running BOINC tasks? E.g. is GPU-Z showing the 610 as idle? Are you using the 32 bit or 64 bit version of BOINC? Thx HBE |
Neil Newell Send message Joined: 9 Jan 13 Posts: 13 Credit: 4,081,564 RAC: 0 |
Not sure if this is relevant to the issue, but BOINC certainly seems to mis-report the GPUs present on the web pages (the start-up messages are correct). For example this host is actually a GTX570 and a GTX460 but shows as 2xGTX460, while this host shows as 2xGTX580 whereas it's really a GTX580 and a GTX570 (n.b. these hosts are only on einstein, not albert). |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Not sure if this is relevant to the issue, but BOINC certainly seems to mis-report the GPUs present on the web pages (the start-up messages are correct). For example this host is actually a GTX570 and a GTX460 but shows as 2xGTX460, while this host shows as 2xGTX580 whereas it's really a GTX580 and a GTX570 (n.b. these hosts are only on einstein, not albert). That problem is due to a well-known design limitation in the BOINC back-end database which drives the website: it wasn't given a separate relational table which would allow multiple individual (and different) GPUs to be associated with a single host. The BOINC client itself does enumerate the GPUs individually, and (E&OE) reports them correctly to the server. In theory, the scheduler should be able to handle disparate GPUs correctly - it's just the subsequent cosmetic reporting which is broken. |
Neil Newell Send message Joined: 9 Jan 13 Posts: 13 Credit: 4,081,564 RAC: 0 |
Interesting - thanks for the info, I'd been wondering about it for a while (plus someone asked why my "2xGTX460" system was so fast!). |
Alez Send message Joined: 8 Apr 13 Posts: 7 Credit: 4,335,055 RAC: 0 |
Hi! The GT 610 is used to run the display plus run milkyway and seti. I use cc_config.xml file to control what runs on which GPU and these are the only two projects running, and truth be told the only 2 projects that it can run. During the tests conducted all GPU's were being used. The split was as follows GTX 660ti - Primegrid GTX 650 - Albert / Einstein GT 610 - Seti / Milkyway. The display is also connected to this GPU I'm running 7.0.60 (x64) boinc under window 7 home premium x64 , SP1 I noticed the failures on the OpenCL apps and have suspended them just now. All the Cuda apps work correctly. |
Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0 |
Hi! Interesting. I guess the problem is that the BOINC client and the science app are enumerating the devices in a different order, so the science app actually tries to run on the 610, and because there's already other stuff running there, it can't get enough RAM. So the root cause is a problem in device enumeration and a secondary problem is not handling the exception/error condition correctly. Two bugs in one report...not bad :-) ! Thanks again Cheers HB |
Alez Send message Joined: 8 Apr 13 Posts: 7 Credit: 4,335,055 RAC: 0 |
That would explain the zero load on the GTX 650. I wonder if the issue could be compounded by the different device enumerations. For Boinc GTX 660ti is 0, GT610 is 1 and GTX 650 is 2, however as the display is ran from the GT610 then windows would regard that as the primary device. Could this be causing a conflict with the app accessing the windows device listing rather than the Boinc device enumeration ? If it was trying to run on the 610 then I'm actually surprised that the card didn't crash as it's not a very powerful card and the run times on the units ( milkyway) running on the card didn't indicate another project running simultaneously. Also would trying to run on a card already running a different app not cause a lock file error ? or is that only when projects try to use the same slots in the boinc folder ? If Albert was trying to access the GT610 it never actually managed to run on the card.It may have thought it was, but it was never physically running on the card. Let me know if you require me to try the tests app again. Alex |
Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0 |
Hi all, Thanks for your patience, hopefully the bug discussed in this thread is finally fixed in this new version discussed here: http://albert.phys.uwm.edu/forum_thread.php?id=8975. Thanks for the testing!!! Cheers HBE |