Deprecated: Function get_magic_quotes_gpc() is deprecated in /srv/BOINC/live-webcode/html/inc/util.inc on line 640
Sending work

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Sending work

Message boards : News : Sending work
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
robertmiles

Send message
Joined: 16 Nov 11
Posts: 19
Credit: 4,468,368
RAC: 0
Message 111510 - Posted: 8 Dec 2011, 12:40:17 UTC - in response to Message 111498.  


Can the developers in the mean time fix the driver detection?


Sorry, not up to us. I'm not sure whether the BOINC devs can do anything about it since this might even be an AMD driver issue.


You may talk about two different things here.

Jord, what exactly do you think should be fixed?

I do see that displaying the ATI CAL/driver version on the host web pages appears broken (on Albert), and possibly the string in the DB is, too.

In the scheduler the ATI "driver" version is stored as "char version[50]" and "int version_num" in coproc_ati, and in "char opencl_driver_version[32]" in opencl_device_prop. These could in principle be used in app_plan(), though we don't check this yet.

BM


I just read something related on the boinc_dev mailing list. It seems that BOINC 7.01 and 7.02 don't allocate enough digits in one of the places they store
ATI version numbers, and are therefore likely to get at least some of the version numbers wrong.
ID: 111510 · Report as offensive     Reply Quote
Profile Gaurav Khanna

Send message
Joined: 8 Nov 04
Posts: 12
Credit: 2,818,895
RAC: 0
Message 111511 - Posted: 8 Dec 2011, 15:15:05 UTC

Since the upgrade to 7.0.2 I'm not getting any work for the GPUs ..

08-Dec-2011 07:20:02 [Albert@Home] Sending scheduler request: To fetch work.
08-Dec-2011 07:20:02 [Albert@Home] Requesting new tasks for ATI
08-Dec-2011 07:20:05 [Albert@Home] Scheduler request completed: got 0 new tasks
08-Dec-2011 07:20:05 [Albert@Home] No tasks sent
08-Dec-2011 07:58:09 [Albert@Home] Sending scheduler request: To fetch work.
08-Dec-2011 07:58:09 [Albert@Home] Requesting new tasks for NVIDIA
08-Dec-2011 07:58:11 [Albert@Home] Scheduler request completed: got 0 new tasks
08-Dec-2011 07:58:11 [Albert@Home] No tasks sent

host details are here:
http://albert.phys.uwm.edu/show_host_detail.php?hostid=1396

Any thoughts?
ID: 111511 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 111513 - Posted: 8 Dec 2011, 19:31:03 UTC - in response to Message 111511.  

Click on the "last contact" link to see the scheduler logs.

In case of your host I see

2011-12-08 15:42:31.5128 [PID=32492]    [version] Checking plan class 'atiOpenCL'
2011-12-08 15:42:31.5128 [PID=32492]    [version] GPU RAM required min: 536870912.000000, supplied: 0
2011-12-08 15:42:31.5128 [PID=32492]    [version] [AV#459] app_plan() returned false


Hm - looks like there's something wrong with the GPU RAM size reporting. I'll look into that tomorrow.

BM
ID: 111513 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 111514 - Posted: 8 Dec 2011, 19:56:28 UTC - in response to Message 111513.  
Last modified: 8 Dec 2011, 19:57:07 UTC

Hm ... according to sched_request the ATI device isn't OpenCL capable:
<coproc_ati>
   <count>1</count>
   <name>ATI Radeon HD 5800 series (Cypress)</name>
   <available_ram>1024458752.000000</available_ram>
   <have_cal>1</have_cal>
   <have_opencl>0</have_opencl>
   <req_secs>0.000000</req_secs>
   <req_instances>0.000000</req_instances>
   <estimated_delay>0.000000</estimated_delay>
   <peak_flops>50000000000.000000</peak_flops>
   <CALVersion>1.4.815</CALVersion>
   <target>8</target>
   <localRAM>1024</localRAM>
   <uncachedRemoteRAM>1788</uncachedRemoteRAM>
   <cachedRemoteRAM>508</cachedRemoteRAM>
   <engineClock>0</engineClock>
   <memoryClock>0</memoryClock>
   <wavefrontSize>64</wavefrontSize>
   <numberOfSIMD>20</numberOfSIMD>
   <doublePrecision>1</doublePrecision>
   <pitch_alignment>256</pitch_alignment>
   <surface_alignment>256</surface_alignment>
   <maxResource1DWidth>16384</maxResource1DWidth>
   <maxResource2DWidth>16384</maxResource2DWidth>
   <maxResource2DHeight>16384</maxResource2DHeight>
    <atirt_detected/>
</coproc_ati>
ID: 111514 · Report as offensive     Reply Quote
Profile STE/E

Send message
Joined: 18 Jan 05
Posts: 144
Credit: 7,886,269
RAC: 0
Message 111515 - Posted: 8 Dec 2011, 20:14:16 UTC - in response to Message 111514.  

Hm ... according to sched_request the ATI device isn't OpenCL capable:
<coproc_ati>
   <count>1</count>
   <name>ATI Radeon HD 5800 series (Cypress)</name>
   <available_ram>1024458752.000000</available_ram>
   <have_cal>1</have_cal>
   <have_opencl>0</have_opencl>
   <req_secs>0.000000</req_secs>
   <req_instances>0.000000</req_instances>
   <estimated_delay>0.000000</estimated_delay>
   <peak_flops>50000000000.000000</peak_flops>
   <CALVersion>1.4.815</CALVersion>
   <target>8</target>
   <localRAM>1024</localRAM>
   <uncachedRemoteRAM>1788</uncachedRemoteRAM>
   <cachedRemoteRAM>508</cachedRemoteRAM>
   <engineClock>0</engineClock>
   <memoryClock>0</memoryClock>
   <wavefrontSize>64</wavefrontSize>
   <numberOfSIMD>20</numberOfSIMD>
   <doublePrecision>1</doublePrecision>
   <pitch_alignment>256</pitch_alignment>
   <surface_alignment>256</surface_alignment>
   <maxResource1DWidth>16384</maxResource1DWidth>
   <maxResource2DWidth>16384</maxResource2DWidth>
   <maxResource2DHeight>16384</maxResource2DHeight>
    <atirt_detected/>
</coproc_ati>


I don't think Cal version 1.4.815 is OpenCL Capable ...

STE\/E
ID: 111515 · Report as offensive     Reply Quote
Profile Gaurav Khanna

Send message
Joined: 8 Nov 04
Posts: 12
Credit: 2,818,895
RAC: 0
Message 111516 - Posted: 8 Dec 2011, 20:49:03 UTC - in response to Message 111515.  

Weird .. with BOINC 6.13 the same ATI GPU completed scores of work units:

For example:
http://albert.phys.uwm.edu/result.php?resultid=53182

And I use the ATI GPU for OpenCL development all the time ..
ID: 111516 · Report as offensive     Reply Quote
Profile Gaurav Khanna

Send message
Joined: 8 Nov 04
Posts: 12
Credit: 2,818,895
RAC: 0
Message 111517 - Posted: 8 Dec 2011, 20:50:44 UTC

What exactly does BOINC do to check for OpenCL?
ID: 111517 · Report as offensive     Reply Quote
Profile Oliver Behnke
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 4 Sep 07
Posts: 130
Credit: 8,545,955
RAC: 0
Message 111518 - Posted: 9 Dec 2011, 9:35:08 UTC
Last modified: 9 Dec 2011, 10:06:54 UTC


I don't think Cal version 1.4.815 is OpenCL Capable ...


It is, but you have to install the SDK and register the OpenCL ICD yourself. Installing the driver is not sufficient.

Gaurav, have you changed anything in your setup?

Oliver
ID: 111518 · Report as offensive     Reply Quote
Profile Oliver Behnke
Volunteer moderator
Project administrator
Project developer

Send message
Joined: 4 Sep 07
Posts: 130
Credit: 8,545,955
RAC: 0
Message 111521 - Posted: 9 Dec 2011, 10:10:21 UTC - in response to Message 111517.  
Last modified: 9 Dec 2011, 14:03:18 UTC

What exactly does BOINC do to check for OpenCL?


It uses libOpenCL via late binding to query a few basic properties. Using 10.8 (as you do) you have to make sure it's available as it's not installed automatically in the usual library paths. Better still, upgrade your driver to 11.7 (11.11 on Linux!). That version of the Catalyst driver installs the OpenCL runtime all by itself. Our app officially requires at least 11.7 anyway as we build it using SDK 2.5.

Oliver
ID: 111521 · Report as offensive     Reply Quote
Profile Gaurav Khanna

Send message
Joined: 8 Nov 04
Posts: 12
Credit: 2,818,895
RAC: 0
Message 111531 - Posted: 9 Dec 2011, 21:22:42 UTC - in response to Message 111521.  

Thanks Oliver. That was helpful. Figured it out .. it was a 64-bit Vs 32-bit issue. The ATI app is 32-bit, but the BOINC client I built is 64-bit. So, I had to make both 32 and 64 bit OpenCL libs available in my library path.

Its working now (finished a result successfully):
http://albert.phys.uwm.edu/result.php?resultid=54521

ID: 111531 · Report as offensive     Reply Quote
Profile x3mEn

Send message
Joined: 21 Jun 11
Posts: 9
Credit: 10,000
RAC: 0
Message 111540 - Posted: 11 Dec 2011, 13:42:59 UTC

Aborting task p2030.20100912.G57.94-00.24.S.b5s0g0.00000_416_1: exceeded elapsed time limit 6394.64 (2800000.00G/432.72G)

3 WUs were aborted after ~6395 sec of run time for the same reason:
p2030.20100912.G57.94-00.24.S.b5s0g0.00000_416_1 - 6,395.56 sec
p2030.20100913.G44.55+00.20.C.b5s0g0.00000_1824_2 - 6,395.53 sec
p2030.20100912.G57.94-00.24.S.b5s0g0.00000_744_0 - 6,395.32 sec

WTF?
ID: 111540 · Report as offensive     Reply Quote
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 28 Aug 06
Posts: 1483
Credit: 1,864,017
RAC: 0
Message 111541 - Posted: 11 Dec 2011, 16:17:11 UTC - in response to Message 111540.  
Last modified: 11 Dec 2011, 16:18:45 UTC

Hi!

My guess is that the way BOINC calculates the maximum allowed elapsed time for a task to finish has changed with version 7.x. So the workunits would now have to be generated with a higher number of estimated/max floating point ops per task. Because BRP4 workunits are all created equal, no matter on which platform they will eventually get crunched, one value of estimated floating point ops must be used for CPU, NVIDIA/CUDA and ATI/OpenCL (and all supported BOINC client versions). No trivial task.

Until this gets fixed on the server side for new workunits, one could theoretically do a workaround in the client_state.xml file to prevent more WUs erroring out (and wrecking your quota):

- Stop BOINC
- open the client_state.xml file in an editor
- replace occurrences of the following two lines

    <rsc_fpops_est>140000000000000.000000</rsc_fpops_est>
    <rsc_fpops_bound>2800000000000000.000000</rsc_fpops_bound>

with (say)
    <rsc_fpops_est>1400000000000000.000000</rsc_fpops_est>
    <rsc_fpops_bound>28000000000000000.000000</rsc_fpops_bound>


(Actually you should check that only WUs of the Albert@Home project are changed if you use that BOINC instance for other projects as well).


CU
HBE
ID: 111541 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 111554 - Posted: 13 Dec 2011, 14:11:49 UTC - in response to Message 111541.  

My guess is that the way BOINC calculates the maximum allowed elapsed time for a task to finish has changed with version 7.x.


No, it's worse - it's the new server code ("credit new") that does this. Although we are still using server-assigned credit here on Albert, the run time estimation etc. is handled by the new system. This is part of what we intend to test here.

BM
ID: 111554 · Report as offensive     Reply Quote
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 28 Aug 06
Posts: 1483
Credit: 1,864,017
RAC: 0
Message 111557 - Posted: 13 Dec 2011, 19:20:44 UTC - in response to Message 111554.  

oh....

Were you able to compensate for this effect? That is, will newly generated WUs have a chance to be computed in time without the manual editing I described above?

Thx,
HB
ID: 111557 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 111562 - Posted: 14 Dec 2011, 10:46:12 UTC - in response to Message 111557.  
Last modified: 14 Dec 2011, 14:09:37 UTC

So far I didn't get any help from the BOINC devs that I asked for, so I'm still analyzing and digging through the code myself.

Indeed it currently looks like some things have changed on both ends - client and server - and I still need to understand how these changes work together.

BM

Edit:

Hm, apparently rsc_fpops_est and rsc_fpops_bound pass the server code unchanged, that are still the values written by the WUG...

In the Client there is:
    max_elapsed_time = rp->wup->rsc_fpops_bound/rp->avp->flops;

where rsc_fpops_bound should be what it gets passed from the server, and avp->flops the "flops" of the "app version". Apparently that's also sent by the server for the App it sends.

Bikeman, what's that in your case? There must be something like

<app_version>
...
    <flops>62700155339.574905</flops>
...
</app_version>

in the client_state.xml you edited.
ID: 111562 · Report as offensive     Reply Quote
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 28 Aug 06
Posts: 1483
Credit: 1,864,017
RAC: 0
Message 111569 - Posted: 14 Dec 2011, 19:24:29 UTC - in response to Message 111562.  
Last modified: 14 Dec 2011, 19:27:21 UTC

Hi!

Let me see...

<app_version>
    <app_name>einsteinbinary_BRP4</app_name>
    <version_num>119</version_num>
    <platform>i686-pc-linux-gnu</platform>
    <avg_ncpus>0.150000</avg_ncpus>
    <max_ncpus>1.000000</max_ncpus>
    <flops>2402526409719.028320</flops>
    <plan_class>atiOpenCL</plan_class>


(but that is after I made the editing).

As different users see different cut-off times, I would have expected that this scales with the BOINC benchmark result for the individual graphics card. For mine, it seems to be this:

<coproc_ati>
   <count>1</count>
   <name>ATI Radeon HD 5800 series (Cypress)</name>
   <available_ram>1002438656.000000</available_ram>
   <have_cal>1</have_cal>
   <have_opencl>1</have_opencl>
   <peak_flops>4176000000000.000000</peak_flops>


CU
HB
ID: 111569 · Report as offensive     Reply Quote
Profile Bernd Machenschalk
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 15 Oct 04
Posts: 1956
Credit: 6,218,130
RAC: 0
Message 111570 - Posted: 14 Dec 2011, 20:12:58 UTC - in response to Message 111569.  
Last modified: 14 Dec 2011, 20:20:57 UTC

The "peak_flops" is not benchmarked. It's merely a theoretical upper bound derived basically from the number of "cores" times the clock frequency. Thus the peak_flops should be identical for similar devices.

According to the CreditNew description "The scheduler adjusts this [peak_flops], using the elapsed time statistics, to get the app_version.flops_est it sends to the client (from which job durations are estimated)."

BM
ID: 111570 · Report as offensive     Reply Quote
Profile pragmatic prancing periodic problem child, left
Avatar

Send message
Joined: 26 Jan 05
Posts: 1639
Credit: 70,000
RAC: 0
Message 111571 - Posted: 14 Dec 2011, 20:17:33 UTC - in response to Message 111569.  

<app_version>
    <app_name>einsteinbinary_BRP4</app_name>
    <version_num>119</version_num>
    <platform>i686-pc-linux-gnu</platform>
    <avg_ncpus>0.150000</avg_ncpus>
    <max_ncpus>1.000000</max_ncpus>
    <flops>2402526409719.028320</flops>
    <plan_class>atiOpenCL</plan_class>


Yes, let's see. Local stuff, from the HD4850
<app_version>
    <app_name>einsteinbinary_BRP4</app_name>
    <version_num>109</version_num>
    <platform>windows_intelx86</platform>
    <avg_ncpus>0.200000</avg_ncpus>
    <max_ncpus>1.000000</max_ncpus>
    <flops>33321667311.708328</flops>
    <plan_class>ATIOpenCL</plan_class>
    <api_version>6.13.8</api_version>

<coproc_ati>
   <count>1</count>
   <name>ATI Radeon HD 5800 series (Cypress)</name>
   <available_ram>1002438656.000000</available_ram>
   <have_cal>1</have_cal>
   <have_opencl>1</have_opencl>
   <peak_flops>4176000000000.000000</peak_flops>

For the actual card, I'll do the whole shebang:
<coproc_ati>
   <count>1</count>
   <name>ATI Radeon HD 4700/4800 (RV740/RV770)</name>
   <available_ram>1040187392.000000</available_ram>
   <have_cal>1</have_cal>
   <have_opencl>1</have_opencl>
   <peak_flops>2000000000000.000000</peak_flops>
   <CALVersion>1.4.1607</CALVersion>
   <target>5</target>
   <localRAM>1024</localRAM>
   <uncachedRemoteRAM>2047</uncachedRemoteRAM>
   <cachedRemoteRAM>2047</cachedRemoteRAM>
   <engineClock>625</engineClock>
   <memoryClock>900</memoryClock>
   <wavefrontSize>64</wavefrontSize>
   <numberOfSIMD>10</numberOfSIMD>
   <doublePrecision>1</doublePrecision>
   <pitch_alignment>256</pitch_alignment>
   <surface_alignment>4096</surface_alignment>
   <maxResource1DWidth>8192</maxResource1DWidth>
   <maxResource2DWidth>8192</maxResource2DWidth>
   <maxResource2DHeight>8192</maxResource2DHeight>
    <atirt_detected/>
   <coproc_opencl>
      <name>ATI RV770</name>
      <vendor>Advanced Micro Devices, Inc.</vendor>
      <vendor_id>4098</vendor_id>
      <available>1</available>
      <half_fp_config>0</half_fp_config>
      <single_fp_config>62</single_fp_config>
      <double_fp_config>63</double_fp_config>
      <endian_little>1</endian_little>
      <execution_capabilities>1</execution_capabilities>
      <extensions>cl_amd_fp64 cl_khr_gl_sharing cl_amd_device_attribute_query cl_khr_d3d10_sharing </extensions>
      <global_mem_size>1073741824</global_mem_size>
      <local_mem_size>16384</local_mem_size>
      <max_clock_frequency>625</max_clock_frequency>
      <max_compute_units>10</max_compute_units>
      <opencl_platform_version>OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)</opencl_platform_version>
      <opencl_device_version>OpenCL 1.0 AMD-APP-SDK-v2.5 (793.1)</opencl_device_version>
      <opencl_driver_version>CAL 1.4.1607</opencl_driver_version>
   </coproc_opencl>
</coproc_ati>


How come my flops are so little on the einsteinbinary? Only 33321667311 for Windows versus Heinz's 2402526409719 for Linux? Yeah I get it, those are estimated flops by the server, but heck. When the peak flops my GPU can do is 2000000000000 flops, or 60 times the estimated amount of flops my GPU is getting, no wonder why a) work is estimated so (s)low and b) I have a TDCF of 9.59! Plonk.
Jord.

BOINC FAQ Service

They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake.
ID: 111571 · Report as offensive     Reply Quote
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 28 Aug 06
Posts: 1483
Credit: 1,864,017
RAC: 0
Message 111572 - Posted: 14 Dec 2011, 21:40:30 UTC - in response to Message 111571.  

Hi!

This is quite a mission impossible from a BOINC perspective, isn't it? If the initial estimation is off too much, the WUs will ALL get terminated prematurely, and the server will NEVER get a valid result to adjust the estimation of the computation performance, which is actually needed to provide a good estimation for the max elapsed time in the first place!!

I think (dreaming) that it would be best if (optionally) apps could have a self-profiling option. E.g. if the app info stuff defined for the app in question includes a special tag (say ), then the BOINC client could call the app with all the WU command line items of a workunit, and the additional command line option --benchmark-only . The app would then do a short test run (app developers would know best how to do that) that returns an estimation of the runtime for the entire workunit.

HB



ID: 111572 · Report as offensive     Reply Quote
Profile pragmatic prancing periodic problem child, left
Avatar

Send message
Joined: 26 Jan 05
Posts: 1639
Credit: 70,000
RAC: 0
Message 111573 - Posted: 14 Dec 2011, 21:53:21 UTC - in response to Message 111572.  
Last modified: 14 Dec 2011, 22:08:06 UTC

This is quite a mission impossible from a BOINC perspective, isn't it? If the initial estimation is off too much, the WUs will ALL get terminated prematurely, and the server will NEVER get a valid result to adjust the estimation of the computation performance, which is actually needed to provide a good estimation for the max elapsed time in the first place!!

If no work ever validates, like in my situation, then the adjustment will also never happen. Why will the work never validate? That's still where I come up clueless - no hint from Oliver either. Where is he by the way, seems like he evaporated. ;-)

... The app would then do a short test run (app developers would know best how to do that) that returns an estimation of the runtime for the entire workunit.

Either that or allow that the user sets the amount of flops for all OpenCL capable hardware. Now it can only be set on CUDA work and then only when using the anonymous platform.

But really, why is the flops count for my hardware set so low on the server, when the peak flops show that it can do way better. For the HD5800 the amount of digits for the peak flops and the 'actual flops' is the same (13). For my HD4850 it's 13 for the peak flops and 11 for the actual flops. There's got to be something wrong there.

I think I'll get work, then exit BOINC, adjust the flops number in client_state.xml, restart BOINC and see if that will make a difference. Maybe that will even validate work.

Edit: hehe, I edited the flops value to <flops>1919403979592.269394</flops>, now all tasks think they'll run for 11 minutes. That's gonna wreck my DCF completely. ;-)

Maybe I should make it even less...
Jord.

BOINC FAQ Service

They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake.
ID: 111573 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : News : Sending work



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration