WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!
Sending work |
Message boards :
News :
Sending work
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
robertmiles Send message Joined: 16 Nov 11 Posts: 19 Credit: 4,468,368 RAC: 0 |
I just read something related on the boinc_dev mailing list. It seems that BOINC 7.01 and 7.02 don't allocate enough digits in one of the places they store ATI version numbers, and are therefore likely to get at least some of the version numbers wrong. |
Gaurav Khanna Send message Joined: 8 Nov 04 Posts: 12 Credit: 2,818,895 RAC: 0 |
Since the upgrade to 7.0.2 I'm not getting any work for the GPUs .. 08-Dec-2011 07:20:02 [Albert@Home] Sending scheduler request: To fetch work. 08-Dec-2011 07:20:02 [Albert@Home] Requesting new tasks for ATI 08-Dec-2011 07:20:05 [Albert@Home] Scheduler request completed: got 0 new tasks 08-Dec-2011 07:20:05 [Albert@Home] No tasks sent 08-Dec-2011 07:58:09 [Albert@Home] Sending scheduler request: To fetch work. 08-Dec-2011 07:58:09 [Albert@Home] Requesting new tasks for NVIDIA 08-Dec-2011 07:58:11 [Albert@Home] Scheduler request completed: got 0 new tasks 08-Dec-2011 07:58:11 [Albert@Home] No tasks sent host details are here: http://albert.phys.uwm.edu/show_host_detail.php?hostid=1396 Any thoughts? |
Bernd Machenschalk Volunteer moderator Project administrator Project developer Send message Joined: 15 Oct 04 Posts: 1956 Credit: 6,218,130 RAC: 0 |
Click on the "last contact" link to see the scheduler logs. In case of your host I see 2011-12-08 15:42:31.5128 [PID=32492] [version] Checking plan class 'atiOpenCL' 2011-12-08 15:42:31.5128 [PID=32492] [version] GPU RAM required min: 536870912.000000, supplied: 0 2011-12-08 15:42:31.5128 [PID=32492] [version] [AV#459] app_plan() returned false Hm - looks like there's something wrong with the GPU RAM size reporting. I'll look into that tomorrow. BM |
Bernd Machenschalk Volunteer moderator Project administrator Project developer Send message Joined: 15 Oct 04 Posts: 1956 Credit: 6,218,130 RAC: 0 |
Hm ... according to sched_request the ATI device isn't OpenCL capable: <coproc_ati> <count>1</count> <name>ATI Radeon HD 5800 series (Cypress)</name> <available_ram>1024458752.000000</available_ram> <have_cal>1</have_cal> <have_opencl>0</have_opencl> <req_secs>0.000000</req_secs> <req_instances>0.000000</req_instances> <estimated_delay>0.000000</estimated_delay> <peak_flops>50000000000.000000</peak_flops> <CALVersion>1.4.815</CALVersion> <target>8</target> <localRAM>1024</localRAM> <uncachedRemoteRAM>1788</uncachedRemoteRAM> <cachedRemoteRAM>508</cachedRemoteRAM> <engineClock>0</engineClock> <memoryClock>0</memoryClock> <wavefrontSize>64</wavefrontSize> <numberOfSIMD>20</numberOfSIMD> <doublePrecision>1</doublePrecision> <pitch_alignment>256</pitch_alignment> <surface_alignment>256</surface_alignment> <maxResource1DWidth>16384</maxResource1DWidth> <maxResource2DWidth>16384</maxResource2DWidth> <maxResource2DHeight>16384</maxResource2DHeight> <atirt_detected/> </coproc_ati> |
STE/E Send message Joined: 18 Jan 05 Posts: 144 Credit: 7,886,269 RAC: 0 |
Hm ... according to sched_request the ATI device isn't OpenCL capable: I don't think Cal version 1.4.815 is OpenCL Capable ... STE\/E |
Gaurav Khanna Send message Joined: 8 Nov 04 Posts: 12 Credit: 2,818,895 RAC: 0 |
Weird .. with BOINC 6.13 the same ATI GPU completed scores of work units: For example: http://albert.phys.uwm.edu/result.php?resultid=53182 And I use the ATI GPU for OpenCL development all the time .. |
Gaurav Khanna Send message Joined: 8 Nov 04 Posts: 12 Credit: 2,818,895 RAC: 0 |
What exactly does BOINC do to check for OpenCL? |
Oliver Behnke Volunteer moderator Project administrator Project developer Send message Joined: 4 Sep 07 Posts: 130 Credit: 8,545,955 RAC: 0 |
It is, but you have to install the SDK and register the OpenCL ICD yourself. Installing the driver is not sufficient. Gaurav, have you changed anything in your setup? Oliver |
Oliver Behnke Volunteer moderator Project administrator Project developer Send message Joined: 4 Sep 07 Posts: 130 Credit: 8,545,955 RAC: 0 |
What exactly does BOINC do to check for OpenCL? It uses libOpenCL via late binding to query a few basic properties. Using 10.8 (as you do) you have to make sure it's available as it's not installed automatically in the usual library paths. Better still, upgrade your driver to 11.7 (11.11 on Linux!). That version of the Catalyst driver installs the OpenCL runtime all by itself. Our app officially requires at least 11.7 anyway as we build it using SDK 2.5. Oliver |
Gaurav Khanna Send message Joined: 8 Nov 04 Posts: 12 Credit: 2,818,895 RAC: 0 |
Thanks Oliver. That was helpful. Figured it out .. it was a 64-bit Vs 32-bit issue. The ATI app is 32-bit, but the BOINC client I built is 64-bit. So, I had to make both 32 and 64 bit OpenCL libs available in my library path. Its working now (finished a result successfully): http://albert.phys.uwm.edu/result.php?resultid=54521 |
x3mEn Send message Joined: 21 Jun 11 Posts: 9 Credit: 10,000 RAC: 0 |
Aborting task p2030.20100912.G57.94-00.24.S.b5s0g0.00000_416_1: exceeded elapsed time limit 6394.64 (2800000.00G/432.72G) 3 WUs were aborted after ~6395 sec of run time for the same reason: p2030.20100912.G57.94-00.24.S.b5s0g0.00000_416_1 - 6,395.56 sec p2030.20100913.G44.55+00.20.C.b5s0g0.00000_1824_2 - 6,395.53 sec p2030.20100912.G57.94-00.24.S.b5s0g0.00000_744_0 - 6,395.32 sec WTF? |
Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0 |
Hi! My guess is that the way BOINC calculates the maximum allowed elapsed time for a task to finish has changed with version 7.x. So the workunits would now have to be generated with a higher number of estimated/max floating point ops per task. Because BRP4 workunits are all created equal, no matter on which platform they will eventually get crunched, one value of estimated floating point ops must be used for CPU, NVIDIA/CUDA and ATI/OpenCL (and all supported BOINC client versions). No trivial task. Until this gets fixed on the server side for new workunits, one could theoretically do a workaround in the client_state.xml file to prevent more WUs erroring out (and wrecking your quota): - Stop BOINC - open the client_state.xml file in an editor - replace occurrences of the following two lines <rsc_fpops_est>140000000000000.000000</rsc_fpops_est> <rsc_fpops_bound>2800000000000000.000000</rsc_fpops_bound> with (say) <rsc_fpops_est>1400000000000000.000000</rsc_fpops_est> <rsc_fpops_bound>28000000000000000.000000</rsc_fpops_bound> (Actually you should check that only WUs of the Albert@Home project are changed if you use that BOINC instance for other projects as well). CU HBE |
Bernd Machenschalk Volunteer moderator Project administrator Project developer Send message Joined: 15 Oct 04 Posts: 1956 Credit: 6,218,130 RAC: 0 |
My guess is that the way BOINC calculates the maximum allowed elapsed time for a task to finish has changed with version 7.x. No, it's worse - it's the new server code ("credit new") that does this. Although we are still using server-assigned credit here on Albert, the run time estimation etc. is handled by the new system. This is part of what we intend to test here. BM |
Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0 |
oh.... Were you able to compensate for this effect? That is, will newly generated WUs have a chance to be computed in time without the manual editing I described above? Thx, HB |
Bernd Machenschalk Volunteer moderator Project administrator Project developer Send message Joined: 15 Oct 04 Posts: 1956 Credit: 6,218,130 RAC: 0 |
So far I didn't get any help from the BOINC devs that I asked for, so I'm still analyzing and digging through the code myself. Indeed it currently looks like some things have changed on both ends - client and server - and I still need to understand how these changes work together. BM Edit: Hm, apparently rsc_fpops_est and rsc_fpops_bound pass the server code unchanged, that are still the values written by the WUG... In the Client there is: max_elapsed_time = rp->wup->rsc_fpops_bound/rp->avp->flops; where rsc_fpops_bound should be what it gets passed from the server, and avp->flops the "flops" of the "app version". Apparently that's also sent by the server for the App it sends. Bikeman, what's that in your case? There must be something like <app_version> ... <flops>62700155339.574905</flops> ... </app_version> in the client_state.xml you edited. |
Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0 |
Hi! Let me see... <app_version> <app_name>einsteinbinary_BRP4</app_name> <version_num>119</version_num> <platform>i686-pc-linux-gnu</platform> <avg_ncpus>0.150000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <flops>2402526409719.028320</flops> <plan_class>atiOpenCL</plan_class> (but that is after I made the editing). As different users see different cut-off times, I would have expected that this scales with the BOINC benchmark result for the individual graphics card. For mine, it seems to be this: <coproc_ati> <count>1</count> <name>ATI Radeon HD 5800 series (Cypress)</name> <available_ram>1002438656.000000</available_ram> <have_cal>1</have_cal> <have_opencl>1</have_opencl> <peak_flops>4176000000000.000000</peak_flops> CU HB |
Bernd Machenschalk Volunteer moderator Project administrator Project developer Send message Joined: 15 Oct 04 Posts: 1956 Credit: 6,218,130 RAC: 0 |
The "peak_flops" is not benchmarked. It's merely a theoretical upper bound derived basically from the number of "cores" times the clock frequency. Thus the peak_flops should be identical for similar devices. According to the CreditNew description "The scheduler adjusts this [peak_flops], using the elapsed time statistics, to get the app_version.flops_est it sends to the client (from which job durations are estimated)." BM |
pragmatic prancing periodic problem child, left Send message Joined: 26 Jan 05 Posts: 1639 Credit: 70,000 RAC: 0 |
<app_version> <app_name>einsteinbinary_BRP4</app_name> <version_num>119</version_num> <platform>i686-pc-linux-gnu</platform> <avg_ncpus>0.150000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <flops>2402526409719.028320</flops> <plan_class>atiOpenCL</plan_class> Yes, let's see. Local stuff, from the HD4850 <app_version> <app_name>einsteinbinary_BRP4</app_name> <version_num>109</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <flops>33321667311.708328</flops> <plan_class>ATIOpenCL</plan_class> <api_version>6.13.8</api_version> <coproc_ati> <count>1</count> <name>ATI Radeon HD 5800 series (Cypress)</name> <available_ram>1002438656.000000</available_ram> <have_cal>1</have_cal> <have_opencl>1</have_opencl> <peak_flops>4176000000000.000000</peak_flops> For the actual card, I'll do the whole shebang: <coproc_ati> <count>1</count> <name>ATI Radeon HD 4700/4800 (RV740/RV770)</name> <available_ram>1040187392.000000</available_ram> <have_cal>1</have_cal> <have_opencl>1</have_opencl> <peak_flops>2000000000000.000000</peak_flops> <CALVersion>1.4.1607</CALVersion> <target>5</target> <localRAM>1024</localRAM> <uncachedRemoteRAM>2047</uncachedRemoteRAM> <cachedRemoteRAM>2047</cachedRemoteRAM> <engineClock>625</engineClock> <memoryClock>900</memoryClock> <wavefrontSize>64</wavefrontSize> <numberOfSIMD>10</numberOfSIMD> <doublePrecision>1</doublePrecision> <pitch_alignment>256</pitch_alignment> <surface_alignment>4096</surface_alignment> <maxResource1DWidth>8192</maxResource1DWidth> <maxResource2DWidth>8192</maxResource2DWidth> <maxResource2DHeight>8192</maxResource2DHeight> <atirt_detected/> <coproc_opencl> <name>ATI RV770</name> <vendor>Advanced Micro Devices, Inc.</vendor> <vendor_id>4098</vendor_id> <available>1</available> <half_fp_config>0</half_fp_config> <single_fp_config>62</single_fp_config> <double_fp_config>63</double_fp_config> <endian_little>1</endian_little> <execution_capabilities>1</execution_capabilities> <extensions>cl_amd_fp64 cl_khr_gl_sharing cl_amd_device_attribute_query cl_khr_d3d10_sharing </extensions> <global_mem_size>1073741824</global_mem_size> <local_mem_size>16384</local_mem_size> <max_clock_frequency>625</max_clock_frequency> <max_compute_units>10</max_compute_units> <opencl_platform_version>OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)</opencl_platform_version> <opencl_device_version>OpenCL 1.0 AMD-APP-SDK-v2.5 (793.1)</opencl_device_version> <opencl_driver_version>CAL 1.4.1607</opencl_driver_version> </coproc_opencl> </coproc_ati> How come my flops are so little on the einsteinbinary? Only 33321667311 for Windows versus Heinz's 2402526409719 for Linux? Yeah I get it, those are estimated flops by the server, but heck. When the peak flops my GPU can do is 2000000000000 flops, or 60 times the estimated amount of flops my GPU is getting, no wonder why a) work is estimated so (s)low and b) I have a TDCF of 9.59! Plonk. Jord. BOINC FAQ Service They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake. |
Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0 |
Hi! This is quite a mission impossible from a BOINC perspective, isn't it? If the initial estimation is off too much, the WUs will ALL get terminated prematurely, and the server will NEVER get a valid result to adjust the estimation of the computation performance, which is actually needed to provide a good estimation for the max elapsed time in the first place!! I think (dreaming) that it would be best if (optionally) apps could have a self-profiling option. E.g. if the app info stuff defined for the app in question includes a special tag (say HB |
pragmatic prancing periodic problem child, left Send message Joined: 26 Jan 05 Posts: 1639 Credit: 70,000 RAC: 0 |
This is quite a mission impossible from a BOINC perspective, isn't it? If the initial estimation is off too much, the WUs will ALL get terminated prematurely, and the server will NEVER get a valid result to adjust the estimation of the computation performance, which is actually needed to provide a good estimation for the max elapsed time in the first place!! If no work ever validates, like in my situation, then the adjustment will also never happen. Why will the work never validate? That's still where I come up clueless - no hint from Oliver either. Where is he by the way, seems like he evaporated. ;-) ... The app would then do a short test run (app developers would know best how to do that) that returns an estimation of the runtime for the entire workunit. Either that or allow that the user sets the amount of flops for all OpenCL capable hardware. Now it can only be set on CUDA work and then only when using the anonymous platform. But really, why is the flops count for my hardware set so low on the server, when the peak flops show that it can do way better. For the HD5800 the amount of digits for the peak flops and the 'actual flops' is the same (13). For my HD4850 it's 13 for the peak flops and 11 for the actual flops. There's got to be something wrong there. I think I'll get work, then exit BOINC, adjust the flops number in client_state.xml, restart BOINC and see if that will make a difference. Maybe that will even validate work. Edit: hehe, I edited the flops value to <flops>1919403979592.269394</flops>, now all tasks think they'll run for 11 minutes. That's gonna wreck my DCF completely. ;-) Maybe I should make it even less... Jord. BOINC FAQ Service They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake. |