Sending work

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Author	Message
robertmiles Send message Joined: 16 Nov 11 Posts: 19 Credit: 4,468,368 RAC: 0	Message 111510 - Posted: 8 Dec 2011, 12:40:17 UTC - in response to Message 111498. Can the developers in the mean time fix the driver detection? Sorry, not up to us. I'm not sure whether the BOINC devs can do anything about it since this might even be an AMD driver issue. You may talk about two different things here. Jord, what exactly do you think should be fixed? I do see that displaying the ATI CAL/driver version on the host web pages appears broken (on Albert), and possibly the string in the DB is, too. In the scheduler the ATI "driver" version is stored as "char version[50]" and "int version_num" in coproc_ati, and in "char opencl_driver_version[32]" in opencl_device_prop. These could in principle be used in app_plan(), though we don't check this yet. BM I just read something related on the boinc_dev mailing list. It seems that BOINC 7.01 and 7.02 don't allocate enough digits in one of the places they store ATI version numbers, and are therefore likely to get at least some of the version numbers wrong. ID: 111510 · Reply Quote

Gaurav Khanna Send message Joined: 8 Nov 04 Posts: 12 Credit: 2,818,895 RAC: 0	Message 111511 - Posted: 8 Dec 2011, 15:15:05 UTC Since the upgrade to 7.0.2 I'm not getting any work for the GPUs .. 08-Dec-2011 07:20:02 [Albert@Home] Sending scheduler request: To fetch work. 08-Dec-2011 07:20:02 [Albert@Home] Requesting new tasks for ATI 08-Dec-2011 07:20:05 [Albert@Home] Scheduler request completed: got 0 new tasks 08-Dec-2011 07:20:05 [Albert@Home] No tasks sent 08-Dec-2011 07:58:09 [Albert@Home] Sending scheduler request: To fetch work. 08-Dec-2011 07:58:09 [Albert@Home] Requesting new tasks for NVIDIA 08-Dec-2011 07:58:11 [Albert@Home] Scheduler request completed: got 0 new tasks 08-Dec-2011 07:58:11 [Albert@Home] No tasks sent host details are here: http://albert.phys.uwm.edu/show_host_detail.php?hostid=1396 Any thoughts? ID: 111511 · Reply Quote

Bernd Machenschalk Volunteer moderator Project administrator Project developer Send message Joined: 15 Oct 04 Posts: 1956 Credit: 6,218,130 RAC: 0	Message 111513 - Posted: 8 Dec 2011, 19:31:03 UTC - in response to Message 111511. Click on the "last contact" link to see the scheduler logs. In case of your host I see 2011-12-08 15:42:31.5128 [PID=32492] [version] Checking plan class 'atiOpenCL' 2011-12-08 15:42:31.5128 [PID=32492] [version] GPU RAM required min: 536870912.000000, supplied: 0 2011-12-08 15:42:31.5128 [PID=32492] [version] [AV#459] app_plan() returned false Hm - looks like there's something wrong with the GPU RAM size reporting. I'll look into that tomorrow. BM ID: 111513 · Reply Quote

Bernd Machenschalk Volunteer moderator Project administrator Project developer Send message Joined: 15 Oct 04 Posts: 1956 Credit: 6,218,130 RAC: 0	Message 111514 - Posted: 8 Dec 2011, 19:56:28 UTC - in response to Message 111513. Last modified: 8 Dec 2011, 19:57:07 UTC Hm ... according to sched_request the ATI device isn't OpenCL capable: <coproc_ati> <count>1</count> <name>ATI Radeon HD 5800 series (Cypress)</name> <available_ram>1024458752.000000</available_ram> <have_cal>1</have_cal> <have_opencl>0</have_opencl> <req_secs>0.000000</req_secs> <req_instances>0.000000</req_instances> <estimated_delay>0.000000</estimated_delay> <peak_flops>50000000000.000000</peak_flops> <CALVersion>1.4.815</CALVersion> <target>8</target> <localRAM>1024</localRAM> <uncachedRemoteRAM>1788</uncachedRemoteRAM> <cachedRemoteRAM>508</cachedRemoteRAM> <engineClock>0</engineClock> <memoryClock>0</memoryClock> <wavefrontSize>64</wavefrontSize> <numberOfSIMD>20</numberOfSIMD> <doublePrecision>1</doublePrecision> <pitch_alignment>256</pitch_alignment> <surface_alignment>256</surface_alignment> <maxResource1DWidth>16384</maxResource1DWidth> <maxResource2DWidth>16384</maxResource2DWidth> <maxResource2DHeight>16384</maxResource2DHeight> <atirt_detected/> </coproc_ati> ID: 111514 · Reply Quote

STE/E Send message Joined: 18 Jan 05 Posts: 144 Credit: 7,886,269 RAC: 0	Message 111515 - Posted: 8 Dec 2011, 20:14:16 UTC - in response to Message 111514. Hm ... according to sched_request the ATI device isn't OpenCL capable: <coproc_ati> <count>1</count> <name>ATI Radeon HD 5800 series (Cypress)</name> <available_ram>1024458752.000000</available_ram> <have_cal>1</have_cal> <have_opencl>0</have_opencl> <req_secs>0.000000</req_secs> <req_instances>0.000000</req_instances> <estimated_delay>0.000000</estimated_delay> <peak_flops>50000000000.000000</peak_flops> <CALVersion>1.4.815</CALVersion> <target>8</target> <localRAM>1024</localRAM> <uncachedRemoteRAM>1788</uncachedRemoteRAM> <cachedRemoteRAM>508</cachedRemoteRAM> <engineClock>0</engineClock> <memoryClock>0</memoryClock> <wavefrontSize>64</wavefrontSize> <numberOfSIMD>20</numberOfSIMD> <doublePrecision>1</doublePrecision> <pitch_alignment>256</pitch_alignment> <surface_alignment>256</surface_alignment> <maxResource1DWidth>16384</maxResource1DWidth> <maxResource2DWidth>16384</maxResource2DWidth> <maxResource2DHeight>16384</maxResource2DHeight> <atirt_detected/> </coproc_ati> I don't think Cal version 1.4.815 is OpenCL Capable ... STE\/E ID: 111515 · Reply Quote

Gaurav Khanna Send message Joined: 8 Nov 04 Posts: 12 Credit: 2,818,895 RAC: 0	Message 111516 - Posted: 8 Dec 2011, 20:49:03 UTC - in response to Message 111515. Weird .. with BOINC 6.13 the same ATI GPU completed scores of work units: For example: http://albert.phys.uwm.edu/result.php?resultid=53182 And I use the ATI GPU for OpenCL development all the time .. ID: 111516 · Reply Quote

Gaurav Khanna Send message Joined: 8 Nov 04 Posts: 12 Credit: 2,818,895 RAC: 0	Message 111517 - Posted: 8 Dec 2011, 20:50:44 UTC What exactly does BOINC do to check for OpenCL? ID: 111517 · Reply Quote

Oliver Behnke Volunteer moderator Project administrator Project developer Send message Joined: 4 Sep 07 Posts: 130 Credit: 8,545,955 RAC: 0	Message 111518 - Posted: 9 Dec 2011, 9:35:08 UTC Last modified: 9 Dec 2011, 10:06:54 UTC I don't think Cal version 1.4.815 is OpenCL Capable ... It is, but you have to install the SDK and register the OpenCL ICD yourself. Installing the driver is not sufficient. Gaurav, have you changed anything in your setup? Oliver ID: 111518 · Reply Quote

Oliver Behnke Volunteer moderator Project administrator Project developer Send message Joined: 4 Sep 07 Posts: 130 Credit: 8,545,955 RAC: 0	Message 111521 - Posted: 9 Dec 2011, 10:10:21 UTC - in response to Message 111517. Last modified: 9 Dec 2011, 14:03:18 UTC What exactly does BOINC do to check for OpenCL? It uses libOpenCL via late binding to query a few basic properties. Using 10.8 (as you do) you have to make sure it's available as it's not installed automatically in the usual library paths. Better still, upgrade your driver to 11.7 (11.11 on Linux!). That version of the Catalyst driver installs the OpenCL runtime all by itself. Our app officially requires at least 11.7 anyway as we build it using SDK 2.5. Oliver ID: 111521 · Reply Quote

Gaurav Khanna Send message Joined: 8 Nov 04 Posts: 12 Credit: 2,818,895 RAC: 0	Message 111531 - Posted: 9 Dec 2011, 21:22:42 UTC - in response to Message 111521. Thanks Oliver. That was helpful. Figured it out .. it was a 64-bit Vs 32-bit issue. The ATI app is 32-bit, but the BOINC client I built is 64-bit. So, I had to make both 32 and 64 bit OpenCL libs available in my library path. Its working now (finished a result successfully): http://albert.phys.uwm.edu/result.php?resultid=54521 ID: 111531 · Reply Quote

x3mEn Send message Joined: 21 Jun 11 Posts: 9 Credit: 10,000 RAC: 0	Message 111540 - Posted: 11 Dec 2011, 13:42:59 UTC Aborting task p2030.20100912.G57.94-00.24.S.b5s0g0.00000_416_1: exceeded elapsed time limit 6394.64 (2800000.00G/432.72G) 3 WUs were aborted after ~6395 sec of run time for the same reason: p2030.20100912.G57.94-00.24.S.b5s0g0.00000_416_1 - 6,395.56 sec p2030.20100913.G44.55+00.20.C.b5s0g0.00000_1824_2 - 6,395.53 sec p2030.20100912.G57.94-00.24.S.b5s0g0.00000_744_0 - 6,395.32 sec WTF? ID: 111540 · Reply Quote

Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0	Message 111541 - Posted: 11 Dec 2011, 16:17:11 UTC - in response to Message 111540. Last modified: 11 Dec 2011, 16:18:45 UTC Hi! My guess is that the way BOINC calculates the maximum allowed elapsed time for a task to finish has changed with version 7.x. So the workunits would now have to be generated with a higher number of estimated/max floating point ops per task. Because BRP4 workunits are all created equal, no matter on which platform they will eventually get crunched, one value of estimated floating point ops must be used for CPU, NVIDIA/CUDA and ATI/OpenCL (and all supported BOINC client versions). No trivial task. Until this gets fixed on the server side for new workunits, one could theoretically do a workaround in the client_state.xml file to prevent more WUs erroring out (and wrecking your quota): - Stop BOINC - open the client_state.xml file in an editor - replace occurrences of the following two lines <rsc_fpops_est>140000000000000.000000</rsc_fpops_est> <rsc_fpops_bound>2800000000000000.000000</rsc_fpops_bound> with (say) <rsc_fpops_est>1400000000000000.000000</rsc_fpops_est> <rsc_fpops_bound>28000000000000000.000000</rsc_fpops_bound> (Actually you should check that only WUs of the Albert@Home project are changed if you use that BOINC instance for other projects as well). CU HBE ID: 111541 · Reply Quote

Bernd Machenschalk Volunteer moderator Project administrator Project developer Send message Joined: 15 Oct 04 Posts: 1956 Credit: 6,218,130 RAC: 0	Message 111554 - Posted: 13 Dec 2011, 14:11:49 UTC - in response to Message 111541. My guess is that the way BOINC calculates the maximum allowed elapsed time for a task to finish has changed with version 7.x. No, it's worse - it's the new server code ("credit new") that does this. Although we are still using server-assigned credit here on Albert, the run time estimation etc. is handled by the new system. This is part of what we intend to test here. BM ID: 111554 · Reply Quote

Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0	Message 111557 - Posted: 13 Dec 2011, 19:20:44 UTC - in response to Message 111554. oh.... Were you able to compensate for this effect? That is, will newly generated WUs have a chance to be computed in time without the manual editing I described above? Thx, HB ID: 111557 · Reply Quote

Bernd Machenschalk Volunteer moderator Project administrator Project developer Send message Joined: 15 Oct 04 Posts: 1956 Credit: 6,218,130 RAC: 0	Message 111562 - Posted: 14 Dec 2011, 10:46:12 UTC - in response to Message 111557. Last modified: 14 Dec 2011, 14:09:37 UTC So far I didn't get any help from the BOINC devs that I asked for, so I'm still analyzing and digging through the code myself. Indeed it currently looks like some things have changed on both ends - client and server - and I still need to understand how these changes work together. BM Edit: Hm, apparently rsc_fpops_est and rsc_fpops_bound pass the server code unchanged, that are still the values written by the WUG... In the Client there is: max_elapsed_time = rp->wup->rsc_fpops_bound/rp->avp->flops; where rsc_fpops_bound should be what it gets passed from the server, and avp->flops the "flops" of the "app version". Apparently that's also sent by the server for the App it sends. Bikeman, what's that in your case? There must be something like <app_version> ... <flops>62700155339.574905</flops> ... </app_version> in the client_state.xml you edited. ID: 111562 · Reply Quote

Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0	Message 111569 - Posted: 14 Dec 2011, 19:24:29 UTC - in response to Message 111562. Last modified: 14 Dec 2011, 19:27:21 UTC Hi! Let me see... <app_version> <app_name>einsteinbinary_BRP4</app_name> <version_num>119</version_num> <platform>i686-pc-linux-gnu</platform> <avg_ncpus>0.150000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <flops>2402526409719.028320</flops> <plan_class>atiOpenCL</plan_class> (but that is after I made the editing). As different users see different cut-off times, I would have expected that this scales with the BOINC benchmark result for the individual graphics card. For mine, it seems to be this: <coproc_ati> <count>1</count> <name>ATI Radeon HD 5800 series (Cypress)</name> <available_ram>1002438656.000000</available_ram> <have_cal>1</have_cal> <have_opencl>1</have_opencl> <peak_flops>4176000000000.000000</peak_flops> CU HB ID: 111569 · Reply Quote

Bernd Machenschalk Volunteer moderator Project administrator Project developer Send message Joined: 15 Oct 04 Posts: 1956 Credit: 6,218,130 RAC: 0	Message 111570 - Posted: 14 Dec 2011, 20:12:58 UTC - in response to Message 111569. Last modified: 14 Dec 2011, 20:20:57 UTC The "peak_flops" is not benchmarked. It's merely a theoretical upper bound derived basically from the number of "cores" times the clock frequency. Thus the peak_flops should be identical for similar devices. According to the CreditNew description "The scheduler adjusts this [peak_flops], using the elapsed time statistics, to get the app_version.flops_est it sends to the client (from which job durations are estimated)." BM ID: 111570 · Reply Quote

pragmatic prancing periodic problem child, left Send message Joined: 26 Jan 05 Posts: 1639 Credit: 70,000 RAC: 0	Message 111571 - Posted: 14 Dec 2011, 20:17:33 UTC - in response to Message 111569. <app_version> <app_name>einsteinbinary_BRP4</app_name> <version_num>119</version_num> <platform>i686-pc-linux-gnu</platform> <avg_ncpus>0.150000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <flops>2402526409719.028320</flops> <plan_class>atiOpenCL</plan_class> Yes, let's see. Local stuff, from the HD4850 <app_version> <app_name>einsteinbinary_BRP4</app_name> <version_num>109</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <flops>33321667311.708328</flops> <plan_class>ATIOpenCL</plan_class> <api_version>6.13.8</api_version> <coproc_ati> <count>1</count> <name>ATI Radeon HD 5800 series (Cypress)</name> <available_ram>1002438656.000000</available_ram> <have_cal>1</have_cal> <have_opencl>1</have_opencl> <peak_flops>4176000000000.000000</peak_flops> For the actual card, I'll do the whole shebang: <coproc_ati> <count>1</count> <name>ATI Radeon HD 4700/4800 (RV740/RV770)</name> <available_ram>1040187392.000000</available_ram> <have_cal>1</have_cal> <have_opencl>1</have_opencl> <peak_flops>2000000000000.000000</peak_flops> <CALVersion>1.4.1607</CALVersion> <target>5</target> <localRAM>1024</localRAM> <uncachedRemoteRAM>2047</uncachedRemoteRAM> <cachedRemoteRAM>2047</cachedRemoteRAM> <engineClock>625</engineClock> <memoryClock>900</memoryClock> <wavefrontSize>64</wavefrontSize> <numberOfSIMD>10</numberOfSIMD> <doublePrecision>1</doublePrecision> <pitch_alignment>256</pitch_alignment> <surface_alignment>4096</surface_alignment> <maxResource1DWidth>8192</maxResource1DWidth> <maxResource2DWidth>8192</maxResource2DWidth> <maxResource2DHeight>8192</maxResource2DHeight> <atirt_detected/> <coproc_opencl> <name>ATI RV770</name> <vendor>Advanced Micro Devices, Inc.</vendor> <vendor_id>4098</vendor_id> <available>1</available> <half_fp_config>0</half_fp_config> <single_fp_config>62</single_fp_config> <double_fp_config>63</double_fp_config> <endian_little>1</endian_little> <execution_capabilities>1</execution_capabilities> <extensions>cl_amd_fp64 cl_khr_gl_sharing cl_amd_device_attribute_query cl_khr_d3d10_sharing </extensions> <global_mem_size>1073741824</global_mem_size> <local_mem_size>16384</local_mem_size> <max_clock_frequency>625</max_clock_frequency> <max_compute_units>10</max_compute_units> <opencl_platform_version>OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)</opencl_platform_version> <opencl_device_version>OpenCL 1.0 AMD-APP-SDK-v2.5 (793.1)</opencl_device_version> <opencl_driver_version>CAL 1.4.1607</opencl_driver_version> </coproc_opencl> </coproc_ati> How come my flops are so little on the einsteinbinary? Only 33321667311 for Windows versus Heinz's 2402526409719 for Linux? Yeah I get it, those are estimated flops by the server, but heck. When the peak flops my GPU can do is 2000000000000 flops, or 60 times the estimated amount of flops my GPU is getting, no wonder why a) work is estimated so (s)low and b) I have a TDCF of 9.59! Plonk. Jord. BOINC FAQ Service They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake. ID: 111571 · Reply Quote

Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0	Message 111572 - Posted: 14 Dec 2011, 21:40:30 UTC - in response to Message 111571. Hi! This is quite a mission impossible from a BOINC perspective, isn't it? If the initial estimation is off too much, the WUs will ALL get terminated prematurely, and the server will NEVER get a valid result to adjust the estimation of the computation performance, which is actually needed to provide a good estimation for the max elapsed time in the first place!! I think (dreaming) that it would be best if (optionally) apps could have a self-profiling option. E.g. if the app info stuff defined for the app in question includes a special tag (say ), then the BOINC client could call the app with all the WU command line items of a workunit, and the additional command line option --benchmark-only . The app would then do a short test run (app developers would know best how to do that) that returns an estimation of the runtime for the entire workunit. HB ID: 111572 · Reply Quote

pragmatic prancing periodic problem child, left Send message Joined: 26 Jan 05 Posts: 1639 Credit: 70,000 RAC: 0	Message 111573 - Posted: 14 Dec 2011, 21:53:21 UTC - in response to Message 111572. Last modified: 14 Dec 2011, 22:08:06 UTC This is quite a mission impossible from a BOINC perspective, isn't it? If the initial estimation is off too much, the WUs will ALL get terminated prematurely, and the server will NEVER get a valid result to adjust the estimation of the computation performance, which is actually needed to provide a good estimation for the max elapsed time in the first place!! If no work ever validates, like in my situation, then the adjustment will also never happen. Why will the work never validate? That's still where I come up clueless - no hint from Oliver either. Where is he by the way, seems like he evaporated. ;-) ... The app would then do a short test run (app developers would know best how to do that) that returns an estimation of the runtime for the entire workunit. Either that or allow that the user sets the amount of flops for all OpenCL capable hardware. Now it can only be set on CUDA work and then only when using the anonymous platform. But really, why is the flops count for my hardware set so low on the server, when the peak flops show that it can do way better. For the HD5800 the amount of digits for the peak flops and the 'actual flops' is the same (13). For my HD4850 it's 13 for the peak flops and 11 for the actual flops. There's got to be something wrong there. I think I'll get work, then exit BOINC, adjust the flops number in client_state.xml, restart BOINC and see if that will make a difference. Maybe that will even validate work. Edit: hehe, I edited the flops value to <flops>1919403979592.269394</flops>, now all tasks think they'll run for 11 minutes. That's gonna wreck my DCF completely. ;-) Maybe I should make it even less... Jord. BOINC FAQ Service They say most of your brain shuts down in cryo-sleep. All but the primitive side, the animal side. No wonder I'm still awake. ID: 111573 · Reply Quote