WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Posts by Claggy

21) Message boards : Problems and Bug Reports : Errors - 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED (Message 113123)
Posted 23 Jun 2014 by Claggy
Post:
We're in the middle of Boinc server software testing here, see the news threads, the rsc_fpops_bound is O.K, the server is supplying ridiculous speed estimates for the initial tasks.

Claggy
22) Message boards : Problems and Bug Reports : question about "quorum" for units (Message 113111)
Posted 19 Jun 2014 by Claggy
Post:
It will in time be sent, the scheduler just has to wait for the right moment to send it, it won't necessarily send both tasks at the same time.

Claggy
23) Message boards : News : Project server code update (Message 113086)
Posted 18 Jun 2014 by Claggy
Post:
So you're saying that a host which has a very low actual throughput, relative to its marketing rating, will 'claim high' for credit?


My HD7770 against another HD7770 (3,215):

https://albert.phys.uwm.edu/workunit.php?wuid=620885

My HD7770 against another HD7770 (4,555):

https://albert.phys.uwm.edu/workunit.php?wuid=618068

against a HD 7500/7600/8500/8600 series (2,927):

https://albert.phys.uwm.edu/workunit.php?wuid=620828

against a HD 5800/5900 series (2,897):

https://albert.phys.uwm.edu/workunit.php?wuid=620875

against a HD 6900 series (3,409):

https://albert.phys.uwm.edu/workunit.php?wuid=619539

against a GeForce G210 (3,218):

https://albert.phys.uwm.edu/workunit.php?wuid=620250

against a 8800GTX (3,013):

https://albert.phys.uwm.edu/workunit.php?wuid=619497

against a 8800GTX (4,890):

https://albert.phys.uwm.edu/workunit.php?wuid=617804

against a 9600 GT (3,525):

https://albert.phys.uwm.edu/workunit.php?wuid=618083

against a 9600 GT (3,258):

https://albert.phys.uwm.edu/workunit.php?wuid=618072

against a 9600 GT (3,374):

https://albert.phys.uwm.edu/workunit.php?wuid=618075

against a 9600 GT (3,441):

https://albert.phys.uwm.edu/workunit.php?wuid=618080

against a NVS 4200M (4,598):

https://albert.phys.uwm.edu/workunit.php?wuid=606864

against a GT 555M (4,229):

https://albert.phys.uwm.edu/workunit.php?wuid=612309

against a GTX 670M (3,388):

https://albert.phys.uwm.edu/workunit.php?wuid=617797

against a GTX 680 (3,363)

https://albert.phys.uwm.edu/workunit.php?wuid=617769

Against Richard's GTX670 (all around 2400):

https://albert.phys.uwm.edu/workunit.php?wuid=620884

https://albert.phys.uwm.edu/workunit.php?wuid=620851

https://albert.phys.uwm.edu/workunit.php?wuid=620495

https://albert.phys.uwm.edu/workunit.php?wuid=620346

I guess AMD's, legacy NV's, and modern mobile NV's have a relative high flops to their actual throughput.

Claggy
24) Message boards : News : Project server code update (Message 113066)
Posted 18 Jun 2014 by Claggy
Post:
I've started documenting the wingmates who co-validate my 'high outlier' credit scores, but no pattern has emerged yet.

Validated with different app versions, like x86 on one and x64 on another?

Been running a number of CPU hosts on and off for months, mostly Arm, before the upgrade the best app, ie Neon app was only sent to my Arm hosts unless I aborted tasks to drive the Max tasks per day down low enough,
(My 2012 HTC One S and the 1.43 Neon app only produced validate errors, and the scheduler wouldn't send the 1.43 VFP app unless I did that, it completed 5 of those O.K),
the 1.44 Neon app is good through, and has completed over 200 hundred now with hardly a problem, no more VFP tasks have been sent.
The two Parallellas were only doing Neon tasks before hand, afterwards they started picking up non Neon tasks, they are at 11 and 10 validations so far for non Neon, and 21 and 23 for Neon,
The 2012 Nexus 7 had done only Neon tasks beforehand, afterwards it's picked up VFP tasks, done 8 of those against 37 of Neon, the VFP app is about half the speed of the Neon app,
The C2D T8100 Linux x64 host before hand only picked up x64 BRP tasks, it's completed 271 so far, SSE2 x86 tasks have never been sent,

On the HD7770 it picked up 1.34 windows_x86_64 (BRP4G-opencl-ati) and 1.34 windows_intelx86 (BRP4G-opencl-ati) work, these are different apps, with different file sizes, the x86_x64 app had some validations from beforehand,
afterwards they were failing with max time exceeded errors for a few days, the x86 work got sent when the x64 max tasks per day got too low,
since the x64 tasks got a reasonable speed estimate their tasks complete O.K, x86 tasks haven't been sent again, I have no idea which app is fastest,
looks as if there are scheduler differences between sending CPU and GPU apps, I would have expected some x86 work to be sent.

It's similar with the Perseus Arm Survey, I've had work from 1.39 windows_x86_64 (BRP5-opencl-ati) and 1.39 windows_intelx86 (BRP5-opencl-ati), the x64 app has some validations, the x86 none,
while you can't tell the difference from the tasks page, Boinc Manager shows different duration estimates for the two, 33secs for x64, 20secs for x86, stderr.txt doesn't seem to tell them apart.

CreditNew seems to use different calculations depending whether an app is above or below a sample level, could it be that one app version is above the sample level and the other isn't?

Claggy
25) Message boards : News : Project server code update (Message 113035)
Posted 17 Jun 2014 by Claggy
Post:
Unfortunately I missed the server log for a fetch - just got a 'report only' RPC instead. Could you grab a log if it does another work_fetch, please?

I did another request, and suspended network:

https://albert.phys.uwm.edu/host_sched_logs/8/8143

Claggy

[version] [AV#911] (FGRPopencl-ati) adjusting projected flops based on PFC avg: 2950.33G


That's not TeraFlops (speed), That's peak flop count, as in # of operations.

(verifying in code now)

*scratch that* looks broken, walking the lot with beer


Boinc startup says:

17/06/2014 18:17:17 | | CAL: ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (CAL version 1.4.1848, 1024MB, 984MB available, 3584 GFLOPS peak)
17/06/2014 18:17:17 | | OpenCL: AMD/ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (driver version 1348.5 (VM), device version OpenCL 1.2 AMD-APP (1348.5), 1024MB, 984MB available, 3584 GFLOPS peak)
17/06/2014 18:17:17 | | OpenCL CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1348.5 (sse2,avx), device version OpenCL 1.2 AMD-APP (1348.5))

The GTX460 always had a lot lower GFLOPS peak value, but was a lot more effective at Seti v6, v7 and AP v6, the exception being here, and the OpenCL Gamma-ray pulsar search #3 1.07 app, where the HD7770 was a little faster:

https://albert.phys.uwm.edu/host_app_versions.php?hostid=8143

Gamma-ray pulsar search #3 1.07 windows_x86_64 (FGRPopencl-ati)
Number of tasks completed 13
Max tasks per day 45
Number of tasks today 0
Consecutive valid tasks 13
Average processing rate 3.55 GFLOPS
Average turnaround time 0.37 days

Gamma-ray pulsar search #3 1.07 windows_x86_64 (FGRPopencl-nvidia)
Number of tasks completed 12
Max tasks per day 44
Number of tasks today 0
Consecutive valid tasks 12
Average processing rate 2.87 GFLOPS
Average turnaround time 0.88 days


http://boinc.berkeley.edu/dev/forum_thread.php?id=8767&postid=51659

04/12/2013 21:25:07 | | CUDA: NVIDIA GPU 0: GeForce GTX 460 (driver version 331.58, CUDA version 6.0, compute capability 2.1, 1024MB, 854MB available, 1075 GFLOPS peak)
04/12/2013 21:25:07 | | CAL: ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (CAL version 1.4.1848, 1024MB, 984MB available, 3584 GFLOPS peak)
04/12/2013 21:25:07 | | OpenCL: NVIDIA GPU 0: GeForce GTX 460 (driver version 331.58, device version OpenCL 1.1 CUDA, 1024MB, 854MB available, 1075 GFLOPS peak)
04/12/2013 21:25:07 | | OpenCL: AMD/ATI GPU 0: AMD Radeon HD 7700 series (Capeverde) (driver version 1348.4 (VM), device version OpenCL 1.2 AMD-APP (1348.4), 1024MB, 984MB available, 3584 GFLOPS peak)
04/12/2013 21:25:07 | | OpenCL CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1348.4 (sse2,avx), device version OpenCL 1.2 AMD-APP (1348.4))

Claggy
26) Message boards : News : Project server code update (Message 113029)
Posted 17 Jun 2014 by Claggy
Post:
Unfortunately I missed the server log for a fetch - just got a 'report only' RPC instead. Could you grab a log if it does another work_fetch, please?

I did another request, and suspended network:

https://albert.phys.uwm.edu/host_sched_logs/8/8143

Claggy
27) Message boards : News : Project server code update (Message 113025)
Posted 17 Jun 2014 by Claggy
Post:
For your info, my i7-2600K/HD7770 is now picking up Gamma-ray pulsar search #3 tasks, the initial CPU estimates look O.K at 4hrs 55mins, the ATI estimates are at 5 seconds.
(This application type has CPU, Nvidia, ATI and Intel apps across Windows, Mac and Linux (But no Intel app on Linux))

Claggy


whetstone, Flops and rsc_fpops_est for GPu and CPU?

edit: 'please' - sorry ::)


CPU p_fpops is 4514900817.923695

HD7770 peak_flops is 3584000000000.000000

flops for the CPU app_version of hsgamma_FGRP3 is 845960315.482654

flops for the ATI GPU app_version of hsgamma_FGRP3 is 2950327174499.708000

rsc_fpops_est is 15000000000000.000000, with rsc_fpops_bound at 300000000000000.000000

With an Gamma-ray pulsar search #3 only request I got:

https://albert.phys.uwm.edu/host_sched_logs/8/8143

2014-06-17 17:18:23.1994 [PID=2155 ] [send] CPU: req 8330.13 sec, 0.00 instances; est delay 0.00
2014-06-17 17:18:23.1995 [PID=2155 ] [send] AMD/ATI GPU: req 8692.21 sec, 0.00 instances; est delay 0.00
2014-06-17 17:18:23.1995 [PID=2155 ] [send] work_req_seconds: 8330.13 secs
2014-06-17 17:18:23.1995 [PID=2155 ] [send] available disk 95.78 GB, work_buf_min 95040
2014-06-17 17:18:23.1995 [PID=2155 ] [send] on_frac 0.923624 active_frac 0.985800 gpu_active_frac 0.984082
2014-06-17 17:18:23.1995 [PID=2155 ] [send] CPU features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm vmx tm2 pbe
2014-06-17 17:18:23.3103 [PID=2155 ] [mixed] sending locality work first


2014-06-17 17:18:23.3223 [PID=2155 ] [version] get_app_version(): getting app version for WU#604131 (LATeah0109C_32.0_0_-1.48e-10) appid:30
2014-06-17 17:18:23.3223 [PID=2155 ] [version] looking for version of hsgamma_FGRP3
2014-06-17 17:18:23.3224 [PID=2155 ] [version] Checking plan class 'FGRPopencl-ati'
2014-06-17 17:18:23.3234 [PID=2155 ] [version] reading plan classes from file '/BOINC/projects/AlbertAtHome/plan_class_spec.xml'
2014-06-17 17:18:23.3234 [PID=2155 ] [version] plan_class_spec: parsed project prefs setting 'gpu_util_fgrp' : true : 1.000000
2014-06-17 17:18:23.3234 [PID=2155 ] [version] [AV#911] (FGRPopencl-ati) adjusting projected flops based on PFC avg: 2950.33G
2014-06-17 17:18:23.3234 [PID=2155 ] [version] Best app version is now AV911 (85.84 GFLOP)
2014-06-17 17:18:23.3235 [PID=2155 ] [version] Checking plan class 'FGRPopencl-intel_gpu'
2014-06-17 17:18:23.3235 [PID=2155 ] [version] plan_class_spec: parsed project prefs setting 'gpu_util_fgrp' : true : 1.000000
2014-06-17 17:18:23.3235 [PID=2155 ] [version] [version] No Intel GPUs found
2014-06-17 17:18:23.3235 [PID=2155 ] [version] [AV#912] app_plan() returned false
2014-06-17 17:18:23.3235 [PID=2155 ] [version] Checking plan class 'FGRPopencl-nvidia'
2014-06-17 17:18:23.3235 [PID=2155 ] [version] plan_class_spec: parsed project prefs setting 'gpu_util_fgrp' : true : 1.000000
2014-06-17 17:18:23.3235 [PID=2155 ] [version] plan_class_spec: No NVIDIA GPUs found
2014-06-17 17:18:23.3235 [PID=2155 ] [version] [AV#925] app_plan() returned false
2014-06-17 17:18:23.3235 [PID=2155 ] [version] [AV#911] (FGRPopencl-ati) adjusting projected flops based on PFC avg: 2950.33G
2014-06-17 17:18:23.3235 [PID=2155 ] [version] Best version of app hsgamma_FGRP3 is [AV#911] (2950.33 GFLOPS)
2014-06-17 17:18:23.3236 [PID=2155 ] [send] est delay 0, skipping deadline check
2014-06-17 17:18:23.3264 [PID=2155 ] [send] Sending app_version hsgamma_FGRP3 7 111 FGRPopencl-ati; projected 2950.33 GFLOPS
2014-06-17 17:18:23.3265 [PID=2155 ] [CRITICAL] No filename found in [WU#604131 LATeah0109C_32.0_0_-1.48e-10]
2014-06-17 17:18:23.3265 [PID=2155 ] [send] est. duration for WU 604131: unscaled 5.08 scaled 5.59
2014-06-17 17:18:23.3265 [PID=2155 ] [send] [HOST#8143] sending [RESULT#1450173 LATeah0109C_32.0_0_-1.48e-10_1] (est. dur. 5.59s (0h00m05s59)) (max time 101.68s (0h01m41s68))
2014-06-17 17:18:23.3291 [PID=2155 ] [locality] send_old_work(LATeah0109C_32.0_0_-1.48e-10_1) sent result created 344.0 hours ago [RESULT#1450173]
2014-06-17 17:18:23.3291 [PID=2155 ] [locality] Note: sent NON-LOCALITY result LATeah0109C_32.0_0_-1.48e-10_1
2014-06-17 17:18:23.3292 [PID=2155 ] [locality] send_results_for_file(h1_0997.00_S6Direct)
2014-06-17 17:18:23.3365 [PID=2155 ] [locality] in_send_results_for_file(h1_0997.00_S6Direct, 0) prev_result.id=1488887

Claggy
28) Message boards : News : Project server code update (Message 113019)
Posted 17 Jun 2014 by Claggy
Post:
For your info, my i7-2600K/HD7770 is now picking up Gamma-ray pulsar search #3 tasks, the initial CPU estimates look O.K at 4hrs 55mins, the ATI estimates are at 5 seconds.
(This application type has CPU, Nvidia, ATI and Intel apps across Windows, Mac and Linux (But no Intel app on Linux))

All Gamma-ray pulsar search #3 tasks for computer 8143

Claggy
29) Message boards : News : Web code updated (Message 112982)
Posted 16 Jun 2014 by Claggy
Post:
Hm, I can't reproduce that. For me it shows my name and "log out" as expected...

You're an administrator, the rest of us don't need to be logged onto that page, and can't logon because we aren't administrators.

Claggy
30) Message boards : Problems and Bug Reports : 'User aborted' (Message 112963)
Posted 15 Jun 2014 by Claggy
Post:
The event log is too small, it only goes back to the 13th (I'm running some 40 projects). The same PC is running Einstein on its HD 4000 IGP since...

Look at stdoutdae.txt or stdoutdae.old in your Boinc Data directory, you'll find it will go back further.

Claggy
31) Message boards : Problems and Bug Reports : 'User aborted' (Message 112960)
Posted 14 Jun 2014 by Claggy
Post:
Hm,

201 (0xc9) EXIT_MISSING_COPROC,

https://albert.phys.uwm.edu/result.php?resultid=1490485

I wonder if the client aborted them, and there's a mismatch in what the client says, and what the web code reports.

What does the Event log say?

Claggy
32) Message boards : Problems and Bug Reports : 'User aborted' (Message 112958)
Posted 14 Jun 2014 by Claggy
Post:
Your computers are hidden, so there is no evidence of what is happening:

https://albert.phys.uwm.edu/show_user.php?userid=108127

Claggy
33) Message boards : News : Project server code update (Message 112956)
Posted 14 Jun 2014 by Claggy
Post:
Attached a new host to Albert, looking through the logs i keep getting the following download error:

14-Jun-2014 06:06:32 [Albert@Home] Started download of eah_slide_05.png
14-Jun-2014 06:06:32 [Albert@Home] Started download of eah_slide_07.png
14-Jun-2014 06:06:32 [Albert@Home] Started download of eah_slide_08.png
14-Jun-2014 06:06:33 [Albert@Home] Finished download of eah_slide_07.png
14-Jun-2014 06:06:33 [Albert@Home] Started download of EatH_mastercat_1344952579.txt
14-Jun-2014 06:06:34 [Albert@Home] Finished download of eah_slide_05.png
14-Jun-2014 06:06:34 [Albert@Home] Finished download of eah_slide_08.png
14-Jun-2014 06:06:34 [Albert@Home] Giving up on download of EatH_mastercat_1344952579.txt: permanent HTTP error

On this new host (as well as on my HD7770) i'm still getting the very short estimates for Perseus Arm Survey GPU tasks, so i've added two zero's to the rsc_fpops values so they'll complete.

Computer 11441

Claggy
34) Message boards : News : Web code updated (Message 112954)
Posted 14 Jun 2014 by Claggy
Post:
Server status page should work again.

BM

It shows "Log In" at the top right, but i'm already logged in. I think its supposed to show my username and "Log out". Clicking on it indeed prompts to log in (again).

Seti Beta had that a year or two ago, I believe they removed/hid it.

Claggy
35) Message boards : News : Web code updated (Message 112950)
Posted 13 Jun 2014 by Claggy
Post:
I'd noticed that the website availability had been erratic earlier today. Not a problem in itself, and I can work round some of the odder side effects.

But I also got a few download errors on workunits around the same time. They showed as 'download error' on GW (CasA) data files like 'h1_0072.55_S6Direct'. The actual error code was

ERR_HTTP_PERMANENT  -224
    // represents HTTP 404 or 416 error

I'm still getting a fair amount of those:

<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>h1_1000.20_S6Direct</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>l1_1000.20_S6Direct</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>

https://albert.phys.uwm.edu/result.php?resultid=1488062

Claggy
36) Message boards : News : Web code updated (Message 112937)
Posted 12 Jun 2014 by Claggy
Post:
...but still can't get work for my intel GPU mac mini. I went to see if there was work available, but that page is blown up right now.

The only MAC intel_gpu app deployed here is for Gamma-ray pulsar search #3, and as far as I know they aren't producing work for it at the moment.

Albert applications

Claggy
37) Message boards : News : Project server code update (Message 112924)
Posted 11 Jun 2014 by Claggy
Post:
I tried to resend those BRP (Arecibo, GPU) tasks, but got them expired instead (I had use ATI GPU set to No), So managed to get fresh GPU tasks, a mixture of BRP (Arecibo, GPU) and BRP (Perseus Arm Survey),
the (Arecibo, GPU) tasks now have estimates of 13 minutes, while they take an hour, so they are now completeable, the (Perseus Arm Survey) tasks have estimates of 16 seconds, so aren't, i'll let the ones I have run and error:

All tasks for computer 8143

Application details for host 8143

Claggy
38) Message boards : News : Project server code update (Message 112921)
Posted 11 Jun 2014 by Claggy
Post:
Oh, you're going to love this one

		Jason	Holmis	Claggy	Zombie	Zombie (Mac)
Host:		11363	2267	9008	6490	6109
		GTX 780	GTX 660	GT 650M	TITAN	GTX 680MX

Credit for BRP4G (GPU)						

Maximum		1170.48	1036.86	10239.0	1654.85	11847.50
Minimum		115.82	88.84	153.90	25.79	94.88
Average		548.33	463.98	3875.88	874.96	2256.70
Median		468.80	390.21	2977.38	865.33	1591.80
Std Dev		431.90	268.52	2873.26	362.30	2395.61

I'll upload a graph after lunch, when my monitor has cooled down and I've stopped laughing.

For your info, my GT650M is running one task at a time, and I'm only running two CPU tasks at a time too,
(It runs very hot, the 2.5GHz i5-3210M is a dual core with hyper threading, with it running on it's turbo mode of 2.89GHz the CPU cores sit at 99°C,
add another core crunching, or the intel GPU crunching and it starts downclocking, both CPU and Nvidia GPU)

Since I've now got Intel GPU tasks, the CPU is flucturating between 1.90GHz and 2.89GHz in 0.1GHz steps, ie 2.89, 2.79, 2.69, 2.59, 2.50, 2.40, 2.20, 2.10, etc,
and the GT650M is switching between 950MHz and 118MHz, while the HD Graphics 4000 is switching between 950MHz, 1.0GHz, 1.05GHz and 1.10GHz,
expect all task durations to flucturate. ;-)

Claggy
39) Message boards : News : Project server code update (Message 112894)
Posted 6 Jun 2014 by Claggy
Post:
Got some of tasks resent again, still the same, tasks are predicted to take 16 seconds, this host hasn't completed it's 11 validations of that app_version yet, so it's using the initial estimate, and not it's app_version APR yet:

Binary Radio Pulsar Search (Arecibo, GPU) 1.34 windows_x86_64 (BRP4G-opencl-ati)
Number of tasks completed 7
Max tasks per day 1
Number of tasks today 0
Consecutive valid tasks 0
Average processing rate 61.916362373902
Average turnaround time 0.82 days


Claggy
40) Message boards : News : Project server code update (Message 112887)
Posted 5 Jun 2014 by Claggy
Post:
I got some of those tasks resent:

https://albert.phys.uwm.edu/host_sched_logs/8/8143

Claggy


Previous 20 · Next 20



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration