WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Posts by Holmis

1) Message boards : News : Project server code update (Message 113230)
Posted 4 Jul 2014 by Profile Holmis
Post:
Got my 4th validation for the v1.40 BRP5 app in earlier today and credits are on the rise, first two got 12,62, the 3rd got 12,73 and the 4th a whopping 15,41!
The 12,73 one was against Richard both running v1.40 and the last one an older WU against Snow Crash on v1.39.
2) Message boards : Problems and Bug Reports : Amazing credit (Message 113218)
Posted 2 Jul 2014 by Profile Holmis
Post:
Well, this is a test project and what we're testing right now is the credit system! If you'll bear with us we will probably get "Credit New" patched up and ready to go! =)

For a lot of more info, try to keep up with the news section as that's where the discussion of the ongoing tests are held.
3) Message boards : News : Project server code update (Message 113215)
Posted 2 Jul 2014 by Profile Holmis
Post:
Roger that, will keep a close watch on things until I've completed my first 11 tasks then.
4) Message boards : News : Project server code update (Message 113213)
Posted 2 Jul 2014 by Profile Holmis
Post:
... some company would be nice, but be warned: we're half expecting to fall over the 'EXIT_TIME_LIMIT_EXCEEDED' problem at some stage with BRP5 Beta...

I just downloaded my first v1.40 BRP5 and I'd say it's looking pretty good so far! The estimated completion time shown in Boinc is 5h03m08s.
These are the relevant lines from the scheduler log:

2014-07-02 19:35:03.2067 [PID=25783] [version] Best version of app einsteinbinary_BRP5 is [AV#934] (24.74 GFLOPS)
2014-07-02 19:35:03.2067 [PID=25783] [send] est delay 0, skipping deadline check
2014-07-02 19:35:03.2067 [PID=25783] [version] get_app_version(): getting app version for WU#625766 (PB0020_006A1_164) appid:27
2014-07-02 19:35:03.2067 [PID=25783] [version] returning cached version: [AV#934]
2014-07-02 19:35:03.2067 [PID=25783] [send] est delay 0, skipping deadline check
2014-07-02 19:35:03.3000 [PID=25783] [send] Sending app_version einsteinbinary_BRP5 2 140 BRP5-cuda32-nv301; projected 24.74 GFLOPS
2014-07-02 19:35:03.3001 [PID=25783] [send] est. duration for WU 625766: unscaled 18188.26 scaled 18306.56
2014-07-02 19:35:03.3001 [PID=25783] [send] [HOST#2267] sending [RESULT#1514790 PB0020_006A1_164_4] (est. dur. 18306.56s (5h05m06s55)) (max time 363765.12s (101h02m45s11))

And I've got this in the application details:

Binary Radio Pulsar Search (Perseus Arm Survey) 1.40 windows_intelx86 (BRP5-cuda32-nv301)
Number of tasks completed   0
Max tasks per day	    0
Number of tasks today	    1
Consecutive valid tasks	    0
Average turnaround time	    0.00 days

For v1.39 the tasks took less than 5 hours and the APR was 21.91 GFlops.
Whatever was changed seems to be working with regards to the initial estimates assuming that the app and workload is more or less the same. Keep up the good work!
5) Message boards : Problems and Bug Reports : Stats are not being exported (Message 113192)
Posted 30 Jun 2014 by Profile Holmis
Post:
FYI stats has been generated as of 30-Jun-2014 00:35 (UTC?) and should show up on the stats sites soon.
6) Message boards : News : Project server code update (Message 113118)
Posted 20 Jun 2014 by Profile Holmis
Post:
Intel GPU:
<flops>581007031069.074340</flops>
<plan_class>BRP5-opencl-intel_gpu</plan_class>

That's 581 GFlops! Boinc reports it @ 147 GFlops peak in the startup messages.

Another follow up although it's been extensively discussed already.

11 BRP5 Intel GPU tasks has now been validated and the APR has been calculated to 10.78 GFlops running one tasks at a time. The initial estimate was that the iGPU was 53,9 times faster than actual. The peak value reported by Boinc in the startup messages is 13,6 times faster and if that had been used then the tasks would have finished without me having to increase the rsc_fpops_bound value to avoid Boinc aborting the tasks with "maximum time limit exceeded".
7) Message boards : News : Project server code update (Message 113098)
Posted 19 Jun 2014 by Profile Holmis
Post:
Here's some graphs from my validated tasks and to keep the post shorter I'll post links to the pictures.

BRP4X64 - A bit unstable, varies around 50 credits/task ±5 credits and with a few outliers.

BRP4G - Clear upward trend with no sign of coming back down. A few of the high outliers are late validations.

S6CasA (CPU only) - Not as unstable as BRPx64 but bigger difference between high and low, 270 - 330 credits/task. Low number of completed tasks so far.

BRP5 iGPU - The start of an upward trend with only 10 valid tasks so far.

BRP5 Nvidia - Big difference in high and low (20,59 - 8091,06) credits/task. Few completed tasks but might also be starting an upward trend.

And as before here's a link to the Excel document with all the data and graphs.
8) Message boards : News : Project server code update (Message 112979)
Posted 16 Jun 2014 by Profile Holmis
Post:
Nvidia GPU:
<flops>12454544406626.100000</flops>
<plan_class>BRP5-cuda32-nv301</plan_class>

That's 12454 GFlops or 12,45 TeraFlops! Boinc reports it @ 2985 GFlops peak in the startup messages. And the APR for the BRP4G Nvidia tasks is 58.1 GFlops when running 2 at a time.
If the BRP5 app gets the same APR then the initial speed estimate is that the card is 214 times as fast as it actually is!!!

A follow up on my post about initial estimates, the 11th BRP5 task has now been validated and the APR has been calculated to 30.16 GFlops when running 2 tasks at a time.
So the initial estimate was that the card was a whooping 412,9 times faster than actual! =O
9) Message boards : News : Project server code update (Message 112977)
Posted 16 Jun 2014 by Profile Holmis
Post:
I'll add that my GTX660Ti is running 2 task at a time mixing BRP4G and BRP5 from Albert and BRP5 from Einstein. The Intel HD4000 is running single tasks.

Here's an updated Excel file with data and plots from host 2267 and the following searches: BRP4X64, BRP4G, S6CasA, BRP5 (iGPU) and BRP5 (Nvidia GPU).
10) Message boards : News : Project server code update (Message 112915)
Posted 11 Jun 2014 by Profile Holmis
Post:
The initial GPU guesses seem to rely on Marketing flops figures with some sortof scaling. There is coarse error there because achieving anywhere near rated peak GFlops on a GPU is extremely challenging... i.e. it's a guess, and not a very good one.

Might there be a typo in there that multiplies rater than divides or just a missed sign? It feels like the scaling goes in the opposite direction of what it's supposed to.

I've been following some of the discussion on the Seti-boards about this credit system and the code walk and look forward to the testing of an hopefully more stable and functional system here.
I understand that it's difficult to say but is there a timetable for when we start testing what hopefully is improvements to the system?
11) Message boards : News : Project server code update (Message 112912)
Posted 11 Jun 2014 by Profile Holmis
Post:
Noticed that I've been assigned tasks from 2 "new" applications, BRP5 tasks for both Intel and Nvidia GPU. Non of those has an established APR so got another shot at the initial estimates.

Here's some numbers from my client_state.xml:

Intel GPU:
<flops>581007031069.074340</flops>
<plan_class>BRP5-opencl-intel_gpu</plan_class>

That's 581 GFlops! Boinc reports it @ 147 GFlops peak in the startup messages.

Nvidia GPU:
<flops>12454544406626.100000</flops>
<plan_class>BRP5-cuda32-nv301</plan_class>

That's 12454 GFlops or 12,45 TeraFlops! Boinc reports it @ 2985 GFlops peak in the startup messages. And the APR for the BRP4G Nvidia tasks is 58.1 GFlops when running 2 at a time.
If the BRP5 app gets the same APR then the initial speed estimate is that the card is 214 times as fast as it actually is!!!

Question:
How come the system estimates both resources to be much faster than what Boinc reports as their peak speed? Where's the logic in that?

All downloaded BRP5 task comes with <rsc_fpops_est>450000000000000.000000</rsc_fpops_est> or 450000 GFpops.

Crunching the numbers gives time estimates for the Intel GPU app @ 774,5 seconds or 12m54s. The 1st task has been running for 12m55s and reached 1,8% done...
For the tasks assigned to the Nvidia card the estimate is 36 seconds. First 2 tasks has been running for 1h8m and reached about 30% done...

I've resorted to add a few zeros to the <rsc_fpops_bound> to prevent Boinc from aborting the tasks with "maximum time limit exceeded".
12) Message boards : News : Project server code update (Message 112904)
Posted 10 Jun 2014 by Profile Holmis
Post:
Following Richard's example I've put together plots of the credit awarded to host 2267 since the server upgrade.

Plot of credit for BRP4X64, ARP=4.13

Plot of credit for BRP4G, ARP=58.02

Plot of credit for S6CasA, ARP=3.14

And finally if anyones interested here's the Excel document with both the data and plots.

To summaries:
BRP4X64 is all over the place but "always" lower than the fix credit before the upgrade.
BRP4G took a nose dive and is slowly recovering, at least it appears to be going in the right direction.
S6CasA only has 9 validated tasks so cant really tell but seems to be like BRP4X64.
13) Message boards : News : Project server code update (Message 112900)
Posted 6 Jun 2014 by Profile Holmis
Post:
So it would perhaps be a good idea - most helpful - to fire through some extra CasA/GW tasks, so the baseline for those catches up after the slow start. But we're just going into a long (3-day) weekend in Germany, so there's no rush. Just keep taking the tablets as usual, and see how dirty the laundry gets.

Roger that, will run some extra CasA tasks and then let the server decide.

Can anybody beat Zombie for variability?

Well, the server seems to think I've had to much and are now issuing credit between 88.84 - 127.05 per BRP4G task. Wish I could get 10,000+ for a task, would be good for my RAC! =)
14) Message boards : News : Project server code update (Message 112898)
Posted 6 Jun 2014 by Profile Holmis
Post:
We're still generating the baseline - as you noticed, it took a few attempts to disable the previous fixed credits: now we can see and quantify the scale of the problem. There was another glitch with the CasA (GW) tasks this morning, so they still haven't properly started.

But rest assured, there are people editing away in the background even as I type.

So a few of questions about this test of the credit system:

Do we mere mortals need to do anything special or do we just run task and let the wizards take care of things in the background?

Is there something I or any other regular user can do to help and/or speed things up?

Should I/we focus on a special search or run them all?
15) Message boards : Problems and Bug Reports : BRP application v 1.33 feedback thread (Message 112889)
Posted 5 Jun 2014 by Profile Holmis
Post:
To follow up on my last post my host has now accumulated over 10 valid BRP4G tasks so the server side estimates have kicked in.

Freshly downloaded BRP4G tasks has an estimated time to completion @ 1h22m12s and the observed completion time is within a few minutes of that when running 2 tasks at a time. So this seems to be working as it should.

Digging a bit deeper the "Average processing rate" is @ 56.766 according to the application details page for host 2267. I didn't take note of the average PFC in the server logs but I believe it was around 3000.
So the server thinks the GPU is about 50 times faster than it actually is?
If one has to guess the speed/power of some component is it not better to assume it's slower than it actually is?
16) Message boards : News : Project server code update (Message 112888)
Posted 5 Jun 2014 by Profile Holmis
Post:
I enabled another debug flag (debug_array) to possibly get a grip on the app selection issue.

This means that the scheduler log excerpts that you see published for your hosts will get even longer. Please don't post these here in all gory detail, these are kept for ~200d on the server for the devs & admins anyway.

BM

I just made a work request for CPU work and was granted 10 S6CasA tasks and one BRP4 task.
In my Einstein@home prefs the BRP4 search is not selected but Beta-apps are.

Unfortunately Boinc contacted the scheduler again before I could check the server log so I missed it, just wanted to point out that there should be 2 logs at around 15:46 today.

This is the first line from the second contact, the first contact that assigned the CPU tasks should have occurred a few minutes before this one.
2014-06-05 15:46:56.9050 [PID=16227] Request: [USER#xxxxx] [HOST#2267] [IP xxx.xxx.xxx.226] client 7.2.42
17) Message boards : News : Project server code update (Message 112885)
Posted 5 Jun 2014 by Profile Holmis
Post:
I tried asking for more tasks to my Nvidia GPU and got the following in Boinc's Event log:

05/06/2014 12:17:53 | Albert@Home | Requesting new tasks for NVIDIA
05/06/2014 12:17:53 | Albert@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
05/06/2014 12:17:53 | Albert@Home | [sched_op] NVIDIA work request: 102560.41 seconds; 0.00 devices
05/06/2014 12:17:53 | Albert@Home | [sched_op] intel_gpu work request: 0.00 seconds; 0.00 devices
05/06/2014 12:17:55 | Albert@Home | Scheduler request completed: got 0 new tasks
05/06/2014 12:17:55 | Albert@Home | [sched_op] Server version 703
05/06/2014 12:17:55 | Albert@Home | Project requested delay of 60 seconds
05/06/2014 12:17:55 | Albert@Home | [sched_op] Deferring communication for 00:01:00
05/06/2014 12:17:55 | Albert@Home | [sched_op] Reason: requested by project

As you can see there was no reason given for why I didn't receive any tasks.
Next step was checking the server contact log and I found this:

2014-06-05 10:17:54.8969 [PID=8307 ]    [version] Checking plan class 'BRP4G-cuda32-nv301'
2014-06-05 10:17:54.8969 [PID=8307 ]    [version] plan_class_spec: parsed project prefs setting 'gpu_util_brp' : true : 0.500000
2014-06-05 10:17:54.8969 [PID=8307 ]    [version] [AV#716] daily quota exceeded

So the reason was that I've already had my fill for the day.
Checking the Application details for my host gives:

Binary Radio Pulsar Search (Arecibo, GPU) 1.33 windows_intelx86 (BRP4G-cuda32-nv301)
Number of tasks completed  13
Max tasks per day	   45
Number of tasks today      54
Consecutive valid tasks    13
Average processing rate    56.59266205016
Average turnaround time    0.29 days

So I'm over the daily quota, but why didn't the scheduler tell me so in the reply to Boinc?
18) Message boards : Problems and Bug Reports : BRP application v 1.33 feedback thread (Message 112872)
Posted 4 Jun 2014 by Profile Holmis
Post:
Recently I got a bunch of BRP v1.33 tasks (BRP4G-cuda32-nv301). BOINC shows me that the estimated time to complete is 00:01:11. However, after 00:23:54 run time (approx. 9.240% completed) BOINC kills the task with the message "Aborting task p2030.20131124.G176.16-01.04.S.b4s0g0.00000_464_0: exceeded elapsed time limit 1432.71 (5600000.00G/3908.68G)" One example of these is result #1454983.

There were some problems this morning with the updated server code, and - as Bernd says in message 112866

Our plan class specs that were (semi-)automatically converted for the new server code were somewhat broken, causing probably all kinds of oddities for GPU tasks.

This is probably one of them...

I just downloaded 25 BRP4G tasks with an estimated completion time of 15 seconds, wish it was true =)
I found this in client_state.xml for each of these tasks:

  <rsc_fpops_est>280000000000000.000000</rsc_fpops_est>
<rsc_fpops_bound>5600000000000000.000000</rsc_fpops_bound>

If I'm understanding this right the tasks will error out with "exceeded elapsed time limit" when the tasks have run for 20x what they where estimated to take.
I've edited the client_state.xml and added a few zeros to the rsc_fpops_bound value and hope that will give the tasks enough time to actually finish.
Let's see what happens when the host asks for work again.
19) Message boards : Problems and Bug Reports : BRP application v 1.33 feedback thread (Message 112841)
Posted 30 May 2014 by Profile Holmis
Post:
Can the project Administrators/Scientists please look into this problem?
Over 24 Hours has passed since I originally posted and the 3 tasks remain unsent to a 3rd wing man for validation.
Not sure if this a SCHEDULER Problem or a lack of available wing man for this type of work:

Workunit# 594225
on 20 May 2014 | 15:06:40 UTC my PC returnd the completed task;
on 23 May 2014 | 9:16:23 UTC my wing man Aborted their task;
on 23 May 2014 | 9:16:28 UTC a 3rd task was generated but 2 Days, 5.75 Hours later it has yet to be sent out to another PC for computation.

Same thing has occurred with different date/times for Workunits 594230 and 594236.

It's not a problem, Einstein/Albert employs a scheduler that will send out tasks to computers that have the right data files, why increase bandwidth utilisation for server and client, when it just has to wait for the right client to come along, and then save on that download, it just may have to wait days or weeks for the right client to come along.

Claggy

Added to that explanation is that the scheduler will not wait forever, there is a maximum time before the tasks get sent to the next host asking for that type of work. I don't know what that time is set to here and now but over on Einstein it used to be set to 7 days/1 week. That might have been changed since I picked up that info, it's been several years...
20) Message boards : Problems and Bug Reports : OpenCL tasks - Low GPU% on 331.58 drivers? (Message 112755)
Posted 29 Nov 2013 by Profile Holmis
Post:
2013-11-29 11:21:39.6275 [PID=11032] [HOST#9649] Sending [RESULT#1212496 LATeah0069U_48.0_500_-4.01e-10_1] (est. dur. 1601.52s (0h26m41s52)) (max time 31151.75s (8h39m11s74))
2013-11-29 11:21:39.6300 [PID=11032] [locality] send_old_work(LATeah0069U_48.0_500_-4.01e-10_1) sent result created 347.4 hours ago [RESULT#1212496]
2013-11-29 11:21:39.6300 [PID=11032] [locality] Note: sent NON-LOCALITY result LATeah0069U_48.0_500_-4.01e-10_1
2013-11-29 11:21:39.6300 [PID=11032] [locality] send_new_file_work(): try to send old work

I think the reason that you get tasks from the Gamma Ray search is that the scheduler is resending lost work, that is work that's been allocated but for some reason is not present on your machine. Task selection via prefs don't apply when resending work so you'll keep getting them until there are no more missing tasks, but you should not get any new ones.

I checked host #9649 (the posted log is from a scheduler contact from this host) and there are no in progress Gamma Ray tasks so that host should not get any more of them unless you opt into that search again.


Next 20



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration