WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Posts by jason_gee

1) Message boards : News : Project server code update (Message 113268)
Posted 16 Jul 2014 by jason_gee
Post:
Edit - or it might have been CPU normalisation kicking in. We have Win32/SSE above threshhold now as well, and Win32/plain will reach it any time now (99 valid at 08:00 UTC)


With last server data dated 16th, looks like a bit of an interesting illustration going on there. Lowest pfc average, with n > 100, is indeed FGRPSSE with a value of ~10.6. Opencl nv seems to be ~144.

Now nv-OpenCL's is expected to be about 2x what it should be due to the mechanism normalising to 10% efficiency instead of the more realistic 5%... so picture the nv one as 'corrected' ~144/2 -> 72 (rough is good enough here)

CPU SSE has an approximate underclaim of 1.5^2 = 2.25x , so we take 10.6*2.25-> 23.85 'corrected' for the CPU case (again rough is better than uncorrected inputs)

So now we know the relative efficiencies of the implementations, a much tighter ~3x spread than the original uncorrected (noisy) numbers suggest. Right now credit is awarded based on the minimum pfc app, so about a third of what the GPU one would be 'asking'.

Intuitive eyeballs say the GPU population is going to be larger, by sheer throughput. The 'right' credit is in between the corrected CPU and GPU figures, weighted a fair bit to the GPU case. There's tools for determining that too, better than averages.

Net effect of the simplified/corrected/improper-assumption-removed mechanism would be an even higher quality (more trustworthy) number in between the CPU & GPU case, with a weighting bonus encouraging optimisation, and inherently rejecting likely fraudulent claims (another possible source of noise disturbances). So likely in the region of ~2x what win CPU SSE only validations would award now.

I'm surprised how well that correlates with the seti@home astropulse case, and it points the bone directly at the seti@home multibeam case for AVX enabled app underclaims with no peak flops correction.

Wow, we nailed this to the wall good and proper.
2) Message boards : News : Project server code update (Message 113261)
Posted 15 Jul 2014 by jason_gee
Post:
1) The Arecibo GPU apps seem to have settled down. Just a few validations trickling in from the hosts I've been monitoring, and all (except Claggy's laptop) seem to be +/- 2,000 credits - about double what Bernd thought the tasks were worth before we started.
...
I've also plotted the same hosts' results for BRP5 (Perseus Arm). The logarithmic plot looks similar to the lower half of the 'trumpet' graph that emerged from Arecibo. Remember that we saw ridiculously low numbers to start with: we still haven't reached Bernd's previous assessment of value.
...


Having quite a bit more understanding of the nature of the beast now, the major challenges making predictions with the current mechanism implementation are twofold.
First, in the GPU only sense, we see a discrepancy between the chosen normalisation (for credit purposes) efficiency point of 10%, and the 'actual' efficiency of somewhere in the region of ~5% for single task per GPU operation. This amounts to an effective increase of the former application's award.

Second, and a little more insidious, understanding the limitations of average based numerical control with respect to noisy populations, quickly reveals that uncertainty in any specific numbers, as partly reflected in the standard deviations, guarantees many of the numbers intended for comparison of hosts, applications, credits, and cheat detection/prevention, are arbitrary relative to the user and project expectations for the usefulness & meaning of those numbers.

Tools (algorithms etc) exist to improve these situations, namely those of making useful estimations, handling various kinds of 'noise' such as host change, real measurement error and an unlimited range of usage variation conditions, to or beyond end-user expectation.

Refining these mechanisms, using such design tools, ultimately will reduce the development and maintenance overhead constantly dogging the Boinc codebase, while simultaneously making the system more resilient/adaptive to future change. There is also the angle that high quality available numbers can potentially be more useful in global scientific contexts, than just for Credit/RAC & individual needs, having applications in fields such as distributed computing, computer sciences, and engineering fields, probably among more.

@All:
In those lights, I'd like to thank everyone here for helping out. I'm progressing to a detailed simulation and design phase, that will take some time to get right. Please keep collecting, observing, commenting etc, and we're on the right road.

Jason
3) Message boards : News : Project server code update (Message 113247)
Posted 7 Jul 2014 by jason_gee
Post:
Yeah, cross check of walkthroughs should help. Big problem is at least 16 possible general starting states, multiplied across wingmen for many combinations, I'm going to resist the temptation to model all 256 base combinations, and think in terms of reducing those # of states... for example correct the system in places so that CPU & GPU become considered the same much earlier in the sequence, remove the need for onramps, and perhaps even consider if stock & anon are really different enough to warrant completely separate codepaths as they have in places.
4) Message boards : News : Project server code update (Message 113237)
Posted 6 Jul 2014 by jason_gee
Post:
... double post
5) Message boards : News : Project server code update (Message 113236)
Posted 6 Jul 2014 by jason_gee
Post:
WU 618702 looks perkier - v1.39/v1.40 cross-validation.


That's certainly more like the credits I expected from the models. I suspect that the cross app normalisation / averaging business may be quite valid/needed for credit purposes. It just royally screws with the time estimates before a new host/app version engages host scaling (which we've been calling onramp periods)

Rectifying that will probably need all our walkthrough efforts compared in detail to fill any knowledge gaps, but basically seeing something resembling expected behaviour is a good start. Having no incorrectly scaled CPU app to contend with in the mix means the credit part should be around the right region, even if quite noisy & prone to destabilisation.
6) Message boards : News : Project server code update (Message 113234)
Posted 6 Jul 2014 by jason_gee
Post:
The server seems to have accepted that the 'conservative' values for BRP5 v1.40 were correct after all:

[AV#934] (BRP5-cuda32-nv301) adjusting projected flops based on PFC avg: 19.76G

According to WU 619924, the figures for v1.39 were rather different.


Yeah I see it with 3 app_versions in the same app id, so it'll do its wacky averaging thing [aka 'normalisation', but not], to create a min_avg_pfc.

[Edit:]
Ugh, a lot more than 3, make that ~22 . Since a number of those older ones are well beyond their 100 samples, this will have ramifications for the codewalking, because nvers thresholds for scaling will be engaged.
7) Message boards : News : Project server code update (Message 113223)
Posted 3 Jul 2014 by jason_gee
Post:
Yeah, looks a lot like the sortof discrepancies I see in simulations.

Will definitely be worth putting a 1.4 app onramp into the spreadsheets, to see how well the models reflect reality
8) Message boards : News : Project server code update (Message 113209)
Posted 2 Jul 2014 by jason_gee
Post:
I wonder why Claggy's laptop gets such variable credit?


Multiple tasks on smaller GPU, each running longer, will generate higher raw peak flop claims (pfc's) then that's averaged with the wingman's (Yellow triangle on dodgy diagram). So result can be anywhere from normal range to jackpot, as we previously assessed, depending on the wingman's claim. Though the prevalence of the jackpot conditions is less obvious, the noise in the system is still there.

I'm just running a single GPU task on both my GPU hosts, (the T8100's 128Mb 8400M GS doesn't count).

Claggy

Could be the wingmen. (There's a number of combinations of wingmen types that'll give random results between two regions. Two similar wingmen tend to cancel with averaging and become 'normal')

Conversely, when he's paired with me - now back to lower, stable, runtimes - no jackpot, no bonus. Sorry 'bout that.


LoL, yep, throwing the dice to get an answer is as good as any ;)
9) Message boards : News : Project server code update (Message 113206)
Posted 2 Jul 2014 by jason_gee
Post:
I wonder why Claggy's laptop gets such variable credit?


Multiple tasks on smaller GPU, each running longer, will generate higher raw peak flop claims (pfc's) then that's averaged with the wingman's (Yellow triangle on dodgy diagram). So result can be anywhere from normal range to jackpot, as we previously assessed, depending on the wingman's claim. Though the prevalence of the jackpot conditions is less obvious, the noise in the system is still there.

I'm just running a single GPU task on both my GPU hosts, (the T8100's 128Mb 8400M GS doesn't count).

Claggy


Could be the wingmen. (There's a number of combinations of wingmen types that'll give random results between two regions. Two similar wingmen tend to cancel with averaging and become 'normal')
10) Message boards : News : Project server code update (Message 113198)
Posted 1 Jul 2014 by jason_gee
Post:
I wonder why Claggy's laptop gets such variable credit?


Multiple tasks on smaller GPU, each running longer, will generate higher raw peak flop claims (pfc's) then that's averaged with the wingman's (Yellow triangle on dodgy diagram). So result can be anywhere from normal range to jackpot, as we previously assessed, depending on the wingman's claim. Though the prevalence of the jackpot conditions is less obvious, the noise in the system is still there.
11) Message boards : News : Project server code update (Message 113191)
Posted 30 Jun 2014 by jason_gee
Post:
Um, if you don't mind, I think it might be best to wait a little time. The administrators on this project are based in Europe, and as you know Jason is ahead of our time-zone, in Australia. I think it might be better to wait 12 hours or so, until we have a chance to compare notes by email when the lab opens in the morning.

After all, we don't want to use up our entire supply of unattached new hosts in one hit, or else we won't have anything left to test Jason's patches with....


Yes, unhooking that normalisation ( which divides by ~0.1, multiplies the GPU GFlops x~10 into absurd levels, and shrinks time estimates) is going to take quite some preparation to unhook *safely*. That same mechanism is hooked into credit (where it does make sense), so quite a lot of backwards & forwards for clarification, discussion and debate will be needed to get it 'right', and part of that's going to be me communicating effectively (which isn't always easy :)).

The other aspect is that some bandaids will be painful to rip off, and still other odd artefacts might be hiding inside... and only way to tell for sure is open it up.

The next few days will tell if we're all on the same page (but looking from different angles is fine). To me though, we are well through the tricky bits of understanding the current system enough to say it needs to be a lot better.
12) Message boards : News : Project server code update (Message 113186)
Posted 29 Jun 2014 by jason_gee
Post:
At least one of those must be upside down.


In a sense yes. GPU app+device+conditions efficiency would be actual/peak, and must be less than 1 (and it is, e.g. it should be around 0.05 for single task Cuda GPU). Normalisation could be viewed as turning it upside down. It'll raise the GFlops & shrink the time estimate artificially --> the exact opposite of the kindof behaviour we want for new hosts/apps.

A bit will become clearer when I have the next dodgy diagram ready. Getting bogged down in broken code is a bit of a red-herring at the moment, as there are design level issues to tackle first.

In particular, debugging the normalisation, including the absurd GFlops numbers it produces, is pointless in the context of estimates. That's because neither the time nor Gflops should be being normalised [AT ALL], so it all get's disabled in estimates, and restricted to credit related uses where it's applicable to get the same credit claims from different apps.

Well, we do (crudely) have two separate cases to deal with.

1) initial attach. We have to get rid of that divide-by-almost-zero, or hosts can't run. They get the absurdly low runtime estimate/bound and error when they exceed it.

2) steady state. In my (political) opinion, trying to bring back client-side DCF will be flogging one dead horse too many. We need some sort of server-side control of runtime estimates, so that client scheduling works and user expectations are met. I'm happy to accept that the new version will be different to the one we have now, and look forward to seeing it.

OK, I'll get out of your hair, and take my coffee downstairs to grab some more stats.


LoL, always appreciate bouncing it around, thanks. At the moment it's a bit like pointing to a bucket of kittens and saying 'that's not the flower-pot I ordered!'. Yeah it's possible to debate over the intent versus function more, but when push comes to shove it's just wrong & gives wacky numbers. Not really any more complicated than that in some sense ;)
13) Message boards : News : Project server code update (Message 113184)
Posted 29 Jun 2014 by jason_gee
Post:
At least one of those must be upside down.


In a sense yes. GPU app+device+conditions efficiency would be actual/peak, and must be less than 1 (and it is, e.g. it should be around 0.05 for single task Cuda GPU). Normalisation could be viewed as turning it upside down. It'll raise the GFlops & shrink the time estimate artificially --> the exact opposite of the kindof behaviour we want for new hosts/apps.

A bit will become clearer when I have the next dodgy diagram ready. Getting bogged down in broken code is a bit of a red-herring at the moment, as there are design level issues to tackle first.

In particular, debugging the normalisation, including the absurd GFlops numbers it produces, is pointless in the context of estimates. That's because neither the time nor Gflops should be being normalised [AT ALL], so it all get's disabled in estimates, and restricted to credit related uses where it's applicable to get the same credit claims from different apps.
14) Message boards : News : Project server code update (Message 113183)
Posted 29 Jun 2014 by jason_gee
Post:
See edit to my last. In my view, if the relevant numbers are all <<1, we should be multiplying by them, not dividing by them.

Out of coffee error - going shopping. Back soon.


The main issue is really that he starts with real marketing flops (more or less usable), works out an average efficiency there (yuck but still OK-ish), but then he normalises to some other app version... IOW multiplies by some arbitrary large number (or divides by some fraction if you prefer) with no connection to real throughputs or efficiencies in this device+app.

That's OK for a relative number for credit (debatable)... but totally useless for time and throughput estimates (which are absolute estimates). Improper normalisation shrunk your estimate multiplying the projected_flops to 10x+ bloated marketing flops.
15) Message boards : News : Project server code update (Message 113181)
Posted 29 Jun 2014 by jason_gee
Post:
app version pfc is normalised to 0.1 (design flaw), and any real samples would have driven it toward 0.05 or lower . so that text should be 10-20x+ marketing flops, and is NOT the intent, nor remotely correct design. It's Gibberish.

The advice given to project administrators in http://boinc.berkeley.edu/trac/wiki/AppPlanSpec is:

<gpu_peak_flops_scale>x</gpu_peak_flops_scale>
scale GPU peak speed by this (default 1).

I'm wondering whether they put in 0.1, expecting this to be a multiplier (real flops are lower than peak flops), but end up dividing by 0.1 instead? And from what you say, 'default 1' doesn't match the code either?


nope [0.1 is hardwired via 'magic number'], and 1 wouldn't be right for GPU anyway. correct would be ~0.05, don't normalise (except for credit), and enable+set a default host_scale of 1 from the start.... which would yield a projected flops (before convergence) of 0.05x1*peak_flops ... basically one 20th of the Marketing flops... then [let it] scale itself..
16) Message boards : News : Project server code update (Message 113179)
Posted 29 Jun 2014 by jason_gee
Post:
there you go. app version pfc average (!) is 3584GFLOPS/34968.78 ~= 0.102**

[Edit:]
** unfortunately, that's improperly normalised, so meaningless without the normalisation reference app version figure, as per red ellipse on diagram... so the true figure will be likely around 0.02 or so, but anybody's guess without saying what app version is at 0.1
17) Message boards : News : Project server code update (Message 113177)
Posted 29 Jun 2014 by jason_gee
Post:
right, that's what I meant by line numbers (with brief description)

Caggy's case:
if (av.pfc.n > MIN_VERSION_SAMPLES) {
hu.projected_flops = hu.peak_flops/av.pfc.get_avg();
if (config.debug_version_select) {
log_messages.printf(MSG_NORMAL,
"[version] [AV#%d] (%s) adjusting projected flops based on PFC avg: %.2fG\n",


is his marketing flops estimate peak_flops / app version pfc's .

app version pfc is normalised to 0.1 (design flaw), and any real samples would have driven it toward 0.05 or lower . so that text should be 10-20x+ marketing flops, and is NOT the intent, nor remotely correct design. It's Gibberish.
18) Message boards : News : Project server code update (Message 113175)
Posted 29 Jun 2014 by jason_gee
Post:
It's a bit of a stretch to examine border cases when the standard setup doesn't even work right. IMO let's start at the common case & work outward, because I guarantee if the numbers come up flaky there, then they aren;t going to be magically better with incompatible server and clients.

For the present (treblehit's example) question, specifically the old Project DCF isn't involved in treblehit's example, on Albert, in any way (even though maintained by the client). It's the improper normalisation with inactive host scale appearing in another form

... however...

since both host_scale and pfc_scales are, somewhat noisy and unstable, 'per app DCFs' in disguise, and improperly normalised, it amounts to familiar sets of wacky number symptoms. If you keep looking for those you will find them everywhere, because the entire system is dependant on these, and you'd just end up swearing Project DCF is active server side, which in a sense through a lot of spaghetti it is, though it isn't called that, and is per app version and per host app version instead.

i.e. forget Project DCF (for now), use pfc_scale & host_scale.
19) Message boards : News : Project server code update (Message 113173)
Posted 29 Jun 2014 by jason_gee
Post:
Now the server side, that 'Best version of app' striing comes from sched_version.cpp (scheduler inbuilt functions) and uses the following resources:
app->name, bavp->avp->id, bavp->host_usage.projected_flops/1e9

That projected_flops is set during app version selection, as number os samples will be < 10 , flops will be adjusted based on pfc samples average for the app version (there will be 100 of those from other users).

Since that's normalised elsewhere (see red ellipse on dodgy diagram), net effect translates pfc of 0.1 used for the original estimate, to 1, so peak_flops is x10-20


Richard do you want code line numbers for that ?
20) Message boards : News : Project server code update (Message 113172)
Posted 29 Jun 2014 by jason_gee
Post:
I was thinking that they were using Einstein customisations here that might not be needed, looking at robl's Einstein log shows it's the durations that get scaled there:


Yeah, they were before. Quite a lot of work Bernd had to do to get here to stock updated sever code. Now (here), should be pretty close or identical (for our purposes) to current Boinc master IIRC.


Next 20



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration