WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Posts by pragmatic prancing periodic problem child, left

1) Message boards : News : Main database corrupted (Message 112287)
Posted 30 Nov 2012 by Profile pragmatic prancing periodic problem child, left
Post:
Uploading has nothing to do with a database. Reporting does.
Uploading is merely moving data from your hard drive to a hard drive on a server at the project. But that server does need to be responsive and available, something it may not be, seeing how people get HTTP Error 500s in this thread.
2) Message boards : Problems and Bug Reports : [New release] BRP app v1.22 feedback thread (Message 111933)
Posted 18 Mar 2012 by Profile pragmatic prancing periodic problem child, left
Post:
WUID 47277, run time: 29,286.40 seconds.
WUID 46805, run time: 39,079.57 seconds.
WUID 46559, run time: 4,538.00 seconds.

47277 has this:
[00:38:31][368][INFO ] Checkpoint committed!
Activated exception handling...
[02:14:57]

46805 has this:
[04:51:58][3600][INFO ] Checkpoint committed!
Activated exception handling...
[21:55:34]

And from there on in, they slow down. 46559 ran from start to finish without exception handling (aka a break), and as such it ran in 'normal' time.

Now, the troubling thing is that it doesn't do this with all tasks. WUID 47791 has a run time of 6,306.80 seconds, yet it also has this:
[00:47:25][4336][INFO ] Checkpoint committed!
Activated exception handling...
[00:48:18]

That was a BOINC exit & restart. The other two were stops of the task itself while BOINC continued running.
3) Message boards : Problems and Bug Reports : [New release] BRP app v1.22 feedback thread (Message 111926)
Posted 13 Mar 2012 by Profile pragmatic prancing periodic problem child, left
Post:
And again...
Normal average run time of OpenCl tasks on my ATI HD6850 is around 6200 seconds. When not interrupted.

When interrupted (due to exit BOINC, suspend BOINC or suspend task (exclusive_app or switch between applications)), task run time length increases to 31,000 - 36,000 seconds (!!). (task list)

4) Message boards : Problems and Bug Reports : [New release] BRP app v1.22 feedback thread (Message 111925)
Posted 13 Mar 2012 by Profile pragmatic prancing periodic problem child, left
Post:
I don't know if this affects the OpenCL in any way, but the Catalysts 12.2 do cause Anti Aliasing problems in some games. I noticed it after upgrading to these drivers, that all fine mist like graphics in Skyrim would become lots of square pixels. This can only be fixed by disabling AA and enabling FSAA instead.

ATI says it's a game problem, not their drivers, but heck if something works before and doesn't after changing the drivers, then how can that be the game's problem when that one hasn't changed literally a bit?
5) Message boards : Problems and Bug Reports : [New release] BRP app v1.22 feedback thread (Message 111907)
Posted 5 Mar 2012 by Profile pragmatic prancing periodic problem child, left
Post:
Hi Jord,

So, pretty please, can the fpops estimate be adjusted enough that they don't come in thinking to take 200+ hours?


I'll forward this to Bernd but he's pretty overwhelmed with more important topics right now and the BOINC devs are of little help analyzing this right now. Please bear with us.

It may be quite easy.

I changed <rsc_fpops_est>300000000000000.000000</rsc_fpops_est> to <rsc_fpops_est>30000000000000.000000</rsc_fpops_est> (one zero less) and restarted BOINC. Estimated time on a new task is now 15 hours, which is more in line than the original 208 hours.
6) Message boards : Problems and Bug Reports : [OpenCL] app v1.20/v1.21 feedback thread (Message 111899)
Posted 3 Mar 2012 by Profile pragmatic prancing periodic problem child, left
Post:
About run times.
One of my tasks finished on my ATI HD6850 2GB versus an Nvidia GTX570:
http://albert.phys.uwm.edu/workunit.php?wuid=40741

I'd say, continue sending work like this to CUDA only. What's the use, really, when OpenCL is so poorly slow? Or it's OpenCL on my GPU that's so poorly slow, seeing how the next task was a clincher to see who validated on an all OpenCL show. All I know is that that task restarted multiple times, and these OpenCLs don't like to be restarted.

Anyway, I've for now run my cache dry, in anticipation of my new motherboard and CPU. If all is well, I'll be changing later today, from my present Asrock H55DE3 with an i3-530, going to an Asrock Extreme3 Gen3 with an i5-2500K (Although I wished it could've been an i7-2600K).

Wonder what that does for Windows 7... ;-)
7) Message boards : Problems and Bug Reports : [New release] BRP app v1.22 feedback thread (Message 111897)
Posted 3 Mar 2012 by Profile pragmatic prancing periodic problem child, left
Post:
Tasks still come in expecting to run for 205 hours.
So I still have Albert tasks running in a panic.

03/03/2012 02:47:11 | Albert@Home | [rr_sim] Result p2030.20111110.G39.19-00.79.N.b3s0g0.00100_864_3 projected to miss deadline.
03/03/2012 02:47:11 | Albert@Home | [rr_sim] Project has 1 projected ATI deadline misses
03/03/2012 02:47:31 | Albert@Home | [rr_sim] p2030.20111110.G39.19-00.79.N.b3s0g0.00100_864_3 misses deadline by 614511.77

<time_stats>
    <on_frac>0.939516</on_frac>
    <connected_frac>0.783900</connected_frac>
    <active_frac>0.392607</active_frac>
    <gpu_active_frac>0.392447</gpu_active_frac>
    <last_update>1330725382.604116</last_update>
</time_stats>


Of course, it's because BOINC thinks that the 205 hours it's estimated to go do is really 205h / (39 / 100) = 525h (or almost 22 days). A tad difficult to do in 14 days. So it'll run from start to finish in high priority. And as we can see in here, DCF is no longer really used with Boinc 7. Not that it matters, DCF is 7.5, way too high to use reliably.

So, pretty please, can the fpops estimate be adjusted enough that they don't come in thinking to take 200+ hours?
8) Message boards : Problems and Bug Reports : [New release] BRP app v1.22 feedback thread (Message 111896)
Posted 2 Mar 2012 by Profile pragmatic prancing periodic problem child, left
Post:
No it doesn't. No GPUs are being used on T4T or the Vboxwrapper test project (the only two projects at this time where VBox is being used), other than for showing graphics of sorts. And then these projects require Vbox 4.1.4 or higher, as far as I know.

I'll go with driver corruption as well. It certainly never hurts to completely clean out previous drivers and then reinstall any later as new.
9) Message boards : Problems and Bug Reports : [New release] BRP app v1.22 feedback thread (Message 111883)
Posted 28 Feb 2012 by Profile pragmatic prancing periodic problem child, left
Post:
Task comes in with <rsc_fpops_est>300000000000000.000000</rsc_fpops_est> which tells BOINC the task is going to take 210 hours and a bit, so BOINC will run it for a long time in panic mode. Can we please get a reasonable fpops estimate, one that doesn't immediately throw Albert tasks in High Priority?
10) Message boards : Problems and Bug Reports : On the restart (Message 111856)
Posted 16 Feb 2012 by Profile pragmatic prancing periodic problem child, left
Post:
I see a lot of things like these: http://albert.phys.uwm.edu/workunit.php?wuid=38376, where the task starts and doesn't finish in the same run, but is either suspended or exited and later restarted. When it restarts, it runs for ages (33,431s).

When it starts and runs continuously, it's something like http://albert.phys.uwm.edu/workunit.php?wuid=37540 (7,184s). So why will a task that was suspended or exited take up to 4.5 times as long as normal to finish when later continued? This happened with v1.19 as well.
11) Message boards : Problems and Bug Reports : Error -5 and 2021 on all GPU workunits (Message 111834)
Posted 7 Feb 2012 by Profile pragmatic prancing periodic problem child, left
Post:
Are there any other reasons that this message could come about?

Try a reboot of the system. It's possible something else is stuck in the GPU's memory. Only a full power recycle can fix that.

BOINC only checks on start-up how much memory there is available, it can't do that at any time afterwards. I'm not sure if the science app can do it either. Oliver?

You do seem to be having the drivers bug, where BOINC shows half the memory of what's available. Apparently you have a 1024MB GPU, but it shows only 512MB with 992MB available. That's a bug in the drivers. Nothing we can do to fix that, that's something ATI should fix.
12) Message boards : Problems and Bug Reports : Error -5 and 2021 on all GPU workunits (Message 111819)
Posted 6 Feb 2012 by Profile pragmatic prancing periodic problem child, left
Post:
AMD ATI Radeon HD 4700/4800 (RV740/RV770) (512MB) driver: 1.4.1664

The ATIOpenCL application needs at least 490MB memory free on the videocard. BOINC will at startup state how much memory the card has and how much it detects is free. If this value is under 490MB, tasks will err as the FFT setup cannot continue. (source post by Oliver).
13) Message boards : Problems and Bug Reports : [OpenCL] app v1.20/v1.21 feedback thread (Message 111798)
Posted 2 Feb 2012 by Profile pragmatic prancing periodic problem child, left
Post:
Great, initial estimated time to completion, 286 hours. No wonder it went into HP. But that DCF of 11 is way out of whack.
i3-530, Win 7 - 64bit, 8GB RAM, ATI HD6850 2GB, Catalysts 11.12, BOINC 7.0.12
14) Message boards : Problems and Bug Reports : Running on ATI (Message 111784)
Posted 1 Feb 2012 by Profile pragmatic prancing periodic problem child, left
Post:
It's his http://albert.phys.uwm.edu/show_host_detail.php?hostid=1894 system, which has an AMD 5830 in it and the APU (Accelerated Processing Units) called Beavercreek, which makes it a Radeon HD 6550D GPU (with 400 GPU cores), which is comparable to a HD5550.
15) Message boards : Problems and Bug Reports : Running on ATI (Message 111774)
Posted 28 Jan 2012 by Profile pragmatic prancing periodic problem child, left
Post:
OK, I ended up aborting all my work in cache, after I found a couple of things.

a) For some unknown reason running an Albert task on my ATI GPU would make my internet connection stop working, leaving me with loads upon loads of page loading faults, even between same forum pages. Disable the Albert task and the internet would fly.

b) Checking in GPU-Z I found that all these tasks would start with some GPU load before it tapering off to no load at all, or only a small bump every 5 minutes or so. I've seen this happen before with Albert GPU tasks. A reboot of the system didn't help in this matter.

I'll test again after I found some I/O recording and debugging programs, to see if I can show you what I see.
16) Message boards : Problems and Bug Reports : Running on ATI (Message 111770)
Posted 28 Jan 2012 by Profile pragmatic prancing periodic problem child, left
Post:
Flagging a weird one.
I caught this one running at zero GPU load for more than an hour, before I aborted it. http://albert.phys.uwm.edu/result.php?resultid=107592

It had been running for well over 4 hours and had an estimated time of another 4 hours, was at 41% done. Its progress was increasing. But still, I didn't trust it.
17) Message boards : Problems and Bug Reports : Wrong estimates of "Remaining" time (Message 111767)
Posted 27 Jan 2012 by Profile pragmatic prancing periodic problem child, left
Post:
Using the anonymous platform is a bit too steep a thing to expect from a new user. It's even a steep angle expected from a normal user. And even advanced users don't get everything right the first time around.

Put <flops>209876543210</flops> into the file and then adjust it if it turns out to be very wrong.

Right, so you just add an imaginary number and go adjust that in case it doesn't work. Where's the science in that? it also makes all of BOINC a hands-on experience, with you having to exit BOINC tingle with some files and restart BOINC for every task, or else the TDCF will be out of whack again. Which it will be anyway, when a new application is released and the resource flops estimate is not completely correct.

But all that aside, it would be nice if there was an easy way to calculate the flops value of every piece of hardware you have in your computer. Because it's not just the flops value for the GPU that you need here, but also that for the CPU.

There's 11 different GPUs used on this project alone, that means 11 different flop values for those GPUs alone. Added to that there are a lot more different CPUs on this project. Are you starting to see the problem?

And that's outside the people who have e.g. 8 different GPUs in their system plus a 12 core CPU with HT. I'd love you see help them all to the correct values... forget sleep, forget food, forget family, forget friends, forget TV, you're needed here, 24/7/365.
18) Message boards : Problems and Bug Reports : Wrong estimates of "Remaining" time (Message 111761)
Posted 27 Jan 2012 by Profile pragmatic prancing periodic problem child, left
Post:
Addendum to what Gary said, it's even more difficult to calculate the progress bar.

- Some projects run work in 15% segments, quickly going to 90% before sitting there literally for hours seemingly doing nothing. (most projects using Autodock have this problem)
- Some projects run in 2% increments. The Gamma Ray application at Einstein will do this.
- Some projects will run to 100% and over it for several minutes. Einstein's Gravitational Wave S6 app will do this continuously.
- And then there's at least one project that runs to 100%, resets to zero and starts again. Enigma.
- Nothing said about the various wrapper apps with their own weird things out there.
- That topped off with invariable run times just amongst applications. Seti runs high angle, low angle and normal angle ranges, all that work runs at varying lengths of time. The devs there can't see when they split the work what is what, so it's astronomically impossible to give them correct flop numbers.

As you can see, if it were so simple as you state, someone would've used it already. :-)
19) Message boards : Problems and Bug Reports : Experiences running atiOpenCL app on OS X Lion (10.7.2) (Message 111757)
Posted 27 Jan 2012 by Profile pragmatic prancing periodic problem child, left
Post:
BOINC 7.0.12 is out, with lots of OpenCL bug-fixes. Links available in the change log thread.
20) Message boards : Problems and Bug Reports : Work Cache Size in BOINC 7.0.X (Message 111756)
Posted 27 Jan 2012 by Profile pragmatic prancing periodic problem child, left
Post:
David (Anderson) will look at GPU work fetch problems in the BOINC 7 clients, since these seem to ignore a lot of factors. I managed to get 35 OpenCL tasks in on a CE of 0.1 + AD of 1. Been running those for 3 days already. ;-)


Next 20



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration