Posts by Gary Roberts

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

1) Message boards : Problems and Bug Reports : [New release] BRP app v1.28 feedback thread (Message 112176) Posted 13 Aug 2012 by Gary Roberts Post: - OpenCL (ATI/AMD) OSX (Lion and later). I have two pretty much identical hosts - 1859 and 1868 that now have this app. They are both running the latest BOINC (7.0.31) and both have AMD 5750 1GB GPUs. The only hardware difference is that the first one has 4GB RAM whilst the second has 8GB. The one with smaller RAM has run times around 7Ksecs whilst the other is taking 18Ksecs???? Both machines run EAH tasks on all 4 CPUs and both are doing S6LV1 tasks in aroung 17-18Ksecs. All figures quoted were for tasks crunched over the weekend when there was no user sitting at the keyboard. These machines are used during office hours on weekdays. When idle, both machines power down the screen quite quickly (maybe 3 mins) so the only thing I can now think of is whether or not the screensaver process is still running when the screen turns off. I'll have to check that next time I have access to those machines. Anyone got any other suggestions? At some point I will release a CPU core from crunching to see what difference that makes and later I may attempt to run two tasks simultaneously - maybe not when you think of the extra heat that might be produced. The aluminium cases are running pretty hot already :-).
2) Message boards : News : New App S6LV1 (Message 111864) Posted 21 Feb 2012 by Gary Roberts Post: I dont't now if a cross validation problem from 1.03 to 1.07 is of interest. My first CPU invalid here: http://albert.phys.uwm.edu/result.php?resultid=115586 If you look at the whole quorum rather than just your result, it doesn't appear to be just a cross version validation problem since it was initially a disagreement between two 1.03 tasks. Eventually, just one of the 1.03 tasks did validate with a 1.07 task.
3) Message boards : Problems and Bug Reports : Running on ATI (Message 111787) Posted 1 Feb 2012 by Gary Roberts Post: It's his http://albert.phys.uwm.edu/show_host_detail.php?hostid=1894 system .... Yep, I'd already browsed quite a few results on that host and discovered the two different devices returning those results. Thanks for the info on exactly what GPUs they were.
4) Message boards : Problems and Bug Reports : Running on ATI (Message 111785) Posted 1 Feb 2012 by Gary Roberts Post: OK, I understand. I was interested in your findings with the integrated GPU. I had a look at some of your completed results. I see two AMD GPUs so I assume the one referred to as 'Beaver Creek' is the integrated one. Looks like it has quite good performance for an integrated GPU. Very interesting for a budget machine.
5) Message boards : Problems and Bug Reports : Running on ATI (Message 111782) Posted 1 Feb 2012 by Gary Roberts Post: pls check your PM Are you trying to keep it a secret? :-). Please post in the thread for everyone's benefit. I'm sure many others would be interested too.
6) Message boards : Problems and Bug Reports : So many failed wu's (Message 111772) Posted 28 Jan 2012 by Gary Roberts Post: Gary, are you shure that everything runs correct here? I can't find where I said that!! :-). In fact, quite the opposite. I said there was obviously a problem with the 1.02 LV1 app and that there now seemed to be an issue with the 1.03 version as well. I also tried to explain the reason for "can't validate" that you saw with your 1.02 LV1 tasks. I wouldn't say everything runs correct here. I would say that things are likely to fall over from time to time - exactly what you should expect from an alpha test project. For example, take a look at this quorum in which your host is participating. Forgetting about the errors, there are already 3 successfully completed tasks which can't seem to agree with each other and so there is now a new task to try to break the deadlock. This is exactly the sort of information the Devs need to work out the bugs. Even though you seem to regard it as a waste, you are performing a valuable service and I'm sure it's necessary and appreciated.
7) Message boards : Problems and Bug Reports : Work Cache Size in BOINC 7.0.X (Message 111771) Posted 28 Jan 2012 by Gary Roberts Post: I've done a bit more experimenting with work cache under 7.0.X. For CPU tasks anyway, the CE value seems to be the low water mark and BOINC will request more CPU work when the work on hand drops below CE. CA itself is not the high water mark as I previously suggested. The high water mark (as it was previously) is CE + CA. So for a CPU work cache to be kept at around or just above 1 day at all times, you need CE=1.0 and CA=0.1 say. This seems to prevent a big influx of tasks when a CPU work request is made. So far, those settings seem to work OK with the OpenCL app too.
8) Message boards : Problems and Bug Reports : Experiences running atiOpenCL app on OS X Lion (10.7.2) (Message 111760) Posted 27 Jan 2012 by Gary Roberts Post: Thanks. I found another suitable iMac, late 2010, an i5-760 with a HD 5750 1024MB GPU, just like the previous one. BOINC 7.0.12 is now installed and the first atiOpenCLLion task is underway. If this one makes it through the night I'll put 7.0.12 on the very first iMac I tried and see if that makes any difference to the reported GPU RAM. Maybe I can get that Collatz task underway after all. It's still sitting there suspended.
9) Message boards : Problems and Bug Reports : Wrong estimates of "Remaining" time (Message 111759) Posted 27 Jan 2012 by Gary Roberts Post: And what would you do if parts of the calculation go slowly and other parts speed right up? You can't just assume that there's always a constant rate of progress. Also what do you do if the normal use of the machine is quite variable? The science app gets out of the way when other heavy CPU jobs are running. A short test measurement of performance could be very heavily skewed - one way or the other. BOINC makes continuous adjustments over the duration of the task which is probably the safest way to do things. If you are certain there is a better way to handle this, you should talk to the BOINC Devs. It's beyond the scope of what individual projects want to deal with.
10) Message boards : Problems and Bug Reports : Work Cache Size in BOINC 7.0.X (Message 111755) Posted 27 Jan 2012 by Gary Roberts Post: Long time BOINC users will be familiar with the two settings for cache size - Connect every x.xx days (CE) and the additional days setting (AD). Previously, the recommendation has always been to keep CE close to reality, ie very low or zero if you are always connected, and to just use AD to control the cache size. With the new BOINCs, things have changed drastically. The two values should be more or less regarded as low and high water marks. So if you have CE close to zero, your cache will almost completely drain before refilling and if you have AD quite large, you will get a lot of work in one big hit when you refill. Since this is a test project and apps may change suddenly without prior notice and work could be canceled without prior notice, it could be quite a waste if one of these events occurred just after you had done a big top-up. So, common sense says that (if at all possible) you should keep the two values close to each other and at reasonably low values. If you wanted to keep a 1 day cache (and you used to have 0.0 + 1.0 for the two settings) , you should use something like 0.9 and 1.0 so that the cache would be kept topped up all the time without draining all the way to zero. Here is an example of a work cache with rather undesirable settings. A big top up has recently occurred and if you go back though the history, you will see these occurring at intervals of about 6 days. In between, the cache drains to zero. This appears to be something like a 0+6 cache which I think is undesirable for this test project.
11) Message boards : Problems and Bug Reports : Experiences running atiOpenCL app on OS X Lion (10.7.2) (Message 111753) Posted 27 Jan 2012 by Gary Roberts Post: Most of my machines run Linux and none of those have anything but an el cheapo mobo with integrated graphics. I have some windows machines with ATI HD 4850 (512MB) cards but they are 'in production' at Milkyway and my suspicion is that the app here wont run on them anyway (insufficient memory and OpenCL 1.0). However, courtesy of my daughter's business, I do have access to a nice group of iMacs of the late 2009 to mid 2011 variety, with a variety of different GPUs and all running Lion 10.7.2. I tried a 27" iMac, late 2009, core 2 duo, HD 4670 GPU. OS X reports the GPU as having 256MB RAM and BOINC 7.0.11 says it is OpenCL 1.0 and 512MB RAM. Because of the memory discrepancy I gave Collatz a spin first (lower memory requirements) but that immediately locks up the entire machine and a power cycle is needed to restart. I then moved on to a 21", early 2011 i5-2400S with a HD 6750M 512MB GPU. BOINC 7.0.11 sees this as OpenCL 1.1 and 512MB so at least it looks hopeful. Once again I attached to Collatz and this time was immediately rewarded with tasks that run and validate. So, time to try Albert. A task runs for a few seconds but then errors out. The stderr output says <message> process exited with code 229 (0xe5, -27) </message> and [18:40:16][1258][ERROR] Error in OpenCL context: [CL_MEM_OBJECT_ALLOCATION_FAILURE] : OpenCL Error : clEnqueueNDRangeKernel failed: memory usage (268435456 bytes) is more than the device can support (201326592 bytes) [18:40:16][1258][ERROR] Error during OpenCL FFT setup (error: -4) [18:40:16][1258][ERROR] Demodulation failed (error: 2021)! along with a raft of warnings about unused variables. OK so not enough free memory to run the app, it would appear. As I said, the machine is in a work environment so not really suited to running these tests. However I was running it on a public holiday and there was nothing else running on that machine and it had just been rebooted after the install of 7.0.11. Does anyone know how to free up extra memory on an OS X machine without making it difficult for the real user to do what they need to do when it isn't a public holiday? :-). So, onto a third possible candidate. This machine is late 2010, an i5-760 with a HD 5750 1024MB GPU. BOINC 7.0.11 agrees and also says OpenCL 1.1. Collatz runs fine as expected and this time so does the Albert app. The first task done has even validated against one done by the CUDA app, so I'm quite happy about that. It's now not a public holiday any more and the machine is in use and I haven't had any complaints so far (there are 4 FGRP CPU tasks and an atiOpenCLLion task running). All the iMacs in this office are running the standard project anyway so the users are accustomed to seeing the BOINC icon in the dock. They seem to be able to do their work just fine.
12) Message boards : Problems and Bug Reports : Wrong estimates of "Remaining" time (Message 111751) Posted 26 Jan 2012 by Gary Roberts Post: ... "remaining" time is near 83 hours - while it should be equal to some value near 9 hours. Sure, but I think you'll find that's just the way BOINC works when the original estimate is significantly wrong. The remaining time should be decreasing relatively quickly - eg by the time you get to 20% completed (~1 extra hour), maybe the value will be 60 or 50 or 40 hours but it certainly wont be 8 hours. It should progressively get closer to reality but it won't become very close until much nearer to the finish. If the remaining time is continuing to stay near 80 hours then maybe you really do have something to worry about :-).
13) Message boards : Problems and Bug Reports : WUs cancelled? (Message 111750) Posted 26 Jan 2012 by Gary Roberts Post: OK, but they might at least set the quorum at 1 .... Think about what that would do. Are we really trying to see just that a task can finish or do we want to be sure that all versions of the app on a range of platforms can get compatible answers? Once the app runs reliably, increasing the quorum significantly would probably speed up the process by identifying incompatible answers more quickly. Take a look at this quorum. There were three successful completions but one was deemed to be wrong. And notice the three different platforms. More work for the Devs :-).
14) Message boards : Problems and Bug Reports : So many failed wu's (Message 111740) Posted 24 Jan 2012 by Gary Roberts Post: http://albert.phys.uwm.edu/results.php?userid=43613&offset=0&show_names=0&state=4&appid= You can't use a link to your results like this because only you have permission to access your own account. You need to select the host showing the problem and then link to the results for that host - like this. Note that all of these results are for the S6LV1 app currently being tested. There is an active thread in the news section for this app and if you read there you can see others reporting similar problems. You should join the discussion there. I have a lot of wu's that show 'validation error' or 'can't validate'. It's not on one machine only, so there seems to be a problem with the app. In the above link for your host (id=1767), you can see the bottom set of results were for the previous app v1.02. I guess when the problem with that app was identified, the outstanding tasks for that version were all pulled, causing the "can't validate" message. There is obviously a validation problem showing up with the new version so you should expect further announcements from Bernd - probably in the other thread. If this helps the project and the app can be improved I will continue, otherwise it's a waste af time and power. Of course it helps the project by having a small, select group of volunteers discover the problems before inflicting them on the whole community. That latter scenario would really be a waste of time and power!! It also helps the time strapped Devs if the volunteers can keep up with the active thread. Welcome to the joys of alpha testing :-). We were all fully warned :-). By participating this way, you are performing a valuable (and much appreciated) service. I hope you will continue helping. As problems continue to get solved, new app versions may be released at any time. If that happens and you are paying attention, you can limit the waste on your machines by simply aborting all remaining tasks for the deprecated version and replacing them with 'new version' tasks. You can also limit the waste by keeping your cache of work on hand quite small. Thanks for your continuing support.

This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration