[New release] BRP app v1.23/1.24 (OpenCL) feedback thread |
| log in |
Message boards : Problems and Bug Reports : [New release] BRP app v1.23/1.24 (OpenCL) feedback thread
| Author | Message |
|---|---|
|
Hi, | |
| ID: 111974 | | |
|
I made an answer in another thread that also migth be in here. | |
| ID: 111979 | | |
|
GPU load is steady at 20-21%, and CPU load literally bounces: 5%,15%,6%,14%,4%,17%, etc. with 17 being the highest I've seen. | |
| ID: 111985 | | |
|
Thanks for the feedback. | |
| ID: 111986 | | |
March? April? Or May of last year? wu 4/29/2012 9:29:59 AM | Albert@Home | Starting task p2030.20110421.G41.06+00.53.N.b6s0g0.00000_3728_0 using einsteinbinary_BRP4 version 123 (atiOpenCL) in slot 0 ____________ | |
| ID: 111987 | | |
|
Also, is there a way to make them thumbnails in my post and when you click them they link to larger images (just to not annoy people with really large images)? | |
| ID: 111988 | | |
|
HD 5670, 1GB RAM, Windows 7 Home, Catalyst version 12.4 | |
| ID: 111989 | | |
|
Hi | |
| ID: 111990 | | |
|
One more thing: while the workunit mentioned above was sent only recently, it was generated already on the 23rd of April, so it is still one of the "tweaked" workunits. Once the newly generated workunits are reached out, we should see a reduced memory usage and some modest performance increase. | |
| ID: 111991 | | |
One more thing: while the workunit mentioned above was sent only recently, it was generated already on the 23rd of April, so it is still one of the "tweaked" workunits. Once the newly generated workunits are reached out, we should see a reduced memory usage and some modest performance increase. Does this mean we are 5 community-days late with processing? If so, I suggest to just stop everything from being sent that does not bring additional insights. Hm, thinking again, you have certainly done that and I was just too quick when I read the announcement. Ah, wait, you expect an impact on the performance also from the tweaking, so you need to have the same new app performed both on tweaked and regular workunits ?!? Steffen ____________ | |
| ID: 111992 | | |
One more thing: while the workunit mentioned above was sent only recently, it was generated already on the 23rd of April, so it is still one of the "tweaked" workunits. Once the newly generated workunits are reached out, we should see a reduced memory usage and some modest performance increase. I was afraid of that. However, I didn't know how to decipher what date p2030.20110421.G41.06+00.53.N.b6s0g0.00000_3728_0 ... Never mind, I just realized that 20110421 means April 21, 2011. ____________ | |
| ID: 111993 | | |
|
Hi | |
| ID: 111994 | | |
|
Hi! | |
| ID: 111996 | | |
|
I'm not sure why, but I've thrown 3 error recently: | |
| ID: 111997 | | |
|
Thanks for the feedback, I think we have seen this particular error also with other apps and it might even be a general BOINC issue...definitley needs some investigation. | |
| ID: 111999 | | |
|
Using this host. | |
| ID: 112001 | | |
I see your host has now a mix of old and new WUs ? I poked through my history and all my wu's have 20110421 in them. I started aborting batches to try and get some new ones, but no dice so far. Unless I am mistaken, the 20110421 is the datestamp for when the data was recorded? Or is that the datestamp from when it was split? I have the day off tomorrow so I will abort/babysit Boinc to try and get some newer ones. ____________ | |
| ID: 112002 | | |
|
p2030.20110421.G41.29-00.40.S.b0s0g0.00000_744_0 using einsteinbinary_BRP4 version 123 (atiOpenCL) | |
| ID: 112003 | | |
I see your host has now a mix of old and new WUs This is not the WU creation date, you can see that one by following the WU link in the results list. It seems that the first "new" WUs were generated around 13:00 UTC on 27th of April already. When looking at your results, you will notice the results will fall into one of two narrow ranges of runtime, where the newer results (newer by WU creation time) run about 20% faster. Cheers HB ____________ | |
| ID: 112004 | | |
|
p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1264_1 | |
| ID: 112006 | | |
|
In my AMD HD 6850 i'm running 2 boinc projects: albert@home and poem@home (3 gpu wu in 1 cpu). When i download an Albert@home gpu wu, the poem wus entered in "suspended" state and albert@home wu doesn't start - aka, no work on gpu. If i reboot boinc client, the poem wu remain suspended, but albert wu starts and runs ok... | |
| ID: 112007 | | |
|
p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1728_0 | |
| ID: 112008 | | |
|
p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1504_1 using einsteinbinary_BRP4 version 123 (atiOpenCL) | |
| ID: 112009 | | |
In my AMD HD 6850 i'm running 2 boinc projects: albert@home and poem@home (3 gpu wu in 1 cpu). When i download an Albert@home gpu wu, the poem wus entered in "suspended" state and albert@home wu doesn't start - aka, no work on gpu. If i reboot boinc client, the poem wu remain suspended, but albert wu starts and runs ok... hmmm....theoretically it is possible that the Albert task *thought* it didn't have enough memory and waited for some to get available, which happened after the reboot...still, this looks suspicious. Thanks for reporting. One question tho: is this reproducible, e.g. after each new WU download from Albert? Cheers HBE ____________ | |
| ID: 112010 | | |
p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1728_0 Strange...this is this one I guess: http://albert.phys.uwm.edu/result.php?resultid=197941 which has finished in abeout the same time as other tasks. Let's see if it validates. But I would expect a lower GPU temperature if the load had really been 0% for a longer time, so actually I suspect that the readout is wrong. The app does have phases (at the beginning of each of the 8 subtasks) when there is exclusively CPU load, but this will last only a couple of seconds, not minutes. THX HBE ____________ | |
| ID: 112011 | | |
|
p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1920_1 using einsteinbinary_BRP4 version 123 (atiOpenCL) | |
| ID: 112013 | | |
|
Digging through some of the stderr outputs I notice the atiOpenCl app is doing an awful lot of checkpointing. Curious to see if the cuda app was the same, I looked into one of my wu's: [06:49:19][3424][INFO ] Starting data processing... [06:49:19][3424][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc. [06:49:19][3424][INFO ] Using OpenCL device "Cayman" by: Advanced Micro Devices, Inc. [06:49:19][3424][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [06:49:19][3424][INFO ] Header contents: ------> Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.40 ... [06:50:25][3424][INFO ] Checkpoint committed! [06:51:30][3424][INFO ] Checkpoint committed! [06:52:35][3424][INFO ] Checkpoint committed! [06:53:41][3424][INFO ] Checkpoint committed! [06:54:46][3424][INFO ] Checkpoint committed! [06:55:52][3424][INFO ] Checkpoint committed! [06:56:58][3424][INFO ] Checkpoint committed! [06:58:03][3424][INFO ] Checkpoint committed! [06:59:08][3424][INFO ] Checkpoint committed! [07:00:15][3424][INFO ] Checkpoint committed! [07:01:20][3424][INFO ] Checkpoint committed! [07:02:25][3424][INFO ] Checkpoint committed! [07:03:30][3424][INFO ] Checkpoint committed! [07:04:36][3424][INFO ] Checkpoint committed! [07:05:41][3424][INFO ] Checkpoint committed! [07:06:47][3424][INFO ] Checkpoint committed! [07:07:53][3424][INFO ] Checkpoint committed! [07:08:58][3424][INFO ] Checkpoint committed! [07:09:25][3424][INFO ] OpenCL shutdown complete! [07:09:25][3424][INFO ] Data processing finished successfully! ... And then repeats the process for: Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.50 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.60 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.70 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.80 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.90 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM127.00 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM127.10 Checkpointing each WAPP file once per minute, 20 times. Comparing to the BRP3cuda32 app (abbreviated): [12:27:01][5004][INFO ] Starting data processing... [12:27:01][5004][INFO ] CUDA global memory status (initial GPU state, including context): ------> Used in total: 218 MB (807 MB free / 1025 MB total) -> Used by this application (assuming a single GPU task): 0 MB [12:27:01][5004][INFO ] Using CUDA device #0 "GeForce GTX 560" (336 CUDA cores / 1105.44 GFLOPS) [12:27:01][5004][INFO ] Version of installed CUDA driver: 4020 [12:27:01][5004][INFO ] Version of CUDA driver API used: 3020 [12:27:01][5004][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [12:27:01][5004][INFO ] Header contents: ------> Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.40 ... [12:27:31][5004][INFO ] Checkpoint committed! [12:28:01][5004][INFO ] Checkpoint committed! [12:28:31][5004][INFO ] Checkpoint committed! [12:29:01][5004][INFO ] Checkpoint committed! [12:29:31][5004][INFO ] Checkpoint committed! [12:30:02][5004][INFO ] Checkpoint committed! [12:30:32][5004][INFO ] Checkpoint committed! [12:31:01][5004][INFO ] Data processing finished successfully! ... which then also repeats for: Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.50 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.60 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.70 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.80 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM126.90 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM127.00 Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM127.10 Checkpointing each WAPP file once per minute, 5 times. So, my questions are: * What is checkpointing? An intermidiate state (variables) save in case calculations get interrupted and you don't have to start over? * Is the aitOpenCl app checkpointing more? Or is it that the two apps are doing the same amount of work (calcs), and it's just that the CUDA app/GTX 560 is doing more work per unit time and therefore only needs to checkpoint 5 vs. my 20 times? * Is the GTX 560/CUDA app really 4x (20/5=4) than the HD6950/AtiOpenCl? The 6950 shows 2253 SP GFLOPS vs. the GTX 560 SP GFLOPS of 1088.6. http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units To semi-answer that, GPU Time indicates a 2.503x increase for the GTX560/CUDA vs. the AtiOpenCl/HD6950. The CPU time for the CUDA app is ,however, 4.24x less than that of the OpenCl app. Anandtech Bench shows the 2500k vs. my AMD 975BE to be slightly better in single-threaded, multi-threaded, and total MIPS (7-Zip test), but nothing earth shattering. http://www.anandtech.com/bench/Product/288?vs=435 I know you said before that the OpenCl app uses way more CPU than the CUDA app. Perhaps the OpenCl standard is still yet immature, AMD has crappy drivers, or a mix of both? Regardless, I really commend everyone's efforts. Having done a fair bit of coding myself, I know what a pain this can all be. ____________ | |
| ID: 112014 | | |
|
p2030.20110421.G41.29-00.40.S.b0s0g0.00000_1928_0 using einsteinbinary_BRP4 version 123 (atiOpenCL) | |
| ID: 112015 | | |
|
p2030.20110421.G41.29-00.40.S.b0s0g0.00000_2504_0 using einsteinbinary_BRP4 version 123 (atiOpenCL) | |
| ID: 112016 | | |
|
For me the new app takes a full CPU core when it is running. Is that by intention? | |
| ID: 112017 | | |
My pc has 8gb DDR3 on Win7 64bit, it's enough? If i continue to download and run A@H wus, the wus take precedence over Poem. After the last A@H wu, Poem restarts correctly and if i download another A@H the situation occurs again... :-( I forget: during the no-gpu-use state, 1 cpu core is in use (like A@h is running) | |
| ID: 112018 | | |
For me the new app takes a full CPU core when it is running. Is that by intention? Hi! This will depend on the driver version you are using, e.g. you can see from the screenshots posted here that this is not the in general the case for this app. Cheers HB ____________ | |
| ID: 112020 | | |
For me the new app takes a full CPU core when it is running. Is that by intention? I was not exact enough in my statement. BOINC reserves a full core. '1 CPUs + 1 ATI GPU' and as I understand it that should not be the case. Here one log snippet: 03.05.2012 22:25:33 | Albert@Home | [rr_sim_detail] 339385.57: starting p2030.20110421.G41.06+00.53.N.b6s0g0.00000_1448_2 (1.00 CPU + 1.00 ATI) Oh, driver version 12.3 as far as I read it there that high CPU usage is solved. ____________ Christoph | |
| ID: 112021 | | |
Exactly
By default, all apps are checkpointing every 60 seconds. All workunits, whether they will get picked up by a CPU, NVIDIA GPU or ATI GPU will do the same amount of work, but the faster the processing is, the fewer checkpoints will happen during the execution time.
Well, the Flops numbers are just theoretical peak performance and not too meaningful. But anyway, it's fair to say that our CUDA app and the libraries we used with it are more mature and optimized than our OpenCL app. We have some ideas in the pipeline how to further improve the OpenCL app and hopefully we can implement them in a timeframe of weeks rather than months, stay tuned.
Indeed :-). I don't think one should blame the OpenCL standard, it's not different from CUDA anyway. It's the implementation of the standard and the drivers that are causing a few troubles. Neither AMD nor NVIDIA seem too enthusiastic about OpenCL anymore, I'm afraid. CU HB ____________ | |
| ID: 112022 | | |
For me the new app takes a full CPU core when it is running. Is that by intention? Yup, the allocation of a full core by BOINC is something that is configured at the server side. This was to prevent the situation where CPUs will get overcommitted for those users with older drivers where indeed a full CPU is taken by the driver...it's a conservative choice. We will look into the question how to handle this when we go live with the app, e.g. we could make the CPU allocation dependent on the driver version as we once did for NVIDIA where a similar driver problem existed under Linux, iirc. Cheers HB ____________ | |
| ID: 112023 | | |
|
Ah, ok. Looks like I missed that detail somehow. So I can stop scratching my head. | |
| ID: 112024 | | |
|
I got one computation error. Propably it has something to do with my reboot yesterday. | |
| ID: 112026 | | |
I got one computation error. Propably it has something to do with my reboot yesterday. Hi! Thanks very much for reporting this, this might actually point to a real problem in the code which affects only some cards that have certain restrictions that the app has to take into account when generating work for the GPU. Stay tuned, I hope I can install a fix tomorrow. HB ____________ | |
| ID: 112027 | | |
|
Two more errors with: | |
| ID: 112030 | | |
|
I´m new here since yesterday afternoon. | |
| ID: 112031 | | |
|
Albert@home runs well on my Linux box, all results are validated. I have no GPU.I got some validation error on Einstein@home, on a Gamma-ray pulsar search unit. | |
| ID: 112032 | | |
|
Feedback from Ubuntu 12.04_amd64 with Catalyst 12.4 /HD6950@6870: | |
| ID: 112035 | | |
During running AaH the desktop was very sticky, most time I had to wait some seconds before any activity could be performed. This was also during the phases of waiting of the AaH task. The desktop was no longer sticky when the AaH project was suspended. This is a very uncomfortable way of operation. ... uncomfortable, but caused by the graphics card interfering with your regular display and is not a defect by albert@home from what I grasp. I observe this with my graphics card on Linux, too. The only way out that I am aware of is to not allow GPU computing while the machine is in use. How much RAM does your card have, btw? I do not observe this behaviour on a 1GB ATI HD 5670 card running albert on Windows, but I do with a HD 5770 512MB card (running prime grid or so because of memory constrains) and this is very much unbearable. Anyone dual booting and observing the issue under Linux but not with Windows? Steffen ____________ | |
| ID: 112036 | | |
|
Hi, | |
| ID: 112037 | | |
|
Hallo Steffen! ... but caused by the graphics card interfering with your regular display and is not a defect by albert@home from what I grasp. This task was running on a GTX550Ti with 1 GB of RAM in slot 0. At the same time a task of BRP4 from EaH was running on the same card - 0,5 mode -. So you are probably right. I didn´t check for the memory load of the GPU, as in EaH I can easily run 3 task a time. I don´t know, how much of memory the OpenCl task does require. The probably too high memory load might also the reason for the long run time. I will take attention on that next time. Thank you for this hint. Kind regards martin ____________ | |
| ID: 112038 | | |
|
p2030.20110421.G41.18+00.30.N.b6s0g0.00000_1832_2 using einsteinbinary_BRP4 version 123 (atiOpenCL) | |
| ID: 112039 | | |
|
p2030.20110421.G41.18+00.30.N.b6s0g0.00000_1400_4 using einsteinbinary_BRP4 version 123 (atiOpenCL) | |
| ID: 112042 | | |
|
This wu seems to be wreaking havoc. I completed it ok, but everyone is erroring out. Your client erorred too Bikeman, but I presume that is because you client is 6.12.33? | |
| ID: 112045 | | |
This wu seems to be wreaking havoc. I completed it ok, but everyone is erroring out. Your client erorred too Bikeman, but I presume that is because you client is 6.12.33? Seems to be the same types of problems with this wu also: http://albert.phys.uwm.edu/workunit.php?wuid=69486 ____________ | |
| ID: 112046 | | |
|
Got same errors on my notebook with Mobile Radeon 5450 1GB vram: | |
| ID: 112047 | | |
Software: Catalst 12.3, BOINC 7.0.26 (x64), Windows 7/64 RAM-Usage: Taskmanager during GPU-process: ~207 MB (max) no visible GPU-Usage (by AMD Overdrive), computing the workunits took just same seconds until fail Each workunit failed, so I stopped processing. Stderr output <core_client_version>7.0.26</core_client_version> <![CDATA[ <message> Beim L�schen der Farbtransformation ist ein Fehler aufgetreten. (0x7e3) - exit code 2019 (0x7e3) </message> <stderr_txt> Activated exception handling... [20:24:32][5108][INFO ] Starting data processing... [20:24:33][5108][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc. [20:24:33][5108][INFO ] Using OpenCL device "Cedar" by: Advanced Micro Devices, Inc. [20:24:34][5108][WARN ] Kernel "kernelTimeSeriesMeanReduction" exceeds device-specific maximum work group size (requested: 256)! ------> Reducing kernel's work group size to allowed maximum of: 128 work items [20:24:34][5108][WARN ] Kernel "kernelPowerSpectrum" exceeds device-specific maximum work group size (requested: 256)! ------> Reducing kernel's work group size to allowed maximum of: 128 work items [20:24:34][5108][WARN ] Kernel "kernelHarmonicSumming" exceeds device-specific maximum work group size (requested: 256)! ------> Reducing kernel's work group size to allowed maximum of: 128 work items [20:24:35][5108][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [20:24:35][5108][INFO ] Header contents: ------> Original WAPP file: ./p2030.20110421.G41.29-00.40.S.b0s0g0.00000_DM42.40 ------> Sample time in microseconds: 65.4762 ------> Observation time in seconds: 274.62705 ------> Time stamp (MJD): 55672.400301627786 ------> Number of samples/record: 0 ------> Center freq in MHz: 1214.289551 ------> Channel band in MHz: 0.33605957 ------> Number of channels/record: 960 ------> Nifs: 1 ------> RA (J2000): 190804.6872 ------> DEC (J2000): 71149.1882019 ------> Galactic l: 0 ------> Galactic b: 0 ------> Name: G41.29-00.40.S ------> Lagformat: 0 ------> Sum: 1 ------> Level: 3 ------> AZ at start: 0 ------> ZA at start: 0 ------> AST at start: 0 ------> LST at start: 0 ------> Project ID: -- ------> Observers: -- ------> File size (bytes): 0 ------> Data size (bytes): 0 ------> Number of samples: 4194304 ------> Trial dispersion measure: 42.4 cm^-3 pc ------> Scale factor: 0.00758342 [20:24:40][5108][INFO ] Seed for random number generator is 1157054464. [20:25:10][5108][INFO ] Derived global search parameters: ------> f_A probability = 0.08 ------> single bin prob(P_noise > P_thr) = 1.32531e-008 ------> thr1 = 18.139 ------> thr2 = 21.241 ------> thr4 = 26.2686 ------> thr8 = 34.6478 ------> thr16 = 48.9581 [20:25:10][5108][ERROR] Error during OpenCL kernel setup: PS_R3 (error: -55) [20:25:10][5108][ERROR] Demodulation failed (error: 2019)! 20:25:10 (5108): called boinc_finish </stderr_txt> ]]> | |
| ID: 112049 | | |
|
I gave it a new chance (some weeks ago my system crashed every 20 min). | |
| ID: 112053 | | |
|
Hi all | |
| ID: 112054 | | |
|
This sounds very good! | |
| ID: 112055 | | |
|
Good news! | |
| ID: 112056 | | |
|
So, problem solved, now it works again with v1.24. There are screens of my machines crunching Albert. Is cpu/gpu memory usage normal? Because they differs alot. | |
| ID: 112057 | | |
|
looks good, thanks! | |
| ID: 112058 | | |
|
I have a 5450 but not yet the new app. SETI is right now on the GPU. There were some Ghosts wu lingering in my account so I allowed work to get them going. Sometime tomorrow maybe I will pickup new work here. | |
| ID: 112060 | | |
|
ATI 4850(512MB) no tasks. | |
| ID: 112061 | | |
ATI 4850(512MB) no tasks. Hi! Only OpenCL 1.1 capable cards are supported by this app, that's why the 4850 won't get jobs Cheers HB ____________ | |
| ID: 112062 | | |
|
I've ran a few tasks with 1.24 and it looks fine. | |
| ID: 112063 | | |
|
I must add this: | |
| ID: 112064 | | |
|
Hi all, | |
| ID: 112065 | | |
|
I have my older task done but it is still running. On trying to save all Messages to Memory to dump them BM is hanging again. Will report via Alpha Email list. | |
| ID: 112066 | | |
|
I don't understand. | |
| ID: 112067 | | |
|
I don't know if this is a BM thing or albert app thing. | |
| ID: 112068 | | |
|
I am now running 1 albert task on my ati 5850 with 0.932cpu and 1 seti ap task on the same gpu | |
| ID: 112069 | | |
I cannot be sure that the cpu cores use different cores though... Your operating system's scheduler should take care of this unless you force specific applications to use specific cores. Basically the way it works is that applications/threads get 'time slices' from the OS scheduler, which is how it can run multiple applications side by side on a single core. Between time slices the scheduler might decide to continue to run a thread on a different core depending on how busy each core is - that's why you generally see even single-threaded applications using a bit of each core: because they spend about equal time running on each one. | |
| ID: 112070 | | |
New day, new test: it´s running ! RAM-usage System: 208 MB RAM max., 84 MB at the moment GPU: 43 percent usage with (4,596 CPUs + 1 ATI GPU) GPU: 95 percent usage with (3,596 CPUs + 1 ATI GPU) GPU Temp: 50 stock, 65 degrees @Albert@home estimated runtime for the Albert-Workunit: 24 hours, dead line in 14 days. It runs slowly, but with noticeable lags @95 percent GPU usage. CPU usage in BOINC is suboptimal with 3,596 CPUs. | |
| ID: 112071 | | |
|
These wu's show BRPCUDA32 v1.25 throwing errors: <core_client_version>7.0.25</core_client_version> <![CDATA[ <message> Cannot create a symbolic link in a registry key that already has subkeys or values. (0x3fc) - exit code 1020 (0x3fc) </message> <stderr_txt> Activated exception handling... [08:07:13][4260][INFO ] Starting data processing... [08:07:13][4260][ERROR] Couldn't initialize CUDA driver API (error: 100)! [08:07:13][4260][ERROR] Demodulation failed (error: 1020)! 08:07:13 (4260): called boinc_finish </stderr_txt> ]]> Also, the BRPSSE3 v1.22 client is throwing errors: http://albert.phys.uwm.edu/workunit.php?wuid=70871 http://albert.phys.uwm.edu/workunit.php?wuid=70837 (from the same host) <core_client_version>6.10.60</core_client_version> <![CDATA[ <message> too many exit(0)s </message> ]]> ____________ | |
| ID: 112075 | | |
|
So, finally catched a running task. HD5450 1gb memory max workgroup 128. | |
| ID: 112076 | | |
|
v1.24 | |
| ID: 112077 | | |
|
First WU awaiting validation. | |
| ID: 112079 | | |
|
Hi all! | |
| ID: 112082 | | |
|
Wow. I find that very strange as the 69xx series cards are double precision vs. the single precision of the NVIDIA and single precision AMD (54xx-57xx, 63xx-68xx, 73xx-76xx) cards. | |
| ID: 112083 | | |
|
The Einstein@Hom app does not need (and does not use) any double precision arithmetic on the GPU, so this should not be a factor. | |
| ID: 112084 | | |
The Einstein@Home app does not need (and does not use) any double precision arithmetic on the GPU, so this should not be a factor. I am aware. The point I was trying to make, though, was that how the math is coded matters greatly and does impact precision of the final answer. Let's take for example pi^16 (exaggerated for show) with 3 different approximations for pi. 3 9 27 81 243 729 2187 6561 19683 59049 177147 531441 1594323 4782969 14348907 43046721 3.1 9.61 29.791 92.3521 286.29151 887.503681 2751.261411 8528.910374 26439.62216 81962.8287 254084.769 787662.7838 2441754.63 7569439.352 23465261.99 72742312.17 3.141592654 9.869604401 31.00627668 97.40909103 306.0196848 961.3891936 3020.293228 9488.531016 29809.09933 93648.04748 294204.018 924269.1815 2903677.271 9122171.182 28658145.97 90032220.84 I did these in excel with the last set of calculations using the actual pi() function in excel (which obviously shows decimal truncations). So, **in general**, the more precision you start with, the better your final answer (depending on a host of other things I forget from my numerical computation class), but you pay for it with computation time. But I'm sure I'm not telling you guys anything new. Just out of curiosity, was the Einstein app ever run in double precision and compared to results of single precision calculations? I presume it was based on "does not need", but I'd be interested to know the difference.
All my above hot air aside, I could have sworn I remember reading somewhere about the accuracy of OpenCL results and a statement to the effect of "it seems AMD has ditched some precision in lieu of speed", however I thought that was rectified with new catalyst drivers. Maybe send a PM to Raistmer on the Seti@Home Beta boards. I'm more than positive he will know (I think he's the one who originally posted it). ____________ | |
| ID: 112085 | | |
|
Hi all! | |
| ID: 112086 | | |
Hi all! That sounds great. You will cancel the unsent WUs? Then we could just keep our machines polling the servers and will get new work and apps as soon as you have them. ____________ Christoph | |
| ID: 112087 | | |
|
My 7970 is producing nothing but validation errors: | |
| ID: 112088 | | |
|
Hmmm... GPU temperature is ok?? | |
| ID: 112089 | | |
If memory serves me right, the BRP (then called ABP-) app started with code that indeed used double precision for some parts of its computations, and ran only on CPUs. When the idea came up to implement a GPU version, the code was changed to use single precision in those parts (almost all of the code) that were supposed to go on the GPU. At that point the scientists made sure that the ability to find pulsars wasn't compromised by this change. Note that the task of the app is not to determine the characteristics of a pulsar detection to extremely high precision (this is done in post-processing pulsar candidates and using re-observations), but to find candidate signals that stick out of the noise sufficiently clear to follow up on them. While this statement is simplifying things quite a bit, it gives you an intuitive idea why single precision is ok for this search. Cheers HB ____________ | |
| ID: 112090 | | |
|
Ah I understand. You need a way to cut through all the junk and the volunteers are the garbage filter; which means good enough detection is ok. Understood. | |
| ID: 112091 | | |
|
I don't want to get too far off topic here, but it happens there is a paper specifically on the validation strategies for the type of simulation that is done at Milkyway@Home, written by the MW scientists: http://www.cs.rpi.edu/~szymansk/papers/dais10.pdf. Just to cure your nervousness :-) | |
| ID: 112092 | | |
Hmmm... GPU temperature is ok?? It is OC slightly. I will move back to stock and see if that maks a difference. ____________ | |
| ID: 112093 | | |
I don't want to get too far off topic here, but it happens there is a paper specifically on the validation strategies for the type of simulation that is done at Milkyway@Home, written by the MW scientists: http://www.cs.rpi.edu/~szymansk/papers/dais10.pdf. Just to cure your nervousness :-) Excellent. I will read it in chunks to break up the day as I need breaks from my work. Thanks. Edit: Ok I lied I read it all just now. So it seems that bad results aren't quite so bad, but still negatively effect things. And, ironically enough, they do have trusted/untrusted host status for users. I will try to dig more on this because I see I have a lot of inconclusive results for Einstein now. For what it is worth, I know there was an issue with NVIDIA cards silently overflowing and generating bad numbers on the Seti Beta app. However, that still doesn't excuse bad numbers from AMD 6xxx cards if that's the issue. ____________ | |
| ID: 112094 | | |
|
Looks like reducing the OC solved it. I also upgraded from 12.3 to 12.4. So I can't be 100% sure. But whatever the case, It's working again. | |
| ID: 112095 | | |
Looks like reducing the OC solved it. I also upgraded from 12.3 to 12.4. So I can't be 100% sure. But whatever the case, It's working again. The upper limit is reached when the Video RAM is exhausted. So per GB of VRAM you should be able to execute at least 2, possibly 3 instances. It's hard to tell where the "sweet spot" is to maximize the overall output, so some experimentation with the number of "reserved" CPU cores (cores not allocated to CPU apps) and # of GPU jobs in parallel is the best way to find out. CU HB ____________ | |
| ID: 112096 | | |
|
A little update: | |
| ID: 112098 | | |
|
Hi! | |
| ID: 112100 | | |
|
I'm glad you got to the bottom of things. I guess that means that I the next card I add will be a 79xx card instead of another 69xx. I can't imagine why AMD thought worse accuracy was acceptable considering their whole push for compute oriented video cards and APUs. Then again, maybe that's why things were changed with the 7xxx cards (assuming you had no errors with those)? | |
| ID: 112102 | | |
I'm glad you got to the bottom of things. I guess that means that I the next card I add will be a 79xx card instead of another 69xx. I can't imagine why AMD thought worse accuracy was acceptable considering their whole push for compute oriented video cards and APUs. Then again, maybe that's why things were changed with the 7xxx cards (assuming you had no errors with those)? It's actually not something you can blame AMD for (and they were quite helpful in diagnosing this issue). The function in question is documented to have implementation dependent accuracy. It was probably not a good idea for the author of the 3rd party FFT lib to make use of this function, but that's just my personal opinion. We will get rid of this part of code to make sure this doesn't hit us again with future cards. Cheers HB ____________ | |
| ID: 112103 | | |
|
When you're able to try it on both HD 69xx cards and similar HD 79xx cards, could you give us the relative speeds of the two? | |
| ID: 112104 | | |
* Known issue: no OpenCL support for Mac OS X for the time being (we're still looking into a potential Apple bug) I could swear that I saw a message yesterday, talking about how this was fixed (hopefully). But I can't find it now, and I cannot get any tasks for my mac. Was I hallucinating? Edit: It was over at Collatz. D'oh! ____________ | |
| ID: 112105 | | |
* Known issue: no OpenCL support for Mac OS X for the time being (we're still looking into a potential Apple bug) Maybe you had sort of a vision, because I've just released, here on Albert, a version that indeed might work on Macs for AMD/OpenCL under OSX (Lion). :-) Cheers HBE ____________ | |
| ID: 112106 | | |
Message boards :
Problems and Bug Reports :
[New release] BRP app v1.23/1.24 (OpenCL) feedback thread