WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

Posts by nenym

1) Message boards : News : Project server code update (Message 113119)
Posted 20 Jun 2014 by Profile nenym
Post:
Additional notes.

Cherry-pickig.
It is very difficult to prevent it selectively, as it can be done not by aborting only, also by killing process via taskmanager. And aborting could be really cherry-picking or:
- missed deadlines (could be sorted),
- unexpected reasons (HW fault of host...),
- end of a challenge (Pentathlon, PG challenge series - PG explicitly asks for aborting unneeded tasks),
- overestimated fpops and consequential preventing of a panic mode...
Lowering of the daily quota (according to actual formula N=N-n_of_errored_tasks) is not enough for preventing, because it can be simply passed by time-to-time finished work.
It is mission impossible by my POW.

Regulation process.
Reaching the asymptote can be accelerated using granted credit boundaries (independetely on rsc_fpops_est) on a validator side for a sort/batch of WUs if it makes sense. Yes, it is additional work for developers and administrators, could be partially automated using "reference" machine. On the other side - it can be a wrong way theoretically because of two regulation parameters of the same quantity.
I feel you did not implicate credit bounds to see design or implementation flaws in raw algorithm.
2) Message boards : News : Project server code update (Message 113116)
Posted 20 Jun 2014 by Profile nenym
Post:
Thanks jason_gee for explanation.
I agree with you, a fixed credit scheme (and a "measured work done scheme" - rosetta, AQUA, Seti MB years ago) means a lot of complications for developers, on the other side is popular independently on target RAC.

Notes (theory):
Maybe D part of regulation (feedback) is too strong or R part too weak because I see the same waves as on a oscilloscope measuring of not well designed regulation with tendencies to vibration (for some frequencies) missing the asymptote. In that case I part has to chance to win. Good test for stability is repeated Dirac impulse, theory e.g. here.

I know that CreditNew works very well for some projects (WEP M+2, Seti AP ...), but these has units with close run times. On the other side on projects having very different times it seems to be RNG (LHC vary 1:3 for my i5-4570S).
Run time/ CPU time/ Credit
20,679.88/ 20,046.41/ 291.51 WU
346.51/ 336.59/ 1.84 WU

Notes (how to bestir/cheat the CreditNew):
a) in stable state (CPU time ~ Run time) start high CPU load by third application (e.g. GPU crunching, Autocad 3D rendering...), credit per CPU time rises
b) to make a notepad overclocking = false benchmark. It works for cca 10 tasks, then credit normalises. If tasks are long enough, cheated credit is noticeable. After 10 tasks it is necessary to crunch other project and to return back after time for 100 tasks, in that case the system "forgets".

Cherry-picking: where do you see source causation for it, on the project's side or on the cruncher's side?

Current system penalises optimisation: it's what I really hate on CreditNew (and benchmark CreditOld too). I lived in "socialist" country for 34 years and be sure we don't have the same stomachs. If I work hard and lot (optimised AVX/FMA3 apps) I eat more and must visit fitness, chiropractor etc.. (pay more for electricity and CPU+MOBO and PSU and cooling system) and I anticipate more salary (more credit). That is what David Lenin Anderson misunderstood.
3) Message boards : News : Project server code update (Message 113114)
Posted 19 Jun 2014 by Profile nenym
Post:
Notices from ordinary cruncher.

Some observations I haven't seen be mentioned here (maybe my bad observations or trivial for you - experts):
- Run time of Intel_GPU apps depends on type of CPU apps crunched by CPU, especially AVX/FMA3 application has strong effect (PG LLR/AVX&AVX2&FMA3, Asteroids AVX on Haswell, Beal and MindModeling SSE2 too),
- CPU time of CPU apps depends on type of Intel_GPU app concurrently running, e.g. Collatz mini has nearly no effect, on the other side Einstein BRP and Seti AP apps can double the CPU time,
- Run time of CUDA apps depends on type of Intel_GPU app concurrently running (not sure if CUDAOpenCL and ATIOpenCL too) - GPU load of CUDA is the same,
- Run time of some types GPU apps can be strongly shortened by manipulation with CPU process priority (if priority of BRP process set to Realtime, Run time is half-length on Intel_GPU).

A bit OT, but......it's my point of view:
What can I see for the time being:
- a hard work and analysis ,
- David's RNG seems to be a fixed credit compared to the granted credit here for GPU apps.
No offense, but are you sure by chance to catch up that chaotic system, as a Boinc space is? It is really a great deal.
I see as a simplest way the fixed credit scheme for tasks of application, which length vary -+ 15 % on "standard" machine. Your work is hard and great, but what is the goal? If "fair" credit scheme for tasks of application using different app_ver, platform and plan_class (SSEx, AVXx, FMA3, GPU Intel, CUDA, OpenCL) with big vary of length....what is the fair credit? For the same reference WU the same credit independently on crunching machine (close to a fixed credit scheme), or credit depended on a "benchmark" (i.e. vary credit for the same reference WU), which is nonsense in the world of AVX/FMA3 and GPU hosts? What I see is effort to reach the benchmark asymptote.
In despite of my point of view my machines stay here and helps to find the way.

It is very interesting for me to look over your work and analysis as my job are chaotic systems, too. (to be clear partially predictable by Poisson/Binomic distribution, i.e. "without memory")
4) Message boards : News : Project server code update (Message 112925)
Posted 12 Jun 2014 by Profile nenym
Post:
I have fixed fpops intel_GPU issue using app_info.xml containing tag
<flops>14479075542.794144</flops>
for BPR4, BPR4G and BPR5 intel_GPU applications. Seems to work at both HD4000 and HD4600. Is it correct way?
5) Message boards : News : Project server code update (Message 112867)
Posted 4 Jun 2014 by Profile nenym
Post:
BRP4G cuda task is running OK at 9600GT/XP 32bit, driver 335.28.
6) Message boards : News : Project server code update (Message 112859)
Posted 4 Jun 2014 by Profile nenym
Post:
Seems to be OK.
04/06/2014 12:37:42 | Albert@Home | Sending scheduler request: Requested by user.
04/06/2014 12:37:42 | Albert@Home | Requesting new tasks for CPU and NVIDIA GPU
04/06/2014 12:37:45 | Albert@Home | Scheduler request completed: got 0 new tasks
04/06/2014 12:37:45 | Albert@Home | No tasks sent
04/06/2014 12:37:45 | Albert@Home | Tasks for CPU are available, but your preferences are set to not accept them

7) Message boards : News : Project server code update (Message 112856)
Posted 4 Jun 2014 by Profile nenym
Post:
OK, if Albert is not for testing of applications only, but also for the credit system (as SetiBeta and ralph), I have no problem to help to generate baseline. It is important to know it. It that case I have no problem with low and random credit.
8) Message boards : News : Project server code update (Message 112852)
Posted 4 Jun 2014 by Profile nenym
Post:
Does the problem persist?

We are testing the behavior of "CreditNew" on this project and will try to fix it if necessary. Be prepared for the unexpected!

BM
04/06/2014 10:53:32 | Albert@Home | Sending scheduler request: Requested by user.
04/06/2014 10:53:32 | Albert@Home | Requesting new tasks for CPU and NVIDIA GPU
04/06/2014 10:53:36 | Albert@Home | Scheduler request failed: HTTP internal server error
The machine has been restarted.

Note: Local time UTC+2 (Prag).

If your are going to use Dave's random number generator, I leave the project. Some CPU projects have fixed it to number generator of expected and acceptable range, but no GPU project has been successful in that deal. Good luck.

EDIT: Before leaving I'll try my favorite joke - using app_info to get BPR4 CPU task to be crunched by intel_gpu. I expect credit 0.5 instead of 62.5. Can be seen as wu 590960.
9) Message boards : News : Project server code update (Message 112847)
Posted 2 Jun 2014 by Profile nenym
Post:
Scheduler request failed: HTTP internal server error

is what I get
The same here
10) Message boards : Problems and Bug Reports : Upload/Download server (Message 111447)
Posted 30 Nov 2011 by Profile nenym
Post:
More notes:
Download of one Albert task failed. I have updated the project
"Albert server" wrote:
30/11/2011 22:23:22 Albert@Home Reporting 1 completed tasks, not requesting new tasks
30/11/2011 22:23:26 Albert@Home Scheduler request completed
30/11/2011 22:23:26 Albert@Home Message from server: This project doesn't support computers of type x86_64-pc-linux-gnu
and the failed task has not been reported.
The main project Einstein has no problem with reporting finished or falied tasks on the host when the feature <no_alt_platform> is set to 1.
11) Message boards : Problems and Bug Reports : Upload/Download server (Message 111434)
Posted 29 Nov 2011 by Profile nenym
Post:
It is not about reseting of <no_alt_platform> by the server.
Step by step.
- to recieve Albert (and Optima) tasks I reset the feature to 0
- when crunching Albert (and Optima) tasks I set the feature to 1 not to recieve 32bit tasks of Correlizer, WEP+2 etc.
- when a Optima task is finished and uploaded, the task is reported (with standard message app xxx is not available for your type of computer), but really reported and credited
- when a Albert task is finished and uploaded, after updating the Albert server sends standard message only, but the task is not reported
- to report Albert task, I am to reset the feature to 0
I mean it is not a client issue. I mean the server doesn't recieve report of finished 32bit task, when the host exclude tasks of 32bit applications.
12) Message boards : Problems and Bug Reports : Upload/Download server (Message 111414)
Posted 25 Nov 2011 by Profile nenym
Post:
A minor server issue occured. I am using the cc_config <no_alt_platform>1</no_alt_platform> feature on Ubuntu 10.04 64 bit host (X6 1090T + GTX260). To recieve a Albert CUDA task I am to switch to <no_alt_platform>0</no_alt_platform>, that is no problem. While crunching downloaded CUDA tasks, the feature is switched to <no_alt_platform>1</no_alt_platform>. Issuse occures when finished task is uploaded and the project is updated. A server sends standard message (app xxx is not available for your type of computer), but also finished and uploaded tasks are not reported. To report these tasks I am to switch the feature to <no_alt_platform>0</no_alt_platform>.
Is it requiered?
13) Message boards : Problems and Bug Reports : 1015 (0x3f7) (Message 111410)
Posted 24 Nov 2011 by Profile nenym
Post:
Hi Ageless.
I am not a newbie on testing (see DD forum, where we both have reported a lot), I have reported the problem after
- clear install of GPU driver,
- updating boinc to 6.13.12.
That task was crunched by new CUDA app. Radio Pulsar Search v1.07 (BRP3cuda32) sent by server.
I am not reporting errors when I mean that my machine is not OK for testing, e.g. host with ATI HD 4770 + core 6.12.34 (PG PPSE AtiOpenCL works there fine).
App_info I use with GTX260, but only for main project crunching, it makes me no sense to use it when testing new apps.
14) Message boards : Problems and Bug Reports : 1015 (0x3f7) (Message 111261)
Posted 15 Nov 2011 by Profile nenym
Post:
All my CUDA tasks errored out. Win XP 32bit, 9600GT, driver 285.58, core 6.13.12.
<core_client_version>6.13.12</core_client_version>
<![CDATA[
<message>
Registr je po�kozen. Je po�kozena struktura jednoho ze soubor�, kter� obsahuj� registr, nebo struktura syst�mov�ho bitov� kopie souboru v pam�ti, nebo soubor nemohl b�t obnoven, proto�e chyb� nebo je po�kozena n�hradn� kopie �i protokol . (0x3f7) - exit code 1015 (0x3f7)
</message><stderr_txt>
Activated exception handling...
[23:52:52][4044][INFO ] Starting data processing...
[23:52:52][4044][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 51 MB (461 MB free / 512 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[23:52:52][4044][INFO ] Using CUDA device #0 "GeForce 9600 GT" (64 CUDA cores / 349.44 GFLOPS)
[23:52:52][4044][INFO ] Version of installed CUDA driver: 4010
[23:52:52][4044][INFO ] Version of CUDA driver API used: 3020
[23:52:53][4044][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[23:52:53][4044][INFO ] Header contents:
------> Original WAPP file: ./p2030.20100913.G48.73+01.03.S.b5s0g0.00000_DM545.60
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 55453.031578180256
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 191721.825399
------> DEC (J2000): 142523.136101
------> Galactic l: 0
------> Galactic b: 0
------> Name: G48.73+01.03.S
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 545.6 cm^-3 pc
------> Scale factor: 0.118517
[23:52:55][4044][INFO ] Seed for random number generator is 1149629056.
[23:52:56][4044][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 9.93986e-009
------> thr1 = 18.4267
------> thr2 = 21.5421
------> thr4 = 26.5915
------> thr8 = 35.0049
------> thr16 = 49.3672
[23:52:56][4044][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 312 MB (200 MB free / 512 MB total) -> Used by this application (assuming a single GPU task): 261 MB
[23:52:56][4044][ERROR] Error launching CUDA TSP kernel (error: 1)
[23:52:56][4044][ERROR] Demodulation failed (error: 1015)!
23:52:56 (4044): called boinc_finish

</stderr_txt>






This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration