WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!
BRP application v 1.33 feedback thread |
Message boards :
Problems and Bug Reports :
BRP application v 1.33 feedback thread
Message board moderation
Author | Message |
---|---|
archae86 Send message Joined: 6 Dec 05 Posts: 414 Credit: 67,924 RAC: 0 |
I assume 1.33 is the version which employs file compression in order greatly to reduce the download network traffic. I applaud the attempt to obtain this improvement. While Comcast has stopped posting my bandwidth consumption where I can see it, when last I could look just two GTX460 hosts running BRP were using up about half my allowed monthly traffic. So I'm happy to report that both of my two CUDA Windows 7 hosts have returned a stock of v1.33 work. Already 4307 has 2/5 validated, and 4306 other has 5/13 validated. Execution timings look in line with recent Einstein 1.32 work on the same hosts. This report is neither a problem nor a bug report, but this board seemed most nearly suitable. |
Jeroen Send message Joined: 25 Nov 05 Posts: 12 Credit: 638,256 RAC: 0 |
I have 1.33 running on one host so far. So far 9 tasks have completed and 3 tasks have validated. The other 6 are pending validation. The file size reduction is very significant from 2MB to 475K per file. Thank you. |
Bikeman (Heinz-Bernd Eggenstein) Volunteer moderator Project administrator Project developer Send message Joined: 28 Aug 06 Posts: 1483 Credit: 1,864,017 RAC: 0 |
Hi! Yup, 1.33 is a new version which is testing compression of the input files, plus it uses a newer version of the BOINC API code, which is recommended for the next generation of BOINC clients. Unfortunately this new BOINC API version introduced a bug that broke all but the OSX versions of the OpenCL BRP app versions :-(. There is also a problem with the Linux 32 bit CPU app version (doesn't link zlib statically). We plan to publish a new, corrected suite of BRP4 apps on Albert for testing next week. Cheers HB |
tullio Send message Joined: 22 Jan 05 Posts: 796 Credit: 137,342 RAC: 0 |
But it runs OK on my SuSE Linux 12.1 32-bit. Tullio |
Alex Send message Joined: 1 Mar 05 Posts: 88 Credit: 398,734 RAC: 0 |
27 validated, 3 pending (win cuda wu's). Looks good so far, returning to Einstein. |
skgiven Send message Joined: 14 Oct 12 Posts: 9 Credit: 4,734,887 RAC: 0 |
I'm getting <0.5% failure rate: 934 Valid, 253 In Progress, 52 Pending, 3 Invalid and 1 Error: The error task was from a couple of days ago on this machine: (I was actually using 7.0.42 at the time, now using 7.0.44) Stderr output <core_client_version>7.0.44</core_client_version> <![CDATA[ <stderr_txt> </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_0</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_1</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_2</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_3</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_4</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_5</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_6</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_7</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> 3 Invalid tasks from the 13th and 14th (same host): http://albert.phys.uwm.edu/result.php?resultid=470974 (standard error shown below) http://albert.phys.uwm.edu/result.php?resultid=470633 http://albert.phys.uwm.edu/result.php?resultid=468200 Name p2030.20120219.G177.98-03.39.S.b0s0g0.00000_48_1 Workunit 184410 Created 13 Jan 2013 | 9:55:17 UTC Sent 14 Jan 2013 | 2:05:30 UTC Received 14 Jan 2013 | 21:17:43 UTC Server state Over Outcome Validate error (58:00111010) Client state Done Exit status 0 (0x0) Computer ID 5305 Report deadline 28 Jan 2013 | 2:05:30 UTC Run time 1,112.41 CPU time 197.95 Validate state Invalid Credit 0.00 Application version Binary Radio Pulsar Search v1.33 (BRP4cuda32nv301) Stderr output <core_client_version>7.0.42</core_client_version> <![CDATA[ <stderr_txt> Activated exception handling... [20:58:27][308800][INFO ] Starting data processing... [20:58:28][308800][INFO ] CUDA global memory status (initial GPU state, including context): ------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB [20:58:28][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS) [20:58:28][308800][INFO ] Version of installed CUDA driver: 5000 [20:58:28][308800][INFO ] Version of CUDA driver API used: 3020 [20:58:28][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [20:58:28][308800][INFO ] Header contents: ------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM4.80 ------> Sample time in microseconds: 65.4762 ------> Observation time in seconds: 274.62705 ------> Time stamp (MJD): 55976.964674255258 ------> Number of samples/record: 0 ------> Center freq in MHz: 1214.289551 ------> Channel band in MHz: 0.33605957 ------> Number of channels/record: 960 ------> Nifs: 1 ------> RA (J2000): 52736.3790016 ------> DEC (J2000): 284603.9856 ------> Galactic l: 0 ------> Galactic b: 0 ------> Name: G177.98-03.39.S ------> Lagformat: 0 ------> Sum: 1 ------> Level: 3 ------> AZ at start: 0 ------> ZA at start: 0 ------> AST at start: 0 ------> LST at start: 0 ------> Project ID: -- ------> Observers: -- ------> File size (bytes): 0 ------> Data size (bytes): 0 ------> Number of samples: 4194304 ------> Trial dispersion measure: 4.8 cm^-3 pc ------> Scale factor: 0.00102345 [20:58:29][308800][INFO ] Seed for random number generator is 1173636489. [20:58:29][308800][INFO ] Derived global search parameters: ------> f_A probability = 0.08 ------> single bin prob(P_noise > P_thr) = 1.32531e-008 ------> thr1 = 18.139 ------> thr2 = 21.241 ------> thr4 = 26.2686 ------> thr8 = 34.6478 ------> thr16 = 48.9581 [20:58:29][308800][INFO ] CUDA global memory status (GPU setup complete): ------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB [21:00:48][308800][INFO ] Statistics: count dirty SumSpec pages 12644 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052 [21:00:48][308800][INFO ] Data processing finished successfully! [21:00:48][308800][INFO ] Starting data processing... [21:00:48][308800][INFO ] CUDA global memory status (initial GPU state, including context): ------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB [21:00:48][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS) [21:00:48][308800][INFO ] Version of installed CUDA driver: 5000 [21:00:48][308800][INFO ] Version of CUDA driver API used: 3020 [21:00:48][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [21:00:48][308800][INFO ] Header contents: ------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM4.90 ------> Sample time in microseconds: 65.4762 ------> Observation time in seconds: 274.62705 ------> Time stamp (MJD): 55976.96467425322 ------> Number of samples/record: 0 ------> Center freq in MHz: 1214.289551 ------> Channel band in MHz: 0.33605957 ------> Number of channels/record: 960 ------> Nifs: 1 ------> RA (J2000): 52736.3790016 ------> DEC (J2000): 284603.9856 ------> Galactic l: 0 ------> Galactic b: 0 ------> Name: G177.98-03.39.S ------> Lagformat: 0 ------> Sum: 1 ------> Level: 3 ------> AZ at start: 0 ------> ZA at start: 0 ------> AST at start: 0 ------> LST at start: 0 ------> Project ID: -- ------> Observers: -- ------> File size (bytes): 0 ------> Data size (bytes): 0 ------> Number of samples: 4194304 ------> Trial dispersion measure: 4.9 cm^-3 pc ------> Scale factor: 0.00102345 [21:00:49][308800][INFO ] Seed for random number generator is 1171635415. [21:00:50][308800][INFO ] Derived global search parameters: ------> f_A probability = 0.08 ------> single bin prob(P_noise > P_thr) = 1.32531e-008 ------> thr1 = 18.139 ------> thr2 = 21.241 ------> thr4 = 26.2686 ------> thr8 = 34.6478 ------> thr16 = 48.9581 [21:00:50][308800][INFO ] CUDA global memory status (GPU setup complete): ------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB [21:03:07][308800][INFO ] Statistics: count dirty SumSpec pages 14138 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052 [21:03:07][308800][INFO ] Data processing finished successfully! [21:03:07][308800][INFO ] Starting data processing... [21:03:07][308800][INFO ] CUDA global memory status (initial GPU state, including context): ------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB [21:03:07][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS) [21:03:07][308800][INFO ] Version of installed CUDA driver: 5000 [21:03:07][308800][INFO ] Version of CUDA driver API used: 3020 [21:03:07][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [21:03:07][308800][INFO ] Header contents: ------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM5.00 ------> Sample time in microseconds: 65.4762 ------> Observation time in seconds: 274.62705 ------> Time stamp (MJD): 55976.96467425119 ------> Number of samples/record: 0 ------> Center freq in MHz: 1214.289551 ------> Channel band in MHz: 0.33605957 ------> Number of channels/record: 960 ------> Nifs: 1 ------> RA (J2000): 52736.3790016 ------> DEC (J2000): 284603.9856 ------> Galactic l: 0 ------> Galactic b: 0 ------> Name: G177.98-03.39.S ------> Lagformat: 0 ------> Sum: 1 ------> Level: 3 ------> AZ at start: 0 ------> ZA at start: 0 ------> AST at start: 0 ------> LST at start: 0 ------> Project ID: -- ------> Observers: -- ------> File size (bytes): 0 ------> Data size (bytes): 0 ------> Number of samples: 4194304 ------> Trial dispersion measure: 5 cm^-3 pc ------> Scale factor: 0.00102345 [21:03:09][308800][INFO ] Seed for random number generator is 1171635415. [21:03:09][308800][INFO ] Derived global search parameters: ------> f_A probability = 0.08 ------> single bin prob(P_noise > P_thr) = 1.32531e-008 ------> thr1 = 18.139 ------> thr2 = 21.241 ------> thr4 = 26.2686 ------> thr8 = 34.6478 ------> thr16 = 48.9581 [21:03:09][308800][INFO ] CUDA global memory status (GPU setup complete): ------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB [21:05:26][308800][INFO ] Statistics: count dirty SumSpec pages 13713 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052 [21:05:26][308800][INFO ] Data processing finished successfully! [21:05:26][308800][INFO ] Starting data processing... [21:05:26][308800][INFO ] CUDA global memory status (initial GPU state, including context): ------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB [21:05:26][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS) [21:05:26][308800][INFO ] Version of installed CUDA driver: 5000 [21:05:26][308800][INFO ] Version of CUDA driver API used: 3020 [21:05:26][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [21:05:26][308800][INFO ] Header contents: ------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM5.10 ------> Sample time in microseconds: 65.4762 ------> Observation time in seconds: 274.62705 ------> Time stamp (MJD): 55976.964674249153 ------> Number of samples/record: 0 ------> Center freq in MHz: 1214.289551 ------> Channel band in MHz: 0.33605957 ------> Number of channels/record: 960 ------> Nifs: 1 ------> RA (J2000): 52736.3790016 ------> DEC (J2000): 284603.9856 ------> Galactic l: 0 ------> Galactic b: 0 ------> Name: G177.98-03.39.S ------> Lagformat: 0 ------> Sum: 1 ------> Level: 3 ------> AZ at start: 0 ------> ZA at start: 0 ------> AST at start: 0 ------> LST at start: 0 ------> Project ID: -- ------> Observers: -- ------> File size (bytes): 0 ------> Data size (bytes): 0 ------> Number of samples: 4194304 ------> Trial dispersion measure: 5.1 cm^-3 pc ------> Scale factor: 0.00102345 [21:05:28][308800][INFO ] Seed for random number generator is 1173636489. [21:05:28][308800][INFO ] Derived global search parameters: ------> f_A probability = 0.08 ------> single bin prob(P_noise > P_thr) = 1.32531e-008 ------> thr1 = 18.139 ------> thr2 = 21.241 ------> thr4 = 26.2686 ------> thr8 = 34.6478 ------> thr16 = 48.9581 [21:05:28][308800][INFO ] CUDA global memory status (GPU setup complete): ------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB [21:07:45][308800][INFO ] Statistics: count dirty SumSpec pages 13239 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052 [21:07:45][308800][INFO ] Data processing finished successfully! [21:07:45][308800][INFO ] Starting data processing... [21:07:45][308800][INFO ] CUDA global memory status (initial GPU state, including context): ------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB [21:07:45][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS) [21:07:45][308800][INFO ] Version of installed CUDA driver: 5000 [21:07:45][308800][INFO ] Version of CUDA driver API used: 3020 [21:07:45][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [21:07:45][308800][INFO ] Header contents: ------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM5.20 ------> Sample time in microseconds: 65.4762 ------> Observation time in seconds: 274.62705 ------> Time stamp (MJD): 55976.964674247123 ------> Number of samples/record: 0 ------> Center freq in MHz: 1214.289551 ------> Channel band in MHz: 0.33605957 ------> Number of channels/record: 960 ------> Nifs: 1 ------> RA (J2000): 52736.3790016 ------> DEC (J2000): 284603.9856 ------> Galactic l: 0 ------> Galactic b: 0 ------> Name: G177.98-03.39.S ------> Lagformat: 0 ------> Sum: 1 ------> Level: 3 ------> AZ at start: 0 ------> ZA at start: 0 ------> AST at start: 0 ------> LST at start: 0 ------> Project ID: -- ------> Observers: -- ------> File size (bytes): 0 ------> Data size (bytes): 0 ------> Number of samples: 4194304 ------> Trial dispersion measure: 5.2 cm^-3 pc ------> Scale factor: 0.00102345 [21:07:46][308800][INFO ] Seed for random number generator is 1173636489. [21:07:47][308800][INFO ] Derived global search parameters: ------> f_A probability = 0.08 ------> single bin prob(P_noise > P_thr) = 1.32531e-008 ------> thr1 = 18.139 ------> thr2 = 21.241 ------> thr4 = 26.2686 ------> thr8 = 34.6478 ------> thr16 = 48.9581 [21:07:47][308800][INFO ] CUDA global memory status (GPU setup complete): ------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB [21:08:27][308800][INFO ] Checkpoint committed! [21:10:03][308800][INFO ] Statistics: count dirty SumSpec pages 6747 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052 [21:10:03][308800][INFO ] Data processing finished successfully! [21:10:03][308800][INFO ] Starting data processing... [21:10:03][308800][INFO ] CUDA global memory status (initial GPU state, including context): ------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB [21:10:03][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS) [21:10:03][308800][INFO ] Version of installed CUDA driver: 5000 [21:10:03][308800][INFO ] Version of CUDA driver API used: 3020 [21:10:03][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [21:10:03][308800][INFO ] Header contents: ------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM5.30 ------> Sample time in microseconds: 65.4762 ------> Observation time in seconds: 274.62705 ------> Time stamp (MJD): 55976.964674245086 ------> Number of samples/record: 0 ------> Center freq in MHz: 1214.289551 ------> Channel band in MHz: 0.33605957 ------> Number of channels/record: 960 ------> Nifs: 1 ------> RA (J2000): 52736.3790016 ------> DEC (J2000): 284603.9856 ------> Galactic l: 0 ------> Galactic b: 0 ------> Name: G177.98-03.39.S ------> Lagformat: 0 ------> Sum: 1 ------> Level: 3 ------> AZ at start: 0 ------> ZA at start: 0 ------> AST at start: 0 ------> LST at start: 0 ------> Project ID: -- ------> Observers: -- ------> File size (bytes): 0 ------> Data size (bytes): 0 ------> Number of samples: 4194304 ------> Trial dispersion measure: 5.3 cm^-3 pc ------> Scale factor: 0.00102345 [21:10:04][308800][INFO ] Seed for random number generator is 1173636489. [21:10:05][308800][INFO ] Derived global search parameters: ------> f_A probability = 0.08 ------> single bin prob(P_noise > P_thr) = 1.32531e-008 ------> thr1 = 18.139 ------> thr2 = 21.241 ------> thr4 = 26.2686 ------> thr8 = 34.6478 ------> thr16 = 48.9581 [21:10:05][308800][INFO ] CUDA global memory status (GPU setup complete): ------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB [21:12:21][308800][INFO ] Statistics: count dirty SumSpec pages 11398 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052 [21:12:21][308800][INFO ] Data processing finished successfully! [21:12:21][308800][INFO ] Starting data processing... [21:12:22][308800][INFO ] CUDA global memory status (initial GPU state, including context): ------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB [21:12:22][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS) [21:12:22][308800][INFO ] Version of installed CUDA driver: 5000 [21:12:22][308800][INFO ] Version of CUDA driver API used: 3020 [21:12:22][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [21:12:22][308800][INFO ] Header contents: ------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM5.40 ------> Sample time in microseconds: 65.4762 ------> Observation time in seconds: 274.62705 ------> Time stamp (MJD): 55976.964674243056 ------> Number of samples/record: 0 ------> Center freq in MHz: 1214.289551 ------> Channel band in MHz: 0.33605957 ------> Number of channels/record: 960 ------> Nifs: 1 ------> RA (J2000): 52736.3790016 ------> DEC (J2000): 284603.9856 ------> Galactic l: 0 ------> Galactic b: 0 ------> Name: G177.98-03.39.S ------> Lagformat: 0 ------> Sum: 1 ------> Level: 3 ------> AZ at start: 0 ------> ZA at start: 0 ------> AST at start: 0 ------> LST at start: 0 ------> Project ID: -- ------> Observers: -- ------> File size (bytes): 0 ------> Data size (bytes): 0 ------> Number of samples: 4194304 ------> Trial dispersion measure: 5.4 cm^-3 pc ------> Scale factor: 0.00102345 [21:12:23][308800][INFO ] Seed for random number generator is 1173636489. [21:12:23][308800][INFO ] Derived global search parameters: ------> f_A probability = 0.08 ------> single bin prob(P_noise > P_thr) = 1.32531e-008 ------> thr1 = 18.139 ------> thr2 = 21.241 ------> thr4 = 26.2686 ------> thr8 = 34.6478 ------> thr16 = 48.9581 [21:12:23][308800][INFO ] CUDA global memory status (GPU setup complete): ------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB [21:14:39][308800][INFO ] Statistics: count dirty SumSpec pages 11084 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052 [21:14:39][308800][INFO ] Data processing finished successfully! [21:14:39][308800][INFO ] Starting data processing... [21:14:39][308800][INFO ] CUDA global memory status (initial GPU state, including context): ------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB [21:14:39][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS) [21:14:39][308800][INFO ] Version of installed CUDA driver: 5000 [21:14:39][308800][INFO ] Version of CUDA driver API used: 3020 [21:14:39][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory). ------> Starting from scratch... [21:14:39][308800][INFO ] Header contents: ------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM5.50 ------> Sample time in microseconds: 65.4762 ------> Observation time in seconds: 274.62705 ------> Time stamp (MJD): 55976.964674241019 ------> Number of samples/record: 0 ------> Center freq in MHz: 1214.289551 ------> Channel band in MHz: 0.33605957 ------> Number of channels/record: 960 ------> Nifs: 1 ------> RA (J2000): 52736.3790016 ------> DEC (J2000): 284603.9856 ------> Galactic l: 0 ------> Galactic b: 0 ------> Name: G177.98-03.39.S ------> Lagformat: 0 ------> Sum: 1 ------> Level: 3 ------> AZ at start: 0 ------> ZA at start: 0 ------> AST at start: 0 ------> LST at start: 0 ------> Project ID: -- ------> Observers: -- ------> File size (bytes): 0 ------> Data size (bytes): 0 ------> Number of samples: 4194304 ------> Trial dispersion measure: 5.5 cm^-3 pc ------> Scale factor: 0.00102345 [21:14:41][308800][INFO ] Seed for random number generator is 1173636489. [21:14:41][308800][INFO ] Derived global search parameters: ------> f_A probability = 0.08 ------> single bin prob(P_noise > P_thr) = 1.32531e-008 ------> thr1 = 18.139 ------> thr2 = 21.241 ------> thr4 = 26.2686 ------> thr8 = 34.6478 ------> thr16 = 48.9581 [21:14:41][308800][INFO ] CUDA global memory status (GPU setup complete): ------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB [21:16:58][308800][INFO ] Statistics: count dirty SumSpec pages 10223 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052 [21:16:58][308800][INFO ] Data processing finished successfully! 21:16:58 (308800): called boinc_finish </stderr_txt> ]]> |
Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0 |
I realise this is rather late feedback, but I've only just (re-)attached. I got a bunch of errors - CPU app. The error is simple - out of mem. The host only has 3 GB and it was rather strained trying to run BOINC with Einstein/Albert and memory heavy Rosetta tasks AND a memory heavy game. Having LAIM in effect suspending boinc would free the CPUs but not the memory. So, when BRP tried to start up there wasn't enough memory to be had [no idea if making my pagefile larger would help] and a whole bunch of tasks bit the bullet. The error made it into stderr, so the app did notice that there was not enough memory. Since that very often is a transient condition, resulting from the user doing something memory heavy, it would be nice if the app could invoke 'temporary exit' instead of hard exits. That way boinc will try to start the task again at a later time, hopefully with more free mem, and the task will be able to run, instead of producing a cacheful of errors. Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. |
Dr Who Fan Send message Joined: 3 May 14 Posts: 3 Credit: 191,726 RAC: 0 |
Not sure if this a SCHEDULER Problem or a lack of available wing man for this type of work: Workunit# 594225 on 20 May 2014 | 15:06:40 UTC my PC returnd the completed task; on 23 May 2014 | 9:16:23 UTC my wing man Aborted their task; on 23 May 2014 | 9:16:28 UTC a 3rd task was generated but 2 Days, 5.75 Hours later it has yet to be sent out to another PC for computation. Same thing has occurred with different date/times for Workunits 594230 and 594236. |
Dr Who Fan Send message Joined: 3 May 14 Posts: 3 Credit: 191,726 RAC: 0 |
Can the project Administrators/Scientists please look into this problem? Over 24 Hours has passed since I originally posted and the 3 tasks remain unsent to a 3rd wing man for validation. Not sure if this a SCHEDULER Problem or a lack of available wing man for this type of work: |
Claggy Send message Joined: 29 Dec 06 Posts: 78 Credit: 4,040,969 RAC: 0 |
Can the project Administrators/Scientists please look into this problem? It's not a problem, Einstein/Albert employs a scheduler that will send out tasks to computers that have the right data files, why increase bandwidth utilisation for server and client, when it just has to wait for the right client to come along, and then save on that download, it just may have to wait days or weeks for the right client to come along. Claggy |
Holmis Send message Joined: 4 Jan 05 Posts: 104 Credit: 2,104,736 RAC: 0 |
Can the project Administrators/Scientists please look into this problem? Added to that explanation is that the scheduler will not wait forever, there is a maximum time before the tasks get sent to the next host asking for that type of work. I don't know what that time is set to here and now but over on Einstein it used to be set to 7 days/1 week. That might have been changed since I picked up that info, it's been several years... |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Mind you, the explanations offered by both Claggy and Holmis only apply to Gravity Wave (CasA) tasks - they're the only ones which use the locality scheduler. The workunits reported were - properly for this thread - BRP jobs, which download fresh data with every task. I suspect the real explanation was more prosaic - replacement tasks are put at the back of the queue, and with things being quiet here until testing resumed this morning, there were probably very few active computers plodding their way through that queue. Anyway, all WUs have been completed and validated now. |
GonoszTopi Send message Joined: 21 Jan 14 Posts: 1 Credit: 302,619 RAC: 0 |
Recently I got a bunch of BRP v1.33 tasks (BRP4G-cuda32-nv301). BOINC shows me that the estimated time to complete is 00:01:11. However, after 00:23:54 run time (approx. 9.240% completed) BOINC kills the task with the message "Aborting task p2030.20131124.G176.16-01.04.S.b4s0g0.00000_464_0: exceeded elapsed time limit 1432.71 (5600000.00G/3908.68G)" One example of these is result #1454983. |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Recently I got a bunch of BRP v1.33 tasks (BRP4G-cuda32-nv301). BOINC shows me that the estimated time to complete is 00:01:11. However, after 00:23:54 run time (approx. 9.240% completed) BOINC kills the task with the message "Aborting task p2030.20131124.G176.16-01.04.S.b4s0g0.00000_464_0: exceeded elapsed time limit 1432.71 (5600000.00G/3908.68G)" One example of these is result #1454983. There were some problems this morning with the updated server code, and - as Bernd says in message 112866 Our plan class specs that were (semi-)automatically converted for the new server code were somewhat broken, causing probably all kinds of oddities for GPU tasks. This is probably one of them... |
Holmis Send message Joined: 4 Jan 05 Posts: 104 Credit: 2,104,736 RAC: 0 |
Recently I got a bunch of BRP v1.33 tasks (BRP4G-cuda32-nv301). BOINC shows me that the estimated time to complete is 00:01:11. However, after 00:23:54 run time (approx. 9.240% completed) BOINC kills the task with the message "Aborting task p2030.20131124.G176.16-01.04.S.b4s0g0.00000_464_0: exceeded elapsed time limit 1432.71 (5600000.00G/3908.68G)" One example of these is result #1454983. I just downloaded 25 BRP4G tasks with an estimated completion time of 15 seconds, wish it was true =) I found this in client_state.xml for each of these tasks: <rsc_fpops_est>280000000000000.000000</rsc_fpops_est> <rsc_fpops_bound>5600000000000000.000000</rsc_fpops_bound> If I'm understanding this right the tasks will error out with "exceeded elapsed time limit" when the tasks have run for 20x what they where estimated to take. I've edited the client_state.xml and added a few zeros to the rsc_fpops_bound value and hope that will give the tasks enough time to actually finish. Let's see what happens when the host asks for work again. |
Holmis Send message Joined: 4 Jan 05 Posts: 104 Credit: 2,104,736 RAC: 0 |
To follow up on my last post my host has now accumulated over 10 valid BRP4G tasks so the server side estimates have kicked in. Freshly downloaded BRP4G tasks has an estimated time to completion @ 1h22m12s and the observed completion time is within a few minutes of that when running 2 tasks at a time. So this seems to be working as it should. Digging a bit deeper the "Average processing rate" is @ 56.766 according to the application details page for host 2267. I didn't take note of the average PFC in the server logs but I believe it was around 3000. So the server thinks the GPU is about 50 times faster than it actually is? If one has to guess the speed/power of some component is it not better to assume it's slower than it actually is? |
Eyrie Send message Joined: 20 Feb 14 Posts: 47 Credit: 2,410 RAC: 0 |
Thanks for reminding me, we need to track down the starting point for the GPU pfc calculations. FWIW CPU starting point is that whetstone benchmark it does. Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons. |
Rasputin42 Send message Joined: 14 Jan 12 Posts: 13 Credit: 282,604 RAC: 0 |
6/6/2014 9:41:16 AM | Albert@Home | Server error: recompile needed One CUDA WU errored out after 14h. What does that mean? What needs recompiling? |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
What needs recompiling? (at least one of) the daemons running on the server. Nothing we can do: I've emailed Bernd. |
Richard Haselgrove Send message Joined: 10 Dec 05 Posts: 450 Credit: 5,409,572 RAC: 0 |
Server is working again. |