Deprecated: Function get_magic_quotes_gpc() is deprecated in /srv/BOINC/live-webcode/html/inc/util.inc on line 640
BRP application v 1.33 feedback thread

WARNING: This website is obsolete! Please follow this link to get to the new Albert@Home website!

BRP application v 1.33 feedback thread

Message boards : Problems and Bug Reports : BRP application v 1.33 feedback thread
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile archae86

Send message
Joined: 6 Dec 05
Posts: 414
Credit: 67,924
RAC: 0
Message 112314 - Posted: 10 Jan 2013, 5:14:59 UTC

I assume 1.33 is the version which employs file compression in order greatly to reduce the download network traffic. I applaud the attempt to obtain this improvement. While Comcast has stopped posting my bandwidth consumption where I can see it, when last I could look just two GTX460 hosts running BRP were using up about half my allowed monthly traffic.

So I'm happy to report that both of my two CUDA Windows 7 hosts have returned a stock of v1.33 work. Already 4307 has 2/5 validated, and 4306 other has 5/13 validated. Execution timings look in line with recent Einstein 1.32 work on the same hosts.

This report is neither a problem nor a bug report, but this board seemed most nearly suitable.
ID: 112314 · Report as offensive     Reply Quote
Jeroen

Send message
Joined: 25 Nov 05
Posts: 12
Credit: 638,256
RAC: 0
Message 112321 - Posted: 11 Jan 2013, 2:14:14 UTC - in response to Message 112314.  
Last modified: 11 Jan 2013, 2:30:45 UTC

I have 1.33 running on one host so far. So far 9 tasks have completed and 3 tasks have validated. The other 6 are pending validation.

The file size reduction is very significant from 2MB to 475K per file. Thank you.
ID: 112321 · Report as offensive     Reply Quote
Profile Bikeman (Heinz-Bernd Eggenstein)
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 28 Aug 06
Posts: 1483
Credit: 1,864,017
RAC: 0
Message 112322 - Posted: 11 Jan 2013, 14:52:46 UTC

Hi!


Yup, 1.33 is a new version which is testing compression of the input files, plus it uses a newer version of the BOINC API code, which is recommended for the next generation of BOINC clients.

Unfortunately this new BOINC API version introduced a bug that broke all but the OSX versions of the OpenCL BRP app versions :-(.

There is also a problem with the Linux 32 bit CPU app version (doesn't link zlib statically).

We plan to publish a new, corrected suite of BRP4 apps on Albert for testing next week.

Cheers
HB

ID: 112322 · Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 22 Jan 05
Posts: 796
Credit: 137,342
RAC: 0
Message 112323 - Posted: 12 Jan 2013, 16:32:53 UTC - in response to Message 112322.  

But it runs OK on my SuSE Linux 12.1 32-bit.
Tullio
ID: 112323 · Report as offensive     Reply Quote
Alex

Send message
Joined: 1 Mar 05
Posts: 88
Credit: 398,734
RAC: 0
Message 112324 - Posted: 12 Jan 2013, 17:01:07 UTC

27 validated, 3 pending (win cuda wu's).
Looks good so far, returning to Einstein.
ID: 112324 · Report as offensive     Reply Quote
Profile skgiven
Avatar

Send message
Joined: 14 Oct 12
Posts: 9
Credit: 4,734,887
RAC: 0
Message 112328 - Posted: 18 Jan 2013, 13:11:12 UTC - in response to Message 112324.  
Last modified: 18 Jan 2013, 13:53:22 UTC

I'm getting <0.5% failure rate:
934 Valid, 253 In Progress, 52 Pending, 3 Invalid and 1 Error:

The error task was from a couple of days ago on this machine:
(I was actually using 7.0.42 at the time, now using 7.0.44)

Stderr output

<core_client_version>7.0.44</core_client_version>
<![CDATA[
<stderr_txt>

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_1</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_2</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_3</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_4</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_5</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_6</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>p2030.20120218.G178.84-02.08.C.b1s0g0.00000_2952_3_7</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


3 Invalid tasks from the 13th and 14th (same host):

http://albert.phys.uwm.edu/result.php?resultid=470974 (standard error shown below)
http://albert.phys.uwm.edu/result.php?resultid=470633
http://albert.phys.uwm.edu/result.php?resultid=468200

Name p2030.20120219.G177.98-03.39.S.b0s0g0.00000_48_1
Workunit 184410
Created 13 Jan 2013 | 9:55:17 UTC
Sent 14 Jan 2013 | 2:05:30 UTC
Received 14 Jan 2013 | 21:17:43 UTC
Server state Over
Outcome Validate error (58:00111010)
Client state Done
Exit status 0 (0x0)
Computer ID 5305
Report deadline 28 Jan 2013 | 2:05:30 UTC
Run time 1,112.41
CPU time 197.95
Validate state Invalid
Credit 0.00
Application version Binary Radio Pulsar Search v1.33 (BRP4cuda32nv301)
Stderr output

<core_client_version>7.0.42</core_client_version>
<![CDATA[
<stderr_txt>
Activated exception handling...
[20:58:27][308800][INFO ] Starting data processing...
[20:58:28][308800][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[20:58:28][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS)
[20:58:28][308800][INFO ] Version of installed CUDA driver: 5000
[20:58:28][308800][INFO ] Version of CUDA driver API used: 3020
[20:58:28][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[20:58:28][308800][INFO ] Header contents:
------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM4.80
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 55976.964674255258
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 52736.3790016
------> DEC (J2000): 284603.9856
------> Galactic l: 0
------> Galactic b: 0
------> Name: G177.98-03.39.S
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 4.8 cm^-3 pc
------> Scale factor: 0.00102345
[20:58:29][308800][INFO ] Seed for random number generator is 1173636489.
[20:58:29][308800][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[20:58:29][308800][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB
[21:00:48][308800][INFO ] Statistics: count dirty SumSpec pages 12644 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[21:00:48][308800][INFO ] Data processing finished successfully!
[21:00:48][308800][INFO ] Starting data processing...
[21:00:48][308800][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[21:00:48][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS)
[21:00:48][308800][INFO ] Version of installed CUDA driver: 5000
[21:00:48][308800][INFO ] Version of CUDA driver API used: 3020
[21:00:48][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[21:00:48][308800][INFO ] Header contents:
------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM4.90
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 55976.96467425322
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 52736.3790016
------> DEC (J2000): 284603.9856
------> Galactic l: 0
------> Galactic b: 0
------> Name: G177.98-03.39.S
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 4.9 cm^-3 pc
------> Scale factor: 0.00102345
[21:00:49][308800][INFO ] Seed for random number generator is 1171635415.
[21:00:50][308800][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[21:00:50][308800][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB
[21:03:07][308800][INFO ] Statistics: count dirty SumSpec pages 14138 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[21:03:07][308800][INFO ] Data processing finished successfully!
[21:03:07][308800][INFO ] Starting data processing...
[21:03:07][308800][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[21:03:07][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS)
[21:03:07][308800][INFO ] Version of installed CUDA driver: 5000
[21:03:07][308800][INFO ] Version of CUDA driver API used: 3020
[21:03:07][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[21:03:07][308800][INFO ] Header contents:
------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM5.00
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 55976.96467425119
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 52736.3790016
------> DEC (J2000): 284603.9856
------> Galactic l: 0
------> Galactic b: 0
------> Name: G177.98-03.39.S
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 5 cm^-3 pc
------> Scale factor: 0.00102345
[21:03:09][308800][INFO ] Seed for random number generator is 1171635415.
[21:03:09][308800][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[21:03:09][308800][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB
[21:05:26][308800][INFO ] Statistics: count dirty SumSpec pages 13713 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[21:05:26][308800][INFO ] Data processing finished successfully!
[21:05:26][308800][INFO ] Starting data processing...
[21:05:26][308800][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[21:05:26][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS)
[21:05:26][308800][INFO ] Version of installed CUDA driver: 5000
[21:05:26][308800][INFO ] Version of CUDA driver API used: 3020
[21:05:26][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[21:05:26][308800][INFO ] Header contents:
------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM5.10
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 55976.964674249153
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 52736.3790016
------> DEC (J2000): 284603.9856
------> Galactic l: 0
------> Galactic b: 0
------> Name: G177.98-03.39.S
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 5.1 cm^-3 pc
------> Scale factor: 0.00102345
[21:05:28][308800][INFO ] Seed for random number generator is 1173636489.
[21:05:28][308800][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[21:05:28][308800][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB
[21:07:45][308800][INFO ] Statistics: count dirty SumSpec pages 13239 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[21:07:45][308800][INFO ] Data processing finished successfully!
[21:07:45][308800][INFO ] Starting data processing...
[21:07:45][308800][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[21:07:45][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS)
[21:07:45][308800][INFO ] Version of installed CUDA driver: 5000
[21:07:45][308800][INFO ] Version of CUDA driver API used: 3020
[21:07:45][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[21:07:45][308800][INFO ] Header contents:
------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM5.20
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 55976.964674247123
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 52736.3790016
------> DEC (J2000): 284603.9856
------> Galactic l: 0
------> Galactic b: 0
------> Name: G177.98-03.39.S
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 5.2 cm^-3 pc
------> Scale factor: 0.00102345
[21:07:46][308800][INFO ] Seed for random number generator is 1173636489.
[21:07:47][308800][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[21:07:47][308800][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB
[21:08:27][308800][INFO ] Checkpoint committed!
[21:10:03][308800][INFO ] Statistics: count dirty SumSpec pages 6747 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[21:10:03][308800][INFO ] Data processing finished successfully!
[21:10:03][308800][INFO ] Starting data processing...
[21:10:03][308800][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[21:10:03][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS)
[21:10:03][308800][INFO ] Version of installed CUDA driver: 5000
[21:10:03][308800][INFO ] Version of CUDA driver API used: 3020
[21:10:03][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[21:10:03][308800][INFO ] Header contents:
------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM5.30
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 55976.964674245086
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 52736.3790016
------> DEC (J2000): 284603.9856
------> Galactic l: 0
------> Galactic b: 0
------> Name: G177.98-03.39.S
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 5.3 cm^-3 pc
------> Scale factor: 0.00102345
[21:10:04][308800][INFO ] Seed for random number generator is 1173636489.
[21:10:05][308800][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[21:10:05][308800][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB
[21:12:21][308800][INFO ] Statistics: count dirty SumSpec pages 11398 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[21:12:21][308800][INFO ] Data processing finished successfully!
[21:12:21][308800][INFO ] Starting data processing...
[21:12:22][308800][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[21:12:22][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS)
[21:12:22][308800][INFO ] Version of installed CUDA driver: 5000
[21:12:22][308800][INFO ] Version of CUDA driver API used: 3020
[21:12:22][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[21:12:22][308800][INFO ] Header contents:
------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM5.40
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 55976.964674243056
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 52736.3790016
------> DEC (J2000): 284603.9856
------> Galactic l: 0
------> Galactic b: 0
------> Name: G177.98-03.39.S
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 5.4 cm^-3 pc
------> Scale factor: 0.00102345
[21:12:23][308800][INFO ] Seed for random number generator is 1173636489.
[21:12:23][308800][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[21:12:23][308800][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB
[21:14:39][308800][INFO ] Statistics: count dirty SumSpec pages 11084 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[21:14:39][308800][INFO ] Data processing finished successfully!
[21:14:39][308800][INFO ] Starting data processing...
[21:14:39][308800][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 90 MB (1959 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[21:14:39][308800][INFO ] Using CUDA device #1 "GeForce GTX 660 Ti" (0 CUDA cores / 0.00 GFLOPS)
[21:14:39][308800][INFO ] Version of installed CUDA driver: 5000
[21:14:39][308800][INFO ] Version of CUDA driver API used: 3020
[21:14:39][308800][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[21:14:39][308800][INFO ] Header contents:
------> Original WAPP file: ./p2030.20120219.G177.98-03.39.S.b0s0g0.00000_DM5.50
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 55976.964674241019
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 52736.3790016
------> DEC (J2000): 284603.9856
------> Galactic l: 0
------> Galactic b: 0
------> Name: G177.98-03.39.S
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 5.5 cm^-3 pc
------> Scale factor: 0.00102345
[21:14:41][308800][INFO ] Seed for random number generator is 1173636489.
[21:14:41][308800][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[21:14:41][308800][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 294 MB (1755 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB
[21:16:58][308800][INFO ] Statistics: count dirty SumSpec pages 10223 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[21:16:58][308800][INFO ] Data processing finished successfully!
21:16:58 (308800): called boinc_finish

</stderr_txt>
]]>
ID: 112328 · Report as offensive     Reply Quote
Eyrie

Send message
Joined: 20 Feb 14
Posts: 47
Credit: 2,410
RAC: 0
Message 112808 - Posted: 25 Feb 2014, 14:37:30 UTC

I realise this is rather late feedback, but I've only just (re-)attached.

I got a bunch of errors - CPU app.

The error is simple - out of mem.
The host only has 3 GB and it was rather strained trying to run BOINC with Einstein/Albert and memory heavy Rosetta tasks AND a memory heavy game. Having LAIM in effect suspending boinc would free the CPUs but not the memory.

So, when BRP tried to start up there wasn't enough memory to be had [no idea if making my pagefile larger would help] and a whole bunch of tasks bit the bullet.

The error made it into stderr, so the app did notice that there was not enough memory. Since that very often is a transient condition, resulting from the user doing something memory heavy, it would be nice if the app could invoke 'temporary exit' instead of hard exits. That way boinc will try to start the task again at a later time, hopefully with more free mem, and the task will be able to run, instead of producing a cacheful of errors.

Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
ID: 112808 · Report as offensive     Reply Quote
Profile Dr Who Fan
Avatar

Send message
Joined: 3 May 14
Posts: 3
Credit: 191,726
RAC: 0
Message 112836 - Posted: 25 May 2014, 14:00:55 UTC

Not sure if this a SCHEDULER Problem or a lack of available wing man for this type of work:

Workunit# 594225
on 20 May 2014 | 15:06:40 UTC my PC returnd the completed task;
on 23 May 2014 | 9:16:23 UTC my wing man Aborted their task;
on 23 May 2014 | 9:16:28 UTC a 3rd task was generated but 2 Days, 5.75 Hours later it has yet to be sent out to another PC for computation.

Same thing has occurred with different date/times for Workunits 594230 and 594236.


ID: 112836 · Report as offensive     Reply Quote
Profile Dr Who Fan
Avatar

Send message
Joined: 3 May 14
Posts: 3
Credit: 191,726
RAC: 0
Message 112837 - Posted: 27 May 2014, 5:36:05 UTC - in response to Message 112836.  
Last modified: 27 May 2014, 5:36:53 UTC

Can the project Administrators/Scientists please look into this problem?
Over 24 Hours has passed since I originally posted and the 3 tasks remain unsent to a 3rd wing man for validation.
Not sure if this a SCHEDULER Problem or a lack of available wing man for this type of work:

Workunit# 594225
on 20 May 2014 | 15:06:40 UTC my PC returnd the completed task;
on 23 May 2014 | 9:16:23 UTC my wing man Aborted their task;
on 23 May 2014 | 9:16:28 UTC a 3rd task was generated but 2 Days, 5.75 Hours later it has yet to be sent out to another PC for computation.

Same thing has occurred with different date/times for Workunits 594230 and 594236.


ID: 112837 · Report as offensive     Reply Quote
Claggy

Send message
Joined: 29 Dec 06
Posts: 78
Credit: 4,040,969
RAC: 0
Message 112838 - Posted: 27 May 2014, 18:54:45 UTC - in response to Message 112837.  

Can the project Administrators/Scientists please look into this problem?
Over 24 Hours has passed since I originally posted and the 3 tasks remain unsent to a 3rd wing man for validation.
Not sure if this a SCHEDULER Problem or a lack of available wing man for this type of work:

Workunit# 594225
on 20 May 2014 | 15:06:40 UTC my PC returnd the completed task;
on 23 May 2014 | 9:16:23 UTC my wing man Aborted their task;
on 23 May 2014 | 9:16:28 UTC a 3rd task was generated but 2 Days, 5.75 Hours later it has yet to be sent out to another PC for computation.

Same thing has occurred with different date/times for Workunits 594230 and 594236.

It's not a problem, Einstein/Albert employs a scheduler that will send out tasks to computers that have the right data files, why increase bandwidth utilisation for server and client, when it just has to wait for the right client to come along, and then save on that download, it just may have to wait days or weeks for the right client to come along.

Claggy
ID: 112838 · Report as offensive     Reply Quote
Profile Holmis

Send message
Joined: 4 Jan 05
Posts: 104
Credit: 2,104,736
RAC: 0
Message 112841 - Posted: 30 May 2014, 19:08:14 UTC - in response to Message 112838.  

Can the project Administrators/Scientists please look into this problem?
Over 24 Hours has passed since I originally posted and the 3 tasks remain unsent to a 3rd wing man for validation.
Not sure if this a SCHEDULER Problem or a lack of available wing man for this type of work:

Workunit# 594225
on 20 May 2014 | 15:06:40 UTC my PC returnd the completed task;
on 23 May 2014 | 9:16:23 UTC my wing man Aborted their task;
on 23 May 2014 | 9:16:28 UTC a 3rd task was generated but 2 Days, 5.75 Hours later it has yet to be sent out to another PC for computation.

Same thing has occurred with different date/times for Workunits 594230 and 594236.

It's not a problem, Einstein/Albert employs a scheduler that will send out tasks to computers that have the right data files, why increase bandwidth utilisation for server and client, when it just has to wait for the right client to come along, and then save on that download, it just may have to wait days or weeks for the right client to come along.

Claggy

Added to that explanation is that the scheduler will not wait forever, there is a maximum time before the tasks get sent to the next host asking for that type of work. I don't know what that time is set to here and now but over on Einstein it used to be set to 7 days/1 week. That might have been changed since I picked up that info, it's been several years...
ID: 112841 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 112869 - Posted: 4 Jun 2014, 13:31:35 UTC

Mind you, the explanations offered by both Claggy and Holmis only apply to Gravity Wave (CasA) tasks - they're the only ones which use the locality scheduler.

The workunits reported were - properly for this thread - BRP jobs, which download fresh data with every task. I suspect the real explanation was more prosaic - replacement tasks are put at the back of the queue, and with things being quiet here until testing resumed this morning, there were probably very few active computers plodding their way through that queue.

Anyway, all WUs have been completed and validated now.
ID: 112869 · Report as offensive     Reply Quote
GonoszTopi

Send message
Joined: 21 Jan 14
Posts: 1
Credit: 302,619
RAC: 0
Message 112870 - Posted: 4 Jun 2014, 16:49:54 UTC

Recently I got a bunch of BRP v1.33 tasks (BRP4G-cuda32-nv301). BOINC shows me that the estimated time to complete is 00:01:11. However, after 00:23:54 run time (approx. 9.240% completed) BOINC kills the task with the message "Aborting task p2030.20131124.G176.16-01.04.S.b4s0g0.00000_464_0: exceeded elapsed time limit 1432.71 (5600000.00G/3908.68G)" One example of these is result #1454983.
ID: 112870 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 112871 - Posted: 4 Jun 2014, 17:23:30 UTC - in response to Message 112870.  

Recently I got a bunch of BRP v1.33 tasks (BRP4G-cuda32-nv301). BOINC shows me that the estimated time to complete is 00:01:11. However, after 00:23:54 run time (approx. 9.240% completed) BOINC kills the task with the message "Aborting task p2030.20131124.G176.16-01.04.S.b4s0g0.00000_464_0: exceeded elapsed time limit 1432.71 (5600000.00G/3908.68G)" One example of these is result #1454983.

There were some problems this morning with the updated server code, and - as Bernd says in message 112866

Our plan class specs that were (semi-)automatically converted for the new server code were somewhat broken, causing probably all kinds of oddities for GPU tasks.

This is probably one of them...
ID: 112871 · Report as offensive     Reply Quote
Profile Holmis

Send message
Joined: 4 Jan 05
Posts: 104
Credit: 2,104,736
RAC: 0
Message 112872 - Posted: 4 Jun 2014, 18:35:43 UTC - in response to Message 112871.  

Recently I got a bunch of BRP v1.33 tasks (BRP4G-cuda32-nv301). BOINC shows me that the estimated time to complete is 00:01:11. However, after 00:23:54 run time (approx. 9.240% completed) BOINC kills the task with the message "Aborting task p2030.20131124.G176.16-01.04.S.b4s0g0.00000_464_0: exceeded elapsed time limit 1432.71 (5600000.00G/3908.68G)" One example of these is result #1454983.

There were some problems this morning with the updated server code, and - as Bernd says in message 112866

Our plan class specs that were (semi-)automatically converted for the new server code were somewhat broken, causing probably all kinds of oddities for GPU tasks.

This is probably one of them...

I just downloaded 25 BRP4G tasks with an estimated completion time of 15 seconds, wish it was true =)
I found this in client_state.xml for each of these tasks:

  <rsc_fpops_est>280000000000000.000000</rsc_fpops_est>
<rsc_fpops_bound>5600000000000000.000000</rsc_fpops_bound>

If I'm understanding this right the tasks will error out with "exceeded elapsed time limit" when the tasks have run for 20x what they where estimated to take.
I've edited the client_state.xml and added a few zeros to the rsc_fpops_bound value and hope that will give the tasks enough time to actually finish.
Let's see what happens when the host asks for work again.
ID: 112872 · Report as offensive     Reply Quote
Profile Holmis

Send message
Joined: 4 Jan 05
Posts: 104
Credit: 2,104,736
RAC: 0
Message 112889 - Posted: 5 Jun 2014, 16:28:28 UTC - in response to Message 112872.  

To follow up on my last post my host has now accumulated over 10 valid BRP4G tasks so the server side estimates have kicked in.

Freshly downloaded BRP4G tasks has an estimated time to completion @ 1h22m12s and the observed completion time is within a few minutes of that when running 2 tasks at a time. So this seems to be working as it should.

Digging a bit deeper the "Average processing rate" is @ 56.766 according to the application details page for host 2267. I didn't take note of the average PFC in the server logs but I believe it was around 3000.
So the server thinks the GPU is about 50 times faster than it actually is?
If one has to guess the speed/power of some component is it not better to assume it's slower than it actually is?
ID: 112889 · Report as offensive     Reply Quote
Eyrie

Send message
Joined: 20 Feb 14
Posts: 47
Credit: 2,410
RAC: 0
Message 112890 - Posted: 5 Jun 2014, 16:34:23 UTC

Thanks for reminding me, we need to track down the starting point for the GPU pfc calculations. FWIW CPU starting point is that whetstone benchmark it does.
Queen of Aliasses, wielder of the SETI rolling pin, Mistress of the red shoes, Guardian of the orange tree, Slayer of very small dragons.
ID: 112890 · Report as offensive     Reply Quote
Rasputin42

Send message
Joined: 14 Jan 12
Posts: 13
Credit: 282,604
RAC: 0
Message 112891 - Posted: 6 Jun 2014, 7:45:24 UTC

6/6/2014 9:41:16 AM | Albert@Home | Server error: recompile needed

One CUDA WU errored out after 14h.
What does that mean?
What needs recompiling?
ID: 112891 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 112892 - Posted: 6 Jun 2014, 8:00:38 UTC - in response to Message 112891.  

What needs recompiling?

(at least one of) the daemons running on the server.

Nothing we can do: I've emailed Bernd.
ID: 112892 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 10 Dec 05
Posts: 450
Credit: 5,409,572
RAC: 0
Message 112893 - Posted: 6 Jun 2014, 9:27:17 UTC

Server is working again.
ID: 112893 · Report as offensive     Reply Quote

Message boards : Problems and Bug Reports : BRP application v 1.33 feedback thread



This material is based upon work supported by the National Science Foundation (NSF) under Grant PHY-0555655 and by the Max Planck Gesellschaft (MPG). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the investigators and do not necessarily reflect the views of the NSF or the MPG.

Copyright © 2024 Bruce Allen for the LIGO Scientific Collaboration