Difference: FullProductionJune2008 (117 vs. 118)

Revision 1182011-04-21 - PeterWinter

Line: 1 to 1
 
META TOPICPARENT name="MuCapGroup"
Changed:
<
<
-- BrendanKiburg - 05 Jun 2008

Summary

  • Summary Table :

dataset runs (ana) total mu-e(pred) observed mu-e mu-e / midas run comments
Run 9 Mu+   1.4 e9     Not processed during production
Run 9 Mu-   3.5 e9     Not processed during production
Run 10 Mu+ 2906 (2712) 1.56 e9 1.79 e9 6.145 e5 all files appear to be merged properly
ds_151 777 (776)   3.95 e8 5.09 e5  
ds_152 777 (651)   5.88 e8 9.03 e5  
ds_153 777 (734)   5.91 e8 8.05 e5  
ds_154 575 (551)   2.12 e8 3.85 e5 only diff is beam momentum is +1.27, instead of +1.275
Run10 Mu- 7524 (7369) 8.6 e9 7.39 e9 1.00 e6 After rehandling substantial merging problems
ds_101 300 (298)   2.74 e8 9.21 e5  
ds_102 1000 (996)   9.83 e8 9.87 e5  
ds_103 1000 (991)   1.01 e9 1.02 e6  
ds_104 1000 (976)   1.07 e9 1.10 e6  
ds_105 1000 (982)   1.08 e9 1.10 e6  
ds_106 473 (454)   4.49 e8 9.90 e5  
ds_107 875 (846)   8.80 e8 1.04 e6  
ds_108 850 (814)   7.17 e8 8.80 e5  
ds_109 850 (838)   7.26 e8 8.66 e5  
ds_110 64 (64)   7.28 e7 1.14 e6  
ds_111 53 (53)   6.22 e7 1.17 e6  
ds_112 59 (57)   6.32 e7 1.11 e6  
Run 11 Mu+ 5554 ( 5220) 5.4 e9      
ds_251 799 (794)        
ds_252 750 (722)        
ds_253 700 (679)        
ds_254 665 (629)        
ds_255 852 (692)        
ds_256 655 (623)        
ds_257 512 (477)        
ds_258 296 (289)        
ds_259 249 (242)        
ds_260 21 (21)        
ds_261 38 (35)        
ds_262 17 (17)        
Run 11 Mu - 6101 ( 5878 ) 6.0 e9 3.31 e9 5.63 e5 Lots of muons getting vetoed by the Tpc Spark Definition
ds_201 949 (944)   5.87 e8 6.22 e5  
ds_202 900 (826)   4.38 e8 5.30 e5  
ds_203 900 (899)   7.84 e8 8.72 e5  
ds_204 1000 (921)   2.28 e8 2.48 e5 problems while merging
ds_205 967 (957)   3.39 e8 3.54 e5  
ds_206 451 (431)   3.11 e8 7.21 e5  
ds_207 423 (404)   3.57 e8 8.83 e5  
ds_208 267 (258)   2.48 e8 9.63 e5  
ds_209 165 (163)   3.58 e6 2.20 e4  
ds_210 65 (61)   1.86 e5 3.06 e3  
ds_211 7 (7)   7.75 e6 1.11 e6  
ds_212 4 (4)   3.03 e6 7.58 e5  
ds_213 3 (3)   1.80 e6 5.99 e5  
Total 22085 (21179)       95.90% success (raw->mta histos)
<-- /editTable -->

  • There were 22085 raw midas files that were considered "good"
  • We processed (or attempted to process) 19410 of them (88%)
  • The failure rate was small while things were working properly ( < 3%)
  • Several large scale crashes slowed down the total progress (tungsten unavailable, problems with my home directory filling up breaking all of my analysis jobs, etc).
  • All trend_trees transferred to NPL for web page interfacing
  • Merging files for post-processing analysis on NCSA
  • New complications with tungsten cluster --> see below

Termination of Tungsten

Today we received a message from NCSA regarding the termination of the tungsten cluster. This will happen in the next 3 weeks, so our immediate reaction is required:

July 1, 2008

Dear colleagues:

On July 21, 2008 the Dell PowerEdge 1750 cluster at NCSA, known as Tungsten, will be retired after more than 4 years of highly productive computational service. This notice is to inform you of the details surrounding the final shutdown of Tungsten. First and foremost, we want you to be aware that all remaining machine specific allocations, by default, will be transferred to NCSA's Dell cluster known as Abe. The transfer rate will be to the users benefit at a one to one ratio. Information on the system is available on our hardware page at: http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/Intel64Cluster/.

The unused balance of machine specific MRAC/LRAC and NCSA DAC allocations on Tungsten will be transferred to Abe at a ratio of 1 to 1 by default. Transfer of remaining allocation to another NCSA/TeraGrid resource is available, at the request of the project's Principal Investigator (PI). These requests will only be approved if resources are available and appropriate on the system to which the PI desires to transfer their balance. These allocations will expire on their original expiration date. All requests for transfers, other than the default to Abe, should be sent to the Allocations department at allocations@ncsa.uiuc.edu . The request must list the project's Principal Investigator, the project PSN, the amount of SUs to be transferred, and the machine to which transfer will be made by July 15, 2008 in order to guarantee that the transfer will be made before the account is moved to Abe.

Abe will also become part of the TeraGrid roaming resources as of July 1, 2008. If you are an existing TeraGrid roaming account user and you can utilize Abe, please send email to allocations@ncsa.uiuc.edu requesting your roaming project be added.

All user processes on Tungsten will be terminated at 11:59 PM, July 20, 2008. After this time, user access will be denied to tungsten.ncsa.uiuc.edu and all batch hosts. The home directories of Tungsten will be mounted on the head nodes of Abe till August 31, 2008 to ease the transition. Users who have data in the projects directory will be contacted to determine individual needs. Data remaining in the scratch directories will not be moved to another resource. At this time, it is our intention to move all applications software supported on Tungsten over to Abe. If there are unforeseen issues, we will contact the users with dependencies on those codes.

If you require assistance in migrating your codes or data, please contact the HPC Consulting Office at 217-244-1144 or by e-mail at consult@ncsa.uiuc.edu.

If you have questions or concerns, please contact Mike Pflugmacher, Assistant Director of Computing and Data Management, by phone at 217-244-1337 or by email at mikep@ncsa.uiuc.edu.

Thank you,

Mike Pflugmacher

As a followup, we found out that the tungsten machines are being traded to dell as part of an expansion of the Abe cluster. Here are some links to information about the two clusters:

Tungsten

Abe

We will need to setup all of code on the new system and deal with the normal problems of system immigration. Additionally, this needs to be done in the short term, as tungsten will become unavailable 7-10 days before the Mucap collaboration meeting.

topic tungsten abe comments
Architecture 32-bit 64-bit Need to debug code, make compatible with new system ; improved speed on 64-bit machine?
Processor speed 3.2 GHz 2.33 GHz  
Number of cores 2560 9600  
Cores / node 2 8 Introduces questions because of I/O limitations
Memory / core 1.5 GB 1 GB or 2 GB 1/2 of the system has 8 GB / 8 cores, other has 16 GB / 16 cores ; We need ~ 2 GB / mta job, so this will be a limitiation
Peak Performance 16.4 TFlop 89.47 TFlop  
Queuing system LSF Torque David says scripts can't have input parameters, need to rewrite our scripts
Number of login machines 3 unknown Hopefully there will be more than on tungsten
Project space 2- 10 TB unknown  
MSS access Yes unknown  
<-- /editTable -->

Production Information

The plan is to do a full production pass before our remaining 85 kSU expire on June 30th. To accomplish this, I am setting up a table that will allow me to check off tasks for each run group (as defined in : SarasDataClassificationWork). Files that have numbers at the end of them are subsets of the file directly above.

data_set_name nruns data set from MSS cleanup mu run clean fails mta run clean fails mta merge mta to NPL to MSS clean trend_trees on NPL and NCSA free space
prod_run10_mu-_a 4773   June 5          
       
prod_rn10_mu-_a1 300 101 6/11 tunc x x x x x x x x x mu:300 / 300, mta: 300/300, removed skim/tree from projects
prod_rn10_mu-_a2 1000 102 6/12 tunc x x x x x x x x x mu: 998/998, mta 998/998 rerun mta ; Merged
prod_rn10_mu-_a3 1000 103 6/12 tunc x x x x x x x x x files removed ; Merged
prod_rn10_mu-_a4 1000 104 6/17 tund x x x x x x x x   files removed ; merging tund 188
prod_rn10_mu-_a5 1000 105 6/23 tunb x x x x x x x x   files removed ; Merged
prod_rn10_mu-_a6 473 106 6/24 tunc x x x x x x x x   files removed ; Merged
prod_run10_mu-_b 2575              
       
prod_rn10_mu-_b1 875 107 6/25 tunc x x x x x x x x   files removed ; Merged
prod_rn10_mu-_b2 850 108 6/27 tund x x x x x x x x   files removed ; Merged
prod_rn10_mu-_b3 850 109 6/27 tunb x x x x x x x x   files removed ; Merged
prod_run10_mu+_a 2331              
       
prod_rn10_mu+_a1 777 151 6/17 tunc x x x x x x x x   files removed ; Merged
prod_rn10_mu+_a2 777 152 6/18 tunb x x x x x x x x   files removed ; Merged
prod_rn10_mu+_a3 777 153 6/22 tunc x x x x x x x x   files removed ; Merged
prod_run10_mu+_b 575 154 6/23 tunc x x x x x x x x   files removed ; Merged
prod_run11_mu-_a 2749   June 12          
       
prod_rn11_mu-_a1 949 201 6/12 tund x x x x x x x x x (#59772) Merged
prod_rn11_mu-_a2 900 202 6/12 tund x x x x x x x x x Merged
prod_rn11_mu-_a3 900 203 6/15 tunc x x x x x x x x x Merged
prod_run11_mu-_b 1967              
       
prod_rn11_mu-_b1 1000 204 6/8 tunc x x x x x   x s   check 244, need to check this after 244
prod_rn11_mu-_b2 967 205 6/9 tunc x x x x x x x x   files removed ; Merged
prod_run11_mu+_a 1549              
       
prod_rn11_mu+_a1 799 251 6/25 tunb x x x x x x x x   files removed ; Merged
prod_rn11_mu+_a2 750 252 6/27 tunc x x x x x x x x   files removed ; Merged
prod_run11_mu+_b 1365              
       
prod_rn11_mu+_b1 700 253 6/28 tunb x x x x x x x x   files removed ; Merged
prod_rn11_mu+_b2 665 254 6/29 tund x x x x x x x x   files removed ; Merged
prod_run11_mu-_c 451 206 6/29 tunc x x x x x x x x   files removed ; Merged
prod_run11_mu-_d 423 207 6/29 tunc x x x x x x x x   files removed ; Merged
prod_run11_mu+_c 452 255 6/29 tunb x x x x x x x x   ready to remove ; Merged
prod_run11_mu+_d 200 256 6/29 tund x x x x x x x x   files removed ; Merged
Total 22085 ds            
       
<-- /editTable -->

Remaining runs for new allocation

data_set_name nruns data set from MSS cleanup mu run clean fails mta run clean fails mta merge mta to NPL to MSS clean trend_trees on NPL and NCSA free space
prod_run10_mu-_c 64 110 7/2 tunb x x x x x x x x   files removed ; Merged
prod_run10_mu-_d 53 111 7/2 tund x x x x x x x x   files removed ; Merged
prod_run10_mu-_e 59 112 7/2 tunb x x x x x x x x   files removed ; Merged
prod_run11_mu-_e 267 208 7/2 tunc x x x x x x x x   files removed ; Merged
prod_run11_mu-_f 165 209 7/2 tunb x x x x x x x x   files removed ; Merged
prod_run11_mu-_g 65 210 7/2 tunb x x x x x x x x   Merged
prod_run11_mu-_h 7 211 7/1 tunb x x x x x x x x   files removed ; Merged
prod_run11_mu-_i 4 212 7/1 tunc x x x x x x x x   files removed ; Merged
prod_run11_mu-_j 3 213 7/1 tund x x x x x x x x   files removed ; Merged
prod_run11_mu+_c2 400 255 7/1 tunb x x x x x x x x   files removed ; Merged
prod_run11_mu+_d2 455 256 7/1 tunc x x x x x x s s   files removed ; Merged
prod_run11_mu_e 512 257 7/1 tund x x x x x x x x   files removed ; Merged
prod_run11_mu+_f 296 258 7/2 tunb x x x x x x x x   files removed ; Merged
prod_run11_mu_g 249 259 7/2 tunc x x x x x x x x   Merged
prod_run11_mu_h 21 260 7/2 tund x x x x x x x x   files removed ; Merged
prod_run11_mu_i 38 261 7/1 tunb x x x x x x x x   files removed ; Merged
prod_run11_mu_j 17 262 7/1 tund x x x x x x x x   files removed ; Merged
prod_run11_mu_k (misc) 28                       files removed ; Merged
prod_rn11_mu-_b244.txt 219 244 7/1 tund x x x x x
x s   checking mss (tund)
Total 2894                       246 ready for sending to mss/npl (02:37 July 2nd)
<-- /editTable -->

Scripting procedures:

  • Located in /u/ac/bkiburg/jobscripts/
    • generate_sublists.pl : will produce a job list from a list of run numbers.
    • get_runs_from_mss.pl : takes a list . Specify psi_run # and output directory in text. creates list of bad "runs"
    • cleanup_failures.pl : takes a source file and a run # . Specify raw midas directory in text . Removes midas files in list and relaunches get_runs_from_mss for the list.
    • check_directory.pl : compares sizes between all files in a list and the mss file size. creates summary list of runs with size differences. Also can be used to check the files transferred from projects --> mss
    • ncsa_jobscript.pl : this manages a list of files to analyze per node. mta version as well.
    • ncsa_compute.pl : does the analysis for one file. mta version as well
    • ls_ncsa.pl : creates lists with the files and sizes in all the output file dirs on mss
    • trend_to_npl.pl : copies the trendplot files from ncsa -> npl and then launches the hadd command to merge the files. links are also created in the master list.

Data Locations

  • In the short term, data can be found on tungsten at :
    • /usr/projects/bkiburg/run10_4/fullprod/ds_1##/
    • mu+ run10 starts with ds_151
    • /usr/projects/bkiburg/run11_4/fullprod/ds_2##/
    • mu+ run11 starts with ds_251
  • In the long term, this data will be shuttled back over to the MSS
  • I plan to retain all of the mta and trend plots if possible (read, under 2 TB)
  • Trees will be re-transferred for subsequent passes (transfer time should be the limiting factor for those analyses)

Organizational Todo List

  • Write scripts to parse output looking for rare errors
  • Find parameters to add to trend plots (from mta for now, could move to mu later)
  • prod_rn10_mu-_a1.txt: prod run 10 , mu- subset a1

>
>
Please go to https://muon.npl.washington.edu/twiki/bin/view/Main/FullProductionJune2008
 
META FILEATTACHMENT attachment="prod_rn10_mu-_a1.txt" attr="" comment="prod run 10 , mu- subset a1" date="1249591503" name="prod_rn10_mu-_a1.txt" path="prod_rn10_mu-_a1.txt" size="1800" stream="prod_rn10_mu-_a1.txt" tmpFilename="/usr/tmp/CGItemp45827" user="BrendanKiburg" version="1"
META FILEATTACHMENT attachment="prod_rn10_mu-_a2.txt" attr="" comment="prod run 10 , mu- subset a2" date="1249591807" name="prod_rn10_mu-_a2.txt" path="prod_rn10_mu-_a2.txt" size="6000" stream="prod_rn10_mu-_a2.txt" tmpFilename="/usr/tmp/CGItemp45796" user="BrendanKiburg" version="1"
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback