************************************************************************* * * RC5DES *NIX Terminal Spawner * * [Checks Load, Console, and Remote Users] * * Version : 1.0.6 * Date : 10/22/1999 * * Designed and "Coded" By: * * Brad Mertz: bphantom@xmission.com * - Automation Scripts, Crontab, and Documentation. * * Chris Grahn: grahn@eng.utah.edu * - User Load/Console/Remote Logins Checker Scripts. * * Copyright : (C) Brad Mertz/Chris Grahn 1999 * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License * as published by the Free Software Foundation; version 2, * June 1991. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details: * * http://www.fsf.org/copyleft/gpl.html * ************************************************************************* We would like to say thanks to: Jacek Radajewski - For the idea on how to do remote RSH spawning of specified terminals and not to mention the EXCELLENT disk-less template scripts! http://www.sci.usq.edu.au/staff/jacek/beowulf Team AnandTech - For the will to assimilate like mad! WoooMooo! http://www.anandtech.com/html/rc5.cfm Tavish Robinson - For a few pointers in Perl. CocaCola Company - For that needed caffeine boost at four in the morning. ------ DISCLAIMER: ***! You MUST HAVE permission to operate the Distributed.Net RC5DES ***! client on any of the machines the scripts spawn. The scripts included in this package are meant to remotely spawn RC5DES clients (http://www.distributed.net) on *NIX machines via RSH. RSH is NOT a secure program and can be compromised. For our purposes, a temp account with minimal access was used. Note: Even though these scripts are presented as-is, they are still being fine-tuned. Chris and I (Brad) would very much appreciate any feedback of modifications, suggestions or what not on how to make this better. Table of Contents: Section 1 - What Comes In This Package Section 2 - General Setup Section 3 - Explanation of the Script Procedures Section 4 - Overview of the Scripts Section 5 - Configuring of the Hostnames Section 6 - Crontab Setup and Usage Section 7 - Possible Network Congestion Section 8 - Checkpoint Files Section 9 - Additional Notes and Thoughts **** Section 1 - What Comes In This Package: - A bunch of scripts - Pre-configured RC5DES .ini files wo/email address (run -config) - Two README files for your enjoyment **** Section 2 - General Setup: Each OS (Sparc, UltraSparc, and X86) has their own home directory for operation. Since the buffer files could be incompatible, each OS gets their own set of buffers. The clients then share the buffers that are specific to their own OS. I hope that makes sense.... :) (The scripts and .ini files are already set to this) The path /rc5/ needs be changed to reflect where you extracted the files (usually your home directory). You MUST EDIT all scripts that came in this package, because they all default to this path. I was originally going to include the binaries of the Sparc, UltraSparc, and X86 RC5DES clients, but decided it would be in your (and mine) best interest to download them yourself. Here are the links to the files: http://www.distributed.net/clients.html These are the versions we used: Sparc - v2.7112.444 (Solaris 2.x, non-ultra, mt) UltraSparc - v2.7103.427 (Solaris 2.x, UltraSparc, mt) Linux - v2.7111.442 (Linux glibc 2.1, x86, mt) Untar (tar zxvf) the X86 Linux client to a temp directory and mv RC5DES /rc5/x86/rc5_x86 chmod 700 /rc5/x86/rc5_x86 Delete the rest of the files that were included with the client tarball. Do the same thing for the remaining Sparc (rc5_sparc) & UltraSparc (rc5_ultra) clients. Directory of /rc5/x86/ will show (and similar for the other dir's): drwx------ 1 users 4096 Sep 8 10:32 ./ drwx------ 2 users 4096 Oct 22 13:41 ../ -rwx------ 1 users 629232 Jul 23 03:30 rc5_x86* -rw------- 1 users 126 Oct 22 14:21 rc5_x86.ini -rw------- 1 users 43 Sep 8 10:39 x86_nodes Any variations of the file names and either the scripts will not work or you will need to modify the scripts. Once that is all completed, enter each RC5 directory and execute ./rc5_* -config (ex: rc5_x86), <1> and add your email address. You will also want to adjust the block threshold and preferredblocksize settings. Save the configuration when you are finished and copy the rc5_*.ini file to /rc5/backup/rc5_*.ini . You are now ready to proceed. **** Section 3 - Explanation of the Script Procedures: - Execute one of the OS specific spawners (auto*_up) - Grab's a hostname from the host lookup file (*_nodes) - RSH into hostname and execute the Terminal Load Checker script (*_script) - Terminal Load Checker script checks User Load, Console Login, and Remote Logins - If any specified parameter (load to high, user on Console, to many remote logins) is exceeded, script aborts and exits. - If specified parameters are meant, script executes RC5DES. For the LAB environment Chris and I were using, guidelines had to be set on how the clients were to be used. The Sparc's were occupied/utilized on a low/medium basis. The UltraSparc's were occupied/utilized almost all the time. The X86 systems were brand new to the Lab, so public usage was restricted. Since the Sparc and UltraSparc's could be occupied/utilized at any time, the Terminal Load Checker script must be run every half hour. This meant the RC5DES client must run for 29 minutes and shutdown. A minute later, a cronjob would restart the Terminal Load Checker script which would determine if it was safe to reload the client. The X86 boxes could easily run for a full hour at which point they shutdown and be rechecked. The whole half hour shutdown process was pretty simple with the use of the exitrc5.now control file. Once the file is generated all operating RC5 clients immediately shutdown. Very slick! **** Section 4 - Overview of the Scripts: Manual testing of the scripts is crucial before setting up your crontab. The /rc5/automated/ directory contains all the scripts you need to play with: These first three files start RSH sessions, which executes their Terminal Load Checker (*_script) counterparts: -rwx------ autosparc_up* - Startup script for Sparc RC5 clients -rwx------ autoultra_up* - Startup script for USparc RC5 clients -rwx------ autox86_up* - Startup script for X86 RC5 clients The flush_all starts RSH sessions for each OS type. Each OS then runs their corresponding flush_* (example: flush_x86) script: -rwx------ flush_all* - Client buffer update > spawns next three -rwx------ flush_sparc* - Sparc buffer update script -rwx------ flush_ultra* - USparc buffer update script -rwx------ flush_x86* - X86 buffer update script Both scripts create exitrc5.now files in the clients directories. They may seem redundant unless you have clients which run longer than others (which is what we did - 30 min for Sparc & USparc, 60 min for X86): -rwx------ fullhr_shutdown* - Client shutdown for **:59 -rwx------ halfhr_shutdown* - Client shutdown for **:30 These are the main scripts that determine if the client is able to load or not. If they fail any of the set parameters, the script terminates and exits its RSH session. -rwx------ sparc_script* - Load Checker Script for Sparc's -rwx------ ultra_script* - Load Checker Script for USparc's -rwx------ x86_script* - Load Checker Script for X86's **** Section 5 - Configuring of the Hostnames: In each OS home directory, there is a file called *_nodes (ex: /rc5/x86/x86_nodes). This is the file auto*_up looks up for hostnames it can start the client on. Lets say I have eight X86 Linux boxes with hostnames: zip-01 through zip-08, zip-03 and zip-04 can not have RC5 loaded. So I edit /rc5/x86/x86_nodes and add: zip-01 zip-02 zip-05 zip-06 zip-07 zip-08 Be sure not to add machines that are incompatible with the installed client binary (i.e.: accidentally adding a Sparc hostname into the X86 node list is not good). **** Section 6 - Crontab Setup and Usage: Please read README.CRONTAB for more information on how to setup the crontab. At **:00 or **:30, a script (auto*_up) is run which starts up the RC5DES clients. At **:29 or **:59 minutes, a script (halfhr_shutdown or fullhr_shutdown) is run which creates a exitrc5.now in the specified clients home directory. The clients immediately (quite fast in fact!) shutdown operations and exit. For example (this is the default setup for the scripts): I want all OS clients to start up at **:00, but I also want the Sparc's and UltraSparc's to restart at **:30. So I edit the crontab to use 0,30 * as startup times for the autosparc_up and autoultra_up scripts. I edit the halfhr_shutdown script to "touch /rc5/sparc/exitrc5.now" and "touch /rc5/ultra/exitrc5.now". All OS clients must shutdown at **:59, so I add a touch exitrc5.now line for each client directory. Since there can be over 100 machines munching on blocks, I added a flush_all script which runs every **:50. You want to make sure the buffers are updated (fetch/flush) before the clients are shutdown and restarted. This is very important because I once had at least four machines all trying to fetch/flush when they were restarted! I wound up with locked buffers and lost completed blocks. ARGH! **** Section 7 - Possible Network Congestion: As you can imagine, spawning around 115 computers almost instantaneously of each other, there is a LOT of network activity (not to mention server activity) for the first 15-30 seconds! Now depending who's network this is, you may get your butt kicked. :) If "scheduled" network utilization is not a problem, then you may run into another problem. With over 100 RSH sessions opening all at the same time, various errors would pop up. From time to time it would be RSH login errors, unavailable receiving ports (can't remember the exact error), and a few other errors. To alleviate this problem, the sleep command was used to pause the script (auto*_up) just long enough to allow the remote login to occur and startup the RC5 client. Usually a second or two was inserted and cured most of the problems. Unfortunately this brings up another problem: 80 Sparc's with a two second delay, means the final client will be almost three minutes late in starting up and only have about 25 minutes of operation (damn :))! So you need to do some calculating of your available hardware: ## Type KeyRate ## Combined Utilization ------------------------------------------------------------ 11 X86 boxes : 1.1Mkey/s x 11 = 12.1Mkey/s Low Usage 25 UltraSparc's : 550kkey/s x 25 = 13.75Mkey/s High Usage 80 Sparc's : 60kkey/s x 80 = 4.8Mkey/s Medium Usage This means my X86 boxes are the most critical for getting started. So I setup a cronjob with a start time of **:00 and place a two second pause between each host (autox86_up). The UltraSparc's are usually in use, which means the Terminal Load Checker is going to fail most of the time. So they are started **:01 & **:30 and with one second of delay (autoultra_up). The Sparc's in quantity add up the keyrate and usually have a low/medium usage, except it would take forever to spawn 80 Sparc's. So lets start them ALL instantaneously at **:01 & **:30 and take our chances with RSH errors. **** Section 8 - Checkpoint Files: With the clients starting and shutting down up to 48 times a day, checkpoint files are critical! Checkpoint files are enabled by default by the *_script files. This makes sure that when the clients come back online, they will start working on the same block they were originally doing before shutdown. Works like a charm! Note: If you decide to redo the .ini file, make sure the following line is set (or deleted all together) in the rc5_*.ini files: noexitfilecheck=0 If this is set to =1, the exitrc5.now will fail and the shutdown scripts will no longer function. **** Section 9 - Additional Notes and Thoughts: When the auto*_up scripts start up, they will spawn many RSH sessions which will be reported in ps. These will exist until the RC5 client shutdowns and exits the RSH login. If you log into one of the remote computers when RC5 is running, a ps may show a defunct pid and what not. This seems perfectly normal (...in the eye of the beholder) and they will clear out when the RC5 client shutdowns. Kill(ing) the pids of RSH on the host machine before the RC5DES client finishes WILL NOT stop RC5 on the remote computer. The quickest way to bring down the node(s) is using the exitrc5.now method. Enjoy!