Cosmos, Eric Index, Universe |
Jan 28 05:08:29 eric unix: WARNING: [AFT1] EDP event on CPU0 Data access at TL=0, errID 0x0014cf26.3d155b76 Jan 28 05:08:29 eric AFSR 0x00000000.00404000<EDP> AFAR 0x00000002.de45a4c0 Jan 28 05:08:29 eric AFSR.PSYND 0x4000(Score 95) AFSR.ETS 0x00 Fault_PC 0x100211860 Jan 28 05:08:29 eric UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00 Jan 28 05:08:29 eric unix: [AFT2] errID 0x0014cf26.3d155b76 PA=0x00000002.de45a4c0 Jan 28 05:08:29 eric E$tag 0x00000000.09c05bc8 E$State: Modified E$parity 0x04 Jan 28 05:08:29 eric unix: [AFT2] E$Data (0x00): 0x3fa57bfc.eda65795 *Bad* PSYND=0x4000 Jan 28 05:08:29 eric unix: [AFT2] E$Data (0x08): 0x3fa77bfc.eda65795 Jan 28 05:08:29 eric unix: [AFT2] E$Data (0x10): 0x3fa77bfc.eda65795 Jan 28 05:08:29 eric unix: [AFT2] E$Data (0x18): 0x3fa77bfc.eda65795 Jan 28 05:08:29 eric unix: [AFT2] E$Data (0x20): 0x3fa77bfc.eda65795 Jan 28 05:08:29 eric unix: [AFT2] E$Data (0x28): 0x3fa77bfc.eda65795 Jan 28 05:08:29 eric unix: [AFT2] E$Data (0x30): 0x3fa77bfc.eda65795 Jan 28 05:08:29 eric unix: [AFT2] E$Data (0x38): 0x3fa77bfc.eda65795 Jan 28 05:08:29 eric unix: [AFT2] errID 0x0014cf26.3d155b76 AFAR was derived from E$Tag Jan 28 05:08:29 eric unix: NOTICE: Scheduling clearing of error on page 0x00000002.de45a000 Jan 28 05:08:29 eric unix: [AFT3] errID 0x0014cf26.3d155b76 Above Error is in User Mode Jan 28 05:08:29 eric and is fatal: will reboot Jan 28 05:08:29 eric unix: WARNING: [AFT1] initiating reboot due to above error in pid 23908 (l502.exe) Jan 28 05:08:41 eric unix: NOTICE: Previously reported error on page 0x00000002.de45a000 cleared Jan 28 05:08:43 eric unix: pseudo-device: pm0 Jan 28 05:08:43 eric unix: pm0 is /pseudo/pm@0 Jan 28 05:08:44 eric syslogd: going down on signal 15 Jan 28 05:09:30 eric unix: syncing file systems... Jan 28 05:09:32 eric unix: done Jan 28 05:16:15 eric unix: ^MSunOS Release 5.7 Version Generic_106541-42 64-bit [UNIX(R) System V Release 4.0] Jan 28
-- call no 3737 6109
On the evening on 14th Eric just suddenly stopped talking to me --- and no login from anywhere was possible. On 15th tried the console --- no luck. Got the ok prompt though and syncced disks, got:
dumping to c0t0d0s1 Interrupt bitset after 10 seconds card/firmware failure [repeated four times] Fast Data Access MMU MissThe booted at the ok prompt and got:
WARNING forceload of misc/md_trans failure WARNING forceload of misc/md_raid failure WARNING forceload of misc/md_hostspares failureAll seems ok though...
...to be a local-queue-only system (no old Galaxy) --- see the separate document.
Made some changes to /etc/inetd.conf so that at next boot in.talkd, in.fingerd and in.uucpd will be blocked at the service/inetd level (in addition to at IP Filter level and router level).
For details see Security Journal.
Removed LSF (pkgrm SUNWlsf) from Eric as the license has expired and we need the space (/opt is part of /).
Formatted the second new "scratch" disk, c2t2d0s6 and mounted on /export/simonh for want of something more appropriate for now.
Formatted the "spare" disks, c0t3 and c2t3 and mounted on /export/little_star* (Twinkle, twinkle, little star, how I wonder what you are...).
Editted /etc/vfstab to ensure all gets remounted at next boot.
Finally managed to get SUNWhpc installed and running on Eric with Ian. It would not install and work! Sun's support people were worse than useless, suggesting the problem was out LDAP authentication which we proved was not the case after an install on mir.csu (Simon's Solaris7/openldap machine).
Files:
/opt/SUNWhpc /etc/init/sunhpc.*and associated links in /etc/.
For reasons best known to itself SUNWhpc would not install
Replaced the two 9Gb disks concatenated into /scratch (c0t2d0 and c2t2d0s0) with two 36Gb disks.
/scratch would not umount so determined which processes were using the slice using ~mpciish2/bin/lsof:
lsof | grep scratch | awk -F" " print... | sort | uniqand killed said processes then umount was successful. Swapped the disks and checked partition table with format and finally used newfs and mounted.
Edited /etc/opt/SUNWmd/md.cf and md.tab to reflect changes.
Edited /etc/vfstab to reflect changes.
Files:
/usr/opt/SUNWmd/* /etc/opt/SUNWmd/*
MechEng
Given email from John Chinn to this effect, have determined which appear to be dead mecheng accounts and have tarred, gzipped them and moved to /export/umist/simonh* and deleted entries from /etc/passwd, shadow and auto_home.
Civil
Based on the names given in the entry for 02/11/22 in this journal, am tarring and gzipping home-dirs of said civil engineers to /export/umist/simonh* and removing the home-dirs themselves. Have deleted entries from /etc/passwd etc.
In stalled tcpdump from sunfreeware binary to investigate problems. It complained that libcrypt0.so.??? was missing. Google showed that this was part of openssh/openssl. Investigation of the OpenSSH and OpenSSL binaries on sunfreeware.com contained said library and that it was already installed on Cosmos (but not Eric) --- the OpenSSL package, which installs itself in /usr/local/ssl contained it, so adding /usr/local/ssl/lib to LD_LIBRARY_PATH solved the problem and had a good play with tcpdump.
Edited /etc/syslog.conf on and restarted (kill -HUP) syslogd on both Cosmos and Eric so logs are now copied to Gresh's logserver. Added this
# gresh's log server : *.info @130.88.???.???to /etc/syslogd.conf. N.B. that whitespace consists of tabs, not blankspaces --- use the wrong one and an error message appears in /var/adm/messages (cf.
eric syslogd: line 44: unknown priority name "info @130.88.120.194"which clearly shows that syslogd is getting its knickers in a twist...
Accounts listed below were removed. Also, chemistry accounts in /export/umist were moved to /export/chem.
Envelope-to: [email protected] Delivery-date: Mon, 09 Dec 2002 16:09:06 +0000 From: "Steven Y Liem" <[email protected]> To: <[email protected]> Subject: RE: eric accounts Date: Mon, 9 Dec 2002 16:08:59 -0000 Content-Type: text/plain; charset="us-ascii" X-Priority: 3 (Normal) X-MSMail-Priority: Normal In-reply-to: <[email protected]> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Importance: Normal X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *18LQSr-0007Fo-00*dAjrC1N2GiQ* X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *18LQSv-0005kG-00*EcGy3ra7Jxw* Hi Simon, The following accounts can be savely removed: Mcdas01, mcdas02, mcdst03, mcdst04, mcdap01, mcdst00, mcdsskl, mcdssmpi, mcdssjc, mcdsslvw, mcdigfa2. Cheers Steven
See this for details.
Shutdown access to Eric via in.rexecd by commenting out in.rexecd entries in /etc/hosts.allow (have default-deny in /etc/hosts.deny). Sent reminder email to same people as below.
Envelope-to: [email protected] Delivery-date: Tue, 26 Nov 2002 14:26:27 +0000 From: Dr Simon Hood <[email protected]> To: patrick.o'[email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], Subject: rexec, telent and ssh to eric Reply-to: [email protected] Date: Tue, 26 Nov 2002 14:26:16 +0000 X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *18GgfI-0005cj-00*o95zl2UGVZ6* X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *18GgfT-0000cw-00*jDz8szPBPIM* Hi All, over the coming weeks and months security on Eric will be tightened significantly. Main changes: 1. The r-commands will we disabled. This means that eXceed users must change their configuration to use a method other than rexec (or rsh, rlogin). For the present telnet is the simplest option to use. 2. telnet and ftp access will soon be disabled from outside .umist.ac.uk --- ssh and scp are available instead and are more secure. If you have questions regarding these please email me. (Note that you will need a recent ssh client, and this must be configured to use keyboard-interactive authentication.) Please reconfigure your eXceed installation so that it does not use an r-command! Regards Dr Simon Hood, ISD.
First steps of plan developed to get IP Filter on Cosmos and Eric with default-deny; also replace telnet and ftp with ssh and scp. For details of the evolving plans see this.
Envelope-to: [email protected] Delivery-date: Fri, 22 Nov 2002 11:35:07 +0000 From: "Steven Y Liem" <[email protected]> To: <[email protected]> Subject: RE: eric dead accounts Date: Fri, 22 Nov 2002 11:34:57 -0000 Content-Type: text/plain; charset="us-ascii" X-Priority: 3 (Normal) X-MSMail-Priority: Normal In-Reply-To: <[email protected]> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Importance: Normal X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *18FC5I-0004bm-00*Ea/fFyUQPIg* X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *18FC5T-000319-00*YKbOIHJQBHI* Hi Simon, I think the following accounts can be safely removed: Mcdas01, mcdas02, mcdst03, mcdst04, mcdap01, mcdst00, mcdsskl, mcdssmpi, mcdssjc, mcdsslvw, mcdigfa2. The last 5 accounts should be backed up before removing. Regards, STeven
Based on info from Prof Burdekin began process or backing up and deleting the following accounts from Eric:
Prof Burdekin said... "Of the persons listed in your e-mail in my group the following have left UMIST and can be deleted from future use - if they have any files left on the system please let me know as these must be backed up." So... kbytes mcgidkpk: /export/umist --> civil, 569816, K.P. Kou, mcgihkk4: /export/umistmisc --> civil, 3024849, K. Kuntiyawichai, mcgijis2: /export/civil1 --> civil, 712342, I. Sbokos, mcgizsu2: /export/civil, 6, S. Ucsnik, mcgiztd2: /export/civil, 32, T. Dohr, mcgsswz2: /export/civil, 9065021, W. Zhao.Moved accounts all to /export/civil and disabled each via hacking entry in /etc/passwd.
Copied /var/yp from Cosmos to Eric and adjusted as seemed appropriate. Added some sensible contents to /var/yp/ypfiles/passwd and auto_home and security/passwd.
Started NIS/YP on Eric by calling
/lib/netsvc/yp/ypstartand we got
starting NIS (YP server) services: ypserv ypbind ypxfrd \ rpc.yppasswdd rpc.ypupdated done.which resulted in
/usr/ucb/ps auxww | grep yp root ... /usr/lib/netsvc/yp/ypserv -d root ... /usr/lib/netsvc/yp/ypbind root ... /usr/lib/netsvc/yp/ypxfrd root ... /usr/lib/netsvc/yp/rpc.yppasswdd -D /var/yp/ypfiles -m root ... /usr/lib/netsvc/yp/rpc.ypupdatedwhich is the same as Cosmos so looks well.
/var/yp/Makefile failed as ypservers: no such map or words to that effect. A quick look on google added a ypservers map thus:
cd /var/yp echo eric eric | makedbm - tmpmap mv tmpmap.dir yp.eric.umist/ypservers.dir tmpmap.pag yp.eric.umist/ypservers.pagthen
[root@eric yp]# ypcat ypservers eric [root@eric yp]# touch ypfiles/auto_home ypfiles/passwd \ ypfiles/security/passwd.adjunct [root@eric yp]# /usr/ccs/bin/make updated passwd ...Good! Removed all evidence of mpciish2 from /etc/passwd, shadow and auto_home and found could not login as mpciish2. Added nis to passwd entry in /etc/nsswitch.conf and all is well --- can log in as mpciish2.
Envelope-to: [email protected] Delivery-date: Fri, 15 Nov 2002 12:48:26 +0000 From: "Steven Y Liem" <[email protected]> To: <[email protected]> Subject: RE: eric accounts Date: Fri, 15 Nov 2002 12:48:11 -0000 Content-Type: text/plain; charset="us-ascii" X-Priority: 3 (Normal) X-MSMail-Priority: Normal Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 In-Reply-To: <[email protected]> X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *18CftQ-0000OA-00*avBW9OFXohA* X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *18Cfta-0007VF-00*6RjdqRzU3iI* Hi Simon, This is to confirm what we discussed on the phone, that you can remove mcdsswkd from eric's disk. Regards, Steven
Renamed /var/nis/NIS_COLD_START so that /etc/init.d/rpc does not start NIS+ on an Eric (re)boot.
Problem with mounting shared (exported) eric filesystems on clients (which do have permission to do this). Original entry in /etc/nsswitch.conf for RPC was nisplus [NOTFOUND=return] files, now just files, seemed likely, having taken out NIS+ yesterday, that needed to restart RPC and/or nfsd and related stuff. Did so; problem solved. Details follow...
Initial problem:
mount eric:/export/mecheng/<username> /home/<username> nfs mount: eric:/export/mecheng/mcjifmta: server not responding : RPC: Timed out nfs mount: retrying: /home/<username>Restarted /usr/sbin/rpcbind on server (eric) and then got
mount eric:/export/mecheng/mcjifmta /home/mcjifmta nfs mount: eric: : RPC: Program not registered nfs mount: retrying: /home/mcjifmtaon a Solaris client and
mount eric.umist.ac.uk:/export/umist/isd/mpciish2 /mnt/eric mount: RPC: Unable to receive; errno = Connection refusedon a Linux client. Google suggested killing nfsd, restarting mountd and then starting nfsd again, so given
/usr/ucb/ps auxww | grep -i nfs root 252 ... /usr/lib/nfs/lockd daemon 253 ... /usr/lib/nfs/statd root 504 ... /usr/lib/nfs/mountd root 506 ... /usr/lib/nfs/nfsd -a 16did
kill 506 kill -HUP 504...oh shit, mountd disappeared, so
/usr/lib/nfs/mountd /usr/lib/nfs/nfsd -a 16 /usr/ucb/ps auxww | grep -i nfsleaving
root ... /usr/lib/nfs/mountd root ... /usr/lib/nfs/nfsd -a 16 root ... /usr/lib/nfs/lockd daemon ... /usr/lib/nfs/statdAll sorted it now appears.
Copied contents of passwd.org_dir (NIS+ map) to /etc/passwd and put suitable place-holders in /etc/shadow. Ensured contents of /etc/nsswitch.conf had nisplus entries removed (commented out). Then studied contents of /etc/init.d/rpc and found that needed to stop rpc.nisd, nis_cachemgr and rpc.nispasswdd. Did so. Seemed ok.
Envelope-to: [email protected] Delivery-date: Fri, 08 Nov 2002 17:07:33 +0000 From: "Steven Y Liem" <[email protected]> To: <[email protected]> Subject: RE: eric disk space Date: Fri, 8 Nov 2002 17:07:30 -0000 Content-Type: text/plain; charset="us-ascii" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Importance: Normal In-Reply-To: <[email protected]> X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *18ACbP-0004gC-00*xjYikpUGu6A* X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *18ACbU-0008T2-00*JYZ6OPim4U.* Hi Simon, I believe the following users should be retained on eric: mcdapyl mcdap00 Mcdapjc Mcdijuc2 Mcdi7ps4 Mcdi7bn2 Mcdaspm mcdstpp mcdst00 I am still waiting for some of the supervisors to confirm whether it is OK to remove other usernames. Also, bsub/bjobs doesn't seem to work anymore. I have a project student who requires it to run mpi jobs. By the way, when is eric going to be reconfigured and will it have a dedicated queue for parallel jobs ? Regards, Steven
Envelope-to: [email protected] Delivery-date: Tue, 12 Nov 2002 13:30:05 +0000 From: "Professor F.M.Burdekin" <[email protected]> To: [email protected] Date: Tue, 12 Nov 2002 13:29:56 -0000 Content-type: text/plain; charset=US-ASCII Subject: Re: eric reconfiguration Reply-to: [email protected] CC: [email protected] X-Confirm-Reading-To: [email protected] X-pmrqc: 1 Priority: normal In-reply-to: <[email protected]> X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *18Bb79-0001d7-00*4oJ8jBROQk.* X-Scanner: exiscan for exim4 (http://duncanthrax.net/exiscan/) *18Bb7F-0005pZ-00*NTRV4TzZ5OM* Dear Dr Hood, Of the persons listed in your e-mail in my group the following have left UMIST and can be deleted from future use - if they have any files left on the system please let me know as these must be backed up. K.P. Kou, K. Kuntiyawichai, I. Sbokos, S. Ucsnik, T. Dohr, W. Zhao. The following persons are still at UMIST and must be given continued access: F.M. Burdekin, K.C. Leong, Y. Tkach, E. Aja de Retana, N. Cunliffe, E. El Dardiry. Please note that I purchased an additional disk specifically for the use of my group, and E aja de Retana in particular, and that disk is not to be made available to persons outside my group. Regards, Michael Burdekin Professor F.M. Burdekin Tel No. 0161-200 4600 Fax No. 0161-200 4601
For authentication added these lines to /etc/pam.conf:
sshd2 auth sufficient /usr/lib/security/pam_unix.so.1 sshd2 auth required /usr/lib/security/pam_ldap.so.1 try_first_pass
For sshd the important lines in /etc/ssh2/sshd2_config are
AllowedAuthentications keyboard-interactiveand
AuthKbdInt.Required pam
The details are as for Cosmos
Changed eric from half-duplex to full-duplex networking --- or, rather, Pete Smith did by reconfiguring the switch port into which Eric is plugged. Eric simply reported as a message that the network had changed. Simple.
Migrated Yun Steven Liem, mcdapyl to mcdssyl on Eric. Backed up files to /scratch, used Solstice to create new user and moved old files to new home-dir (well, renamed it). Easy-peasy.
Patched Eric with the recommended patch cluster for Recommended and Security patches from sunsolve.sun.com.
Problems:
First needed to rm some stuff from the / partition to create space.
Then:
-- got repeated failure of patches as summarised by /var/sadm/install_data/Solaris_7_Recommended_log and detailed in /var/sadm/patch/*/log such as pkgadd: ERROR: checkinstall script did not complete successfully -- a search on google suggested that when installing a patch, the Solaris 2.5+ patch installation procedure will execute the script "checkinstall" with uid nobody. If any of the patch files cannot be read by nobody or if any part of the path leading up to the patch directory an error similar to the following will appear: -- it turned out that the directory from which I was installing the patches was not readable by nobody (e.g., su nobody, then pwd to check this) so I changed permissions of the dirs in the path and all was well...patches installed.
As at 2002 Nov 05 1347, /etc/nsswitch.conf looked like this.
eUMISTyfied Eric with nisplus still in place --- followed what I did for Cosmos exactly and can now authenticate to mpciish2 with both local (nisplus) and LDAP/eUMIST passwords.
Outline:
-- use Cosmos as a template for eUMISTifying Eric: -- copy over libs and make (and remove) s-links as necessary: -- lib -- copy over and/or edit config files as necessary: -- /etc/ldap.conf -- /etc/nsswitch.conf -- /etc/pam.conf -- as at 2002 Nov 04, 0920, still have nisplus (not nis) but have successfully eUMISTifyied eric: can login with mpciish2 with both local (nisplus) password and eUMIST password!