model restarts logged for Tue 18/Mar/2014
2014_03_18 11:27 h1susetmx
2014_03_18 11:53 h1susetmx
2014_03_18 12:03 h1susitmx
2014_03_18 12:32 h1susitmy
2014_03_18 12:51 h1susetmx
2014_03_18 12:52 h1susitmx
2014_03_18 12:55 h1susitmy
2014_03_18 13:01 h1susbs
2014_03_18 13:05 h1suspr3
2014_03_18 13:07 h1sussr3
2014_03_18 13:08 h1susetmy
2014_03_18 13:12 h1dc0
2014_03_18 13:14 h1broadcast0
2014_03_18 13:14 h1fw0
2014_03_18 13:14 h1fw1
2014_03_18 13:14 h1nds0
2014_03_18 13:14 h1nds1
2014_03_18 13:21 h1broadcast0
2014_03_18 13:27 h1broadcast0
2014_03_18 16:22 h1iopsush2a
2014_03_18 16:22 h1susprm
2014_03_18 16:24 h1suspr3
2014_03_18 16:25 h1susmc1
2014_03_18 16:27 h1susmc3
All restarts expected. h1broadcaster multiple restarts due to memory issues.
Large number of restarts due to Tuesday Maintenance Work
One fan has been secured and there is only one fan running now at X end.
I turned off three gowning rooms at ~10:40 am (2 in the VEA, 1 at the entrance)
Ski turned off one of the two HVAC fans at 10:57.
The TMS lab is still operating - I will ask Corey/Keita about this.
For the first time since the injection path rework on friday, we have the arm cavity locking reliably. It seems like our difficulties locking for the last several days really were due to seismic motion, not some probelm we created while reowrking the path. I breifly has YAWWFS working tonight, and saw transmission up to 1000 counts with the cavity locked stably.
The IMC is also now locking stably, although I can't do the handoff. Since we've had so many problems since our maintence today, I'm not going to try to chase down why the handoff doesn't work until some of these other problems are fixed.
Leaving the arm cavity locking, no WFS on.
The links to filter modules from the quad drive align screens don't work, except for M0. The M0 links are (for example) H1SUSITMX_M0_DRIVEALIGN_L2L.adl while the lower stages link to H1SUS-ITMX_L1_DRIVEALIGN_L2L.adl
The H1SUSITMX screens exist in /ligo/rtcds/lho.h1/medm/h1susitmx, the H1SUS-ITMX ones don't.
I'm assuming that this worked in the past, since Keita and Arnaud have been loading filters in these filter banks.
For some reason (maybe related to the CDS boogie man???) the supervised guardian nodes (ie. the nodes running under the guardian supervision infrastructure on h1guardian0) are unable to talk to any of the h1ecatx1 TwinCat IOC channels.
Sheila first noticed this when guard:ALS_XARM was not able to connect to the H1:ALS-X_LOCK_ERROR_STATUS channel. The truly weird thing, though, is that all other channel access methods can access the h1ecatx1 channels just fine. We can caget channels from the command line and from ezca in python. We can do this on operator workstations and even from the terminal on h1guardian0. It's only the supervised guardian nodes that can't connect to the channels.
I tried reloading the code, restarting the guardian nodes, nothing helped. The same problem regardless of which node was used. Note:
I'm at a loss.
There's clearly something different about the environment inside the guardian supervision infrastructure that makes this kind of failure even possible, although I honestly have no idea what the issue could be.
I'm going to punt on trying to spend any more time diagnosing the problem, since I'm just going to chalk it up to the other weirdness. Hopefully things will fix themselves tomorrow.
some of the digital video cameras used to have names, that appeared in on the links in the H1VIC_DIGITAL_OVERVIEW.adl screen.
The names disappeared sometime this afternoon, maybe related to the deleted files?
The overview is one of the files that went missing. I had an older copy saved elsewhere which I restored so as to provide access to the cameras, but this does not have the (later) name updates. Once the backup copies are retrieved from tape, I should be able to restore the latest version. It was also an oversight that these were not in SVN, which will also be corrected.
I have restored the latest version of the camera screens, they have also been added to the useraps repo in cds/h1/medm, and symlinked in the /opt/rtcds/lho/h1/medm/cds area.
J. Kissel, S. Dwyer, D. Sigg, J. Rollins, A. Pele, H. Radkins, D. Gustafson, F. Clara, C. Reed, J. Batch, D. Barker So, maintenance day was awesome. In summary, we had two problems that took us 4 hours to identify: (1) The h1sush2a IOP model's DAC enable bit had gone bad after Sheila's restart of the h1suspr3 model. This resulted in the mode cleaner misalignment, because alignment offsets were not getting out to the SUS. This has happened several times before, see for example LHO aLOGs 10375, or 8964. (2) Some of the /opt/rtcds/lho/h1/ directory structure has disappeared for an as yet unknown reason. This resulted in the mode cleaner blowing up the FSS every time it tried to LOCK. The story, in chronological order for the record: - Kiwamu and I flip HXTS signs, store new safe snapshots, and leave with the IMC locked. (see LHO aLOG 10837) - Sheila installs optical lever BLRMS into PR3 by restarting the h1suspr3 front end code, which is on h1sush2a computer. (see LHO aLOG 10837) - Restarting the SUS front end model trips the HAM2-ISI watchdogs (but NOT HPI), because the ISI loses communication with the SUS. - Found the IMC TRANS and REFL cameras appearing grayed out, appearing to be broken / not reporting real data - Sent Fil out to check if the camera's analog path has been screwed up some how. Confirms OK. - Fil launched Cyrus and had him reboot the camera server processes on the relevant cameras and there's no affect. - Sheila, Arnaud, Hugh, and Daniel launch on the "has the alignment changed since we last had a good lock, and to what SEI trips does the change correspond" DataViewer trending game. Red herrings everywhere, Daniel finally identifies that the IMC is just misaligned. The cameras look bad because of some automatic exposure or gain or aperture were searching for light and found none. - Jim restarts the DAQ (see LHO aLOG 10832) - Jamie identifies the h1sush2a IOP model's DAC enable bit had gone bad, because of his previous experience with the problem. We all look at the h1sush2a's IOP CDS bit word (one bit in the middle of a bit word, that is un-intuitively related to MC1 and MC3), and find, sure enough, the bit is red. - Talk to Jim who says the way to solve the problem is to kill all user front-end processes, and restart the IOP process, and restart all the user processes. - Performing Jim's FIXIT works, and we can drive out of MC1 and MC3 (and the rest on h1sush2a, PRM and PR3) again. - We used the guardian to bring each SUS back into the aligned state, and turned on the IMC guardian. - The IMC aligns, but is constantly flashing, breaking the FSS lock constantly during attempts to lock. - After bringing up all the front end processes, while the DAC enable problems disappeared, we were left with IPC errors in the CDS bit words for 3 of he 4 SUS. - Looking to clear these errors because they're a suspect, we tried opening the GDS TP screens to hit the "Diag Reset" button, only to discover that MEDM throws and error when SUS GDS_TP screens are called, complaining they don't exist. - Looking into the /opt/rtcds/lho/h1/medm/ folder, we find that *all* SUS, and several more automatically generated MEDM directories have disappeared. - Launched Jim and Dave to investigate, continuing to explore why the IMC won't lock (see LHO aLOG 10850) - Sheila and I toggle switches, BURT restore h1lscepics, h1ascimcepics, h1ecatc1plc2 to 2014-03-18 14:00 PT with no affect - Sheila checks analog IMC error signal (demod phase) and cabling at ISC racks. Everything looks great. - Jim, beginning to replace all of the missing MEDM files, begins systematically make-installing all missing models. - When he gets to h1susmc2 BLAMO -- the IMC locks right up. - Dick titles the aLOG. As I said, still unclear why all of these files when missing, but some ~200 files were apparently *non-MEDM* files, and *some* of those files were essential for the IMC to lock. YAAAAAAAY MAINTENANCE DAY YAAAAAAAAY. The investigation continues on why we lost files from /opt/rtcds continues, in the mean time, Jim has reinstalled all models with the missing files, and things appear functional, but we've test little else than the IMC.
A massive number of files were deleted from /opt/rtcds under mysterious circumstances this afternoon between 1:20 and 4:20 PM PDT. We checked for intruders, and inquired about creative or rogue scripts, but found nothing. The largest number of files were deleted from /opt/rtcds/lho/h1/medm. To restore the medm files, make install was performed on the following models: h1iopsusb123, h1susitmy, h1susbs, h1susitmx, h1susmc1, h1susmc3, h1susprm, h1suspr3, h1iopsush2b, h1susim, h1susmc2, h1suspr2, h1sussr2, h1sussr3, h1sussrm, h1susomc, h1iopsusauxb123, h1susauxb123, h1iopsusauxh2, h1iopsusauxh34, h1iopsusauxh56, h1susauxh56, h1iopseib1, h1iopseib2, h1iopseib3, h1hpiitmx, h1iopseih16, h1hpiham6 h1isiham6, h1hpiham3, h1isiham2, h1isiham3, h1iopseih45, h1hpiham4, h1hpiham5, h1isiham5, h1ioppemmy, h1pemmy, h1ioppsl0, h1pslfss, h1pslpmc, h1psldbb, h1iopoaf0, h1peml0, h1tcscs, h1odcmaster, h1omc, h1iopasc0, h1ascimc, h1sushtts, h1ioppemmx, h1pemmx, h1iopsusey, h1iopsusex, h1iopseiey, h1hpietmy, h1isietmy, h1isietmx, h1iopiscey, h1pemey, h1iscex, h1odcy, h1odcx, h1iopsusauxey, h1iopsusauxex, h1susauxex. Additionally, h1susetmx required a make, make install to be performed because the h1susetmxepics directory was missing from the build, even though it had been built shortly after noon today. The SITEMAP.adl file or symbolic link was missing from medm, so a symbolic link was created to the SITEMAP.adl file in userapps. Links to other adl files were missing as well. We do have a complete list of missing files and will evaluate which files should be restored from backup tapes tomorrow.
I was logged in to see why the system wasn't running, and the machine crashed. It restarted and I burt restored all three pls (h1ecatx1plc, 2, and 3) to 14:00 today.
We are currently investigating a loss of files from the h1boot /opt/rtcds file system. It appears to have happened between 1pm and 4:20pm (when it was discovered). We are restoring the files both from tape backup and by reinstalling from model RCG install scripts. We are investigating the cause of the loss.
(Corey, Jax, Keita)
Continue with getting a GREEN beam onto the TMS table. We had to make more position changes for the GREEN Top Periscope Mirrors (this required using longer screws for the top dog clamps [see photo]). Then most of the time was spent pointing the beam onto TMS. Once on the TMS, the beam was then pointed onto the QPDs first with a TMS Periscope mirror, but most of it was via ISCTEY mirrors by Jax. We went through several iterations of tweaking to stay centered on the mirrors and to also have a beam nice & centered on the ISCTEY periscope mirrors (this involved some nap time in the chamber while Jax did all the work outside [see photo]).
Through this alignment work, we could see the retro-reflected beam off the ETM on the TMS Telescope Mirrors. On the Secondary mirror the spot looks to be low and little off in yaw. To fix the pitch, we will need to pitch the TMS "back" via the slider weights under the table.
Additionally, during this alignment work, the QPD servo was engaged & is operational!
We aligned the ALSY green beam to the TMSY, and confirmed that the QPD servo works.
Tomorrow we'll rebalance the TMSY to get the ALS beam retro-reflected off of the ETMY.
Filiberto & Jeff The Apollo crew spotted a frayed nylon tag line in the cable tray over HAM5 (good catch Apollo). This tag line has been removed from over the HAM5 chamber, and will be replaced before any more cable pulling is done in this area.
This afternoon, while adjusting the AOSEMs on the H1-SR3 Intermediate Mass, the UL Magnet/Dumbbell assembly was knocked off by making contact with the ceramic boards holding the PD/LED. We will reattach a new Magnet/Dumbbell assembly tomorrow.
Mitchell, The two monolithic plates have been assembled and the 0-1 blade spring posts have been added. A cleaning in the granite table clean room has been set up. Apollo will be working with C&B tomorrow to wrap and deliver the optical tables which are next up for a helicoil makeover.
Sheila, Jeff, Jim
We have added OpLev BLRMS to the suspension models for suspensions that have opLevs, this change is described in ECR E1400155
This involved changes to h1susetmx h1susitmx (since these are not using the quad master right now) QUAD_MASTER BSFM_MASTER and HLTS_MASTER
There is also a new medm screen in sus/common/medm/SUS_CUST_OPLEV_BLRMS
I also added a link to this screen in the op Lev screens :SUS_CUST_QUAD_L3_OPLEV.adl SUS_CUST_HLTS_M3_OPLEV.adl and SUS_CUST_BSFM_M3_OPLEV.adl
All of these changes are now committed to the svn.
The models are running, and Jim has restarted the DAQ so the new channels are avaiable. We also checked using dtt and a spectrum that the values diplayed are accurate.
The H1 SUS ITMY PUM coil balancing coefficients got lost in this reboot. I've restored their values according to LHO aLOG 10625, captured, and committed a new safe.snap.
I also placed two strip tools, the top one for at the microseism frequency, 30-100mHz, and one for the suspension resonance frequencies, 0.3-1Hz. on the monitor on the wall next to the PEM FOMs. I moved the Osem screen over to the other side of th control room. We can change around the settings on these plots to make them more usefull if it turns out they are too hard to read.
TMS Work (Corey, Jax, Keita, Margot)
Apollo roughly positioned the ISCT-EY Table (via floor markings).
Chamber Floor was first cleaned upon entry for work (did not do an exit floor wipe). Margot then entered the chamber to remove First Contact from inside surface of pair of TMS Viewports (then went out and removed First Contact off outer surfaces). She mentioned "finger prints" on the outside surface of one of the TMS viewports; Margot will document in the DCC.
By eye, we checked position of the table, we ended up pushing the table about 3" west. We then attached ducting between the chamber and the table. Then we went in to check our line of sight from the TMS Table to the Table Periscope. This required us to move the GREEN periscope an inch east, and also the top periscope mirror down 4-6". At this point the laser wasn't quite making it down the ductThis is about where we ended things last evening.
Dust Monitor Check Of Purge Air For BSC10 (Corey, John, Keita)
John whether there was dust coming from the purge air. So I, with handheld Dust Monitor in hand, crawled under the ACB, and took a few measurements from the Dust Monitor (which was running continuously). I took measurements at four spots above the input of the purge air. At the highest point, I had readings which would hover between 1500-2500 counts of 0.5um particles. At the lowest point (with sensor of Dust Monitor right at the input), the 0.5um counts could be kept at basically 0-counts (with flashes of a few hundred possibly. So, it would seem the purge air is relatively clean.
These 2 statements seem contradictory:
"At the highest point, I had readings which would hover between 1500-2500 counts of 0.5um particles. [At the lowest point ...] So, it would seem the purge air is relatively clean."
Maybe can John elaborate at where he thinks the high counts are "coming from" if not the purge? Is it just that the air is turbulent in chamber and stirring up the 1500-2500 counts already in the chamber?
Corey saw high background levels in the beam manifold but was able to drive the particle counts to zero by moving the detector close to the purge port at the floor of the beam manifold. I walked around the VEA sampling and found low levels throughout. Counts inside cleanrooms were close to or equal to zero except at the open BSC door. There was activity here as well as equipment staged. Corey was inside and Keita was outside at the BSC entrance. The overhead work platform reduces the effectiveness of the clean room in this location. Inside, the arm cavity baffle obstructs access to the beam manifold so any work in the beam manifold requires a person to laydown and slide under the baffle. This may very well abrade clothing. I recommend we establish a horzontal clean flow as we have while working in HAM chambers.
A reminder that the purge air can only provide 25--50 cfm of air flow into the chamber. In a 6 foot diameter tube (beam manifold) this translates to air velocities of only 0.6 to 1.2 feet per MINUTE. Think how far you walk in a minute.
My impression is we have a reservoir of particulate in the vacuum chamber from the series of operations which have taken place - for example there have been two cartridge installs and one removal. Also this cartridge is an early assembly - probably assembled prior to some of the "in process" cleaning steps we have adopted.
Yes, I should elaborate a little (was quickly entering alog during Morning Meeting). So we did measure counts while I was in BSC10. And seemed like we had steading counts in the several thousands [for 0.5um counts with continuous sampling]. When I took measurements along the Purge Air plume, I would get up to 2500 at the most at the top of the plume (6' high). As I went closer and closer down to the Purge air inlet, the counts started to drop. And it was zero right at the inlet.
So the picture looked as though we have a baseline of particles floating in the chamber. And in the turbulent air above the Purge Air inlet the counts waiver a bit, but counts decrease the closer you are to the inlet. So above the purge we have particles moving around more (vs further away from purge these particles are more "statically" floating...perhaps they are more on the floor when someone like me isn't shuffling them into the air).
Basically we have particles all over the surfaces in the chamber/tube. They may be gently floating around or resting on the floor. They get rustled around when we work in-chamber & also get blown around and away from the Purge Air inlet. We need to remove these particles...which I know is obvious and daunting.
We're going to coninue wiping these particles on the floor toward door, but not sure what that does. Hopefully, particles get attached to our wet wipes, but I wonder if particles just get pushed to the edge of the floor and then fall over the edge of the temporary floor and then rain down to the bottom of the chamber. Sad sad.
Are these raw or normalized counts?
The other thing to note is that I did an svn update on that computer right before it crashed, it might be worth looking at what was included in the update to see if it changed the behavoir of the IoC somehow.
(As also discussed in person.) This may be due to a difference in the environment setup between the Guardian supervisors, and a login shell. The EPICS gateway processes are still in place on the FE subnet, as we have not changed the data concentrator or FE systems to directly broadcast to other subnets. So, the channel behavior will be dependent on the setting of the EPICS_CA_ADDR_LIST environment variable, specifically whether CA will traverse the gateway or route through the core switch. The problem described sounds a lot like the issue the gateway has with reconnecting to Beckhoff IOCs, if the Guardian processes are connecting to the gateway then this would explain the behavior described. Jaime was going to look at the Guardian environment setup as time permits, to see how it differs from the current cdscfg setup.