aLIGO LHO Logbook

Logbook Admin Bug (CDS)

jeffrey.kissel@LIGO.ORG - posted 23:06, Wednesday 30 October 2013 - last comment - 23:08, Wednesday 30 October 2013(8326)

tar.gz files fail to upload

I was looking to upload the source code tar ball for the record in LHO aLOG 8325, but after selecting the file in the browser, I hit the "upload file" button and the file just disappears from the browse field without uploading anything.

We've requested that *any* file type should be upload-able, as long as it's under the 10 Mb limit (see LHO aLOG 3798 and subsequent comments) but I guess the infrastructure doesn't allow for it?

P.S. looking if we've requested this before ('cause I thought ), I rediscovered a report of a bug that's still present:
.

Comments related to this report

jeffrey.kissel@LIGO.ORG - 23:08, Wednesday 30 October 2013 (8327)

Link

For the record, the exact tarball (if it doesn't change before tomorrow when we fix the problem) lives in
/opt/rtcds/lho/h1/target/h1sussr3/src/sources.tar.gz
if it has something to do with this specific tar ball, but I doubt it.

H1 SUS (CDS, DAQ)

Link

jeffrey.kissel@LIGO.ORG - posted 22:52, Wednesday 30 October 2013 - last comment - 05:21, Thursday 31 October 2013(8325)

ECR E1300578 and E1300261 Progress -- HLTS Models -- And crashed to Data Concentrator / Framebuilder

J. Kissel

I've now updated the HLTS front-end simulink models as per ECR E1300578, similar to the QUADs, TMTS, and BSFM, as described in G1301192. After successful compilation of both H1SUSPR3 and H1SUSSR3, Fabrice informed me that Arnaud was gathering some data looking for long-term drift on PR3, so I only installed and restarted SR3. Of course, up successful compilation and install, I went to restart the front end with the new process and it hung halfway through, completely taking down the data concentrator / frame builder / DAQ, and took down the entire h1sush56 front end. I attach a screenshot of the CDS overview screen. *sigh*. The reigning king of finding crazy obscure bugs in CDS and exercising them wins again. The only debugging I've done is trying to reboot the data concentrator once, by doing the following

controls@opsws3:models 0$ telnet h1dc0 8087
Trying 10.101.0.20...
Connected to h1dc0.cds.ligo-wa.caltech.edu.
Escape character is '^]'.
daqd> shutdown
OK
Connection closed by foreign host.
controls@opsws3:models 1$

This brought back *some* of the front ends back up and to green status (The SEI and SUS computers at the end stations), but the corner is cooked.

Sorry Arnaud, and anyone else who was gather data overnight.

Giving up for the night and will continue fighting the good fight tomorrow morning.

I tried uploading the source from the target area, but the aLOG doesn't like tar.gz's at all.

------
Here's the status of the sus corner of the SVN repo that's a result of my work:

MM      common/models/HAUX_MASTER.mdl                  # haven't started on the HAUX yet
MM      common/models/HLTS_MASTER.mdl                  # changes complete, but don't want to commit until I can successfully start the front end process
M       common/models/SIXOSEM_T_STAGE_MASTER.mdl       # same as above
MM      common/models/MC_MASTER.mdl                    # haven't started on the HSTS yet
M       common/models/OMCS_MASTER.mdl                  # haven't started on the OMCS yet
M       common/models/SIXOSEM_T_WD_AC_MASTER.mdl       # changes complete, but don't want to commit until I can successfully start the front end process
M       common/models/SIXOSEM_T_WD_DC_MASTER.mdl       # changes complete, but don't want to commit until I can successfully start the front end process
MM      common/models/HSTS_MASTER.mdl                  # haven't started on the HSTS yet

M       h1/filterfiles/H1SUSTMSX.txt                   # Haven't committed since new code has been installed, these still need a hand clean up of now-vestigial filter banks
M       h1/filterfiles/H1SUSTMSY.txt                   #     | 
M       h1/filterfiles/H1SUSBS.txt                     #     | 
M       h1/filterfiles/H1SUSSR3.txt                    #     | 
M       h1/filterfiles/H1SUSETMX.txt                   #     | 
M       h1/filterfiles/H1SUSETMY.txt                   #     | 
M       h1/filterfiles/H1SUSITMX.txt                   #     | 
M       h1/filterfiles/H1SUSITMY.txt                   #     V
 
M       h1/models/h1susprm.mdl                         # haven't started on the HSTS yet
M       h1/models/h1sussrm.mdl                         #     |
M       h1/models/h1suspr2.mdl                         #     V
M       h1/models/h1suspr3.mdl                         # changes complete, but don't want to commit until I can successfully start the front end process
M       h1/models/h1sussr2.mdl                         # haven't started on the HSTS yet
M       h1/models/h1sussr3.mdl                         # changes complete, but don't want to commit until I can successfully start the front end process
M       h1/models/h1susomc.mdl                         # haven't started on the OMCS yet
M       h1/models/h1susmc1.mdl                         # haven't started on the HSTS yet
M       h1/models/h1susmc2.mdl                         #     |
M       h1/models/h1susmc3.mdl                         #     V

Images attached to this report

Comments related to this report

keith.thorne@LIGO.ORG - 05:21, Thursday 31 October 2013 (8328)

Link

The front-end models are running - however data shipping to the data concentrator is not working (or only partially).
What is needed is to restart the mx_stream processes on each front-end.
  ** There should be a script 'restart_all_mxstreams.sh' in /opt/rtcds/lho/h1/target/h1dc0.  If you log into the boot server as 'controls' you should be able to run this script smoothly.
All this script (should) do is ssh onto each front-end, then do /etc/init.d/mx_stream stop, /etc/init.d/mx_stream start.  
* You can do this manually on each front-end to see if it fixes the problem.

[ and yes, we need more complete info, helpful docs consistent at both sites]

H1 CDS

Link

patrick.thomas@LIGO.ORG - posted 19:16, Wednesday 30 October 2013 (8324)

Restarted Conlog after setting paths for caRepeater

I noticed that Conlog was not reconnecting to some PEM channels after their IOCs were restarted.

One hypothesis is that it may need to have the caRepeater running, which it has not been.

I stopped Conlog around 18:12. I set the environment variable $PATH to include the path to the caRepeater binary. I restarted Conlog. I got an error not finding a library needed by caRepeater. I stopped Conlog and set the environment variable $LD_LIBRARY_PATH to include the path to the library. I restarted Conlog and it ran without any further errors. These environment variables are set in /home/controls/bashrc_import which is sourced by /home/controls/.bashrc on h1conlog.

It remains to be seen if this fixes the problem.

H1 SEI

Link

hugh.radkins@LIGO.ORG - posted 16:49, Wednesday 30 October 2013 (8321)

WBSC9 ETMx SEI HEPI Actuators--5 of 8 connected

Should get the remaining Actuators attached tomorrow morning.

LHO General

Link

gerardo.moreno@LIGO.ORG - posted 16:35, Wednesday 30 October 2013 (8320)

Operator Summary

Today's activities:
- Jim W, to LVEA, lock HEPI HAM02.
- Sheila, RefCav locking lesson to operator.
- Richard M, to LVEA, cable work under HAM02.
- Apollo crew, to LVEA, move IOT2L.
- Apollo crew, door prep work, HAM02.
- Betsy, LVEA, SR2 work.
- Jim W, to LVEA, HEPI lock HAM03.
- Apollo crew, to LVEA, door prep work, HAM03
- Filiberto, LVEA and X-end, ESD measurements.
- Mitchel & Thomas, MCB assembly work, West bay area.
- Hugh and Greg, X-end, HEPI work.
- Jim B and Dave B, to Y-End, troubleshooting.

Vendors:
- Porta potty service.

H1 PSL (PSL)

Link

gerardo.moreno@LIGO.ORG - posted 16:35, Wednesday 30 October 2013 (8319)

H1 PSL Changes

(Sheila, Gerardo)

Sheila showed me how to lock the reference cavity.
One change occurred to get the system to behave, Sheila lowered the resonant threshold down to 0.5 V from 0.9 V.
The reference cavity was able to lock manually, but now it appears misaligned when locked.

H1 CDS

Link

david.barker@LIGO.ORG - posted 16:03, Wednesday 30 October 2013 - last comment - 18:04, Wednesday 30 October 2013(8318)

h1pemmx, testing 2.8 code.

Jim, Cyrus, Dave

Rolf added a new feature to RCG2.8 to permit a front end to run without an IRIGB card (GPS time is obtained via EPICS Channel Access from a remote IOC). We are in the process of testing this on h1pemmx.

To prepare for the test, I added the line "remoteGPS=1" to the CDS block on h1ioppemmx. I added a cdsEzCaRead part, reading the GPS time from the DAQ data concentrator on channel H1:DAQ-DC0_GPS. I svn updated the trunk area, and compiled h1ioppemmx and h1pemmx against the latest trunk.

Test 1: keep the IRIG-B card in the computer, restart the IOP model several times. We noticed that the sync of the GPS time from IOPPEMMX and its reference DC0 does change from restart to restart but keeps synchronized to within a second.

We are in the process of test 2, removing the IRIGB card from h1pemmx. At the same time, Cyrus is reconfiguring the X-ARM switching sysems for the FrontEnd and DAQ switches, which will permit replacement of two netgear switches at MX with media converters. The use of full switches to support a single front end computer is obviously wasteful.

On completion of today's IRIGB tests, we will re-install the IRIGB card and reload the 2.7.2 version of the IOP and PEM code. While this test is progressing the DAQ status from MX is 0x2000 and its data is bad.

Comments related to this report

david.barker@LIGO.ORG - 17:26, Wednesday 30 October 2013 (8322)

Link

Test is completed, pemmx front end has been reverted back to its original state (IRIGB card installed, 2.7.2 code running).

The Test was a SUCCESS, the IOP ran without an IRIGB card. This is indicated by a ZERO time on the IRIGB diagnostics on the GDS_TP MEDM screen (see attached).

One problem found was with the replacement of the DAQ network switch with a media converter. This caused all the DAQ data from all the other front ends withc share the second 10 GigE card on h1dc0 to go bad. We tried to restart the mx_streamer on h1pemmx but that only made matters worse and all the FEs data went bad for a few seconds. I'll leave it to Cyrus to add more details. We reinstalled the netgear switch for the DAQ, but kept the media converter for the FE network as this showed no problems.

Images attached to this comment

cyrus.reed@LIGO.ORG - 18:04, Wednesday 30 October 2013 (8323)

Link

The media converters I tried are bridging media converters, which means they act like a small 1 port switch with 1 uplink. I went with these because when the computer is powered off, the embedded IPMI interface negotiates at 100Mbps, not 1Gbps, and a standard media converter will not negotiate this rate (it is fixed to the fiber rate). Therefore a bridging converter maintains access to the IPMI management interface on the front end computer at all times, not just when booted and connected to the switch at 1Gbps. However, the switching logic in these media converters do not support Jumbo Frames, which when used on the DAQ network corrupts the Open-MX data. I've confirmed this by looking at the documentation again and comparing to a non-bridging version. So, I'll need to obtain some additional non-bridging media converters for use on the DAQ network which should work better for this application as they are strictly Layer1 devices with no Layer 2 functionality.

H1 CDS

Link

david.barker@LIGO.ORG - posted 15:54, Wednesday 30 October 2013 (8317)

script0 froze this morning at 6am, needed a reboot

Jim, Dave.

Eagle eyed Kyle noticed that the medm screen snapshots stopped working at 6am this morning. script0 was pingable, but we could not ssh onto it. Its console was frozen, and it had to be rebooted. We restarted the medm screen snapshot program.

H1 CDS

Link

david.barker@LIGO.ORG - posted 15:52, Wednesday 30 October 2013 (8316)

h1susey restarts

Jim and Dave

We restarted the user and iop models on h1susey several times investigating the DAC status bits (follow on from yesterday's ITMX,Y issue). We did not find any problems at EY, the status bits are consistent with the AI units being powered down. We wanted to try powering them up, but they are missing the +15V supply.

H1 AOS

Link

douglas.cook@LIGO.ORG - posted 15:15, Wednesday 30 October 2013 (8315)

ETMx alignment

Jason and I cut away the Ameristat from around the legs of tripods and realigned the instruments to have them ready in the AM.

I added new scribe lines to the ACB targets to represent the new horizontal centerlines.

H1 INS (INS, SEI)

Link

jim.warner@LIGO.ORG - posted 14:42, Wednesday 30 October 2013 (8314)

HAM's 2 & 3 HEPI locked

This morning at ~9:30 I locked HAM2 HEPI. At ~ 1:30pm, HAM3 HEPI got a similar treatment. Offsets from floating position for both were about 100 cts (=100cts / [(7.87V/mm)*(1638cts/V)] ~.0003"), which is what Hugh reported he shot for when locking.

LHO General (PEM)

Link

patrick.thomas@LIGO.ORG - posted 13:44, Wednesday 30 October 2013 (8313)

Increased dust in beer garden

Started around 9:00 AM on October 29. Plot attached.

Non-image files attached to this report

beer_garden.pdf

H1 PSL (PSL)

Link

peter.king@LIGO.ORG - posted 12:04, Wednesday 30 October 2013 (8312)

Second Loop In Air Cables Installed On HAM2

R. McCarthy, P. King

The in-air cables (D1300464) used for the outer loop power stabilisation photodetector array were installed (see
attached pictures).  Looking at the flange, from left to right.  On the left hand side subflange the cables S1301012
and S1301013 were installed.  On the right hand side subflange the cables S1301014 and S1301015 were installed.  These
were attached to the black coloured mating pieces and are face to face flush as shown in the second attached picture.

Images attached to this report

H1 SUS

Link

arnaud.pele@LIGO.ORG - posted 11:16, Wednesday 30 October 2013 (8311)

ITMX free of rubbing

Results of main and reaction chain of ITMX (in chamber, in air, suspension undamped, ISI locked) are showing a very good match with the model and phase 2b (test stand) measurements, so the suspension is free of rubbing for alignment, cf first attached pdf for the main chain and second one for the reaction chain. The two last pdf are showing the tf against the model (third pdf for main chain and fourth for reaction chain). As usual, the reaction chain's pitch is off due to the stiffness added by the cables.

Non-image files attached to this report

allquads_2013_10_30_Phase3a_H1TMX_ALLM0_ZOOMED_TFs.pdf

allquads_2013_10_30_Phase3a_H1TMX_ALLR0_ZOOMED_TFs.pdf

2013-10-29_1067133132_H1SUSITMX_M0_damp_OFF_ALL_TFs.pdf

2013-10-30_H1SUSITMX_R0_ALL_TFs.pdf

H1 SUS

Link

arnaud.pele@LIGO.ORG - posted 19:19, Tuesday 29 October 2013 (8308)

ITMX measurements still pending

I had troubles getting ITMX data with matlab from last friday's measurement, because of a typo I made in the code when updating the new channel names. Although, I was able to recover data of the main chain with dtt with the right channel names, and it looks healthy, cf the attached document.

I reran a new set of measurements overnight on monday for the reaction chain, but this time it failed because of a drive issue. The details have been logged by Jeff cf alog 8279.

So I started again tonight and it seems to be working fine for now.

Non-image files attached to this report

2013-10-25_ITMX_M0_TF.pdf

H1 SUS (CDS)

Link

jeffrey.kissel@LIGO.ORG - posted 18:50, Tuesday 29 October 2013 - last comment - 20:04, Tuesday 29 October 2013(8306)

h1susb123 Computer Woes

D. Barker, J. Batch, J. Kissel, A. Pele

Arnaud has been having trouble this week taking transfer functions on ITMX. After a lot of chasing our tails, and finding a few bugs in the infrastructure work that I've been doing (see LHO aLOG 8247, and  [sorry for the lack of aLOGging, slaps own wrist, was on a deadline for Stuart]), we finally discovered that the analog "keep-alive" signal that is sent from the I/O chassis to the AI Chassis for all the SUS on the h1susb123 (H1SUSITMX, H1SUSITMY, H1SUSBS) was failing. It had apparently failed on Sunday at 7pm PT, when I was here fixing an unrelated bug in the *library parts* for the QUAD (which means, if it *was* me, it would have affected all QUAD models, and Arnaud has successfully driven / damped H1SUSETMX since Sunday). It's now fixed, with a hard power cycle of the front end and IO chassis. (Some rather upsetting) Details below.

---------

The symptoms we identified:
- The IOP Model output on the SUS OVERVIEW screen was showing zeros when we expected to have output signal
- The IOP's GDS_TP screen showed the 5th bit was red -- explained on pg 18 of T1100625 to be 
     "Anti-imaging (AI) chassis watchdog (18bit DAC modules only / [on the] IOP [screen] only): For 18 bit DAC modules, the IOP [front end model] sends a 1 [Hz] heartbeat to the connected AI chassis via a [16 bit] binary output [card] on these modules. The AI chassis, in turn, returns a bit via the DAC binary input register to indicate receipt."
(Don't worry, even with my [edits], it still doesn't even make sense, even to *me*.)
- The LED near the the input of the AI chassis was OFF (not red, just off)
- The switch to flip in the DAC duotone signal on the 31st channel of ADC0 in the IO chassis, which is controlled by the same 16 bit BIO card, was malfunctioning in that -- when watching the 31st channel of ADC0 in dataviewer -- we saw the signal flip from noise to *zero* instead of noise to the typical pretty duotone sinewaves.

Welp. Looks like we need to add yet ANOTHER watchdog layer to the overview screen. 

Dave's conjecture is that some how this 16 bit BIO card got in a bad state on Sunday. It's unclear how, though, since I was just stopping and restarting the user models, and was not playing with the IOP, nor was I turning on and off the front end or the IO chassis.

Anyways. How do we solve any problem with computers? Power down and power back up. *sigh*.

There's now a reasonably successful procedure for gracefully bringing a front end / IO chassis up and down, with out affecting other front ends. Here's what I got from picking Dave's brain, and watching over his shoulder:
(1) Kill the user model processes running on that front end.
     ]$ ssh h1susb123
     ]$ killh1susitmy
     ]$ killh1susitmx
     ]$ killh1susbs
(2) Kill the IOP process running on the front end
     ]$ killh1iopsusb123
(3) Remove the front end from the IPC / Dolphin network, so you don't crash every other front end. Note, you only should do this step if you're powering down the front end and IO chassis. It's not necessary when just stop and starting frontend processes.
      ]$ sudo -s
     ]$ /opt/DIS/sbin/dxtool prepare-shudown 0 
(4) Turn off the front-end gracefully* (still as super user)
      ]$ power off
* This didn't work for us. The front end powered down, and then immediately began rebooting itself and bringing all the models back up. So, we had to 
   - wait for it to finish rebooting and bringing up the models
   - kill the front end processes again
   - Go into the Mass Storage Room (MSR) and hold the power button until it powered down.
(5) Turn off the IO chassis by going to the CDS Highbay, and flicking the rocker switch on the front of the chassis.**
** This doesn't work FOR ANY IO CHASSIS. Jim informs me that the rocker switch is wired to the wrong pins on the motherboard. For every IO chassis. Yeah. So, one has to disconnect the DC power in the back of the rack by unscrewing properly secure cables, risking powering down the chassis unevenly. Similarly on power up. TOTAL BADNESS. Apparently at LLO, they've installed lamp-style rocker switches right on the the cable to work around this problem and badness. (a) Why don't we have this already at LHO? (b) Was this an accepted, global, CDS fix? (c) Why can't we just re-wire the IO chassis?
(6) Turn on the IO chassis via the same rocker switch in the front (assuming you've reconnected the DC power, and like I did, flipped the rocker to the off position expecting it to work before hand.)
(7) Use monit (the remote controller of front ends' power that I still know too little about) to gracefully turn on the front end.
Upon power up, the front is gracefully inserted back into the dolphin network, the IOP front end process is restarted, and then the user front end processes***.
*** Because I've been making a bunch of changes to the EPICs variables in these models, and haven't yet had the chance to update the safe.snaps, the start-up process takes much longer to restore the snap (trying to reconcile the differences I presume), which means the $(IFO):FEC-$(DCUID)_BURT_RESTORE flag doesn't get set before the process looks for it's timing synchronization signal, and just hangs there red and dead claiming no sync. You have to then hit the button (when the EPICs gateway catches up, some time later), restart the front-end process (which captures that this bit is now set), then it happily picks up the IOP timing sync, and springs back to life.


That's the process. Don't you feel better?

Comments related to this report

keith.thorne@LIGO.ORG - 20:04, Tuesday 29 October 2013 (8309)

Link

If the front-end is not locked up, you can simply shutdown it down with:
sudo shutdown -hP now
   (This command will shutdown the Dolphin client as well)

If you want to shutdown all models on a front-end:
sudo /etc/kill_models.sh

We have a lot of power outages at LLO, hence the invention of in-line power kill switches as it is a long way to the DC power room.  David K. may have them already fabbed - we will check.

H1 CDS (SUS)

Link

james.batch@LIGO.ORG - posted 10:16, Tuesday 29 October 2013 - last comment - 20:06, Tuesday 29 October 2013(8296)

Restarted models on h1sush2b

At about 9:03 PDT, there was a timing error detected by the DACs on h1sush2b, which resulted in all outputs being set to 0.  The only remedy is to restart the IOP model, so I killed h1susim, restarted h1iopsush2b, then started h1susim again.  This has cleared the DAC error.  I burt restored to 8:00 PDT.

Comments related to this report

keith.thorne@LIGO.ORG - 20:06, Tuesday 29 October 2013 (8310)

Link

A simple 'sudo /etc/startWorld.sh' will do all this