J. Kissel, R. McCarthy, M. Pirello, O. Patane, D. Barker, B. Weaver 2025-04-06 Power outage: LHO:83753 Among the things that did not recover nicely from the 2025-04-06 power outage was the +18V DC power supply to the SUS ITMY / ITMX / BS rack, SUS-C5. The power supply lives in VDC-C1 U23-U21 (Left-Hand Side if staring at the rack from the front); see D2300167. More details to come, but we replaced both +/-18V power supplies and SUS ITMY PUM OSEMs satamp did not survive the powerup, so we replaced that too. Took out +18V Power Supply S1300278 -18V Power Supply S1300295 ITMY PUM SatAmp S1100122 Replaced with +18V Power Supply S1201919 -18V Power Supply S1201915 ITMY PUM SatAmp S1000227
There was a PSL Beckhoff chassis that needed to be powered on. There is an alog saying that I configured the PSL PLC and IOC to start automatically, so maybe this is what kept it from doing so? I physically power cycled the BRS Beckhoff machine at end X. It was unreachable from remote desktop and in a bad frozen state when I connected to it from the KVM switch. I started the end X NCAL PLC and IOC, the end X mains power monitoring PLC and IOC, and the corner station mains power monitoring PLC and IOC.
On the end X mains power monitoring Beckhoff machine I had to make the tcioc firewall profile also be enabled for the private network.
Oli, Ibrahim, RyanC
We took a look at the osems current positions for the suspensions post power outage to make sure the offsets are still correct, the previously referenced "Golden time" was 1427541769 (the last drmi lock before the vent). While we did compare against this time we mainly set them to before the poweroutage.
Input:
IM1_P: 368.5 -> 396.1, IM1_Y: -382.7 -> -385.4
IM2_P: 558.0 -> 792.0, IM2_Y: -174.7 -> -175.7
IM3_P: -216.3 -> -207.7, IM3_Y: 334.0 -> 346.0
IM4_P: -52.4 -> -92.4, IM4_Y: 379.5 -> 122.5
SR2_P: -114.3 -> -117.6, SR2_Y: 255.2 -> 243.6
SRM_P: 2540.3 -> 2478.3, SRM_Y: -3809.1 -> -3825.1
SR3_P: 439.8 -> 442.4, SR3_Y: -137.7 -> -143.9
Output:
ZM6_P: 1408.7 -> 811.7, ZM6_Y: -260.1 -> -206.1
OM1_P: -70.9 -> -90.8, OM1_Y: 707.2 -> 704.5
OM2_P: -1475.8 -> -1445.0, OM2_Y: -141.2 -> -290.8
OM3_P: Didn't adjust, OM3_Y: Didn't adjust
Input:
PRM_P: -1620.8 -> -1672, (Changed by -51.2) PRM_Y: 579.6 -> -75.6 (Changed by -655.2)
PR2_P: 1555 -> 1409 (Changed by 146), PR2_Y: 2800.7 -> -280.8 (Changed by -3081.5)
PR3_P: -122.2 -> -151 (Changed by -28.8), PR3_Y: -100 -> -232.4 (Changed by -132.4)
MC1_P: 833.3 -> 833.3, MC1_Y: -2230.6 -> -2230.6 (No change)
MC2_P: 591.5 -> 591.5, MC2_Y: -580.4 -> -580.4 (No change)
MC3_P: -20.3 -> -20.3, MC3_Y: -2431.1 -> -2431.1 (No change)
Attached are plots showing the offsets (and their relevant M1 OSEMs) before and after the suspension was reliably aligned.
Comparing QUADs, BS, and TMSs pointing before to after outage. All had come back up from the power outage with a slightly different OPTICALIGN OFFSET for P and Y, due to the power outage taking everything down, and then when systems got back up, the OPTICALIGN OFFSETS were read from the sus sdf files, and those channels aren't monitored by sdf and so had older offset values. I set the offest values back to what they were before the power outage, but still had to adjust them to get the top masses pointed back to where they were before the outage.
'Before' refers to the OPTICALIGN OFFSET values before the outage, and 'After' is what I changed those values to to get the driftmon channels to match where they were before the outage.
SUS | Before | After |
ITMX | ||
P | -114.5 | -109.7 |
Y | 110.1 | 110.1 (no change) |
ITMY | ||
P | 1.6 | 2.0 |
Y | -17.9 | -22.4 |
ETMX | ||
P | -36.6 | -45.7 |
Y | -146.2 | -153.7 |
ETMY | ||
P | 164.6 | 160.6 |
Y | 166.7 | 166.7 (no change) |
BS | ||
P | 96.7 | 96.7 (no change) |
Y | -393.7 | -393.7 (no change) |
TMSX | ||
P | -88.9 | -89.9 |
Y | -94.3 | -92.3 |
TMSY | ||
P | 79.2 | 82.2 |
Y | -261.9 | -261.9 (no change) |
HAM7/8 suspensions was brought back, details in 83774.
Betsy, Oli, Camilla
We rechecked that everything was put back to this time, looking at the M1 DAMP channels for each optic, e.g. H1:SUS-OM1_M1_DAMP_{P,Y}_INMON:
For each ndscope, 1st t-cursor on originally "good DRMI time" we had planned to go back to, second t-cursor on time we went back to.
I checked the rest of the optics and verified that they all got put back to point where they were pointing before the power outage. I've also used one of the horizontal cursors to mark where the optic was at the "good DRMI time", and the other cursor marks where the optic is currently.
J. Warner, J. Kissel, M. Pirello 2025-04-06 Power outage: LHO:83753 Among the things that did not recover nicely from the 2025-04-06 power outage was the aLIGO BSC ISI Interface Chassis (D1002432) for Corner 1 of the ISI ITMY. Page 1 of the wiring diagram for a generic BSC SEI (HPI and ISI) system D0901301 shows that ST1 and ST2 CPS, ST1 L4Cs, and ST2 GS13s of a given corner (Corner 1, Corner 2, and Corner 3) are all read out by the same chassis. In the instance of this drawing / rack for ISI ITMY is SEI-C4, and this corner 1 chassis lives in SEI-C4 U40. This morning, Jim identified that this particular interface chassis failed -- after hearing that Tony tried to re-isolate the platform repeated last night and couldn't -- by opening the ISI ITMY overview screen and found that all of these corner one sensors' signals were reading out a "noisy zero" i.e. 0 +/- a few counts. See attached time-machine screenshot, where the corner 1 sensors are higlighted with yellow arrows showing less than 2 counts. Jim, Marc, and I went out to the CER SEI-C4 and found that power cycling the chassis would not help. There happened to be a spare chassis resting on top of the existing corner 1 chassis, so we just made the swap. We suspect that chassis power regulator didn't survive the current surge of the rack's DC power supplies losing power and coming back (perhaps unequally), but we'll do a post-mortem later. Taken out Interface chassis S1201320. Replaced with Interface chassis S1203892. The attached images show the top of the SEI-C4 rack BEFORE and AFTER.
Slow start on vent tasks today - besides the lack of water onsite this morning, and the particularly nasty power outage which are making things not come back up very well, we popped into HAM2 to keep moving on a few of the next steps. Corey has captured detailed pictures of components and layouts, and Camilla and I have logged all of the LSC and ASC PD serial numbers and cables numbers. We removed all connections at these PD boxes and Daniel is out making the RF cable meter length measurements. OPS were realigning all suspensions to a golden DRMI time they chose as a reference for any fall back times. Jason and Ryan are troubleshooting other PSL items that are misbehaving. We are gearing up to take a side road on today's plan to look for flashes out of HAM2 to convince ourselves that the PSL PZT and alignment restoration of some suspensions are somewhat correct.
Betsy and I also removed the septum plate VP cover to allow the PSL beam into HAM2 for alignment check work 83794, it was placed on HAM1.
In preparation for disconnecting cables in HAM1, I turned off the following DC interface chassis:
LSC RF PD DC interface chassis in ISC R4 (REFL-A, POP-A among others), LSC RF PD DC interface chassis in ISC R1 (REFL-B among others), and ASC WFS DC interface chassis in ISC R4 (REFL_A, REFL_B among others).
Daniel will perform TDR to measure RF in-vac cable length from outside.
Turning off the DC interface for LSC REFL_B somehow interfere with FSS locking. Turns out that the DC interface provides power (and maybe fast readback of the DC output) of FSS RFPD.
Since the point of powering down was to safely disconnect the DC in-vac cable from LSC REFL_B, and since the cable was safely disconnected, I restored the power and the FSS relocked right away.
J. Kissel, M. Pirello 2025-04-06 Power outage: LHO:83753 Among the things that did not recover nicely from the 2025-04-06 power outage was the Timing Comparator D1001370 that lives in ISC-C2 U40 (see component C261 on pg 3 of D1900511-v9). The symptom was that its time-synchronizing FPGA was caught in a bad state, and the timing fanout in the CER Beckhoff status for the comparator was reporting that H1:SYS-TIMING_C_FO_A_PORT_13_NODE_UPLINKUP was in error (a channel value of zero instead of one). We didn't know any of this at the start of the investigation. At the time of investigation start, we only new of an error by following through the automatically generated "SYS" screens (see attached guide), SITEMAP > SYS > Timing > Corner A Button [which had a red status light] > TIMING C_FO_A screen Port 13, dynamically marked as a "C" for comparator [whose number was red, and the status light was red] > Hitting the "C" opens the subscreen for TIMING C_FO_A NODE 13, which shows that red "Uplink Down" message in the middle right The screenshot shows the NODE 13 screen both in the "now" fixed green version state, and a time-machined "broken" version. Going out to the CER, we found that status light for Digital Port 13 == Analog Port 14 on the timing fanout (D080534; ISC-C3 U11) was blinking. Marc tried moving the comms cable to analog port 16, because "sometimes these things have bad ports." That didn't work, so we moved it back to analog port 14. That port's comms fiber cable was not labeled, so we followed it physical to find its connection to the SQZ timing comparator (again in ISC-C2 U40, thankfully "right next door"), to find it's "up" status light also blinking. Marc suggested that the comparators may lose sync, so we power cycled it. This chassis doesn't have a power switch, so we simply disconnected and reconnected its +/-18 V power cable. After waiting ~2 minutes, all status lights turned green. #FIXEDIT
I've been trying to recover seismic systems this morning after the power outage. So far, I've gotten HEPI pump stations back up, all the ISI are damping. Worked with Patrick to get both BRS are back up and running, but their heating was off for a while and it will take some time to get back to a good state.
ETMY and HAM8 however have some issue with the binary read backs for their coil drivers. I've check the racks and the overtemp relay lights are all good, but the binary read backs indicate a number of the overtemp relays are tripped. Not sure what the cause is yet, but fixing these isn't a priority today. Working with Dave, I power cycled both the binary in and coil drivers for HAM8, but that didn't fix the readbacks. Similar to this alog, we just put in test offsets to get the bit word clean and will wait for Fil to be available to try opening the expansion chassis. These tables should be able to isolate now, not that they are needed. Dave has filed and FRS and the test offset shows up in SDF.
And now the EY bit word has fixed itself, I've removed the fix Dave put in.
As of 12:16 here is the latest CDS not-working list (some cleared from last list, some added).
Slow controls DEV4 EX chassis terminal missing
NCALEX
Diode Room Beckhoff (h1pslctrl0)
BRS EX and EY
HWS
EX Mains Mon
Weather stations (EX and EY)
PWRCS and PWREX
R. Short, J. Oberling, P. Thomas, J. Hanks
We restarted the PSL after yesterday's power outage. Some notes:
Once the PSL Beckhoff was restarted we noticed that the output power seemed unusually low, we traced this to the calibration settings being out of date. When running a newer version of the software we have to remember to grab the persistent settings file (port_851.bootdata from C:\TwinCAT\3.1\Boot\PLC), this holds all of the trip points, sensor calibrations, and a running tally of operating hours. Not sure why the system lost this information now, I don't think we've ever seen this happen with a system restart and I currently have no explanation. We were able to get the PD and LD monitor calibration settings from my alog from February when I changed the pump diode operating currents, and were able to grab the operating hours using ndscope (we looked at what they were reading when the power outage happened and set the operating hours back to that point + 1 hour (since the system had been running for ~1 hour at that point)). One thing to note, however, the persistent operating hour data is now completely wrong. The software tracks operating hours 2 ways: a user-updatable value and a locked value. The former allows us to change operating hours when we install a new component (like swapping chillers or installing a new NPRO/Amplifier after a failure), while the latter tracks total uptime of said components (i.e. total number of hours an NPRO, any NPRO, has been running in the system). It's these latter operating hours that are completely bogus, as we have no way to update these if the persisten settings file is lost (as I said, it's a locked value).
For future reference I've attached a screenshot of the current system settings table; the operating hours that are now wrong are in column labeled OPHRS A.
PSL Beckhoff network connection was restored with the following command:
netsh interface ipv4 add neighbors "Ethernet" 10.105.0.1 <mac-address>
where "Ethernet" was the name of the interface as given by "ipconfig"
and <mac-address> is the address of 10.105.0.1 gateway.
This adds a permanent entry to the arp table.
A similar entry had to be added to the RGA workstation to communicate with corner station RGAs after the last network upgrade, but the arp entry had to be deleted after
the installation of sw-lvea-aux1 using
netsh interface ipv4 delete neighbors ....
see this microsoft KB entry: https://support.microsoft.com/en-us/topic/cannot-delete-static-arp-entries-by-using-the-netsh-command-on-a-computer-that-is-running-windows-vista-or-windows-server-2008-08096675-0a9b-81b3-b325-6438af4450bc
Mon Apr 07 10:07:39 2025 Fill completed in 7min 36secs
BSC8 annulus volume has been vented with nitrogen. Randy has been notified that door bolt removal can proceed.
Came in to find all IFO systems down. Working through recovery now.
SUSB123 power supply seemed to have tripped off. ITMs and BS OSEM counts were all sitting very low around 3000. Once power supply was flipped back on, OSEM counts returned to normal for ITMX and BS, but now ITMY coil drivers filter banks are fflashing ROCKER SWITCH DEATH. Jeff and Richard are back in the CER cycling the coil drivers to hopefully fix that.
Also power for ISIITMY is down and being worked on to bring back.
Most of vent work is currently on hold and focused on getting systems back online
Cycling the coil drivers worked to fix that issue with the ITMY coil drivers. They needed to turn the power back off, turn the connected chassis off, then turn the power back on and then each chassis back on one by one.
The ITMY ISI GS13 that failed was replaced,and work is still going on to bring ITMY ISI back.
There are some timing errors that need to be corrected and a problem with DEV4 at EX.
Once the ISI was back, Elenna and I brought all optics in HAM7/8 back to ALIGNED. Elenna put ZM4, FC1 back to before the power outage as they had changed ~200urad from SQZ/FC ASC signals being zeroed. Everything else (ZM1,2,3,5,OPO,FC2) was a <5-10 urad change so leaving as is.
The counts were pretty low over the weekend, peaking at ~ 30 counts of 0.3s and 10 counts for 0.5s.
The EY dust monitor died with the power outage and has not come back with a process restart, it's having connection issues.
The EY dust monitor came back overnight.
By no means a complete list and in no particular order:
FMCS EPICS
CER Timing Fanout 14th port
Diode room PSL Beckhoff
HEPI pump controller, corner station
Slow controls DEV4 terminal errror
h1cdsrfm long range dolphin
DTS
ncalex
BRS EX, EY
HWS
EX mains mon
FMCS-EPICS and the DTS have been recovered