SSDS Operation

These are some of the basics of how the SSDS is operating. The idea with this doc is that if there are certain symptoms, you can come here and find the basics on how to restart things to try and get them running again.

Web Pages

Main Page: http://new-ssds.mbari.org/ssds/
Device Listing: http://new-ssds.mbari.org/ssds/device.jsp
To create a new device: http://new-ssds.mbari.org/ssds/newDevice.jsp

Servers and Restarting

SSDS has three main components that run to keep the system operational.

SSDS Ingest/Transmogrify

The SSDS Ingest/Transmogrify components are message driven beans that run inside a JBoss container running on a server named ‘bob.shore.mbari.org’. It writes packets to the database on dione.mbari.org.

SSDS Core

The main parts (including the web application) for the SSDS run on a server named new-ssds.mbari.org. They are components that run inside a JBoss container running on that server.

SSDS Updatebot

The last component is a straight Java application running on pismo.shore.mbari.org and is called updatebot. It listens for metadata packets that are republished from new-ssds.mbari.org and also crawls all the metadata in the metadata database on dione. It’s responsible for looking for files that need conversion to NetCDF and doing that conversion.

Restarting everything

The order of restarting things does matter for the SSDS system. In order to restart everything clean:

Open a terminal/command line window and ssh into bob.shore.mbari.org as you. Run:
```
sudo systemctl stop JBoss
```
Wait for java process to stop. You can look for the java process by running:
```
ps -ef | grep java
```
In another terminal/command line window, ssh into pismo.shore.mbari.org as you, but then switch over to the ssdsadmin user using (NOTE - last time I did this, when I tried to run the systemctl command in the next step, it asked me for the ssdsadmin password, which I don’t think exists, so I exited out and ran the systemctl command as my own login and that worked OK):
```
sudo -u ssdsadmin -i
```
Now, stop the updatebot process by running:
```
sudo systemctl stop updatebot
```
Wait for java process to stop again by using ‘ps -ef | grep java’ and looking for the updatebot process
In yet another terminal/command line window, ssh into new-ssds.mbari.org as you and then change into the JBoss bin directory using:
```
cd /opt/jboss/bin
```
You can then shutdown the JBoss server using:
```
./shutdown.sh -S -s 134.89.2.25
```
Wait for java process to stop, again using ‘ps -ef | grep java’ looking for the JBoss process.
Change into the JBoss log directory by:
```
cd /opt/jboss/server/default/log
```
Now restart the JBoss server on new-ssds by running:
```
sudo systemctl start jboss
```
You can monitor the startup of the JBoss server by running:
```
tail -f server.log
```
Watch the log file and you will know the JBoss server is started when you see the line that has “Started in xxxx”. The log file may continue to have more entries after the start up and you may even see errors related to JMS as it tries to connect to the JMS server on bob.shore.mbari.org which has not been started yet. Leave the log file tailing in that window
Go back to the terminal window that is connected to bob.shore.mbari.org and run:
```
sudo systemctl start JBoss
```
Watch the startup logs, looking for the “Started in xxxx” by running:
```
tail -f /opt/jboss/server/default/log/server.log
```
Back in the new-ssds window where you have the log file tail going, you should see “Reconnected to JMS provider”, this is new-ssds connecting to JMS on bob
Now, move over to the window that is connected to pismo.shore.mbari.org
Move to the logs directory by using:
```
cd /opt/ssds/logs
```
Start the UpdatBot java client by running:
```
sudo systemctl start updatebot
```
You can then tail the UpdateBot log file by running:
```
tail -f ssds-updatebot.log
```
You should see a bunch of log entries start to fly by and that is how you know UpdateBot is crawling the SSDS metadata
Visit http://new-ssds.mbari.org/ssds/
If you see a web page, everything should be good to go.
You can Cntl-C out of the log file tailing in all three windows and logout

Raw Data Monitor

I have a cron job running on pismo under my account (kgomes) that runs at the top of the hour that checks certain data streams to see if there is recent data (within the last 5 hours). The script is located at:

    /u/kgomes/scripts/ssds_data_checker.pl

To change who gets the emails, edit the script and change the comma separated list of email addresses on line 11 that looks like:

    my $to = 'To: kgomes@mbari.org';

To change the length of data gap to allow, change line 23 that reads:

    my $expectedModificationInterval = .21;

The number here is actually the fraction of 24 hours (I know, weird). If you want to edit the list of instruments that are being monitored, you edit the file:

    /u/kgomes/scripts/rawPacketIdsToMonitor.txt

It’s pretty self explanatory. If a particular instrument fails, I will edit this file and comment out the particular instrument ID so it doesn’t keep throwing messages.

UpdateBot

There is a java process that runs on pismo that handles all the metadata updates when mooring metadata and AUV missions are uploaded called ‘UpdateBot’. Often times it just hangs or loses it’s mind somehow. The process connects to the SSDS instance running on new-ssds.mbari.org and listens for JMS messages that contain metadata. It also will automatically crawl all the metadata in SSDS every 12 hours to see if anything needs processing. One way to see if it’s working is to check both the process listing and the log files. To check the process listing, ssh into pismo and run:

    ps -ef | grep java

and you should see a process listing that looks like:

    ssdsadm+  83796     1  7 11:13 ?        00:00:25 /usr/java/jdk1.8.0_121/bin/java -Duser.timezone=UTC -Xms512m -Xmx1024m -classpath /opt/ssds/updatebot/ssds-updatebot-client-new-ssds.jar moos.ssds.clients.updateBot.UpdateBotRunner

If there is not process listed like this, chances are it died. The process id file (PID file) is located at /var/tmp/updatebot.pid and should contain the PID that was assigned when it was last started. To check the log file, you can tail the /opt/ssds/logs/ssds-updateBot.log file and see if the date and time stamps show any activity in the last 12 hours. If the dates are more than a day old, something is wrong and it should be restarted.

You need sudo privs to run systemctl, but to stop updatebot use:

    sudo systemctl stop updatebot

To start updatebot use:

    sudo systemctl start updatebot

And then tail the log file in /opt/ssds/logs/ssds-updateBot.log and you should see activity as it crawls the metadata on startup.

Troubleshooting

Symptom: Data stops flowing in high data rate instruments

There are usually two possible problems here, either the sender (usually SIAM: JMS Producer) or the receiver (SSDS). I created a new JBoss (now called Wildfly) server running on ssds-ingest.shore.mbari.org just for this experiment because the publishing rates are so high (10 Hz). The old JMS would not work so I spun up this new server. The first thing I usually do is check to see if there are any errors on the Wildfly instance running on ssds-ingest.shore.mbari.org. To check this:

ssh into ssds-ingest.shore.mbari.org (you might have to check with IS to get access).
cd to /opt/wildfly-10.0.0.Final/standalone/log and run ‘tail -n 500 server.log’ or you can ‘tail -f server.log’.
If there are very current stacktrace exceptions (or tail -f shows them happening now), it could be that the JMS got backlogged and is having problems keeping up with the flow of messages. Sometimes this can happen if there are network problems as the component running here is trying to write entries to a SQL database on dione. If the SQL server was rebooted this can cause issues as well. If it looks like this could be the problem, I just restart wildfly using:
```
sudo /sbin/service wildfly stop
ps -ef | grep java (keep doing this until you see the wildfly java process is gone)
sudo /sbin/service wildfly start
```

If you don’t see anything really wrong on the SSDS side here, you can ping on Tom O’Reilly who is in charge of the SIAM instance that is collecting data from the instruments and publishing to SSDS.