Overview

There is a set of software scripts that were written by Rich Schramm that capture the engineering data for the MARS node. The scripts run in a couple of different places. The general overview is that there are python scripts running on the pmacs machine that gather the data and write them to CSV files on the pmacs server that are then exposed via a web server. There are then scripts that run on pismo that read those files and figure out which data has not been published and then publishes the data to a RabbitMQ exchange. Consumers running on pismo read in those messages and write the data to the SSDS_Data database on Dione. There are also scripts on pismo that monitor the number of messages in the RabbitMQ queues and make sure there are enough consumers to read in the data.

flowchart TB subgraph Beach Hut and Wetside npc["NPC Node Power Controller"] mvc["MVC Medium Voltage Converter"] lvc["LVC Low Voltage Converter Controller"] 10kv["10KV DC Power Supply"] pscPower["PSC Power Supply Controller"] ha7["HA7 Onewire Sensor Network"] end subgraph MARS Lab subgraph pmacs-server["pmacs.mars.mbari.org server"] uw-pmacs-server["UW PMACS Server"] psc2web["psc2web.py"] onewire2web["onewire2web.py"] end pmacs-console["PMACS Console"] npc-files[("NPC CSV Files")] onewire-files[("onewire daily files")] psc-files[("psc daily files")] apache["Apache Web Server"] end subgraph messaging.shore.mbari.org management-console["Management Console"] subgraph ssds-vhost 1772-queue["1772 Queue"] 1773-queue["1773 Queue"] xxxx-queue["XXXX Queue"] end end subgraph pismo.shore.mbari.org subgraph npc2rabbit["npc2rabbit.py"] external-loads["External Loads 1-8"] internal-loads["Internal Loads (LV)"] ground-fault["GroundFault Test (GF)"] medium-vc["Medium Voltage Converter (MV)"] end psc2rabbit["psc2rabbit.py"] end subgraph dione ssdsdb[("SSDS_Data Database")] end subgraph pismo-2["pismo.shore.mbari.org"] qmonitor["qmonitor.py"] 1772-consumer["consumer_1.py 1772"] 1773-consumer["consumer_1.py 1773"] xxxx-consumer["consumer_1.py XXXX"] end 10kv --> pscPower npc --> mvc npc --> lvc mvc --> uw-pmacs-server lvc --> uw-pmacs-server uw-pmacs-server --> npc-files npc-files --> apache pscPower --> psc2web psc2web --> psc-files ha7 --> onewire2web onewire2web --> onewire-files onewire-files --> apache psc-files --> apache apache ---> psc2rabbit apache ---> npc2rabbit psc2rabbit -- PB --> 1772-queue psc2rabbit -- PB --> xxxx-queue npc2rabbit -- PB --> 1773-queue npc2rabbit -- PB --> xxxx-queue management-console <--> qmonitor qmonitor --> 1772-consumer qmonitor --> 1773-consumer qmonitor --> xxxx-consumer 1772-queue -- PB --> 1772-consumer 1773-queue -- PB --> 1773-consumer xxxx-queue -- PB --> xxxx-consumer 1772-consumer --> ssdsdb 1773-consumer --> ssdsdb xxxx-consumer --> ssdsdb

Notes:

PMACS Server (pmacs.mars.mbari.org)contains the raw npc log files and an Apache web server
External Loads 1-8, Internal Loads(LV), GroudFault Test(GF), and Medium Voltage Converter (MV) are psuedo devices that are derived from the NPC
npc2rabbit.py, psc2rabbit.py, onewire2rabbit.py, Message 'Topic' Exchange (RabbitMQ), psc2web.py, onewire2web.py, onewire daily files and psc daily files are all code and data stores that were added for the purpose of storing data in the SSDS
PB stands for "Google Protocol Buffers"

pmacs.mars.mbari.org

The first step in the process happens on the server pmacs.mars.mbari.org. All of the code that runs on the pmacs server is located in a BitBucket Repo here. There is a crontab under the 'pmacs' account that runs two scripts every 15 minutes that check to see if the onewire2web and psc2web python scripts are running. Here are the two scripts at the time of this writing

restart_psc_2web

    #! /bin/csh
    # if its gone - start it
    set server_pid=`ps -ef | grep python | grep -v 'sh -c' | grep -v awk | awk '(/psc2web/){print $2}'`
    if ( "$#server_pid" == 0 ) then
    echo the psc2web task is stopped - it is being restarted via a cron job
    echo  psc2web restart attemped by cron | mail -s "psc2web attempting restart" kgomes@mbari.org
    #  echo  psc2web restart attemped by cron | mail -s "psc2web attempting restart" dacr@mbari.org rich@mbari.org
    /usr/bin/python /home/pmacs/bin/psc2web.py &
    endif

restart_onewire_2web

    #! /bin/csh
    # if its gone - start it
    set server_pid=`ps -ef | grep python | grep -v 'sh -c' | grep -v awk | awk '(/onewire2web/){print $2}'`
    if ( "$#server_pid" == 0 ) then
    echo the onewire2web task is stopped - it is being restarted via a cron job
    #  echo  onewire2web restart attemped by cron | mail -s "onewire2web attempting restart" rich@mbari.org
    #  echo  onewire2web restart attemped by cron | mail -s "onewire2web attempting restart" dacr@mbari.org kheller@mbari.org rich@mbari.org
    /usr/bin/python /home/pmacs/bin/onewire2web.py &
    endif

The cron scripts send their output to /var/lib/pmacs/archive/(psc|onewire) and the log files are usually empty unless something strange happens.

pismo.shore.mbari.org

There are two pieces running on pismo. There are the scripts that read the CSV files from the Apache web server and publish messages to RabbitMQ and then there are the SSDS related processes that consume those messages and ingest them into the SSDS_Data database.

xxx2rabbit.py

There are a couple of Python scripts that run on pismo that read the CSV files from the Apache server on pmacs.mars.mbari.org

qmonitor

There are scripts that run on a separate process that consume the messages and write them to the SSDS_Data database. The code repo for these scripts is in BitBucket. Under the ssdadmin account, there is a crontab entry that looks like:

    0,10,20,30,40,50 * * * * /usr/bin/python /opt/ssds/qmonitor/bin/qmonitor.py >> /opt/ssds/qmonitor/log/cronjob.output

The qmonitor script first sets up a rotating log file for itself in /opt/ssds/qmonitor/log/qmonitor.log. It uses authentication stored in /etc/ssh/keys/ssdsadmin/.netrc to login to the RabbitMQ management console. It then grabs the lists of all the queues from the management console (as JSON object) and parses each of the channels into a Python object that contains all the relevant information related to all channels. In addition to the basic channel information, it gets information about consumers and publishers too. It then uses all that information to create Queue objects which contains all the information about the Queue that are on RabbitMQ. The qmonitor script then loops over the list of Queues, makes sure that queue is just a digit (that is a convention that assumes that any queues that are just digits are SSDS Device IDs and need a consumer) and looks at the numbers of consumers for that Queue. If there are none, it starts one which is done using the consumer_1.py script. Each consumer creates a log file with the device ID as it's name in the /opt/ssds/qmonitor/log directory. The consumer creates a connection to the SSDS_Data database on Dione using credentials from the /etc/ssh/keys/ssdsadmin/.netrc file. It makes sure there is a table with the correct device ID in the database and creates one if it's not there. Next, it creates a durable Queue and binds it to the exchange with a routing key that is the device ID. It then starts listening for messages and when it gets a message, it parses the message using ProtocolBuffers and then writes an entry in the SSDS_Data database.

This doc is a capture of an old system diagram of the MARS Datamanagement pieces.

Dated approx: 03-Oct-2011

Its not quite right... I see that the python modules xxx2rabbit.py scripts are pointing to the pmaccs server.. they should really point at the apache web server which provides access to the csv files stored on the server.

Account is ssdsadm on pismo
There is a crontab that runs the following scripts:
- Every minute: restart_npc_2rabbit
  - Looks for a PID for the npc2rabbit and if it does not find one, it starts one using /usr/bin/python /opt/ssds/mars2ssds/bin/npc2rabbit.py -p -s 30 -c 88000 -d 0.003 &
  - -p: turn on publishing
  - -s: sleep interval in seconds between http page requests
  - -f: Start date (localtime) as string 'yyyy-mm-dd', default is todays date
  - -o: offset the initial html read by n bytes
  - -n: stop after n-days, default 100yrs
  - -r: Resume processing from last time run - Overrides -f argument !!!
  - -c: max number of lines to read each wakeup, minimum=1,default=3600
  - -d: delay(decimal secs) between lines to throttle uploads to target server minimum=0.0,default=0.0
- Every 15 minutes: restart_psc_2rabbit
  - Looks for a PID for psc2rabbit and if it does not find one, it start one using /usr/bin/python /opt/ssds/mars2ssds/bin/psc2rabbit.py -p -s 30 -c 88000 -d 0.003 &

TODO

Move diagram to Mermaid
Add the Apache web server between the stuff on the left running on pismo as it reads from the web server text files.
The XML files in the cfg directory were there with the idea they would be published in the stream so that SSDS had the metadata for each stream, but this we never implemented.
Rolling logger to keep the number of log files down
Turn off debug mode in these when running in production