MiniROV Data Processing
This page documents the software that was developed in order to integrate the data coming off the MiniROV into the Expedition database.
VideoLab personnel (Lonny) are responsible for creating the DiveNumber/Ship/Scientist entries using the MiniROV Divelog Application. This is a web application that connects to the MINIROV_DiveLog database on the MSSQL server on perseus. It allows the user to enter/edit dives in that database. Currently only Lonny has the password to it in order to reduce entry errors. Periodically tasks running on machine coredata(8) run to pull new dive entries over int the Expd database. The MinirovDivelog application is described in detail in the DiveLog section of the se-ie-doc site.
Minirov_Divelog database to Expd Database Ingest
Note
Divelog ingest into Expd is completely asynchronous and decoupled from the routine raw minirov data file processing. Dive entries are convenient for downstream uses such as the Expd web app or 'by dive' data searches etc but they are completely unnecessary for successful raw data ingest and processing.
Divelog data flow is shown below
An entry in the crontab (shown below) runs the runMinirovDivelogUploader shell script.
#merge from minirovdivlog database to Expd.Dive table
00 * * * * /u/coredata/minirov-cos8/scripts/runMinirovDivelogUploader >> /u/coredata/minirov/runlogs/minirovDivelogUploader.log 2>&1
The script itself consists of the following
#! /bin/sh
echo activating the python environment
source /u/coredata/minirov-cos8/venv-minirov/bin/activate
python /u/coredata/minirov-cos8/scripts/minirovdivelog2expd.py
exit 0
The shell script activates a python virtual environment and invokes the script minirovdivelog2expd.py that does all the work. minirovdivelog2expd.py queries the MINIROV_DiveLog Database for new Dive entries, and reformats them for entry into the EXPD database Dive table. The Dive insert is done using the sp_insertDive() stored procedure to ensure expd database referential integrity.
SEE ALSO Create an Expedition and links to MiniROV dives for non-MBARI ship Expeditions
Mini ROV Data Processing
The Pilot 'Contract'
- MiniROV pilot is responsible for logging data at sea and transfer to shore.landing root directory path is: //atlas.shore.mbari.org/ProjectLibrary/901123.ROV1k/Data/Logs
- Files will be of ‘type’ CTD, NAV, DVL, ROV
- Files are to be organized there to subdirs by ‘type’
- Note that ‘ROV_xxx’ files land in a subdir called 'Vehicle' instead of 'ROV'
- All files are CSV with header tags describing column names
- All of the data are timestamped with GMT seconds since 1-1-1970 (utcsecs)
- Logging starts/stops when the vehicle enters/leaves the water.
- Data files should start/end within a few seconds of each other.
- CTD and NAV are required.
- DVL (for altimeter) and ROV (for heading) are optional
Step 1. Archive, inspect and register original data files
The schematic above shows pieces involved in the initial archive and registration of minirov raw data files.
The sequence diagram below shows the main actions between the pieces. Cron runs the runMinirovFileArchiver shell script. The shell script activates a python virtual environment and invokes the script minirovfilearchiver.py that does all the work. It is invoked four times, once for each type file.
The crontab entry on coredata8 to run the runMinirovFileArchiver shell script
#sweep files from Dales directories to atlas archive and register with database load table
30 17 * * * /u/coredata/minirov-cos8/scripts/runMinirovFileArchiver >> /u/coredata/minirov/runlogs/minirovFileArchiver.log 2>&1
The contents of the runMinirovFileArchiver shell script.
#! /bin/sh
echo activating the python environment
source /u/coredata/minirov-cos8/venv-minirov/bin/activate
python /u/coredata/minirov-cos8/scripts/minirovfilearchiver.py NAV
python /u/coredata/minirov-cos8/scripts/minirovfilearchiver.py CTD
python /u/coredata/minirov-cos8/scripts/minirovfilearchiver.py DVL
python /u/coredata/minirov-cos8/scripts/minirovfilearchiver.py ROV
exit 0
minirovfilearchiver.py
- Crawls the //Atlas/ProjectLibrary/901123.ROV1k/Data/Logs/[NAV|CTD|DVL|ROV] directories looking for data files that have not been registered in the MinirovLogfiles database table.
- Copies them to the //Atlas/ShipData/minirov/logs/[NAV|CTD|DVL|ROV]/YYYY directory as the official permanent archive.
- NOTE: files on atlas in ShipData/minirov and are automatically write-protected (need IS admin to modify or delete)
- Files are also scanned to see if we can determine start/end times, number of records, and a crude test is done to see if columns contain ‘realistic’ data.
- Registers ALL files (even empty ones) in a database table on Expd (Table: MinirovLogfiles), along with start/endtimes etc. from above.
- On successful archive and registration, the isArchived bit is set in the MinirovLogfiles table and the isBlocked flag is cleared. isBlocked and isProcessed flags are currently read in the query but not used – they are meant to be back compatable with how other core ship/rov data loads are handled
- If 'valid' data are found within the file, the archiver also registers the file along with its time and number of records metadata in the type-specific load tracking table e.g. MinirovRawCtdLoad, MinirovRawNavLoad, MinirovRawRovLoad or MinirovRawDvlLoad (this is where device-specific processing is controlled and tracked by subsequent steps in the minirov workflow)
- The type-specific tables have flags initialized that ARE used by subsequent steps in the minirov workflow (isLoaded, isBlocked, isProcessed etc).
Step 2. Minirov CTD Load
Nightly CTD processing on coredata(8)
The schematic above shows pieces involved ingest of raw CTD data into the expd database.
Cron runs the runMinirovCtdUploader shell script.
00 20 * * * /u/coredata/minirov-cos8/scripts/runMinirovCtdUploader >> /u/coredata/minirov/runlogs/minirovCtdUploader.log 2>&1
#! /bin/sh
echo activating the python environment
source /u/coredata/minirov-cos8/venv-minirov/bin/activate
python /u/coredata/minirov-cos8/scripts/ctdfileloader.py
exit 0
The shell script activates a python virtual environment and invokes the script ctdfileloader.py that does all the work.
- This script queries the MinirovRawCtdLoad table to see if it can find any files that have not been loaded.
- If it finds a CTD file that has not been processed, it parses the file, writes the data into rows in the proper MinirovRawCTD_YYYY table.
- Then updates the entry in the MinirovRawCtdLoad table to keep track of the files that have been successfully loaded (or failed)
- Note: Clearing the isLoaded field of the load table will cascade delete dependent rows in the MinirovRawCTD_YYYY table if data needs to be backed out.
Nightly CTD processing on draco
Final processing and merge into the expd database ctd tables for all rov's is done by the microsoft TaskScheduler on draco.
draco d:\EXPD_Data_Loads\RovCtd\processRawMinirovCtd.bat and processMinirovCtdData_perseus.pl
REM -- Run the perl script to process data from the raw ctd tables
perl D:\EXPD_Data_Loads\RovCtd\processMinirovCtdData_perseus.pl D:\EXPD_Data_Loads\Log\EXPD_processminirovctd.txt
exit /b 0
- Every night, ‘Task Scheduler’ on Draco (windows machine) runs a script that looks for MinirovRawCtdLoad table files the are loaded but have isProcessed =0
- On success, the core ctd raw 1second and 15sec binned tables are loaded and the isProcessed field of the load table is set.
- Note: Clearing the isProcessed field of the load table will cascade delete dependent row in the processed ctd data tables if data needs to be backed out.
- NOTE: We compute salinity using standard mbari algorithms
- NOTE: O2 QC flags are currently set to suspect.
Step 3. Minirov NAV load (plus ROV,DVL)
Nightly Nav processing on coredata(8)
The schematic above shows pieces involved ingest of raw minirov NAV data into the MBARI RovNavEdit processing stream.
Cron runs the runMinirovNavUploader shell script.
30 20 * * * /u/coredata/minirov-cos8/scripts/runMinirovNavUploader >> /u/coredata/minirov/runlogs/minirovNavUploader.log
#! /bin/sh
echo activating the python environment
source /u/coredata/minirov-cos8/venv-minirov/bin/activate
python /u/coredata/minirov-cos8/scripts/minirov2mbsystem.py
exit 0
The shell script activates a python virtual environment and invokes the script minirov2mbsystem.py that does most of the work. It uses the MinirovRawNavLoad, MinirovRawCtdLoad, MinirovRawCtdLoad and MinirovRawDvlLoad tables to find datasets that need to be processed into the MBARI RovNav editing pipeline. If it finds datasets that meet criteria it spawns the MBSystem program mbminirovnav program to produce an 'mb165' format file and stages it in the RovNavEdit share on Atlas for video lab personnel to clean.
Note
The main criteria is that the files all begin within 60 seconds of each other to be considered a Dataset
The actual Expd database load is performed by Windows TaskScheduler on Draco. see D:\EXPD_Data_Loads\Navdata\load_minirovnav.bat
Installation Steps
First clone the repo from bitbucket into user coredata's home directory on coredata8
cd ~
git clone git clone https://youruserid@bitbucket.org/mbari/minirovdatamanagement.git
You should now have a folder: ~/minirovdatamanagement
Next we set up a python2.7 virtual environment. The virtenv tool for python 2.7 should already be installed in /bin by IS. Let's check it:
which virtualenv
/bin/virtualenv
virtualenv --version
virtualenv 20.0.20 from /usr/lib/python2.7/site-packages/virtualenv/__init__.pyc
So now we can create the virtual environment venv-minirov
cd ~
mkdir minirov-cos8
cd minirov-cos8
virtualenv venv-minirov
And test we can activate it
Note
We need to be in bash shell not csh to (csh does not work, see the [following issue](https://github.com/conda/conda/issues/3176)
And while we are there, we can go ahead and install the pymssql package
cd ~/minirov-cos8/
~/minirov-cos8]$ bash
source ./venv-minirov/bin/activate
pip install pymssql
deactivate
exit
Next we make the scripts and runlogs directories and copy over the scripts from the repo source.
cd ~/minirov-cos8
mkdir scripts
mkdir runlogs
cp ~/minirovdatamanagement/scripts/* ./scripts
Next we patch the ‘official’ netrc.py that comes with python to comment out the lines at approx. 106-110 re. file permissions being too permissive. We currently are using NTFS directories on atlas that do not expose unix-style file permissions. We will use the file: netrc.patch from our scripts directory. The patch result will be placed in our local virtualenv python site-packages directory.
Note
This hack will also requires the any python scripts that connect to mssql server to include a step that modifies the search path that will cause site-packages to be the first location python looks at and finds our modified netrc.py.
cd ~/minirov-cos8/scripts
cp /usr/lib64/python2.7/netrc.py ../venv-minirov/lib/python2.7/site-packages/netrc.py
patch -u -b ../venv-minirov/lib/python2.7/site-packages/netrc.py -i ./netrc_2.7.patch
Check for the correct perseus expd database login in /u/coredata/.netrc file. I do not document what that setup here for obvious security reasons. Talk to other developers in the group on how to configure that file.
Now we can check that the database connections work using the dbtest.py script.
cd ~/minirov-cos8/scripts
./runDatabaseConnectTester.sh
activating the python environment
(3, u'Schramm', u'Rich', u'rich@mbari.org', u'MBARI', True, True, UUID('b0331cc1-ac18-4ab3-9309-23fe51930f66'))
did it3
ID=3, LastName=Schramm
done
You will also need the MBSystem program mbminirovnav installed (See Karen Salamy)
Lastly make sure cron jobs are correctly set.
###### minirov processing #####
00 * * * * /u/coredata/minirov-cos8/scripts/runMinirovDivelogUploader >> /u/coredata/minirov-cos8/runlogs/minirovDivelogUploader.log 2>&1
30 17 * * * /u/coredata/minirov-cos8/scripts/runMinirovFileArchiver >> /u/coredata/minirov-cos8/runlogs/minirovFileArchiver.log 2>&1
00 20 * * * /u/coredata/minirov-cos8/scripts/runMinirovCtdUploader >> /u/coredata/minirov-cos8/runlogs/minirovCtdUploader.log 2>&1
30 20 * * * /u/coredata/minirov-cos8/scripts/runMinirovNavUploader >> /u/coredata/minirov-cos8/runlogs/minirovNavUploader.log 2>&1
Current issues
- No Flmr in PlatformLookup table in EXPD - its there as of 6/1/21 -rs
- No data in 901123.ROV1k Share that I can see. - yes, see Project share on Atlas 901123ROV1k/Data/Logs -rs
- Question: There is a MinirovRawNavLoad table, but no MinirovRawNavData_YYYY tables? Correct, data are loaded thru the RovNavEdit workflow directly into the same tables as Ventana and DockRicketts. -rs