ROVCTD O2 Reprocessing Example
revised 21-Apr-2021
1) Preliminaries
Why might we need to back-out and reprocess data.
-
O2 sensors on CTD have slope/offset factors provided by Ed Peltzer determined from Winkler titrations. So we need to manually force the system to back-out and reprocess the CTD data in the Expd database over a specified datetime range.
-
One or more sensors failed at sea on a long cruise and the calibration was not updated before more data was logged.
-
There was a mistake in the calibration config file
-
The RovctdConfigFileMonitorService failed to pick up a change to the calibration file. Though rare, this has happened when Erich or the pilots have upgraded, repaired or modified the computer.
This document covers an example of the most complicated use-case where the sensors have already been replaced AND Ed determines that the slope/offsets of some of the dives logged with an 'older' calibration need changing
Once this use case if understood, the more simple case of just a slope/offset change within the normal day-to-day operations should become clear.
In either case, all things need to start with Erich modifiying the CTDCoeff.ini file on the shipboard rc-rctd or wf-rctd windows computers where the labview gui is run.
First several steps below will check the workflow that happens at sea to get the labview config file from the CTDTech's computer into the at-sea database.
-
Erich edits the file using Ed’s new slope/offsets and saves it.
-
The RovCtdCfgFileMonitorService running on the same computer should detect the change and will update the shipboard database on Glacierwind or Alaskanwind (typ takes < 30 seconds ).
-
Then - if swapping sensors out, Erich will need to update the CTDCoeff.ini file a second time with his new sensors.
The RovCtdCfgFileMonitorService is documented HERE
Warning
Your goal is to establish the proper chronological series of calibration entries in both the on-board and on-shore databases using the automated processes. This will save you a lot of by-hand corrections and confusion in the downstream processing.
2) Check the CTDcoeff.ini file
Note
I encourage you to check the date stamp of the CTDCoeff.ini on each system periodically and compare it to the datestamps in the calibration tables in Expd. If you RemoteDesktop into rc-rctd or wf-rctd - Ask first as you will bump whoever is on !!!
Check the file in use on the ship walking over to the boat or use remote desktop.
Below is a sample screengrab of the directory from the WesternFlyer's wf-rctd. I will be using the data shown in the discussion that follows as a 'use case' where we have to do the two sets of updates (new slope/offsets so we can back-correct old data, followed by a sensor swap and process of new sensor data)
Figure 1: Screengrab of CTD Coefficient files on WesternFlyer
Examine the results (see Figure 1):
- Note the timestamp of the current CTDcoeff.ini (8/6/2020 10:25) this should be the file with the ‘new’ sensors.
- Also in this case note Erich has carefully made himeself a safety copy of the prior file.
In our 'complex example' use case - this should be the one with with new slope/offsets from Ed. Hopefully Erich left it as exactly “CTDCoeff.ini” for long enough ( ~30secs) that it got picked up by the file monitor service.
3) Verify Cfg Loaded to Shipboard Database
Next we need to verify both cfgs came thru to the shipboard database. On alaskanwind.wf.mbari.org (remotely via Aquadata Studio), run the following query against the DOCR_CTD_Config table of the DOCR_CTD_Config database. (sorry-prob. should have given them different names....)
(On the Rachel Carson it would be server = Glacierwind and the VNTA_CTD_Config database)
SELECT top 10 * FROM [dbo].[DOCR_CTD_Config] ORDER BY eventtime desc
Figure 2a: The on-ship database
Here are the query results.
Figure 2b: Query run against on-ship database
Examine the results in Figure 2b above.
-
Note the ‘localtime’ column shows two updates – the most recent one matches the most recent file in Fig1. The second-most-recent, while not exact match - is consistent with Erich having modified the slope/offset, letting it post, then renaming (saving) it as a backup.
-
Note the ‘id’ column shows the database assigned key for the entries – jot those down, we will use those values next.
4) Verify Cfg Loaded to On-shore Database
Next we will verify the workflow that gets the config changes from shipboard databases to on-shore database. Approximately this:
We need to verify both of those cfgs came thru to the on-shore database. Using Aquadata Studio expd database on perseus (as user=expddba)
- First check that they made it from Alaskanwind to the temporary table on Perseus EXPD database
SELECT top 10 * FROM [dbo].[RovCtdCfg_DocRicketts] ORDER BY eventtime desc
Figure 3 Checking if the cfgs made it from ship to the staging table in Expd on perseus
Examine the results (See Figure 3):
- Make sure all events have come across using the ‘id’ keys obtain from Fig 2.
- If not ... check using draco task scheduler, run the DocRickettsCtdCfgToPerseus job (or Ventana’s) and then re-run the above.
5) Verify Cfg Cascades to RovCtdCfg
Next check that the calibration cascades up to the parsed Expd RovCtdCfg table: On perseus (as expddba)
This is this is workflow we want to verify.
SELECT top 10 * FROM [dbo].[RovCtdCfg] ORDER BY startdtg desc
Figure 4
Examine the results (see Figure 4):
- You’ll notice combined data for both vehicles are in this table....
- Don’t expect start/end dtg’s to match the CTDCoeff.ini file timestamps... aren’t supposed to... (startdtg, enddtg is when a calibration is to be applied over a range of dives)
- What is important is that cfgfileid_fk fields match the ‘id’ fields of what came off of the ships.
-
You will notice that when compared to Figure 3 above, the most recent configs from the the DocRicketts haven’t cascaded up yet.
-
In this case, I happened to be working on this the ‘same day’ as Erich has made his updates. Since the cascade is set to run in the evening, they havent sync'd up yet. (task EXPD_NightlyRovCtdLoadToPerseus had not run yet).
-
Also note how the old, last-known docr config (cfgfileid_fk = 342) is valid beginning startdtg in October 2019, with no enddtg specified (this is the sign to the CTD processing it is the ‘most current’ calibration )
6) Force a Cascade If needed
So - based on results above – we need to go to Draco and force the cascade to happen.
Using RemoteDesktop, log in to draco as SHORE\DB_EXPD_TASKEXEC
Open/Run TaskScheduler
Execute job: EXPD_NightlyRovCtdLoadToPerseus_early (it includes a step that runs the calibration cascade, (specifically the script updateRovCtdCfg_perseus.pl)
Now re-run the above check Again on perseus (as expddba)
SELECT top 10 * FROM [dbo].[RovCtdCfg] ORDER BY startdtg desc
Figure 5
Examine the results again carefully (see figure 5)
- Ok, now there are entries for the two new cfgfileid_fk’s.
- The cascade update program also closed-out (set the ‘enddtg’ the old October 2019 config (cfgfileid_fk=342).
- Note also – there is an earlier cfgfileid_fk=343 but its start/end is prior to 342.... This is not a problem as fk’s are not always ‘sequential’- what is important is the time-span....
- Cascade update also closed out cfgfileid_fk = 344 (because it also found a newer cfgfileid_fk = 345).
- 344 is the one with slope/offsets we want to use to reprocess Ed’s dive series
- 345 is the one with slope/offsets we want to use going forward using the new sensors. Note that its enddtg is null – indicating to future processing it is the ‘most current’ calibration.
The above still isn’t quite what we need. Next we will have to manually update the start and end times of cfgfileid_fk’s 343 and 344 to accommodate the time-span of the dives Ed has specified as needing reprocessing using the slope and offset. That is – we need to set the endtime of 342 and the starttime of 344 to match start of the first dive in the series.
7) Adjust Calibration Start/End Datetimes
Ed said: “New correction factors to use beginning with docr-1218 and running through docr- 1264 and beyond ...”
On perseus (as expddba)
SELECT * FROM [dbo].[Dive] WHERE RovName = 'docr' AND DiveNumber = 1218
Figure 6
Examine the results (see figure 6): There is the starttime we need. NOTE IT IS GMT
Using the starttime from dive 1218 above (Fig 6), the query below, and Aquadata Studio’s editing feature to edit the endtime of id 342 and startime of 344.
SELECT top 10 * FROM [dbo].[RovCtdCfg] WHERE vehicle = 'docr' ORDER BY startdtg desc
Figure 7
Examine the results (see figure 7)
Warning
The text and figures below contain an error.. The DiveStartDtg from the dive table is in GMT. The startdtg and enddtg fields of the RovCtdCfg table are LOCAL time. So I should have converted GMT to Local before making that entry.. sorry!
Note
There is a trigger on the RovCtdCfg table startdtg and enddtg fields that does a conversion back to GMT fields further into the table. So you need to make your changes to the local time fields.
- Note how the enddtg for cfgfileid_fk=342 has been adjusted to close one second before dive 1218 started, and cfgfileid_fk=344 startdtg begins when dive 1218 started.
- Note the startdtg and enddtg time progression now make logical sense:
- Dives prior to 1218 will use id 342
- Dives starting with 1218 up thru 8/6/20 10:25:05 will now use id 344 o Any dives beyond 8/6/20 10:25 will use id 345
- Also – pls notice that ‘id’ for ‘cfgfileid_fk’ 342 is 1138... that is what has been use up to this point to process the data. We will see that id again in a moment....
- Next we will back out data from dives 1218 thru current (or whatever Ed specified....)
8) Prepare for the Reprocessing - Backout Old Data)
Now that calibrations have been updated we need to back out processed data so that new calibrations are applied. Following from above - we will back out data from DocRicketts dives 1218 thru current. Since data loads are done by “day-of-year” we need to find that for dive 1218:
SELECT divenumber, divestartdtg, datepart(yy, divestartdtg) as yr , datepart(dy,divestartdtg)
FROM [dbo].[DiveSummary]
where rovname = 'docr'
AND divenumber = 1218
The above returns yy=2019, doy=319
Note
Likewise if Ed had specified an ending dive we would retrieve its ‘diveenddtg’ values
Next we will use and Aquadata Studio’s editing feature to set isProcessed to 0 any place it’s a 1 for a dive in the range, (causes processing to cascade delete all dependant rows)
Warning
Cascade delete cant span years, so if your reprocessing work span year boundries (like our example does) – we do it in two chunks...)
Below shows looking up loaded data beginning day 319 of 2019
SELECT *
FROM [dbo].[RawCtdLoad] where deviceid = 1672 and yr = 2019
and yrday >= 319
order by yrday desc
Warning
that needs to be >= 319 because we want to include day 319 !!!!
Figure 8 - ALERT correct the qry above to be >= 319 because we want
to include day 319 !!!!
We want to set isProcessed to 0 any place it’s a 1, and for the processing to cascade delete...
- In aquadata, use execute-edit on the above
- Scroll to right – notice column ‘actionDescription’ status is ‘Processing completed thru binavg’
- Set several of the isProcessed bit to 0 then ‘save and refesh’ (or use ‘cmd-s’ shortcut on a mac)
- After several seconds – ‘actionDescription’ status should change to xxxx processed data rows unloaded’
- Repeat until all yrdays in range are unloaded.
Then do next year if needed.... Just modify the same query as needed (see below) and repeat setting the isProcessed bit.
SELECT *
FROM [dbo].[RawCtdLoad] where deviceid = 1672 and yr = 2020
and yrday > 0
order by yrday desc
So now all the ‘old’ processed data is gone.... (Both the 1Hz and 15 sec BinAveraged).
All that’s left is for draco to reprocess the raw data using the new calibrations. It can be done now by force or left to nightly processing to pick it up. Once that is done, you will likely need to go back in and set the QC flags to what Ed recommends.
9) Manually Start a Processing Run
Here is how to manually kick off reprocessing on draco if you dont feel like waiting for the normal middle of the night run.
-
Using RemoteDesktop, log in to draco as SHORE\DB_EXPD_TASKEXEC.
-
Open/Run Windows TaskScheduler
-
Execute job: EXPD_NightlyRovCtdLoadToPerseus_early
Note it can take several minutes up to several hours depending on how much data you are reprocessing. You can get some comfort that thing are proceeding by either looking at the log file on Draco or by re-running the RawCtdLoad queries and see the isProcessed flag being reset to ‘1’ and the actionDescriptions updating.
Or this query adds a helpful filter on actionDescriptions:
SELECT *
FROM [dbo].[RawCtdLoad]
where deviceid = 1672
and yr = 2019
and yrday >= 319
and actionDescription not like '%Load successful, 0 rows loaded%' order by yrday desc
Figure 9 - Note how isProcessed flags are starting to show complete processing
10) Manually Force a DiveSummary Update
Next - if you are doing a forced reprocessing, after all processing has completed, you should also do a forced update of the DiveSummary table.
Using RemoteDesktop, log in to draco as SHORE\DB_EXPD_TASKEXEC Open/Run TaskScheduler
Execute job: EXPD_UpdateDirtyDiveSummaries
11) Update QC Flags
You will need to re-QC_flag any dives that Ed has instructed to be flagged.
That process is fully described HERE: https://bitbucket.org/mbari/se-ie-doc/src/master/docs/systems/expedition-database/o2-qc-management.md