Operations and Troubleshooting
Navproc and Data Logging Flow
The section describes the operation and troubleshooting for the various components of the system.
Navproc
- For navproc, everything is started with the script 'navproc_start'. The location of that script depends on which vessel you are on (or in simulation), but is located in the
/home/ops/CoreNav/navproc-process/bin/[rc|sim|wf]directory. This script sets the location of thecore_nav_[rc|sim|wf].inifile which should be located in the/home/ops/CoreNav/navproc-process/bin/[rc|sim|wf]directory. It also sets the location of thebot_proc.cfgfile which is also located in the/home/ops/CoreNav/navproc-process/bin/[rc|sim|wf]directory. Also, in development, this startup launches a GUI, but should not when run in production on the ship. -
The start script then uses the procman-sheriff which reads the
bot_proc.cfgfile to launch the various publishers that are located in/home/ops/CoreNav/navproc-process/src. Each of these publishers reads from thecore_nav_[rc|sim|wf].inifile to figure out what serial ports they should connect to and what the communication paramaters are as well as other parameters that the process might need (LCM channel names, etc.). If you run aps -ef | grep CoreNavyou should see something like this:ops@corenav-rc-1:~/CoreNav/navproc-process/bin/sim$ ps -ef | grep CoreNav ops 282536 282530 0 11:27 pts/1 00:00:01 /home/ops/CoreNav/navproc-process/src/octans_pub/octans_pub /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini ops 282537 282530 0 11:27 pts/2 00:00:03 /home/ops/CoreNav/navproc-process/src/digiquartz_pub/digiquartz_pub /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini ops 282538 282530 0 11:27 pts/4 00:00:02 /home/ops/CoreNav/navproc-process/src/mbaridepth_pub/mbaridepth_pub /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini ops 282539 282530 0 11:27 pts/5 00:00:01 /home/ops/CoreNav/navproc-process/src/seabirdctd_pub/seabirdctd_pub /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini ops 282543 282530 0 11:27 pts/6 00:00:01 /home/ops/CoreNav/navproc-process/src/nmeaGPS_pub/nmeaGPS_pub /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini ops 282545 282530 0 11:27 pts/7 00:00:00 /home/ops/CoreNav/navproc-process/src/ventanaCSP_pub/ventanaCSP_pub /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini ops 282546 282530 0 11:27 pts/8 00:00:02 /home/ops/CoreNav/navproc-process/src/shipGyro_pub/shipGyro_pub /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini ops 282547 282530 0 11:27 pts/9 00:00:01 /home/ops/CoreNav/navproc-process/src/winfrog_pub/winfrog_pub /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini ops 282548 282530 0 11:27 pts/10 00:00:00 /home/ops/CoreNav/navproc-process/src/dvecsIn_pub/dvecsIn_pub /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini ops 282549 282530 0 11:27 pts/11 00:00:04 /home/ops/CoreNav/navproc-process/src/dvecsOut_pub/dvecsOut_pub /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini ops 282551 282530 0 11:27 pts/13 00:00:05 /home/ops/CoreNav/navproc-process/src/lapbox_hd_pub/lapbox_hd_pub /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini ops 282554 282530 0 11:27 pts/14 00:00:03 /home/ops/CoreNav/navproc-process/src/vorne_ser_out/vorne_ser_out /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini ops 282568 1094 0 11:27 ? 00:00:00 /bin/bash /home/ops/CoreNav/navproc-process/bin/sim/navproc_gui ops 282942 282530 0 11:29 pts/15 00:00:01 /home/ops/CoreNav/navproc-process/src/serial_output/serial_output /home/ops/CoreNav/navproc-process/bin/sim/core_nav_sim.ini serout_mesotech ops 283151 1094 0 11:30 ? 00:00:00 /bin/bash /home/ops/CoreNav/navproc-process/bin/sim/navproc_spy ops 284293 273753 0 11:36 pts/23 00:00:00 grep --color=auto CoreNav -
Each of the
_pubprocesses starts reading from their respective serial ports, parses the inputs they receive, construct LCM messages, and publish those on LCM channels. - Other useful bin commands are:
- kill_all_deputies: To be documented
- navproc_gui: To be documented
- navproc_kill: To be documented
- navproc_lcm_logger: To be documented
- navproc_lcm_url: To be documented
- navproc_ser_show: To be documented
- navproc_show: To be documented
- navproc_spy: To be documented
- navproc_start: To be documented
- show_all_deputies: To be documented
Logging
- While the
/usr/local/bin/logrutility is built from the navproc-logger project along with the other software (navproc-common and navproc-process), the actual process management happens under the utilities and scripts from the corelogging and bcserver BitBucket repositories. - The
bcserverrepository was checked out to the/home/ops/richdirectory and it was built and installed to/usr/local/binfrom there (see the README.md in that project for more instructions). - The corelogging project was cloned into the
/home/ops/coreloggingdirectory. -
All the bcserver and logging software is started and managed by cron jobs. At the time of this writing, the crontab looks like this:
# crontab for ouser=ops on Rachel Carson * * * * * /home/ops/corelogging/rc/scripts/startBdfserver > /home/ops/corelogging/rc/logs/out_bdf_server 2>&1 # we want to make sure these start at boot so we get the ports we need. @reboot /home/ops/corelogging/rc/scripts/startBdfserver > /home/ops/corelogging/rc/logs/out_bdf_server 2>&1 @reboot /home/ops/corelogging/rc/scripts/startServer shipnavserver > /home/ops/corelogging/rc/logs/out_shipnav_server 2>&1 @reboot /home/ops/corelogging/rc/scripts/startServer rovctdserver > /home/ops/corelogging/rc/logs/out_rovctd_server 2>&1 @reboot /home/ops/corelogging/rc/scripts/startServer videoserver > /home/ops/corelogging/rc/logs/out_video_server 2>&1 @reboot /home/ops/corelogging/rc/scripts/startServer m3rsserver > /home/ops/corelogging/rc/logs/out_m3rs_server 2>&1 @reboot /home/ops/corelogging/rc/scripts/startServer dataprobeserver > /home/ops/corelogging/rc/logs/out_dataprobe_server 2>&1 @reboot /home/ops/corelogging/rc/scripts/startServer winfrogserver > /home/ops/corelogging/rc/logs/out_winfrog_server 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startServer shipnavserver > /home/ops/corelogging/rc/logs/out_shipnav_server 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startServer rovctdserver > /home/ops/corelogging/rc/logs/out_rovctd_server 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startServer videoserver > /home/ops/corelogging/rc/logs/out_video_server 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startServer m3rsserver > /home/ops/corelogging/rc/logs/out_m3rs_server 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startServer dataprobeserver > /home/ops/corelogging/rc/logs/out_dataprobe_server 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startServer winfrogserver > /home/ops/corelogging/rc/logs/out_winfrog_server 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startLogr eventlogr > /home/ops/corelogging/rc/logs/out_event_logr 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startLogr shipnavlogr > /home/ops/corelogging/rc/logs/out_shipnav_logr 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startLogr rovctdlogr > /home/ops/corelogging/rc/logs/out_rovctd_logr 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startLogr videologr > /home/ops/corelogging/rc/logs/out_video_logr 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startLogr dataprobelogr > /home/ops/corelogging/rc/logs/out_dataprobe_logr 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startLogr m3rslogr > /home/ops/corelogging/rc/logs/out_m3rs_logr 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startLogr winfroglogr > /home/ops/corelogging/rc/logs/out_winfrog_logr 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startLogr gtdprologr > /home/ops/corelogging/rc/logs/out_gtdpro_logr 2>&1 * * * * * /home/ops/corelogging/rc/scripts/startLogr lodestarlogr > /home/ops/corelogging/rc/logs/out_lodestar_logr 2>&1 #* * * * * /home/ops/corelogging/rc/scripts/startLogr mrisilogr > /home/ops/corelogging/rc/logs/out_mrisi_logr 2>&1 #------- Nightly data gzip and transfer to shore via annonymous FTP ####**** DO NOT ENABLE ### 59 23 * * * /home/ops/corelogging/rc/scripts/rcAnnoFtpTransfer.perl rc > /home/ops/corelogging/rc/logs/out_nightly_trnsfr.log 2>&1 #### publish ship position via amqp, every even minute, try to kill it a minute later if its hung ##### DISABLED TEMPORARILLY ON 24MAR2021 SINCE THEY ARE GOING TO SEA ON OLD NAVPROC #####0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58 * * * * /home/ops/corelogging/common/shippositiongrabber/scripts/runNewShipPositionGrabber rc > /home/ops/corelogging/rc/logs/out_grabber.log 2>&1 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58 * * * * /home/ops/corelogging/common/shippositiongrabber/runShipPositionGrabber ship-position-grabber.py messaging.shore.mbari.org trackingvhost R_CARSON /home/ops/corelogging/rc/logs localhost 54003 > /home/ops/corelogging/rc/logs/out_grabber.log 2>&1 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59 * * * * /home/ops/corelogging/common/shippositiongrabber/killShipPositionGrabber -
Let's walk through what each script does:
- startBdfserver: This script first does a ps and looks for the taskname
bdfserverand if it is not running, starts the perl script namedbdfserver.pl. The bdfserver.pl script which starts a process that listens for UDP requests on a specific port (54022) and then runs a df to get disk usage and then parses that df response and replies to the request. This is so that a UDP client can request infomration about the disk usage on that machine. -
startLogr: This script first does a ps and looks for a logr process with the argument specified at the command line (e.g. eventlogr) and if it does not find one, it starts one. The logr process is given a 'TASKNAME' and then it uses that taskname to look up the correct section from the
/home/ops/corelogging/"$PLATFORM"/cfgs/core_logging.iniwhere PLATFORM is rc, sim, or wf. For example, if the taskname is 'rovctdlogr', it looks up this out of the .ini file:############################################################################## [rovctdlogr] ############################################################################## #name of coredata root directory coredataRoot = /home/ops/corelogging/sim #coredata console output logs dirctory coredataLogsDir = logs #coredata cfg files directory coredataCfgsDir = cfgs #coredata data files directory coredataDataDir = data # name of logr items config file (in the above cfgs dirctory) coredataLogItemsCfg = rovctdlogr.cfg #platform may be used in forming filenames or tagging datafiles etc #typically wfly or rcsn coredataPlatform = sim #Broadcast and client UDP port assignments coredataBroadcastPort = 54006 clientConnectPort = 54007-
The key parameters here are coredataLogItemsCfg, coredataBroadcastPort, and clientConnectPort. The coredataLogItemsCfg file points to a text file in the same directory as the .ini file and specifies the mappings between the LCM items and the legacy style data manager items. For example, here is the rovctdlogr.cfg file:
# ROV CTD Logging Configuration # Data logging is controlled by one of two modes: # # 1) UPDATE with a positive rate in seconds # and specifying all DM items on the list to be logged as 'Slave' # # 2) UPDATE with a positive rate in seconds # and specifying ** ONE ** a DM item on the # list to be logged as 'Trigger' # # UPDATE may be followed by a positive # timeout value in seconds. # set UPDATE 0 if any item has UpdateMode set to be used as a TRIGGER # ** only one item can be set a trigger...**** UPDATE 5 # ItemTypes must be one of: BOOL, INT, DOUBLE or STRING # MAX_AGE (in seconds): if payload timestamp exceeds MAX_AGE then set # item values on output to NO_PUB MaxAge = 0 skips the test. Generally # MAX_AGE should be set fairly large, its useful to detect a hung or crashed # publisher # Time = yes then pre-pend the item's timestamp # Set# = yes the pre-pend the item's sequence number # Allow_no_prov If the subscriber status is 'stale' we normally # set the item's fields to NO_PROV, you can overide this by setting Allow_no_provider # # Channel Type getItemName getItemType Alias Format Time Seq# UpdateMode MaxAge Allow_No_Prov # --------------------------------------------------------------------------------------------------------------------------------------------- LOG SEABIRD_CTD lcm_seabirdctd_sub INWATER BOOL ROV.CTD.INWATER %1d No No Slave 30 No LOG SEABIRD_CTD lcm_seabirdctd_sub TEMPERATURE Double ROV.CTD.TEMPERATURE %3.4lf Yes No Trigger 30 No LOG SEABIRD_CTD lcm_seabirdctd_sub CONDUCTIVITY Double ROV.CTD.CONDUCTIVITY %3.5lf No No Slave 30 No LOG SEABIRD_CTD lcm_seabirdctd_sub PRESSURE Double ROV.CTD.PRESSURE %5.1lf No No Slave 30 No LOG SEABIRD_CTD lcm_seabirdctd_sub ANALOG1 Double ROV.CTD.ANALOG1 %3.4lf No No Slave 30 No LOG SEABIRD_CTD lcm_seabirdctd_sub ANALOG2 Double ROV.CTD.ANALOG2 %3.4lf No No Slave 30 No LOG SEABIRD_CTD lcm_seabirdctd_sub ANALOG3 Double ROV.CTD.ANALOG3 %3.4lf No No Slave 30 No LOG SEABIRD_CTD lcm_seabirdctd_sub ANALOG4 Double ROV.CTD.ANALOG4 %3.4lf No No Slave 30 No LOG VENTANA_CSP lcm_ventanaCSP_sub PRESSURE Double ROV.PRESSURE %lf Yes No Slave 30 No LOG WINFROG lcm_winfrog_sub LAT Double ROV.POSITION.LAT %lf No No Slave 30 No LOG WINFROG lcm_winfrog_sub LON Double ROV.POSITION.LON %lf No No Slave 30 No LOG NMEA_GPS lcm_nmeaGPS_sub LAT Double SHIP.GPS.LAT %3.1lf No No Slave 30 No LOG VENTANA_CSP lcm_ventanaCSP_sub CTDPOWER INT ROV.CTD.POWER %d No No Slave 30 No -
In the above configuration, you can see the LCM Channel named SEABIRD_CTD, with type lcm_seabirdctd_sub, and item name of INWATER is mapped to a Alis (data manager name) of ROV.CTD.INWATER.
- In general, the
logrprocess is responsible for reading in the LCM messages specified in the .cfg file and then, using the corresponding 'alias', writing those to a log file in the/home/ops/corelogging/sim/datadirectory and then pushing out UDP messages on the specified port. Also, there is a/home/ops/corelogging/sim/logsdirectory where logging information from these processes are kept.
-
-
startServer: This script takes in the name of a corresponding logr (e.g. rovctdlogr). It first looks for a
bcserverprocess that is already running for that logr and if one does not exist, it starts it. The script uses the/home/ops/corelogging/"$PLATFORM"/cfgs/core_logging.inifile to find the corresponding UDP ports and then starts thebcserverprocess which consumes messages from thecoredataBroadcastPortand then caches that data. When a client makes a request on theclientConnectPort, the bcserver returns the most recent message. This was done to prevent flooding the network with UDP messages and only giving out data when requested. - simAnnoFtpTransfer.perl:
- runShipPositionGrabber: See the Ship Position Grabber docs
- killShipPositionGrabber: See the Ship Position Grabber docs
- startBdfserver: This script first does a ps and looks for the taskname
Below is a diagram of all these components and how they interact.
54003 -id shipnavserver"] bcserversn-->54003 54006-->bcserverrv["bcserver -bport 54006 -sport
54007 -id rovctdserver"] bcserverrv-->54007 end end end subgraph Remote Computer 54003-->DataProbe 54007-->DataProbe end
Helpful Tips with Logging
- When you are looking at log files, keep in mind that if a data item contains commas, that will throw off any CSV parsing. Sometimes data is written in it's raw form as one item that is labeled in the header. But, that item may (and sometimes does) have commas. The item is not deliniated by anything else, so the commas will break the alignment of any kind of generic CSV parser.
- In Dataprobe, there are 'Logr Age' item that are monitored. When Dataprobe reads data from an logr (through bcserver), it simply looks for the first column which should always be 'LOGHOST.SYSTEM.UTC' and uses that value to compute if the logger isn't working (if the timestarts to get larger than some range of values)
- You will often see 'NO_PROV', 'NO_PUB' or 'NO_ITEM' in the log files.
- NO_PROV In the logr code, it looks at the LCM message and if the message itself has the 'payload_stale' attribute as false, the logr records a NO_PROV for that item. The LCM code makes the decision if the item is stale and the logr just records NO_PROV when it does.
- NO_PUB In the logr code, it looks at each item and if the age of the timestamp is larger than 'MaxAge' that is defined in the associated *.cfg file, it writes a NO_PUB.
- NO_ITEM When logr is configured using the *.cfg file, it is set up to make LCM request for specific items from LCM types. If the LCM type does not have that item, it will return false for the 'item_valid' flag. If the logr sees a false item_valid flag, it will record a 'NO_ITEM' to the log file and often means the *.cfg file does not match what the LCM messages can provide. The lcm type code is buried in the corenav project under navproc-common/lcm_types/subscribers/.
- WINFROG.GLL.FIX.AGE This field is a bit funky so I thought I would record this here. The Winfrog system outputs this GLL message which is ingested by Navproc. The GLL format only contains hours, minutes and seconds. The navproc parser (winfrog_pub) grabs those numbers and then uses the system clock on the navproc machine to try and convert that to an actual date and time. There are a couple of problems. The winfrog machine runs on local time so the hours will never match UTC hours. It does it's best to calculate time and date, but this causes the GLL.FIX.AGE to look a little strange. When the ROV is operating and getting fixes underwater, the GLL.FIX.AGE will be the difference in the clock on the Winfrog machine and UTC (in seconds). So, for example, if the timezone was 7 hours off, the GLL.FIX.AGE should hold steady at 7*60*60=25,200 seconds. When the ROV is on deck and not getting underwater fixes, the hours, minutes and seconds will stop updating. As the UTC clock hours scroll through the day the GLL.FIX.AGE number will change and could range from -86,400 to 86,400.
Cutting back to old Navproc
Navproc Cut-over
- Change IP address to rcnavproc1
- rcnavproc1.rc.mbari.org settings
- IP Address: 134.89.22.19
- Default Route: 134.89.22.1
- Subnet Mask: 255.255.255.0
- DNS: 134.89.22.43, 134.89.22.28
- Add Digi serial ports
- digi addresses for Rachel Carson
- rctermserva.rc.mbari.org (134.89.22.61)
- rctermservb.rc.mbari.org (134.89.22.62)
- Enable auto-start in /etc/rc.local
Troubleshooting Running Processing
- Run LCM Spy to look at 'logr' strings
- Tail the .dat files in /coredata/data
- Use the dynamic XOJO udp client to troubleshoot. This only works for neatly formatted CSV messages and headers
- You can use the simple UDP client to just get the raw data messages, not parsed.
- There is an 'eventlogr.dat' file that gives you a listing of the processes that are starting stopping.
- document usage of
netstat -unlpto examine serial ports - If you need to look at what is coming out of a serial port, you can do that using
dinc. Make sure all the navproc processes have stopped (you can do that usingnavproc_killfrom ~/CoreNav/navproc-process/bin/[rc|sim|wf]) and then usingdincon the port you want to view. To exit dinc, type~.