SSDS Transmogrify and Ingest Architecture
In order to handle data streams in the SSDS, a set of components was developed to handle instrument data. These components are known as "Transmogrify" (nod to Bill Watterson here) and "Ingest". These are the components that take in data and metadata streams and perform various tasks with them as they are brought into the SSDS. There are two messaging infrastructures that are used to ingest data into the SSDS, JMS and AMQP. JMS was the original messaging scheme and works really well, especially considering you can spin up Message Driven Beans in JBoss and JBoss will manage the pools. This, however, limited the clients to Java for publishing messages. While certainly fine for our initial implementation for the MOOS program, we wanted to open up the messaging to more than just Java clients. We added an application that connects to an AMQP broker and processes messages. This allows non-Java clients to publish packets to the SSDS.
Here is a logical diagram of the message flow for ingesting data into the SSDS.

The JMS flow consumers are all implemented as Message Driven Beans (EJB) and are wired in series with Transmogrify feeding into Ingest. The original reason for having both was that Transmogrify was to handle data and metadata from the SIAM system that had some special characteristics and needed to be converted to the generic message format that the SSDS is expecting. If someone wanted to send the generic format, they could bypass the Transmogrify and send straight to Ingest. As with all things, patchwork becomes production and Transmogrify lives on to this day.
Here is a sequence diagram of the basic steps that occur when a packet is submitted via JMS to the SSDS (at least through transmogrify and ingest).

After the ingest step, if the packet is a metadata packet, it is forwarded on to the RuminateMDB (covered on another page). The protocol buffers ingest MDB operates the same way except that in the IngestProtoMDB, the packet gets converted from the incoming protocol buffer to an SSDSDevicePacket which is then persisted in the same was as the IngestMDB does it.
Transmogrify Packet Structure
Our initial and primary publisher of data for the SSDS was the SIAM infrastructure. Initially they would publish serialized Java objects that were SIAM DevicePacket classes. In the SSDS, we created SSDSDevicePacket sub classes to map the SIAM DevicePacket information to the SSDS world. Here is a class diagram of the classes involved:

The light colored classes are the SIAM classes. The SSDSDevicePacket classes were subclassed from the SIAM classes to add some functionality to assist in converting between the two. The PacketUtility class is a helper class that is used to convert between the different formats of serialization and objects. The SSDSDeviceProto and inner MessagePacket class were the Java classes that were generated from the Google protocol buffers definition.
Just as a note, the idea with the SSDSGeoLocatedDevicePacket was that if there was a way to link a particular device to another device that was providing geospatial data, you could correlate all the SSDSDevicePackets by time with that device and tag each individual packet with a geospatial reference. That work is still to be done.
When the Transmogrify component receives a SIAM DevicePacket, it converts it to a SSDSDevicePacket using the PacketUtility class. Here are the various attributes on the classes and how they map to each other.
| DevicePacket | MetadataPacket | SensorDataPacket | DeviceMessagePacket | SensorStatusPacket | SSDSDevicePacket | SSDSGeoLocatedDevicePacket | SSDSDevicePacketProto MessagePacket | Description |
|---|---|---|---|---|---|---|---|---|
| sourceID | inherited | inherited | inherited | inherited | inherited | inherited | sourceID | This is the ID of the device that generated the packet |
| systemTime | inherited | inherited | inherited | inherited | inherited | inherited | Split into two fields timestampSeconds and timestampNanoseconds | This is the timestamp (in epoch milliseconds) when the packet was created by the system. This may or may not match instrument time if the clocks are not synchronized |
| sequenceNo | inherited | inherited | inherited | inherited | inherited | inherited | sequenceNumber | This is a number that should indicate the order of generation of the packet from the device |
| metadataRef | inherited | inherited | inherited | inherited | inherited | inherited | metadataSequenceNumber | This is the sequence number of the packet that contains the metadata that describes the contents of this packet. If it is a MetadataPacket, this has no meaning. |
| parentID | inherited | inherited | inherited | inherited | inherited | inherited | parentID | This is the SSDS ID of the device to which the generated device was connected when it generated this packet. Null means no parent. |
| recordType | inherited | inherited | inherited | inherited | inherited | inherited but also equal the local recordType | packetSubType | This defines the "Type" of record that this packet contains. Devices can send many forms of records, error messages, etc. and this help define what is actually in the payload for this message. There are three main options here: -1 = This means the record type has not been defined; 0 = Metadata packet which contains information about the instrument or other aspects of the observatory. The SSDS definition of a metadata packet encompasses all the various metadata packets in SIAM. So this means that MetadataPacket and DeviceMessagePacket from the SIAM world are both just tagged a record type 0. 1+ = Data packets and they can be of any kind. The record type allows the device driver writer to group messages that are of the same format (usually). Since the serialized class method is not used anymore, transmogrify ignores SensorStatusPackets which were developed later and use a different serialization method. |
| X | bytes | X | X | X | dataBuffer | inherited | bufferBytes | This is a payload that contains information like service properties, SSDS XML, etc. In the SSDSDevicePacket constructor, the bytes buffer is mapped into the dataBuffer |
| X | cause | X | X | X | otherBuffer | inherited | bufferTwoBytes | Another set of bytes that was meant to hold information about why the metadata packet was generated. In the SSDSDevicePacket constructor, it is mapped to the otherBuffer |
| X | X | dataBuffer | X | X | dataBuffer | inherited | bufferBytes | This is the sample from the device that is packaged in an array of bytes. In the SSDSDevicePacket constructor, the dataBuffer is mapped to the dataBuffer |
| X | X | X | message | X | dataBuffer | inherited | bufferBytes | This is the message contents that are packaged into an array of bytes. In the SSDSDevicePacket constructor, the _message is mapped to the dataBuffer |
| X | X | X | X | statusBytes | X | X | X | This is the message about the instrument status as an array of bytes. Since we broke from serialized objects before this class existed, SSDS ignores this type of object. |
| X | X | X | X | cause | X | X | X | Some message, as an array of bytes, that describes why the status message was sent. Since we broke from serialized objects before this class existed, SSDS ignores this type of object. |
| X | X | X | X | X | dataDescriptionVersion | inherited | dataDescriptionVersion | This is used to indicate minor metadata changes that were not enough to create new SSDS "buckets" which were actual storage file before moving to a database. |
| X | X | X | X | X | packetType | inherited | packetType | This is an integer to define what type of packet this is: 0 = MetadataPacket; 1 = SensorDataPacket; 2 = DeviceMessagePacket |
| X | X | X | X | X | X | longitude | X | Longitude where the packet was generated |
| X | X | X | X | X | X | latitude | X | Latitude where the packet was generated |
| X | X | X | X | X | X | depth | X | Depth (m) where the packet was generated |
We quickly ran into versioning and deserialization issues, so instead, SIAM began publishing their packets as javax.jms.BytesMessages that had a structured payload that was the result of the export method of the Exportable interface. So, what comes across in a BytesMessage is an array of bytes in a payload that looks like the following:

where:
| Name | Type | Description |
|---|---|---|
| StreamID | java.lang.short | This basically states that the bytes are coming from a SIAM ExportablePacket class. SIAM uses constants defined in the org.mbari.siam.distributed.Exportable.java class to enumerate things like this and the short value for this is always 0x0100. SSDS Doesn't really care so we essentially ignore it. |
| DevicePacketVersion | java.lang.long | This is the "serialVersionUID" on the class that was used to export the bytes. It is the version of SIAM class that generated the byte array. SSDS does not really care and as of this writing, it is always 0. |
| SourceID | java.lang.long | The ID of the device that the message was generated by. |
| Timestamp | java.lang.long | Epoch milliseconds (number of elapsed milliseconds since 1/1/1970 00:00:00) that the packet was generated by the device |
| SequenceNumber | java.lang.long | A number that is supposed to show the order of generation of packets from the device |
| MetadataRef | java.lang.long | This is the sequence number of the packet that contains the metadata that describes the information in this packet |
| ParentID | java.lang.long | The ID of the device that the generated device was attached to when it generated the packet |
| RecordType | java.lang.long | The type of record that this packet contains: 0 = MetadataPacket, 1 = Non-MetadataPacket (Data and other) |
| SecondStreamID | java.lang.short | This defines the type of DevicePacket that was used to construct the byte array. The values are as follows: MetadataPacket = 0x0101, SensorDataPacket = 0x0102, DeviceMessagePacket = 0x0103, SummaryPacket = 0x0102 (same as SensorDataPacket) |
| SecondPacketVersion | java.lang.long | This is the "serialVersionUID" on the class that was used to export the bytes. It is the version of SIAM class that generated the byte array. As of this writing, it is the same as the DevicePacketVersion. Since currently it is always 0, SSDS ignores it. |
| FirstBufferLength | java.lang.int | This is the length of the array that holds the bytes of the first buffer |
| FirstBuffer | java.lang.byte [] | This is the bytes array that represents the first buffer |
| SecondBufferLength | java.lang.int | This is the length of the array that holds the bytes of the second buffer. |
| SecondBuffer | java.lang.byte [] | This is the array that holds the bytes of the second buffer. |
Now, in order to handle both types of inputs in Transmogrify (DevicePackets and BytesMessage structure), Transmogrify would take both and convert to a common format that would contain the information to cover both types of messages. Since the BytesMessage structure encompasses all the information in the DevicePacket, we simply used that byte structure and in Transmogrify, a DevicePacket is converted to a SSDSDevicePacket which is then converted to the same BytesMessage structure using the methods in the PacketUtility class. Once the DevicePacket object has been converted to a SIAMFormatByteArray, Transmogrify then converts that message, using the PacketUtility class, to an SSDSFormatByteArray that is then used to send on to the Ingest component. Here is the mapping that is used to convert from SIAMFormatByteArray to the SSDSFormatByteArray.

Notes on the conversion:
- The dashes lines mean that, as of this writing, SSDS ignores those attributes in the array. The reason the arrow is drawn is that if SIAM decides to use this attribute, SSDS may have to pay attention to it in the construction of its byte array.
- The DevicePacketVersion, SecondStreamID, and SecondPacketVersion are used to determine the correct packetType (although, right now, DevicePacketVersion and SecondPacketVersion are ignored).
- For MetadataPackets, the packetType is 1.
- For SensorDataPackets, the packetType is 0 (SummaryPackets come across as SensorDataPacket and are differentiated by their recordType).
- For DeviceMessagePackets, the packetType is 4.
- If the incoming packet is a MetadataPacket, the packetSubType is set to 0. Otherwise, it is set to the RecordType field.
- The RecordType is set to zero if the packet is a MetadataPacket and set equal to the RecordType from SIAM if not a MetadataPacket.
- The buffers are swapped if it is a MetadataPacket. It always seemed to logical to do it that way.
- This timestamp (epoch milliseconds) is split into seconds and nanoseconds.
WARNING: Please note that because byte arrays are limited to 32 bit sizes, the largest payload of a message that can be converted by SSDS is 2GB. While this does not seem like a major restriction, it can be hit if somebody is using straight JMS messaging (or other) and makes a payload bigger than 2GB. SSDS will just ignore such a message.
Ingest Packet Structure
So now we have all messages coming into Ingest in a format that SSDS is expecting (i.e. that matches the SSDS view of the world). For the diagram in the previous section, the attributes in the SSDS Bytes Array are:
| Attribute | Type | Description |
|---|---|---|
| sourceID | java.lang.long | This is what is known as the SSDS ID for the device (i.e. DeviceID) that actually generated the packet of information. |
| parentID | java.lang.long | This is the SSDS ID for the parent that the device was connected to when it sent the packet. If the ID is zero (0), then the generating device was not connected to a parent. |
| packetType | java.lang.int | This is the "Type" of packet that is being sent. It is basically an enumerated list the with following context: 0 = Data Packet, 1 = Metadata Packet, 2 = , 3 = , 4 = Device Message Packet |
| packetSubType | java.lang.long | This is the equivalent of the "recordType" listed in the Transmogrify component. It is used to provide the hook to tell the client applications what type of record was sent. It really only has meaning in the context of data packets as a device can often send data packets of different formats. This basically tells the application which record form is being sent in this packet. |
| metadataSequenceNumber | java.lang.long | Also referred to as dataDescriptionID |
| dataDescriptionVersion | java.lang.long | |
| timestampSeconds | java.lang.long | |
| timestampNanoseconds | java.lang.long | |
| sequenceNumber | java.lang.long | |
| bufferLen | java.lang.int | |
| bufferBytes | java.lang.byte [bufferLen] | |
| bufferTwoLen | java.lang.int | |
| bufferTwoBytes | java.lang.byte [bufferTwoLen] |
The Ingest Message Driven Bean (MDB) then takes that byte array and using a PacketOutput class that corresponds to the correct source ID, metadataSequenceNumber, packetSubType, and parentID, it writes the packet to disk. It then uses a PacketSQLOutput to write that same packet to a table in the database.
Packet Translations
So, through all this, there are basically six representations of data packets in the SSDS ecosystem:
- SIAM Device Packet (and its sub classes MetadataPacket, SensorDataPacket, DeviceMessagePacket)
- SIAM Byte array (from Exportable class)
- SSDSDevicePacket
- SSDSGeoLocatedDevicePacket
- SSDS Byte array
- SSDS Protocol Buffers Format
Here is a diagram of these various forms of data

And the translation rules (some of these may seem very strange for legacy reasons).
DevicePacket to SSDSDevicePacket
This translation is done in the constructor of SSDSDevicePacket which takes in a DevicePacket
| DevicePacket | Translation Rule | SSDSDevicePacket |
|---|---|---|
| sourceID | direct copy | sourceID |
| systemTime | direct copy | systemTime |
| sequenceNo | direct copy | sequenceNo |
| metadataRef | direct copy: metadataRef->metadataRef, metadataRef->metadataSequenceNumber, metadataRef->dataDescriptionID | metadataRef, metadataSequenceNumber, dataDescriptionID |
| parentId | direct copy: parentId->parentId, parentId->platformID | parentId, platformID |
| recordType | MetadataPacket: recordType to 0, Other: direct copy | recordType |
| If MetadataPacket: packetType = 0, If SensorDataPacket: packetType = 1, If DeviceMessagePacket: packetType = 2 | packetType | |
| firstBufferLength | ignored | |
| firstBuffer | First buffer depends on which type of packet: If MetadataPacket: copy "bytes" buffer, If SensorDataPacket: copy "dataBuffer", If DeviceMessagePacket, copy "message" | firstBuffer |
| secondBufferLength | ignored | |
| secondBuffer | Only exists if MetadataPacket and will copy over "cause" buffer | secondBuffer |
DevicePacket to SIAM Byte Array
This is done by the SIAM Exportable Packet class
| DevicePacket | Translation Rule | SIAM Byte Array |
|---|---|---|
| This is a static value that is set to indicate the byte array is a DevicePacket and is set to 0x0100 | EX_DEVICEPACKET | |
| This is just the serial version UID of the class which is always 0 | serialVersionUID | |
| sourceID | direct copy | sourceID |
| systemTime | direct copy | systemTime |
| sequenceNo | direct copy | sequenceNo |
| metadataRef | direct copy | metadataRef |
| parentId | direct copy | parentId |
| recordType | direct copy | recordType |
| This is set based on what type of packet: If MetadataPacket: set to 0x0101, If SensorDataPacket: set to 0x0102, If DeviceMessagePacket: set to 0x0103 | EX_XXXXXXPACKET | |
| This is just the serial version UID of the class which is always 0 | serialVersionUID | |
| firstBufferLength | direct copy (see note for first buffer) | firstBufferLength |
| firstBuffer | This depends on the type of packet: If MetadataPacket: "cause" bytes are copied over, if SensorDataPacket: "dataMessage" bytes are copied over, if DeviceMessagePacket, "message" bytes are copied over | firstBuffer |
| secondBufferLength | only valid with MetadataPacket, but is copied directly over | secondBufferLength |
| secondBuffer | only valid with MetadataPacket, and the "buffer" bytes are copied over | secondBuffer |
SSDSDevicePacket to SIAM Byte Array
Originally done in TransmogrifyMDB by calling SSDSDevicePacket.convertToPublishableVersion3ByteArray before passing byte array to method to translate to SSDS format and then send to ingest
| SSDSDevicePacket | Translation Rule | SIAM Byte Array |
|---|---|---|
| This is a static value that is set to indicate the byte array is a DevicePacket and is set to 0x0100 | EX_DEVICEPACKET | |
| This is just the serial version UID of the class which is always 0 | serialVersionUID | |
| sourceID | direct copy | sourceID |
| systemTime | direct copy | systemTime |
| sequenceNo | direct copy | sequenceNo |
| metadataSequenceNumber | direct copy | metadataRef |
| parentID | direct copy | parentID |
| recordType | If packetType = 0 then set recordType = 0, if packetType = 1 then set recordType = recordType, if packetType = 2 then set recordType = recordType | recordType |
| This is set based on what type of packet: If packetType = 0 then set to 0x0101, if packetType = 1 then set to 0x0102, if packetType = 2 then set to 0x0103 | EX_XXXXXXPACKET | |
| This is just the serial version UID of the class which is always 0 | serialVersionUID | |
| This depends on the type of packet: if packetType = 0 then set to length of "otherBuffer", if packetType = 1 then set to length of "dataBuffer", if packetType = 2 then set to length of "dataBuffer" | firstBufferLength | |
| other/dataBuffer | This depends on the type of packet: If packetType = 0 then set to bytes from "otherBuffer", if packetType = 1 then set to bytes from "dataBuffer", if packetType = 2 then set to bytes from "dataBuffer" | firstBuffer |
| This only exists if it is packetType = 0 then it is set to the length of the "dataBuffer" | secondBufferLength | |
| dataBuffer | This only exists if it is packetType = 0 then it is set to the byte from the "dataBuffer" | secondBuffer |
SSDSDevicePacket to SSDS Byte Array
Originally done in SSDSDevicePacket.convertToVersion3ByteArray now in PacketUtility
| SSDSDevicePacket | Translation Rule | SSDS Byte Array |
|---|---|---|
| sourceID | direct copy | sourceID |
| systemTime | ignored during the translation directly, but used through getter methods for seconds and nanoseconds | |
| timestampSeconds | direct copy (note that this is sort of a direct copy, there are getter methods on SSDSDevicePacket that convert the systemTime to seconds and nanoseconds when called). | timestampSeconds |
| timestampNanoseconds | direct copy (note that this is sort of a direct copy, there are getter methods on SSDSDevicePacket that convert the systemTime to seconds and nanoseconds when called). | timestampNanoseconds |
| sequenceNo | direct copy | sequenceNumber |
| metadataRef | ignored | |
| parentID | ignored | |
| recordType | If packetType = 0, set packetSubType to 0, otherwise set to recordType | packetSubType |
| packetType | Depends on packetType: if packetType = 0 then set to 1, if packetType = 1 then set to 0, if packetType = 2 then set to 4 | packetType |
| metadataSequenceNumber | direct copy | metadataSequenceNumber |
| dataDescriptionVersion | direct copy | dataDescriptionVersion |
| platformID | direct copy | parentID |
| copy length of dataBuffer | firstBufferLength | |
| dataBuffer | direct copy | firstBuffer |
| copy length of otherBuffer | secondBufferLength | |
| otherBuffer | direct copy | secondBuffer |
SIAM Byte Array to SSDS Byte Array
Originally done in TransmogrifyMDB in checkAndPublishBytes method now in PacketUtility
| SIAM Byte Array | Translation Rules | SSDS Byte Array |
|---|---|---|
| EX_DEVICEPACKET | ignored | |
| serialVersionUID | ignored | |
| sourceID | direct copy | sourceID |
| Depending on EX_XXXXXXXPACKET: if MetadataPacket then set packetType to 1, if SensorDataPacket then set packetType to 0, if DeviceMessagePacket, set packetType to 4 | packetType | |
| This was set using the SIAMMetadataTracker that tried to keep track of real version numbers based on XML in payload but that no longer exists, it is a direct copy | metadataSequenceNumber | |
| systemTime | Split into timestampSeconds and timestampNanoseconds | timestampSeconds and timestampNanoSeconds |
| sequenceNo | direct copy | sequenceNumber |
| metadataRef | direct copy | dataDescriptionVersion |
| parentId | direct copy | parentID |
| recordType | If MetadataPacket (determined from EX_XXXXXXXPACKET), recordType set to 0, otherwise set to recordType | packetSubType |
| EX_XXXXXXXPACKET | ignored in storage, but used in logic | |
| serialVersionUID | ignored | |
| first/secondBufferLength | If MetadataPacket (determined from EX_XXXXXXXPACKET), firstBufferLength is set to secondBufferLength so we can flip the "cause" and "buffer" bytes because it just made more sense since the cause was rarely populated. Otherwise set to firstBufferLength | firstBufferLength |
| first/secondBuffer | If MetadataPacket (determined from EX_XXXXXXXPACKET), firstBuffer is set to secondBuffer so we can flip the "cause" and "buffer" bytes because it just made more sense since the cause was rarely populated. Otherwise set to firstBuffer | firstBuffer |
| firstBufferLength | If MetadataPacket (determined from EX_XXXXXXXPACKET), secondBufferLength is set to firstBufferLength to flip "cause" and "buffer" bytes | secondBufferLength |
| firstBuffer | If MetadataPacket (determined from EX_XXXXXXXPACKET), secondBuffer is set to firstBuffer to flip "cause" and "buffer" bytes | secondBuffer |
Transmogrify and Ingest Servlets
To enable clients to send data in to the SSDS over HTTP, there are two servlets that can be called to insert data. The Transmogrify servlet takes in http request parameters that should match the fields that are found in SIAM. The servlet then extracts the values from the query parameters and creates a byte array in SIAM form and publishes it to the Transmogrify topic as if it came in from a JMS client. This allows for non-Java clients to send in data through the normal messaging pipeline. The same goes for the Ingest servlet, but that servlet takes in parameters that match the SSDS format of data. So, for the Transmogrify servlet, the URL would be constructed of the following pieces:
http://your.host.com/transmogrify/Transmogrify?
- response=true (this indicates if you want the call to send back and HTTP response or not)
- &SourceID=101 (this is the ID of the device that is sending the packet)
- &Timestamp=2011-02-11T18:50:00 (this is the timestamp on the packet expressed in ISO form)
- &SequenceNumber=1 (this is the sequence number on the packet)
- &SecondStreamID=0x0102 (this is important as it defines the type of packet that is being sent. The options are: 0x0101 for a metadata packet, 0x0102 for a sensor or summary data packet, 0x0103 for a device message packet)
- &FirstBuffer=SGVsbG8gSW5nZXN0IQ== (this is the byte array in 64 bit encoded form)
- &RecordType=1 (if the second stream ID indicates this packet is a sensor data packet, this field defines a type identifier to specify which type of data is being sent. Some devices can send multiple kinds of data)
- &ParentID=100 (the id of the parent that the source ID was attached to when it created the packet)
- &MetadataRef=0 (this is the sequence number of the metadata packet that contains the information that describes what is in the payload of this packet)
For the Ingest servlet, the parameters are:
http://your.host.com/ingest/Ingest?
- response=true (this indicates if you want the call to send back and HTTP response or not)
- &SourceID=101 (this is the ID of the device that is sending the packet)
- &ParentID=100 (the id of the parent that the source ID was attached to when it created the packet)
- &PacketType=1 (this is the type of packet being sent: 0 = data packet, 1 = metadata packet, 4 = device message packet)
- &PacketSubType=1 (this is the type of payload if the packet is a data packet)
- &MetadataSequenceNumber=0 (this is the sequence number of the packet that contains the metadata describing the payload of this packet)
- &DataDescriptionVersion=0 ()
- &Timestamp=2011-02-11T18:50:00 (this is the timestamp on the packet expressed in ISO form)
- &SequenceNumber=1 (this is the sequence number on the packet)
- &FirstBuffer=SGVsbG8gSW5nZXN0IQ== (this is the byte array in 64 bit encoded form)