Avro Usage

Avro Message Envelope and Content

An extensible message envelope treats all message elements uniformly and allows new message elements (content) to be defined without changing the envelope schema. Of course, to process these new message elements, the receiver must understand them. Making interpretation of/addition of envelope content separate from envelope is simply a way to allow evolution and adaptation and to allow different users/processors of messages to evolve independently.

Envelope Schema

The (fixed) envelope schema is an array of <int key, binary value> pairs representing distinct content elements.

[
{
  "namespace": "com.tattsgroup.gaming.venueenvelope",
  "name": "GenericItem",
  "type": "record",
  "fields": [{ "name": "key", "doc": "Key identifying the type of field contained in value.", "type": "int"}, { "name": "value", "doc": "A value of a type/format identified by key.", "type": "bytes" }]
},
{
  "namespace": "com.tattsgroup.gaming.venueenvelope",
  "name": "GenericEnvelope",
  "type": "record",
  "fields": [ { "name": "content", "type": { "type": "array", "items": "com.tattsgroup.genericenvelope.GenericItem" } } ]
}
]

The actual schema file, including documentation on currently defined fields, is here

A GenericItem's key identifies the type/form of its value.

A reader should process items that it recognises based on the key used and decode/convert the value accordingly.

The key identifies both usage and format/interpretation of the associated binary value.

Multiple occurrences of the same key are allowed in the array but whether this is valid/meaningful for a given key is defined for that key. There is no requirement for a specific order of content elements in the array, although it may be more efficient to order them so that those content elements that must be examined in order to route or filter, as well as those required to interpret other content fields appear earlier in the array.

Message processing steps within MaXaM that involve only certain subsets of content (typically, headers) can operate on the message envelope and read or modify individual content elements independently of other content (in particular, the message body).

As the envelope is extensible, the following should not be assumed to be a complete list of content types, but does define those content types required/used to implement the MaXaM messaging service itself, and in particular those visible/used at the interfaces to MaXaM.

The following GenericItem types are defined. This list enumerates the key value, followed by description of that specific type of GenericItem:

AvroMessageBody : Key=0. Value = A raw Avro encoded message body. The schema of the message is identified by another GenericItem. If an item with this key appears in a GenericEnvelope there must be exactly one such item and exactly one occurrence of each of one or more of the below schema/schema identifying GenericItems in the GenericEnvelope
AvroSchemaFingerprint : Key=3. Value = bytes representation of a 128 bit md5 schema fingerprint
Message Keys:
- Subject : Key = 10. Value = UTF-8 publication topic - identical to STOMP Destination header.
- Tombstone : Key = 12. Value = 64 bit little endian millisconds since unix epoch (1970). Marking as a Tombstone as of that date/time.

These types are described in more detail below.

Avro Schema Fingerprint (_AvroSchemaFingerprint_) Content Element

A robust hash (note - this does not mean cryptographic - no cryptographic properties are required) across a normalised form of the schema. The normalised form is the Avro canonical form, extended with the inclusion of the revision attribute to support versioning of nested types referenced by name).

Because crypto hashes are widely ported / available the easiest solution to a portable hash is to use MD5. Because the hashes are generally precomputed (the exception being for PUT of new schema) outright performance is not critical. Because the crypto properties of the hash are not used/required MD5's cryptographic strength (or lack of) is not important.

Subject (_Subject_) Content Element

Messages that represent publications (in the publish/subscribe sense) include a Subject header that identifies what the message is about.

The Subject content element SHOULD be populated by the SENDer with the same value as the STOMP destination header. MaXaM will validate this and reject inconsistent values. If the value is not set by the SENDer it will be automatically populated from the STOMP Destination header by MaXaM.

The Destination header in a STOMP MESSAGE frame is always populated by MaXaM from the Subject content element in the message being sent.

Tombstone (_Tombstone_) Content Element

The Tombstone content element is populated by the SENDer to indicate that this is the final value it intends to publish for a topic. The value is a timestamp recording the time at which the SENDer made this decision. The timestamp is intended for use by housekeeping processing that dispose/purge/delete obsolete tombstoned data after some time. Tombstones should not be used to indicate a temporary change of status - effectively the publisher is relinquishing ownership/ceasing to update the topic until/unless that topic is recreated.

Avro Message Body (_AvroMessageBody_) Content Element

The message body is simply a binary object. To decode / process it the schema is needed. Typically this is obtained by looking up the schema fingerprint from the AvroSchemaFingerprint element, in the schema registry.

Avro Schema Support

Every Avro writer (STOMP SENDer) or reader (STOMP SUBSCRIBEr + receipt of STOMP MESSAGEs) that is built against a specific Avro type (referred to as a baked in schema) must be capable of correctly populating the AvroSchemaFingerprint in every message of that type that it SENDs. Similarly it must be capable of dispatching to a decoder based on the AvroSchemaFingerprint in MESSAGEs it receives.

In addition readers must support schema evolution by recognising (by looking up in the schema registry) AvroSchemaFingerPrints as different compatible revisions of the expected Avro schema (same type name, different AvroSchemaFingerprint_). The reader must instantiate a resolving decoder and decode the message to it's _baked in representation (revision).

Dynamically typed readers (those written in a dynamically typed language or that decode the Avro to a dynamically typed representation for processing) may be able to directly decode then process a different revision rather than resolving to a baked in revision. Even in this case, the subscriber MUST still use the schema registry to obtain the exact schema revision to decode the Avro message - attempting to decode an Avro binary message with anything other than the exact schema it was encoded with will produce incorrect results, including undefined behaviour (i.e. crashes).

Message Type based Access Control

Publication Authorisation

A MaXaM STOMP client's role provides it with a set of publication authorisations (this may be empty). Each authorisation (possibly including wildcards) is a STOMP destination to which messages may be sent. An authorisation includes wildcards only for parts of the destination that contain identifiers that the client has permission to set, with the remainder of the destination and in particular the Avro type suffix being fixed. The type actually published to a destination is validated by MaXaM so that the overall effect is to limit publications to the correct Avro type for each destination.

Subscription Authorisation

A MaXaM client also has a set of subscription authorisations. These act as a filter applied to messages to the client so that it will only receive messages matching the filter. The subscription authorisations follow the same format as publication authorisations, but will typically contain broader wildcards. Subscription authorisations limit the destinations from which messages will be received as the result of a given subscription but do not prevent broad subscriptions being made. Any client can, for example, subscribe to ** i.e. all destinations, but a client that only has a subscription permission for the destination "foo" (no wildcards) that makes such a subscription will only receive MESSAGEs that have that exact destination.