Diving into ETL and CQRS — developing a secret message encoder with Serialized

This post will show the first steps towards building a "secret" message board using Golang and Serialized.

Joy Kaufman

4/4/2022

Today we are going to embark on Part I of building a “secret” message board. In the course of creating this app — which is built just for fun — we are going to touch on the concepts of ETL systems, CQRS, and also explore Golang, to which I am still a newcomer! With that said, let’s start with some background on those topics and get to coding later.

ETL (or Extract-Transform-Load)

One of my favorite jobs I ever had was working on an ETL system. Before that point I had heard the term thrown around a lot, but the concept never stuck in my brain past “shuffling data around” — and to be honest, that is the gist!

If it helps contextualize the use case where ETL systems are relevant, I’ll give you the tl;dr for project that ran one where I was Dev Lead. The company was paying something like $1,000,000 per year for some expensive software, so we were building out a replacement for it. But until my project was completely functional, we piped data from the old system into it and slowly migrated users over. The need to keep our new in-house learning software in lock step with the third-party system was the reason we needed to extract, transform, and load data. Every minute, hour, or day various ETL jobs were running to capture the data, put it into the format our systems wanted, and then let it work seamlessly in our app.

Today’s example is going to start off with a rough idea of these concepts to begin with, and then in later posts we will be exploring ETL functionality further.

One interesting note: As you can imagine, people have all kinds of data transformation needs! That is why there are ETL variants like ELT (extract, load, transfer) and ETLT (extract, transform, load, transform). In this article, we are going to focus on ELT, but we can look at some other data pipeline processes later.

CQRS (or Command and Query Responsibility Segregation)

For anyone who has worked with a heftier traditional relational database, eventually you run into problems related to the fact that not all data is of the same nature. Some data is requested millions of times in a fraction of a second but rarely changed. Other data is write-heavy, frequently needing to be accessed and altered. As a result, usage patterns across tables can vary hugely, both in the number, and type of requests your database must handle.

💡 Managing those different usage patterns in a database with a single set of tools and rules is a big ask.

However, command and query responsibility segregation (CQRS from now on to save me some typing) says, “These are different animals! Let’s let an eagle be an eagle and a cheetah be a cheetah”.

In that spirit, I’ll be using an excellent tool in this post called Serialized, which offers a comprehensive data management approach to both event driven data and CQRS. You can read more about them here, or very easily get set up as a developer on their very handy free tier to play with all the cool functionality available.

Getting set up with Go

Last but not least, this will be my maiden voyage with Golang, or Go! Go was developed at Google 10+ years ago and has exploded in popularity due to its performance, con currency model, and keen developer interest. I had vowed that next time I got to build an ETL system I would do so in Go, so today it’s on! If you are set up with Go already, feel free to skip ahead to the next subheading, but I’m going to first get all the gears turning on my machine.

Not being entirely sure this Mac was even set up with Go, I started with the basics before embarking on this voyage (for non-Mac users, various Go install options can be found here if you want to follow along, and package manager aside most of these instructions should be fine for *nix users).

brew install go 
mkdir go_serialized && cd go_serialized
touch main.go 

In my main.go file, I added a simple test just to get myself running:

package main 

import "fmt"

func main() {
    fmt.Println("Running!")
}

And then I confirmed my Go setup was at least taken care of before moving on.

➜  go_serialized go build main.go
➜  go_serialized ./main
Running!
➜  go_serialized

Getting set up with Serialized

The sign up for a Serialized account is here, and I’ll tell you two things I liked about the signup/auth process already. I used their authenticate-by-Google option (one less password to remember). Later on as I began looking for what URL I’d need to feed this app I noticed another nice touch, which is that account differentiation for Serialized is apparently done all by secrets/payload. No special URL to remember.

If you’re new, go through the standard sign-up though. Make an account and a project if you are following along — for clarity my serialized project name is “go_serialized”.

And with that, we’re off!

Project Structure

What we are going to eventually build is going to do the following:

  • Extract a message from a local file
  • Transform the message by encoding it with a passphrase
  • Load it into Serialized
  • Query it

Being a Go newcomer and this being more of a proof-of-concept, I organized this project pretty simply. On completion, our structure will look like this:

├── api
│   └── api.go
├── etl
│   └── etl.go
├── go.mod
├── go.sum
├── inputs
│   └── message_one.json
├── main
├── main.go
└── utils
    └── errors.go
        └── passphrase.go

As you may expect:

  • The API code will wrap our network calls.
  • The ETL module load and transform our data.
  • The utils module contains logic to generate our passphrase and to give some generic error handling throughout our app.
  • main.go in the root directory ties all of our logic together.

Keeping secrets secret

Before we get too far, I’ll let you know that if you want to follow along you can head to this repo and pull down from the main branch to get our empty boilerplate structure. (A separate branch called walkthrough will create our commit by commit progress and ultimately have the final project, so I’m going to switch branches).

git checkout -b walkthrough

Switched to a new branch 'walkthrough'

The first thing we will want to do is build out the directories we mentioned earlier, and create an environment file. This will be where we store our credentials for Serialized.

touch .env
mkdir api etl inputs utils
ls -a

.         ..        .env      .git      README.md api       etl       inputs    utils

We don’t want our .env file to make it up anywhere public, because it will contain secrets soon. So let's quickly add a .gitignore file and tell it to ignore .env before anything else. We can then verify this worked with the cat command, which should just return the contents of that file.

touch .gitignore
echo ".env" >> .gitignore
cat .gitignore

.env

Let’s commit our .gitignore and [README.md](http://README.md) before we move on.

git add .
git commit -m "Create README, gitignore"

[walkthrough aa1f032] Create README, gitignore
 2 files changed, 4 insertions(+), 1 deletion(-)
 create mode 100644 .gitignore

Now, let’s navigate to serialized.io and get both our access key and secret access key (my info blanked out for security).

Add to your .env file the values we retrieved here.

// .env

ACCESS_KEY=xxxxxxxx
SECRET_ACCESS_KEY=xxxxxxxx

Awesome! Now we are set up to authenticate with Serialized.

Jumping in with Go

As we mentioned in the directory structure section, the core elements of this app are: the ETL, the API, and the main module tying it together. To begin a Go project, we’re going to run an initialization step for the project, and then re-run our main.go from a previous step to confirm we are set up.

go mod init github.com/jtkaufman737/go_serialized
go: creating new go.mod: module github.com/jtkaufman737/go_serialized
go: to add module requirements and sums:
    go mod tidy

main.go can just be a simple print test for now, since we’re just making sure it’s running.

// main.go 
package main 

import "fmt"

func main() {
    fmt.Println("Running!")
}

Running it once more before we begin, we see we are all set.

go run main.go
Running!

There’s one additional thing I recommend that made this project really nice to work on. For VS Code users, you may have already been or soon will be prompted to install a standard set of Go plugins for the editor. They included some really nice tooltips that helped me debug quite a bit as someone new to Go, but the nicest feature was that on save it will auto-import necessary packages for you and also automatically remove imports you aren’t using. I found this very handy, so if you have the opportunity, I say add the plugins.

Error Handling

The first thing I’m going to do on this project is actually not at all exciting, but is more of a quality-of-life thing. You know I’m talking about error-handling. As I read through various Go tutorials, I kept seeing the following pattern repeated over and over:

/* 
doing stuff in go having a great time
*/ 
if err != nil {
  fmt.Print(err.Error())
}

I’m new, so maybe I’ll hear that people like repeating themselves that way for some reason, but it annoyed me to have those three lines of code some 20x repeated in this project by the end. For this reason my first order of business was creating a little utility function to deal with this and at least cut our three repeated lines of code down to one. Go ahead and make a utils/errors.go file and we’ll add the following code.

// utils/errors.go

package utils

import "fmt"

// LogErrors takes variadic, or multiple arguments for errors and formats/prints them
func LogErrors(errors ...error) {
    // will check if we ever receive no arguments
    if len(errors) == 0 {
        return
    }

    for _, err := range errors {
        // if we receive an argument list, for each that is not nil (falsey) we will 
        // format and print them
        if err != nil {
            fmt.Print(err.Error())
        }
    }
}

For Golang newcomers, a couple of interesting tidbits here:

  • Go is one of a small number of languages I’ve worked with where nil is our garden-variety falsey return value.
  • We’ve seen a nice trick already which is that we can specify a varying number of function arguments with the use of ellipses .... You can read more about that here.
  • Like some other languages, we see a package declaration up top that allows this function to be accessed in the future as part of the utils package.
  • You may also notice the naming convention as our function is uppercase. This signifies it is public and can be imported by other files.

Information flow for our secret messages

For the next sections, we are going to get into some cryptography utilities that Go has. I will be the first to admit to you that I borrowed the public work of others to figure out how to encrypt and decrypt messages, which we will talk about more soon.

For this article though, I want to recap the application logical flow:

  1. We read JSON in from a local file containing a “secret message” (extract)
  2. We will encrypt that message using a random passphrase (transform)
  3. We will then post a “new message” event to Serialized with the message, passphrase, and some metadata like an ID field.

In the future, the idea for this application is that when you post a new message, the response gives you your “secret code” back. You then choose who to share it with, and only they can access the message. The message recipient, armed with the secret code, will be able to get the message back in plain text. The concept came to me as a high-tech version of passing notes in class as a kid, through the lens of ETL systems. We’re not going to fully implement it now — this article is already going to be quite long — but that’s the big picture we’re working towards.

To start that process, let’s make a new file and set about reading it. Create inputs/message_one.json and add to it a body with some silly value. I went for test test test but be creative with it.

// inputs/message_one.json
{
    "body": "test test test"
}

E is for Extract

Let’s create a file called etl/etl.go where our extraction logic can live.

// etl/etl.go
package etl

import (
    "encoding/json"
    "io/ioutil"

    "github.com/jtkaufman737/go_serialized/utils"
)

func ReadData() map[string]string {
    /*
        going to create a default message structure
        as a map, which to me felt analgous to JSON-friendly structures
        in other languages. It is similar to a JSON object or Python dict.
        In the future it might be more semantic to do this
        with structs so I may revisit
    */
    message := make(map[string]string)

    // ioutil reads our JSON file into a variable and handles errors
    data, err := ioutil.ReadFile("./inputs/message_one.json")
    utils.LogErrors(err)

    // the json package translates this into the format we want
    json.Unmarshal([]byte(data), &message)
    return message
} 

If you installed the VS Code plugins I mentioned earlier, you will notice that if you skipped the entire import section of the file above it will automagically generate ALL of the import statements for you, which is super nice.

I’ll next add to our main function in main.go to test out the file reading capabilities.

// main.go
package main

import (
    "fmt"

    "github.com/jtkaufman737/go_serialized/etl"
)

func main() {
    fmt.Println("Running!")

    // First we are going to read in a JSON message
    rawMessage := etl.ReadData()

    fmt.Println("Message from filereader is:")
    fmt.Println(rawMessage)
}

If we run another go run main.go we will see the following output:

go run main.go
Running!

Message from filereader is:
map[body:test test test]%

Perfect! We have confirmed our main function is running and capable of extracting our silly message from the JSON file.

T is for Transform

Our next task, transforming our message, is actually the most interesting and complicated part of the app, at least for now. To be honest with you, I thought about trying to dig into the nitty-gritty of this part, but I realized I had nothing special to contribute to warrant making an original version of it. Plus, the number one rule of encryption is “Thou shalt not roll thine own encryption”, and there’s good reason for that. I’d likely just end up creating vulnerabilities.

I should note that Serialized, the platform we’ll be using to store this information in the Load step, actually has some neat options for real encrypted data (unlike our pet project here, where we aren’t too worried about security). You can read more about the options available for encrypted data on Serialized by checking out Serialized’s docs.

During this step, we will also add a utility to generate a random password to use as encryption salt. This will be what lets the message recipient eventually retrieve and decode the message. Since that part is simpler, let’s start there. Create a utils/passphrase.go file and add the following:

// utils/passphrase.go 
package utils

import (
    "math/rand"
)

var letters = []rune("abcdefghijklmnopqrstuvwxyz!@#$%^")

func MakePassphrase() []byte {
    /*
        Answer informed by this SO answer
        https://stackoverflow.com/questions/22892120/how-to-generate-a-random-string-of-a-fixed-length-in-go
    */

  // rune is a go data type that aliases int32, you can read more about it here:
  // https://zetcode.com/golang/rune/
    b := make([]rune, 32)

    for i := range b {
        b[i] = letters[rand.Intn(len(letters))]
    }

    return []byte(string(b))
}

For me, coming from higher-level languages, this looks awfully fancy to solve the problem of “generate a random or n length”, but it does the trick, so I’m happy. Let’s verify it works by adding a test to main.go.

// main.go

...
func main() {

  // ...previous code 

  // Custom passphrase to salt our encoded message with
  passphrase := utils.MakePassphrase()
    fmt.Println("Random passphrase is:")
    fmt.Println(string(passphrase))
}

Let’s re-run main.go and see our new result.

go run main.go
Running!

Message from filereader is:
map[body:test test test]
Random passphrase is:
bph@bgzmiegpcry!lvfcp!is^lpywxvy%

Perfect! Now into the actual transformation. In etl/etl.go under our ReadData function from earlier, we are going to add some encryption logic. It isn’t necessary for what we’re building to get every nuance of the below code, although it is an interesting area and you can read some more about encryption topics here if you wish. Ultimately, the important part is that we are using our randomly-generated passphrase in an encryption algorithm on the message text, and then we return both the result of the encryption and the passphrase.

// etl/etl.go
package etl 

import (
    "crypto/aes"
    "crypto/cipher"
    "crypto/rand"
    "encoding/json"
    "io"
    "io/ioutil"

    "github.com/jtkaufman737/go_serialized/utils"
)

// ... previous code 

func EncodeData(message string, salt []byte) map[string][]byte {
    // taken from https://tutorialedge.net/golang/go-encrypt-decrypt-aes-tutorial/
    text, key := []byte(message), salt

    c, err := aes.NewCipher(key)
    utils.LogErrors(err)

    // gcm or Galois/Counter Mode, is a mode of operation
    // for symmetric key cryptographic block ciphers
    // - https://en.wikipedia.org/wiki/Galois/Counter_Mode
    gcm, err := cipher.NewGCM(c)
    utils.LogErrors(err)

    // creates a new byte array the size of the nonce
    // which must be passed to Seal
    nonce := make([]byte, gcm.NonceSize())

    // populates our nonce with a cryptographically secure
    // random sequence
    _, err = io.ReadFull(rand.Reader, nonce)
    utils.LogErrors(err)

    // here we encrypt our text using the Seal function
    // Seal encrypts and authenticates plaintext, authenticates the
    // additional data and appends the result to dst, returning the updated
    // slice. The nonce must be NonceSize() bytes long and unique for all
    // time, for a given key.
    finalText := gcm.Seal(nonce, nonce, text, nil)

    return map[string][]byte{
        "passphrase": key,
        "message":    finalText,
    }
}

To test this encryption, let’s return to main.go and call our new function.

// main.go 
package main 

import (
    "fmt"

    "github.com/jtkaufman737/go_serialized/etl"
    "github.com/jtkaufman737/go_serialized/utils"
)

func main() {
    fmt.Println("Running!")

    // ... earlier code

    // Now we are going to encode it
    encryptedMessage := etl.EncodeData(rawMessage["body"], passphrase)
    fmt.Println("Encrypted message and passphrase are:")
    fmt.Println(encryptedMessage)

}

If we re-run main, we will now see our encoded message logged as well!

go run main.go
Running!
Message from filereader is:
map[body:test test test]
Random passphrase is:
bph@bgzmiegpcry!lvfcp!is^lpywxvy
Encrypted message and passphrase are:
bph@bgzmiegpcry!lvfcp!is^lpywxvy W?[??(/?"?S)c;????,?r?4$?4y??8o?w'

Well that looks like a lot of hot nonsense but signifies success!

Let’s now run a quick quality check to make sure we can demystify our data later on. In etl/etl.go, add the following below our other code:

// etl/etl.go 

func DecodeData(encryptedMessage map[string][]byte) string {
    // taken from https://tutorialedge.net/golang/go-encrypt-decrypt-aes-tutorial/
    
  key := encryptedMessage["passphrase"]
    cipherText := encryptedMessage["message"]

    c, err := aes.NewCipher(key)
    utils.LogErrors(err)

    gcm, err := cipher.NewGCM(c)
    utils.LogErrors(err)

    nonceSize := gcm.NonceSize()

    nonce, cipherText := cipherText[:nonceSize], cipherText[nonceSize:]
    plaintext, err := gcm.Open(nil, nonce, cipherText, nil)
    utils.LogErrors(err)

    return string(plaintext)
}

Add the following to main.go:

// main.go

func main() {
  // ..previous code

  // Lets check that we can decode it too
    decryptedMessage := etl.DecodeData(encryptedMessage)
    fmt.Println("Decrypted message is:")
    fmt.Println(decryptedMessage)
}

By re-running main.go, we confirm we’ll be able to decode this secret message when the time comes.

go run main.go
Running!
Message from filereader is:
map[body:test test test]
Random passphrase is:
bph@bgzmiegpcry!lvfcp!is^lpywxvy
Encrypted message and passphrase are:
bph@bgzmiegpcry!lvfcp!is^lpywxvy ?q?A_?L?w\??7<?ع4???SA??7X??O????
Decrypted message is:
test test test

L is for Load

The load step is going to connect us to Serialized and open the door for a lot of fun exploring this event sourcing platform and playing with all the functionality it offers later.

Quick heads up: In the next sections you may get a message about missing packages for our UUID package or others. You can run a quick go get plus the link to the package repo to fix this. For me, it happened on two packages, so I ran go get [github.com/google/uuid](http://github.com/google/uuid) and go get [github.com/joho/godotenv](http://github.com/joho/godotenv).

// api/api.go

package api

import (
    "bytes"
    "encoding/json"
    "io/ioutil"
    "net/http"

    "github.com/joho/godotenv"
    "github.com/jtkaufman737/go_serialized/utils"
)

// for each event, a `data` field will contain custom data, 
// or here the payload we care about 
type PostData struct {
    EncryptedMessage string `json:"encryptedMessage"`
    Passphrase       string `json:"passphrase"`
}

// this is the larger `event` body with a subfield for data
// defined in the previous struct
type PostBody struct {
    EventID   string   `json:"eventID"`
    EventType string   `json:"eventType"`
    Data      PostData `json:"data"`
}

func Call(method string, url string, data PostBody) map[string]string {
    // Step 1: Loading environment variables including our secret
    err := godotenv.Load()
    utils.LogErrors(err)

    myEnv, err := godotenv.Read()
    utils.LogErrors(err)
    /*
         Step 2: We need to build out a JSON body adhering to the following format
        {
            "events": [
                {
                    "eventID": UUID,
                    "eventType": string,
                    "data": {
                        "encryptedMessage": string
                        "passphrase": string
                    }
                }
            ]
        }
    */

    // A bit MORE data wrangling: need a list of PostBody (events)
    type formattedData struct {
        Events []PostBody `json:"events"`
    }
    var events []PostBody
    events = append(events, data)

    finalData := &formattedData{Events: events}

    // then can finally put the whole thing in JSON format
    jsonData, err := json.Marshal(finalData)
    utils.LogErrors(err)

  // set up our network client & error handling
    client := &http.Client{}
    req, err := http.NewRequest(
        method,
        url,
        bytes.NewReader(jsonData),
    )
    utils.LogErrors(err)

  // add our headers so we can authenticate to serialize
    req.Header.Add("Accept", "application/json")
    req.Header.Add("Content-Type", "application/json")
    req.Header.Add("Serialized-Access-Key", myEnv["ACCESS_KEY"])
    req.Header.Add("Serialized-Secret-Access-Key", myEnv["SECRET_ACCESS_KEY"])

    resp, err := client.Do(req)
    utils.LogErrors(err)

    defer resp.Body.Close()

    bodyBytes, err := ioutil.ReadAll(resp.Body)
    utils.LogErrors(err)

    // get it back into a format we like
    var responseObject map[string]string
    json.Unmarshal(bodyBytes, &responseObject)

    // and return it
    return responseObject
}

There’s a lot going on here, but essentially our Call method is a generic wrapper that we can feed a URL, HTTP method, and relevant data to communicate with Serialized and any other sites we plug into. We then use the HTTP package to make our calls, and do some data-wrangling to get things back into a manageable format.

In our main.go file, we need to add a few new things:

  • we will use a UUID package to generate IDs for our new “new_message” event, which will be part of the payload and the URL we hit on the Serialized API
  • we will call our new API wrapper function to dispatch this logic
  • we’ll then do one last print to confirm Serialized got our info
// main.go

func main() {
  // ... previous code 

    // We are going to generate a unique identifier for our event object
    // and embed that into our URL
    newUUID := uuid.New().String()
    url := "https://api.serialized.io/aggregates/new_message/" + newUUID + "/events"

    messageData := api.PostData{
        EncryptedMessage: string(encryptedMessage["message"]),
        Passphrase:       string(encryptedMessage["passPhrase"]),
    }

    eventBody := api.PostBody{
        EventID:   newUUID,
        EventType: "new_message",
        Data:      messageData,
    }

    // Next we are going to post this to serialized
    responseObject := api.Call(
        "POST",
        url,
        eventBody,
    )

    fmt.Println("Response object from Serialized is:")
    fmt.Printf("%+v\n", responseObject)

}

And our final confirmation that the load step ran should look as follows:

go run main.go
Running!
Message from filereader is:
map[body:test test test]
Random passphrase is:
bph@bgzmiegpcry!lvfcp!is^lpywxvy
Encrypted message and passphrase are:
bph@bgzmiegpcry!lvfcp!is^lpywxvy ?f?̮?D/?I?'Ï?ܧ_9:O??yLC????c?
                                                             ?GBw5
Decrypted message is:
test test test
Response object from Serialized is:
map[aggregateVersion: result:SUCCESS taskId:51ccba05-33bf-4552-9539-9e9a547dadf9]

Confirmation of our work

Naturally we want to see a better signifier of what we did than this logging, so let’s head back over to Serialized’s dashboard. It’s the first thing you land on once you log in.

In my case I was testing this a few times so from our main dashboard we see a bevy of events.

By clicking on the last event id in the chronologically descending list, we can check on the payload of what we sent over and confirm our data arrived.

Conclusion

This was a fun one for me personally! Here’s a quick recap of some of the things I learned during the process:

  • It’s nice to feel like a beginner again every once in a while. I just got promoted to Team Lead at PostScript.io, so honestly it seemed about right that I take a step back and take a crack at Go, which I’ve never used seriously before. Discovering how to work with it and get the first part of this project done was incredibly fun.
  • Serialized was by far the easiest part of this work for me. We’re going to use it more when we continue this project in a future article, but it worked exactly how I expected it to going in.
  • I also learned that being away from strongly typed languages for so long has really done a number on me. Particularly, I hadn’t realized how much I leaned on the loosey-goosey nature of high-level languages to deal with things like varying fields and unstructured data with JSON. I still don’t feel like I’ve fully hit my stride in elegantly dealing with JSON inside Go, so I’m going to go chat with some folks I know who use it regularly and see what I really should have been doing with all this logic 😅
  • Despite the learning curve, I generally found Golang pleasant enough to use, and the docs were fairly good considering its relatively young age. I also was impressed by its relative brevity compared to Java, C#, or any of the other lower-level languages I’ve used in any significant way.

This article was intended to be the first part of a short series, so I’m going to leave it here and get started on building out the other half of this application for Part 2! I am looking forward to exploring both more Go and more of Serialized’s capabilities in the next post, where the latter will feature more prominently. Thanks for reading!