Skip to main content

Podcast 069 - Correl Roush on Elm

February 12, 2018 mgroves 0 Comments
Tags: podcast elm

Correl Roush is back to talk about Elm.

Show Notes:

Correl Roush is on Twitter.

Want to be on the next episode? You can! All you need is the willingness to talk about something technical.

Music is by Joe Ferg, check out more music on JoeFerg.com!

This is a repost that originally appeared on the Couchbase Blog: Proof of Concept: Making a case to move from relational.

Proof of concept may be just what you need to start when you’re evaluating Couchbase.

We’ve been blogging a lot about the technical side of moving from a relational database like Oracle or SQL Server to Couchbase. Here are some of the resources and posts we’ve published:

But for this post, we’re going to talk more about the overall process instead of the technical details. You’ll see five steps to creating a successful proof of concept. And if you ever need help getting started, you can talk to a Couchbase Solutions Engineer.

Proof of Concept steps

These steps are not just for migrating an existing application to Couchbase, they also work just as well for creating a brand new "greenfield" application with Couchbase, or even augmenting an existing database (as opposed to replacing it completely).

When creating a Proof of Concept, it’s a good idea to keep the scope as small and simple as possible. Some questions to ask:

  • Will it prove/disprove what you need it to, and help you move to the next step?

  • Can this be accomplished fairly quickly? If it takes too long or isn’t a priority, it might fizzle out.

  • Ask a Couchbase technical team member: is this a good fit for Couchbase? You can draw on their experience to save yourself some heartburn.

Select a use case and application

When I talk to people about Couchbase and NoSQL, I tell them the only thing worse than not using Couchbase is using Couchbase for the wrong thing and becoming soured on document databases.

The benefits of a distributed database like Couchbase are:

  • Better performance

  • Better scalability

  • Higher availability

  • Greater data agility/flexibility

  • Improved operational management

If your application can benefit from one of those characteristics, it’s worth checking out Couchbase. Couchbase may not be the best fit if you need multi-document transactions. But as I showed in my post on data modeling, if you can nest data instead of scattering it in pieces, you may not need multi-document transactions as much as you think.

Further, conversations with Couchbase customers have lead us to identify the need beyond a traditional database to power interactions. Marriott calls this the "look-to-book" ratio.

Think about the interaction to transaction ratio in your proof of concept

If you’re in a situation where you need to record transactions in your traditional database, but you want a low-latency, flexible, scalable database to power all the interactions leading up to it, Couchbase might be the right fit for you.

Some use cases that Couchbase has been a great fit for include:

Define the success criteria

Once you’ve decided that you have a use case that would be good for Couchbase, you need to define what it means for a proof of concept to be successful.

Examples of criteria:

  • Performance/latency improvements - This might boil down to a number, like "5ms latency in the 95th percentile".

  • Ease of scaling - How easy is it to scale now? How much time does it take a person? How many 2am Saturdays do you need to work to do upgrades?

  • Faster development cycles - Does schema management eat up a lot of time in your sprints? A proof of concept with Couchbase can help to demonstrate if a flexible model is going to save you time.

  • Maintenence and costs

Whatever the criteria, it’s good to define it at the beginning, so you can work towards trying to achieve that. A vague goal like "I just want to play around with NoSQL" is fine for an individual developer, but a well-defined success criteria is going to be critical for convincing decision makers.

Understand your data

As I covered in the JSON data modeling post, it’s important for you to understand your data before you even start writing any code. You need to understand what you are going to model and how your application needs to function.

Migrating from a relational to a document database is not going to be a purely mechanical exercise. If you plan to migrate data, it’s better to start by thinking about how it would look independent of how it’s currently stored. Draw out a concept of it on a whiteboard without using "tables" or "documents".

Identify the access patterns

I also covered this in my JSON data modeling post. Couchbase is very flexible in the way that it can store data. But, data access is also flexible. The design of your model should take that into account.

In that blog post, I layed out some rules of thumb for nested/seperate documents. At a higher level, you can start with thinking about data access like this:

  • Key/value - The ability to get/change a document based on its key. This is the fastest, lowest latency method available in Couchbase.

  • N1QL query - N1QL is SQL for JSON data, available in Couchbase. It can query data just about any way you can imagine. Most importantly, you can query data based on something other than its key.

  • Full Text Search - When you need to query based on text in a language aware way. Great for user driven searches, for instance.

  • Map/Reduce - Writing a pure function to calculate query results ahead of time. N1QL is taking a lot of the workload away from M/R, but it’s still good for some specialized types of aggregation.

  • Geospatial - Querying of documents based on some geographical/location based information.

  • Analytics/reporting - Couchbase Analytics (currently in preview) can give you heavily indexed non-operational access to your data. You can run complex reports without impacting day-to-day users.

Review the architecure

At the end of your proof of concept, you can measure your results against the criteria that you created at the very beginning.

It might be a good idea to iterate on this proof on concept: you can apply what you’ve leaned in each subsequent iteration. If you keep the iterations short, you can learn what you’ve applied faster. This isn’t just true of Couchbase, by the way, but anything!

Finally, if your proof of concept is a success (and I know it will be), then it’s time to prepare for production. Take the time to review the architecture, the decisions you’ve made, what worked well, what didn’t work well, and so on. The more you document, the better off the rest of your team and organization will be on the next project.

Summary

Creating a proof of concept with these five steps will help make you successful! All that’s left to do is get started:

Rachel Andrew is a member of the CSS Working Group and is working with CSS Grids.

Show Notes:

Rachel Andrew is on Twitter.

Want to be on the next episode? You can! All you need is the willingness to talk about something technical.

Music is by Joe Ferg, check out more music on JoeFerg.com!

That's right, Cross Cutting Concerns is back for season 3! I know I always say this, but I've got a month full of amazing guests!

I've also got: new original music by JoeFerg (you've gotta hear this!). A new gameshow segment! And much more!

Subscribe now!

Here's what's coming in February:

  • Rachel Andrew(!) on CSS Web Grid
  • Correl Roush returning to talk Elm
  • Tim Wingfield on API design
  • Bill Sempf with a very special, jumbo episode discussing information security through the lens of one of my favorite films: Sneakers

Subscribe now with your podcatcher of choice!

Want to be on the next episode? You can! All you need is the willingness to talk about something technical.

This is a repost that originally appeared on the Couchbase Blog: JSON Data Modeling for RDBMS Users.

JSON data modeling is a vital part of using a document database like Couchbase. Beyond understanding the basics of JSON, there are two key approaches to modeling relationships between data that will be covered in this blog post.

The examples in this post will build on the invoices example that I showed in CSV tooling for migrating to Couchbase from Relational.

Imported Data Refresher

In the previous example, I started with two tables from a relational database: Invoices and InvoicesItems. Each invoice item belongs to an invoice, which is done with a foreign key in a relational database.

I did a very straightforward (naive) import of this data into Couchbase. Each row became a document in a "staging" bucket.

Data imported from CSV

Next, we must decide if that JSON data modeling design is appropriate or not (I don’t think it is, as if the bucket being called "staging" didn’t already give that away).

Two Approaches to JSON data modeling of relationships

With a relational database, there is really only one approach: normalize your data. This means separate tables with foreign keys linking the data together.

With a document database, there are two approaches. You can keep the data normalized or you can denormalize data by nesting it into its parent document.

Normalized (separate documents)

An example of the end state of the normalized approach represents a single invoice spread over multiple documents:

key - invoice::1
{ "BillTo": "Lynn Hess", "InvoiceDate": "2018-01-15 00:00:00.000", "InvoiceNum": "ABC123", "ShipTo": "Herman Trisler, 4189 Oak Drive" }

key - invoiceitem::1811cfcc-05b6-4ace-a52a-be3aad24dc52
{ "InvoiceId": "1", "Price": "1000.00", "Product": "Brake Pad", "Quantity": "24" }

key - invoiceitem::29109f4a-761f-49a6-9b0d-f448627d7148
{ "InvoiceId": "1", "Price": "10.00", "Product": "Steering Wheel", "Quantity": "5" }

key - invoiceitem::bf9d3256-9c8a-4378-877d-2a563b163d45
{ "InvoiceId": "1", "Price": "20.00", "Product": "Tire", "Quantity": "2" }

This lines up with the direct CSV import. The InvoiceId field in each invoiceitem document is similar to the idea of a foreign key, but note that Couchbase (and distributed document databases in general) do not enforce this relationship in the same way that relational databases do. This is a trade-off made to satisfy the flexibility, scalability, and performance needs of a distributed system.

Note that in this example, the "child" documents point to the parent via InvoiceId. But it could also be the other way around: the "parent" document could contain an array of the keys of each "child" document.

Denormalized (nested)

The end state of the nested approach would involve just a single document to represent an invoice.

key - invoice::1
{
  "BillTo": "Lynn Hess",
  "InvoiceDate": "2018-01-15 00:00:00.000",
  "InvoiceNum": "ABC123",
  "ShipTo": "Herman Trisler, 4189 Oak Drive",
  "Items": [
    { "Price": "1000.00", "Product": "Brake Pad", "Quantity": "24" },
    { "Price": "10.00", "Product": "Steering Wheel", "Quantity": "5" },
    { "Price": "20.00", "Product": "Tire", "Quantity": "2" }
  ]
}

Note that "InvoiceId" is no longer present in the objects in the Items array. This data is no longer foreign—​it’s now domestic—​so that field is not necessary anymore.

JSON Data Modeling Rules of Thumb

You may already be thinking that the second option is a natural fit in this case. An invoice in this system is a natural aggregate-root. However, it is not always straightforward and obvious when and how to choose between these two approaches in your application.

Here are some rules of thumb for when to choose each model:

Table 1. Modeling Data Cheat Sheet
If …​Then consider…​

Relationship is 1-to-1 or 1-to-many

Nested objects

Relationship is many-to-1 or many-to-many

Separate documents

Data reads are mostly parent fields

Separate document

Data reads are mostly parent + child fields

Nested objects

Data reads are mostly parent or child (not both)

Separate documents

Data writes are mostly parent and child (both)

Nested objects

Modeling example

To explore this deeper, let’s make some assumptions about the invoice system we’re building.

  • A user usually views the entire invoice (including the invoice items)

  • When a user creates an invoice (or makes changes), they are updating both the "root" fields and the "items" together

  • There are some queries (but not many) in the system that only care about the invoice root data and ignore the "items" fields

Then, based on that knowledge, we know that:

  1. The relationship is 1-to-many (a single invoice has many items)

  2. Data reads are mostly parent + child fields together

Therefore, "nested objects" seems like the right design.

Please remember that these are not hard and fast rules that will always apply. They are simply guidelines to help you get started. The only "best practice" is to use your own knowledge and experience.

Transforming staging data with N1QL

Now that we’ve done some JSON Data Modeling exercises, it’s time to transform the data in the staging bucket from separate documents that came directly from the relational database to the nested object design.

There are many approaches to this, but I’m going to keep it very simple and use Couchbase’s powerful N1QL language to run SQL queries on JSON data.

Preparing the data

First, create a "operation" bucket. I’m going to transform data and move it to from the "staging" bucket (containing the direct CSV import) to the "operation" bucket.

Next, I’m going to mark the 'root' documents with a "type" field. This is a way to mark documents as being of a certain type, and will come in handy later.

UPDATE staging
SET type = 'invoice'
WHERE InvoiceNum IS NOT MISSING;

I know that the root documents have a field called "InvoiceNum" and that the items do not have this field. So this is a safe way to differentiate.

Next, I need to modify the items. They previously had a foreign key that was just a number. Now those values should be updated to point to the new document key.

UPDATE staging s
SET s.InvoiceId = 'invoice::' || s.InvoiceId;

This is just prepending "invoice::" to the value. Note that the root documents don’t have an InvoiceId field, so they will be unaffected by this query.

After this, I need to create an index on that field.

Preparing an index

CREATE INDEX ix_invoiceid ON staging(InvoiceId);

This index will be necessary for the transformational join coming up next.

Now, before making this data operational, let’s run a SELECT to get a preview and make sure the data is going to join together how we expect. Use N1QL’s NEST operation:

SELECT i.*, t AS Items
FROM staging AS i
NEST staging AS t ON KEY t.InvoiceId FOR i
WHERE i.type = 'invoice';

The result of this query should be three total root invoice documents.

Results of transformation with N1QL

The invoice items should now be nested into an "Items" array within their parent invoice (I collapsed them in the above screenshot for the sake of brevity).

Moving the data out of staging

Once you’ve verified this looks correct, the data can be moved over to the "operation" bucket using an INSERT command, which will just be a slight variation on the above SELECT command.

INSERT INTO operation (KEY k, VALUE v)
SELECT META(i).id AS k, { i.BillTo, i.InvoiceDate, i.InvoiceNum, "Items": t } AS v
FROM staging i
NEST staging t ON KEY t.InvoiceId FOR i
where i.type = 'invoice';

If you’re new to N1QL, there’s a couple things to point out here:

  • INSERT will always use KEY and VALUE. You don’t list all the fields in this clause, like you would in a relational database.

  • META(i).id is a way of accessing a document’s key

  • The literal JSON syntax being SELECTed AS v is a way to specify which fields you want to move over. Wildcards could be used here.

  • NEST is a type of join that will nest the data into an array instead of at the root level.

  • FOR i specifies the left hand side of the ON KEY join. This syntax is probably the most non-standard portion of N1QL, but the next major release of Couchbase Server will include "ANSI JOIN" functionality that will be a lot more natural to read and write.

After running this query, you should have 3 total documents in your 'operation' bucket representing 3 invoices.

Result from JSON data modeling transformation

You can delete/flush the staging bucket since it now contains stale data. Or you can keep it around for more experimentation.

Summary

Migrating data straight over to Couchbase Server can be as easy as importing via CSV and transforming with a few lines of N1QL. Doing the actual modeling and making decisions requires the most time and thought. Once you decide how to model, N1QL gives you the flexibility to transform from flat, scattered relational data into an aggregate-oriented document model.

More resources:

Feel free to contact me if you have any questions or need help. I’m @mgroves on Twitter. You can also ask questions on the Couchbase Forums. There are N1QL experts there who are very responsive and can help you write the N1QL to accommodate your JSON data modeling.

Matthew D. Groves

About the Author

Matthew D. Groves lives in Central Ohio. He works remotely, loves to code, and is a Microsoft MVP.

Latest Comments

Twitter