Skip to main content

This is a repost that originally appeared on the Couchbase Blog: New Querying Features in Couchbase Server 5.5.

New querying features figure prominently in the latest release of Couchbase Server 5.5. Check out the announcement and download the developer build for free right now.

In this post, I want to highlight a few of the new features and show you how to get started using them:

  • ANSI JOINs - N1QL in Couchbase already has JOIN, but now JOIN is more standards compliant and more flexible.

  • HASH joins - Performance on certain types of joins can be improved with a HASH join (in Enterprise Edition only)

  • Aggregate pushdowns - GROUP BY can be pushed down to the indexer, improving aggregation performance (in Enterprise Edition only)

All the examples in this post use the "travel-sample" bucket that comes with Couchbase.


Until Couchbase Server 5.5, JOINs were possible, with two caveats:

  1. One side of the JOIN has to be document key(s)

  2. You must use the ON KEYS syntax

In Couchbase Server 5.5, it is no longer necessary to use ON KEYS, and so writing joins becomes much more natural and more in line with other SQL dialects.

Previous JOIN syntax

For example, here’s the old syntax:

SELECT r.destinationairport, r.sourceairport, r.distance, r.airlineid,
FROM `travel-sample` r
JOIN `travel-sample` a ON KEYS r.airlineid
WHERE r.type = 'route'
AND r.sourceairport = 'CMH'
ORDER BY r.distance DESC

This will get 10 routes that start at CMH airport, joined with their corresponding airline documents. The result are below (I’m showing them in table view, but it’s still JSON):

ANSI join results

New JOIN syntax

And here’s the new syntax doing the same thing:

SELECT r.destinationairport, r.sourceairport, r.distance, r.airlineid,
FROM `travel-sample` r
JOIN `travel-sample` a ON META(a).id = r.airlineid
WHERE r.type = 'route'
AND r.sourceairport = 'CMH'
ORDER BY r.distance DESC

The only difference is the ON. Instead of ON KEYS, it’s now ON <field1> = <field2>. It’s more natural for those coming from a relational background (like myself).

But that’s not all. Now you are no longer limited to joining just on document keys. Here’s an example of a JOIN on a city field.

SELECT a.airportname, AS airportCity, AS hotelName, AS hotelCity, h.address AS hotelAddress
FROM `travel-sample` a
INNER JOIN `travel-sample` h ON =
WHERE a.type = 'airport'
AND h.type = 'hotel'

This query will show hotels that match airports based on their city.

ANSI join on fields

Note that for this to work, you must have an index created on the field that’s on the inner side of the JOIN. The "travel-sample" bucket already contains a predefined index on the city field. If I were to attempt it with other fields, I’d get an error message like "No index available for ANSI join term…‚Äč".

For more information on ANSI JOIN, check out the full N1QL JOIN documentation.

Note: The old JOIN, ON KEYS syntax will still work, so don’t worry about having to update your old code.

Hash Joins

Under the covers, there are different ways that joins can be carried out. If you run the query above, Couchbase will use a Nested Loop (NL) approach to execute the join. However, you can also instruct Couchbase to use a hash join instead. A hash join can sometimes be more performant than a nested loop. Additionally, a hash join isn’t dependent on an index. It is, however, dependent on the join being an equality join only.

For instance, in "travel-sample", I could join landmarks to hotels on their email fields. This may not be the best find to find out if a hotel is a landmark, but since email is not indexed by default, it illustrates the point.

SELECT AS landmarkName, AS hotelName, AS landmarkEmail, AS hotelEmail
FROM `travel-sample` l
INNER JOIN `travel-sample` h ON =
WHERE l.type = 'landmark'
AND h.type = 'hotel';

The above query will take a very long time to run, and probably time out.


Next I’ll try a hash join, which must be explicitly invoked with a USE HASH hint.

SELECT AS landmarkName, AS hotelName, AS landmarkEmail, AS hotelEmail
FROM `travel-sample` l
INNER JOIN `travel-sample` h USE HASH(BUILD) ON =
WHERE l.type = 'landmark'
AND h.type = 'hotel';

A hash join has two sides: a BUILD and a PROBE. The BUILD side of the join will be used to create an in-memory hash table. The PROBE side will use that table to find matches and perform the join. Typically, this means you want the BUILD side to be used on the smaller of the two sets. However, you can only supply one hash hint, and only to the right side of the join. So if you specify BUILD on the right side, then you are implicitly using PROBE on the left side (and vice versa).


So why did I use HASH(BUILD)?

I know from using INFER and/or Bucket Insights that landmarks make up roughly 10% of the data, and hotels make up about 3%. Also, I know from just trying it out that HASH(BUILD) was slightly slower. But in either case, the query execution time was milliseconds. Turns out there are two hotel-landmark pairs with the same email address.

Hash join results

USE HASH will tell Couchbase to attempt a hash join. If it cannot do so (or if you are using Couchbase Server Community Edition), it will fall back to a nested-loop. (By the way, you can explicitly specify nested-loop with the USE NL syntax, but currently there is no reason to do so).

For more information, check out the HASH join areas of the documentation.

Aggregate pushdowns

Aggregations in the past have been tricky when it comes to performance. With Couchbase Server 5.5, aggregate pushdowns are now supported for SUM, COUNT, MIN, MAX, and AVG.

In earlier versions of Couchbase, indexing was not used for statements involving GROUP BY. This could severely impact performance, because there is an extra "grouping" step that has to take place. In Couchbase Server 5.5, the index service can do the grouping/aggregation.


Here’s an example query that finds the total number of hotels, and groups them by country, state, and city.

SELECT country, state, city, COUNT(1) AS total
FROM `travel-sample`
WHERE type = 'hotel' and country is not null
GROUP BY country, state, city

The query will execute, and it will return as a result:

Aggregation result

Let’s take a look at the visual query plan (only available in Enterprise Edition, but you can view the raw Plan Text in Community Edition).

Query plan with no pushdown

Note that the only index being used is for the type field. The grouping step is doing the aggregation work. With the relatively small travel-sample data set, this query is taking around ~90ms on my single node desktop. But let’s see what happens if I add an index on the fields that I’m grouping by:


CREATE INDEX ix_hotelregions ON `travel-sample` (country, state, city) WHERE type='hotel';

Now, execute the above SELECT query again. It should return the same results. But:

  • It’s now taking ~7ms on my single node desktop. We’re taking ms, but with a large, more realistic data set, that is a huge difference in magnitude.

  • The query plan is different.

Query plan with pushdown

Note that this time, there is no 'group' step. All the work is being pushed down to the index service, which can use the ix_hotelregions index. It can use this index because my query is exactly matching the fields in the index.

Index push down does not always happen: your query has to meet specific conditions. For more information, check out the GROUP BY and Aggregate Performance areas of the documentation.


With Couchbase Server 5.5, N1QL includes even more standards-compliant syntax and becomes more performant than ever.

Have a question for me? I’m on Twitter @mgroves. You can also check out @N1QL on Twitter. The N1QL Forum is a good place to go if you have in-depth questions about N1QL.

Scott Drake is hiring programmers. This episode is sponsored by Smartsheet.

Show Notes:

Want to be on the next episode? You can! All you need is the willingness to talk about something technical.

Music is by Joe Ferg, check out more music on!

Stephen Cleary is writing Azure Functions. This episode is sponsored by Smartsheet.

Show Notes:

Want to be on the next episode? You can! All you need is the willingness to talk about something technical.

Music is by Joe Ferg, check out more music on!

How are you liking Season 3 so far? I'm having a blast!

Some reminders:

The new original music is by JoeFerg. Check out his stuff!

Have you checked out my sponsor yet? Smartsheet!

If you're enjoying the new JavaScript game show, please start sending in your suggestions. I'm running out of ideas, and you and your ideas could be featured on this show. I need real and made-up suggestions!

Subscribe now!

Here's what's coming in February:

  • Stephen Clearly on Azure Functions
  • Scott Drake on hiring programmers
  • Jeffrey Miller on Neo4j
  • Andrea Cremes on SaaS

Subscribe now with your podcatcher of choice!

Want to be on the next episode? You can! All you need is the willingness to talk about something technical.

I’ve been using Crowdfire social media management tool on Android for a few months now, and I’ve been a paying customer for a while. So, I think it’s time to share my thoughts on it, and why you might want to give it a shot.

Disclaimer: a small promotional consideration was paid by Crowdfire for this post. But I probably would have written it anyway.

I work as a developer advocate, and a significant portion of that is using social media to promote, engage, build a follower base, etc. However, I have some problems:

  • I’m forgetful: I need to remember to help promote posts of my friends and coworkers

  • Scheduling: I’ve used Twuffer in the past (and I still occasionally do), but I’d like an easier way to schedule tweets on my phone.

  • Tracking: I’d like to track how my follower count is doing, and how well I’m sending out promotional tweets.

  • I don’t want my Twitter feed to only be a shill-broadcast of self-promotion. I use IFTTT for certain things, but I want to have some degree of control over what I post, the hashtags I use, the phrasing, etc.

There are a lot of tools out there for doing this, but many of them don’t solve all these problems (especially the last one).

I don’t remember where or how I found Crowdfire, but it’s now something that I use every day. Literally it’s one of the first things I do when I wake up in the morning.

It has a lot of capabilities, I’m only going to cover a few in this post.

What Crowdfire is not

It’s not a replacement for my Twitter client of choice. I still use the Twitter app to do all my reading and a bunch of my tweeting. If I turned over complete control of my Twitter/Facebook/etc account to an app like Crowdfire, IFTTT, etc, then I think I’d be missing the point of social media. I want to be a real person that people can talk to and not just an automated feed of links.

Daily Notification and "prescriptions"

Whenever a new post hits the Couchbase blog, I get a notification and prompt to share the post (whether I wrote the post or not). I also get a report every morning showing the number of followers and unfollowers since yesterday. This provides me daily feedback on whether I’m getting too annoying on social media or not. Also, when I share a post on Crowdfire, it will automatically schedule to tweet them up to 4 times over a 30 day period at the "best time" of day. I’m not sure how they calculate this, but it spaces out my tweets nicely, and I’ve found that repetition of tweets leads to more people seeing them and tweeting with me than before I used Crowdfire.

Crowdfire notifications


As I mentioned above, not only will Crowdfire remind me to tweet, it will also schedule tweets. Everyone uses Twitter at different times, so it’s helpful to send out multiple tweets (but not too many) to reach everyone that I can. In addition, I can manually schedule tweets. I find this helpful when promoting new podcast episodes.

Crowdfire scheduling

RSS Feeds

One feature that Crowdfire has introduced after I started using it, is the ability to connect to any RSS feed. Previously I was limited to just WordPress or other well-known engines. But now I can use Crowdfire with my own site’s RSS feed. This way I can track which posts I’ve promoted on Twitter and which ones I haven’t.

Crowdfire RSS

I’ve given Crowdfire some keywords that I find most relevant. It will then recommend people to follow or articles to tweet that are relevant to me. To me, Twitter isn’t just about getting a bunch of followers, but it’s about doing a lot of following.

Crowdfire recommendations

I usually find this helpful, but this is probably the area where I think Crowdfire could most improve. I get a few suggestions of people who don’t tweet in English, and articles that I’m not interested in, for instance.


Give Crowdfire a shot. They have a free version, which has some limits, but it’s absolutely enough to evaluate the tool and see if you find it useful. Crowdfire is in the Google Play store.

Matthew D. Groves

About the Author

Matthew D. Groves lives in Central Ohio. He works remotely, loves to code, and is a Microsoft MVP.

Latest Comments