Skip to main content

Posts tagged with 'XML'

This is a repost that originally appeared on the Couchbase Blog: XML to JSON conversion with Json.NET.

XML data can be converted to JSON, which can be loaded into Couchbase Server (Couchbase Server 5.0 beta now available). Depending on the source of the data, you might be able to use a tool like Talend. But you may also want to write a simple C# .NET application with Newtonsoft’s Json.NET to do it.

XML data

For the purposes of this tutorial, I’m going to use a very simple XML example. If your XML is more complex (multiple attributes, for instance), then your approach will also have to be more complex. (Json.NET can handle all XML to Json conversions, but it follows a specific set of conversion rules). Here’s a sample piece of data:

            var xml = @"
<Invoice>
    <Timestamp>1/1/2017 00:01</Timestamp>
    <CustNumber>12345</CustNumber>
    <AcctNumber>54321</AcctNumber>
</Invoice>";

Notice that I’ve got this XML as a hardcoded string in C#. In a real-life situation, you would likely be pulling XML from a database, a REST API, XML files, etc.

Once you have the raw XML, you can create an XmlDocument object (XmlDocument lives in the System.Xml namespace).

XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);

Conversion with Json.NET

Once you have an XmlDocument object, you can use Json.NET to convert that object into a Json representation.

var json = JsonConvert.SerializeXmlNode(doc, Formatting.None, true);

In this example, I’m asking Json.NET to serialize an XML node:

  • I used Formatting.None. If I wanted to display the actual Json, it might be better to use Formatting.Indented

  • The last true specifies that I want to omit the root object. In the XML above, you can think of <Invoice></Invoice> as the root object. I just want the values of the Invoice object. If I didn’t omit the root node, the resultant Json would look like: {"Invoice":{"Timestamp":"1/1/2017 00:01","CustNumber":"12345","AcctNumber":"54321"}}

Saving the Json result

Finally, let’s put the Json into Couchbase. The easiest way to do this would be to again call on JsonConvert to deserialize the Json into a C# object. That object would then be used with Couchbase’s bucket.Insert(…​) method.

object transactObject1 = JsonConvert.DeserializeObject(json);
bucket.Insert(Guid.NewGuid().ToString(), transactObject1);

With this method, the Json would be stored in Couchbase like so:

XML serialized to object

That might be fine, but often times you’re going to want more control of the format. With Json.NET, we can serialize to a given class, instead of just object. Let’s create an Invoice class like so:

public class Invoice
{
    public DateTime Timestamp { get; set; }
    public string CustNumber { get; set; }
    public int AcctNumber { get; set; }
}

Notice that there is some type information now. The Timestamp is a DateTime and the AcctNumber is an int. The conversion will still work, but the result will be different, according to Json.NET’s conversion rules. (Also check out the full Json.NET documentation if you aren’t familiar with it already).

Invoice transactObject2 = JsonConvert.DeserializeObject<Invoice>(json);
bucket.Insert(Guid.NewGuid().ToString(), transactObject2);

The result of that insert will look like:

XML serialized to new class object

  • Notice that the timestamp field is different: it’s stored in a more standardized way.

  • The acctNumber field value is not in quotes, indicating that it’s being stored as a number.

  • Finally, notice that the field names are different. This is due to the way Json.NET names Json fields by default. You can specify different names by using the JsonProperty attribute.

That’s it

One more minor thing to point out: I used Guid.NewGuid().ToString() to create arbitrary keys for the documents. If you have value(s) in the XML data that you want to use for a key, you could/should use those value(s) instead.

This blog post was inspired by an email conversation with a Couchbase user. If you have any suggestions on tools, tips, or tricks to make this process easier, please let me know. Or, contact me if there’s something you’d like to see me blog about! You can email me or contact me @mgroves on Twitter.

Because Akismet has been letting me down recently on this site, I've decided to 'outsource' the commenting and spam checking to Disqus.

Installing Disqus is a piece of cake. You sign up and drop some JavaScript onto your site. Even the "# comments" that you see near the top of this post is taken care of by just pasting some JS.

But there are two things that I would potentially lose that I was concerned about:

  1. My "legacy" comments
  2. My Latest Comments widget (which I'll cover in part 2)

If I were migrating from WordPress or some other well-known blog software, it would have been really easy. WordPress->export, Disqus->import.

However, this site runs on my own home-grown blog engine (for better or for worse). I could have just left my legacy comments alone, and let them exist side-by-side on older blog posts. I called that "plan B".

But once I knew that I could import from other engines, well, I assumed there must be a way for me to "fake" it. It was a little bit of work, and it still might not be perfect, but it worked.

I researched the formats that Disqus can handle. One of them is the WordPress WXR format, which is based on RSS, which is based on XML. So, all I had to do was figure out how to generate the right XML. There isn't really a "spec" on the WXR, at least not one that I could find. Luckily, Disqus publishes specs for their own Custom XML Import Format, which is a version of WXR. Once I had that, it was a piece of cake to create an "Export to WXR" button on this very site. Here's roughly what it looks like in an MVC Razor View:

Some notes:

  • I used generated Guids for the comment_id. I'm not sure if it makes any difference from an import perspective, other than they should probably be unique.
  • I put both comment_date and comment_date_gmt. I don't believe that comment_date is actually used, but I put them both in there just in case.
  • Notice the <![CDATA[ ... ]]> within the comment_content. The CDATA is a way to encode data within XML tags that an XML parser might otherwise attempt to interpret.
  • The BlogPostName is the friendly English version (e.g. "I like balloons"). The BlogPostSlug is the URL-friendly version (e.g. "I-like-balloons").
  • I used Html.Raw because otherwise Razor seemed to have trouble parsing what I was generating. I'm not worried about any sort of DOM injection, since this export utility is behind auth. But generally, you should be wary of using Html.Raw in Razor

To import, I just saved the output of this view, and went to Disqus Import, and uploaded the file. It goes into a processing queue, and the time it takes with vary depending on how many comments you are importing and (I assume) how many other workloads that Disqus is trying to process. I imported 86 comments multiple times, and it generally took about 5 minutes at most.

Of all the blog post titles I'd like to write, "Parsing XML in ASP classic" is definitely not at the top of my list. But sometimes you just have to suck it up. So here we go...

Given a string that contains XML (maybe the result of an Ajax request or the contents of some config file), let's get some values out of it.

Here's some sample XML that I'll be using:

Create a MSXML2.DOMDocument.6.0 object. Use its LoadXML method. You can then use the selectSingleNode and XPath to get values out. For instance, if I wanted the value for ShoeSize in the above XML, I could use an XPath of //@ShoeSize to get a node. Then use the text property of that node to get the value.

There ya go. If you aren't an XPath whiz, you can use this XPath tester to help you through it.

Matthew D. Groves

About the Author

Matthew D. Groves lives in Central Ohio. He works remotely, loves to code, and is a Microsoft MVP.

Latest Comments

Twitter