Ga direct naar


How do I write my own parser? (for JSON)

Friday 05 December 2008 09:00

If no parser is available for the file you need, writing one yourself may be easier than you think. What file-structures are managable? What would be the design of such a parser? How do you make sure it is complete? Here we describe the process for building a JSON parser in C#, and issue the source code.

By Patrick van Bergen

[Download the JSON parser / generator for C#]

The software is subject to the MIT license: you are free to use it in any way you like, but it must keep its license.

For our synchronisation-module (which we use to synchronize data between diverse business applications) we chose JSON for data exchange. JSON is just a little better suited for a PHP web-environment than XML, because:

  • The PHP functions json_encode() and json_decode() allow you to convert data structures from and to JSON strings
  • JSON can be sent directly to the browser in an Ajax request
  • It takes up less space than XML, which is important in server > browser traffic.
  • A JSON string can be composed of only ASCII characters, while still being able to express all UNICODE characters, thus avoiding all possible conversion issues a transport may carry.

So JSON is very convenient for PHP. But of course we wanted to be able to synchronize with Windows applications as well, and because C# is better suited to this environment, this part of the module was written in this language. The .Net framework just didn't have its own JSON parser / encoder and the open-source software written for this task often contained a whole package of classes and constraints and sometimes the JSON implementation wasn't even complete.

We just wanted a single class that could be imported and that used the most basic building blocks of our application: the ArrayList and the Hashtable. Also, all aspects of JSON should would have to be implemented, there should a JSON generator, and of course it should be fast.

More reasons to write our own parser weren't necessary. Writing a parser happens to be a very thing satisfying to do. It is the best way to learn a new programming language thoroughly. Especially if you're using unit-testing to guarantee the parser / generator matches the language specification exactly. JSON's specification is easy to find. The website http://www.json.org/ is as clear as one could wish for.

You start by writing the unit-tests. You should really write all test before starting the implementation, but such patience is seldomly found in a programmer. You can at least start by writing some obvious tests that help you to create a consistent API. This is an example of a simple object-test:

string json;
Hashtable o;
bool success = true;

json = "{\"name\":123,\"name2\":-456e8}";
o = (Hashtable)JSON.JsonDecode(json);
success = success && ((double)o["name"] == 123);
success = success && ((double)o["name2"] == -456e8);

Eventually you should write all tests needed to check all aspects of the language, because your users (other programmers) will assume that the parser just works.

OK. Parsers. Parsers are associated with specialized software: so called compiler compilers (of which Yacc is the most well known). Using this software will make sure that the parser will be fast, but it does not do all the work for you. What's more, it can be even easier to write the entire parser yourself than to do all the preparatoy work for the cc.

The compiler compiler is needed for languages with a high level of ambiguity. A language expression is parsed from-left-to-right. If a language contains many structures that cannot be identified at the start of te parse, it is advisable to use a tool that is able to manage the emerging complexity.

Unambiguous languages are better suitable for building the parser manually, using recursive functions to process the recursive nature of the language. The parser looks ahead one or more tokens to identify the next construct. For JSON it is even sufficient to look ahead a single token. This classifies it as an LL(1) language (see also http://en.wikipedia.org/wiki/LL_parser).

A parser takes as input a string of tokens. Tokens are the most elementary building blocks of a language, like "+", "{", "[", but also complete numbers like "-1.345e5" and strings like "'The scottish highlander looked around.'". The parse-phase is usually preceded by a tokenization phase. In our JSON parser this step is integrated in the parser, because to determine the next token, in almost all cases, it is enough to just read the next character in the string. This saves the allocation of a token table in memory.

The parser takes a string as input and returns a C# datastructure, consisting of ArrayLists, Hashtables, a number of scalar value types and null. The string is processed from left-to-right. An index (pointer) keeps track of the current position in the string at any moment. At each level of the parse process the parser performs these steps:

  • Look ahead 1 token to determine the type of the next construct
  • Choose the function to parse the construct
  • Call this function and integrate the returned value in the construct that is currently built.

A nice example is the recursive function "ParseObject" that parses an object:

protected Hashtable ParseObject(char[] json, ref int index)
{
Hashtable table = new Hashtable();
int token;

// {
NextToken(json, ref index);

bool done = false;
while (!done) {
token = LookAhead(json, index);
if (token == JSON.TOKEN_NONE) {
return null;
} else if (token == JSON.TOKEN_COMMA) {
NextToken(json, ref index);
} else if (token == JSON.TOKEN_CURLY_CLOSE) {
NextToken(json, ref index);
return table;
} else {

// name
string name = ParseString(json, ref index);
if (name == null) {
return null;
}

// :
token = NextToken(json, ref index);
if (token != JSON.TOKEN_COLON) {
return null;
}

// value
bool success = true;
object value = ParseValue(json, ref index, ref success);
if (!success) {
return null;
}

table[name] = value;
}
}

return table;
}

The function is only called if a look ahead has determined that a construct starts with an opening curly brace. So this token may be skipped. Next, the string is parsed just as long as the closing brace is not found, or the end of the string is found (a syntax error, but one that needs to be caught). Between the braces there are a number of "'name': value" pairs, separated by comma's. This algorithm is can be found literally in the function, which makes it very insightful and thus easy to debug. The function builds an ArrayList and returns it to the calling function. The parser mainly consists of these types of functions.

If you create your own parser, you will always need to take into account that the incoming string may be grammatically incorrect. Users expect the parser to be able to tell on which line the error occurred. Our parser only remembers the index, but it also contains an extra function that returns the immediate context of the position of the error, comparable to the error messages that MySQL generates.

If you want to know more about parsers, it is good to know there consists a een standard work on this subject, that recently (2006) saw its second version:

Compilers: principles, techniques, and tools, Aho, A.V., Sethi, R. and Ullman ,J.D. (1986)

«Back

Reactions on "How do I write my own parser? (for JSON)"

1 2 3 4 5 6Last page
Friturist
Placed on: 12-26-2008 22:57
Has this implementation been open sourced? It sounds like the .NET world could use a working JSON parser.
garfix
Placed on: 12-26-2008 23:14
Patrick van Bergen
User icon
to be continuum
Yes, the link to the source code is to the top of the article Wink The code may be used without cost or obligation.
Andrey
Placed on: 12-30-2008 14:04
Wow, so simple Smile Thanks!
Doug
Placed on: 01-15-2009 16:20
OK, I translated this code to VB.NET and have it parsing perfectly, but what is the best way to then enumerate the object you have which is a non-standard grouping of DictionaryEntry's and HashTables in order to save the data parsed to a sql database?

TIA,

Doug
garfix
Placed on: 01-16-2009 15:23
Patrick van Bergen
User icon
to be continuum
Hi Doug,

From what you're asking I take it you have parsed a string into a Hashtable. The hashtable can then be enumerated by using a DictionaryEnumerator. An example of this can be found in the function SerializeObject of the JSON class.
BA
Placed on: 02-24-2009 17:03
Hi Doug,

Can you post VB.NET version of this parser? I need to add vb.net parser to our project.

Quote
Doug wrote:
OK, I translated this code to VB.NET and have it parsing perfectly, but what is the best way to then enumerate the object you have which is a non-standard grouping of DictionaryEntry's and HashTables in order to save the data parsed to a sql database?

TIA,

Doug
Dennis
Placed on: 03-03-2009 22:51
Is there a parser for trees?

I'm trying to get a .net collection of folders (which holds a collection of 'children folders', which would then hold a collection of 'children folders', etc.) serialized in the following format:

[ {
id: 1,
text: 'A leaf Node',
leaf: true
},
{
id: 2,
text: 'A folder Node',
children:
[{
id: 3,
text: 'A child Node',
children:
[{
id: 4,
text: 'A child Node',
leaf: true
}]
}]
}
]

Anyone have any ideas?
David J. Smith
Placed on: 04-08-2009 19:15
"A JSON string can be composed of only ASCII characters ... "

AFAIK, that is NOT true. JSON strings contain Unicode characters, in whatever encoding you want, so it could be UTF-8, UTF-16, or any other Unicode encoding you want. Thus, a Unicode character that is not ASCII does NOT need to be escaped, but you can escape those characters if you want.
garfix
Placed on: 04-22-2009 15:43
Patrick van Bergen
User icon
to be continuum
You are right, of course. What I meant to say here is that all characters _can_ be expressed using only ASCII characters, using the \u notation for UNICODE characters. Using only ASCII chars has the advantage that is less susceptible to corruption by wrong / missing character set conversions.
Paul
Placed on: 05-19-2009 20:39
nice
Rob
Placed on: 05-27-2009 16:19
Perfect, that was what I was looking for, since .net does not contain any usefull stuff regarding json. Thanks a lot.
Sam
Placed on: 08-06-2009 18:30
Very helpful, thanks
NeverCast
Placed on: 08-13-2009 08:30
GG man Smile
Great work.. Found it a little late as I just Finished writing my own, Although mine does not yet possess support for Arrays.

Well done Smile
StockId
Placed on: 09-10-2009 01:42
I got a compilation error when compiling with Compact Framework for CE devices,

'char' does not contain a definition for 'ConvertFromUtf32'

What is the best way to work around this?
Abhijit
Placed on: 10-10-2009 12:55
Great ! Simply great !
Philipp Schumann
Placed on: 10-27-2009 12:36
Lovin this --- JSON.NET is such a pain to use for typical, web-driven "JSON interop" scenarios (where you wanna Keep It Simple, Stupid), I'm glad I could throw it out entirely and replace its serialization overkill with that parser of yours. It does all I need, nothing more, and works. Kudos!

The one thing I changed was the method signatures for encoding ArrayLists (changed this to IList which is even implemented by all arrays, and all List classes too) and for encoding Hashtables (changed this to IDictionary). Changing only the signatures kept the whole thing working of course, and the advantage: I can use my strong-typed arrays, generic collections and dictionaries. Of course, this is later decoded to ArrayLists and Hashtables but converting those myself is a one-liner if I really need to, and often I don't...

Thanks again for this!
Hemant Kularia
Placed on: 11-23-2009 13:10
Great article! JSON Parser saved a lot of time Smile Thanks
Hal Rottenberg
Placed on: 12-04-2009 05:25
This is an awesome piece of code! I was able to get it to compile in memory in PowerShell with one tiny modification (it didn't like line 2 for some reason). I can now parse JSON with PowerShell thanks to you!

I'll post the steps required in a blog post shortly and will share it here.
journey Liu
Placed on: 01-03-2010 09:15
thank you for this great jaon parser.
garfix
Placed on: 01-03-2010 10:10
Patrick van Bergen
User icon
to be continuum
Thanks to you all for your kind response.
Edited
garfix has edited this message on: 01-03-2010 10:11
weiseditor
Placed on: 01-05-2010 13:32
good!
thank you for your json tool in c#
Ales
Placed on: 01-16-2010 22:02
Thanks! It's really cool.
I've used it in my mobile app, I wrote a blog about it:
http://shelastyle.net/bl...bile-phone-without-gps/
Guy
Placed on: 01-22-2010 01:04
Thanks a lot. Very helpful.
eerste google resultaat
Placed on: 02-17-2010 15:50
Quote
Hal Rottenberg wrote:
This is an awesome piece of code! I was able to get it to compile in memory in PowerShell with one tiny modification (it didn't like line 2 for some reason). I can now parse JSON with PowerShell thanks to you!

I'll post the steps required in a blog post shortly and will share it here.
test
Placed on: 02-17-2010 15:51
Quote
Hal Rottenberg wrote:
This is an awesome piece of code! I was able to get it to compile in memory in PowerShell with one tiny modification (it didn't like line 2 for some reason). I can now parse JSON with PowerShell thanks to you!

I'll post the steps required in a blog post shortly and will share it here.
1 2 3 4 5 6Last page

Procurios zoekt PHP webdevelopers. Werk aan het Procurios Webplatform en klantprojecten! Zie http://www.slimmerwerkenbijprocurios.nl/.


Hello!

We are employees at Procurios, a full-service webdevelopment company located in the Netherlands. We are experts at building portals, websites, intranets and extranets, based on an in-house developed framework. You can find out more about Procurios and our products, might you be interested.

This weblog is built and maintained by us. We love to share our ideas, thoughts and interests with you through our weblog. If you want to contact us, please feel free to use the contact form!


Showcase

  • Klantcase: Bestseller
  • Klantcase: de ChristenUnie
  • Klantcase: Evangelische Omroep
  • Klantcase: de Keurslager
  • Klantcase: New York Pizza
  • Klantcase: Verhage

Snelkoppelingen