Skip links
Main content

How do I write my own parser? (for JSON)

Friday 05 December 2008 09:00

If no parser is available for the file you need, writing one yourself may be easier than you think. What file-structures are managable? What would be the design of such a parser? How do you make sure it is complete? Here we describe the process for building a JSON parser in C#, and issue the source code.

By Patrick van Bergen

[Download the JSON parser / generator for C#]

The software is subject to the MIT license: you are free to use it in any way you like, but it must keep its license.

For our synchronisation-module (which we use to synchronize data between diverse business applications) we chose JSON for data exchange. JSON is just a little better suited for a PHP web-environment than XML, because:

  • The PHP functions json_encode() and json_decode() allow you to convert data structures from and to JSON strings
  • JSON can be sent directly to the browser in an Ajax request
  • It takes up less space than XML, which is important in server > browser traffic.
  • A JSON string can be composed of only ASCII characters, while still being able to express all UNICODE characters, thus avoiding all possible conversion issues a transport may carry.

So JSON is very convenient for PHP. But of course we wanted to be able to synchronize with Windows applications as well, and because C# is better suited to this environment, this part of the module was written in this language. The .Net framework just didn't have its own JSON parser / encoder and the open-source software written for this task often contained a whole package of classes and constraints and sometimes the JSON implementation wasn't even complete.

We just wanted a single class that could be imported and that used the most basic building blocks of our application: the ArrayList and the Hashtable. Also, all aspects of JSON should would have to be implemented, there should a JSON generator, and of course it should be fast.

More reasons to write our own parser weren't necessary. Writing a parser happens to be a very thing satisfying to do. It is the best way to learn a new programming language thoroughly. Especially if you're using unit-testing to guarantee the parser / generator matches the language specification exactly. JSON's specification is easy to find. The website http://www.json.org/ is as clear as one could wish for.

You start by writing the unit-tests. You should really write all test before starting the implementation, but such patience is seldomly found in a programmer. You can at least start by writing some obvious tests that help you to create a consistent API. This is an example of a simple object-test:

string json;
Hashtable o;
bool success = true;

json = "{\"name\":123,\"name2\":-456e8}";
o = (Hashtable)JSON.JsonDecode(json);
success = success && ((double)o["name"] == 123);
success = success && ((double)o["name2"] == -456e8);

Eventually you should write all tests needed to check all aspects of the language, because your users (other programmers) will assume that the parser just works.

OK. Parsers. Parsers are associated with specialized software: so called compiler compilers (of which Yacc is the most well known). Using this software will make sure that the parser will be fast, but it does not do all the work for you. What's more, it can be even easier to write the entire parser yourself than to do all the preparatoy work for the cc.

The compiler compiler is needed for languages with a high level of ambiguity. A language expression is parsed from-left-to-right. If a language contains many structures that cannot be identified at the start of te parse, it is advisable to use a tool that is able to manage the emerging complexity.

Unambiguous languages are better suitable for building the parser manually, using recursive functions to process the recursive nature of the language. The parser looks ahead one or more tokens to identify the next construct. For JSON it is even sufficient to look ahead a single token. This classifies it as an LL(1) language (see also http://en.wikipedia.org/wiki/LL_parser).

A parser takes as input a string of tokens. Tokens are the most elementary building blocks of a language, like "+", "{", "[", but also complete numbers like "-1.345e5" and strings like "'The scottish highlander looked around.'". The parse-phase is usually preceded by a tokenization phase. In our JSON parser this step is integrated in the parser, because to determine the next token, in almost all cases, it is enough to just read the next character in the string. This saves the allocation of a token table in memory.

The parser takes a string as input and returns a C# datastructure, consisting of ArrayLists, Hashtables, a number of scalar value types and null. The string is processed from left-to-right. An index (pointer) keeps track of the current position in the string at any moment. At each level of the parse process the parser performs these steps:

  • Look ahead 1 token to determine the type of the next construct
  • Choose the function to parse the construct
  • Call this function and integrate the returned value in the construct that is currently built.

A nice example is the recursive function "ParseObject" that parses an object:

protected Hashtable ParseObject(char[] json, ref int index)
{
Hashtable table = new Hashtable();
int token;

// {
NextToken(json, ref index);

bool done = false;
while (!done) {
token = LookAhead(json, index);
if (token == JSON.TOKEN_NONE) {
return null;
} else if (token == JSON.TOKEN_COMMA) {
NextToken(json, ref index);
} else if (token == JSON.TOKEN_CURLY_CLOSE) {
NextToken(json, ref index);
return table;
} else {

// name
string name = ParseString(json, ref index);
if (name == null) {
return null;
}

// :
token = NextToken(json, ref index);
if (token != JSON.TOKEN_COLON) {
return null;
}

// value
bool success = true;
object value = ParseValue(json, ref index, ref success);
if (!success) {
return null;
}

table[name] = value;
}
}

return table;
}

The function is only called if a look ahead has determined that a construct starts with an opening curly brace. So this token may be skipped. Next, the string is parsed just as long as the closing brace is not found, or the end of the string is found (a syntax error, but one that needs to be caught). Between the braces there are a number of "'name': value" pairs, separated by comma's. This algorithm is can be found literally in the function, which makes it very insightful and thus easy to debug. The function builds an ArrayList and returns it to the calling function. The parser mainly consists of these types of functions.

If you create your own parser, you will always need to take into account that the incoming string may be grammatically incorrect. Users expect the parser to be able to tell on which line the error occurred. Our parser only remembers the index, but it also contains an extra function that returns the immediate context of the position of the error, comparable to the error messages that MySQL generates.

If you want to know more about parsers, it is good to know there consists a een standard work on this subject, that recently (2006) saw its second version:

Compilers: principles, techniques, and tools, Aho, A.V., Sethi, R. and Ullman ,J.D. (1986)

« Back

Reactions on "How do I write my own parser? (for JSON)"

1 2 3 4 5 6 Last page
Prabir Shrestha
Placed on: 04-19-2011 20:48
We forked your Json parser and named it SimpleJson which can be found at http://simplejson.codeplex.com/
We are planning to incorporate the json library in Facebook C# SDK (http://facebooksdk.codeplex.com) for the next release.
the main reason for the fork was to support deserializing to strongly typed objects and support for DataContracts(I'm not much of fan of this but there are still some users who use it). We also added support for dynamic.
garfix
Placed on: 04-19-2011 21:37
Patrick van Bergen
User icon
to be continuum
Thanks for letting me know, and good luck with your project, Prabir!
Lainon
Placed on: 04-20-2011 19:21
Simply great, and beautiful. Thank you.
Prabir Shrestha
Placed on: 06-11-2011 10:35
seems like there is a bug in IsNumeric method.

Json encoding the following code throws exception.

var x = new X { Y = "z" };
Console.WriteLine(JSON.JsonEncode(x));

class X
{
public string Y { get; set; }

public override string ToString()
{
return Procurios.Public.JSON.JsonEncode(this);
}
}

So, basically any class that overrides ToString() mehod and encodes itself throws stackoverflow exception.

I fixed this in our simplejson repository which was based on your code. I think this bug fix will be useful to you too. http://simplejson.codepl...et/changes/3c2d893ec8d5

thanks.
prabir
garfix
Placed on: 06-12-2011 17:31
Patrick van Bergen
User icon
to be continuum
Thanks for your comment, Prabir. It may be useful for others using JsonEncode in a ToString method. JsonEncode wasn't meant to process random objects, however. Therefore I hope you agree that I will not need to fix this behaviour in the code.

greetings,
Patrick
ebru
Placed on: 07-06-2011 09:25
I need a help..:(
we can sen json from webservice to extjs and i have to send json not in a <string></string> tag .. for example:
<string xmlns="http://tempuri.org/">
[["1","p&#305;nar","ho&#351;yumruk","phosyumruk","0033 "],["2","Emre","I&#351;&#305;n","emrei","nywmzs "],["5","Ercan","Eren","eeren","7920471 "]]
</string>
there are string tag but i want to send only the [["1","p&#305;nar","ho&#351;yumruk","phosyumruk","0033 "],["2","Emre","I&#351;&#305;n","emrei","nywmzs "],["5","Ercan","Eren","eeren","7920471 "]] json ,,How can do this please help me it is very important for me please!!!!
garfix
Placed on: 07-06-2011 09:33
Patrick van Bergen
User icon
to be continuum
Ebru,

I assume your request is not really related to the library, but I what I can see from your request is that some agency placed a <string> tag around your json string.

What can you do? 1) Make the agency stop adding the tag. or 2) Remove the tag yourself by using a string replace or preg replace function. Hope this helps..
ebru
Placed on: 07-06-2011 10:40
[WebMethod(Description = "Gets all informations about users with JSON")]
[ScriptMethod(ResponseFormat = ResponseFormat.Json, UseHttpGet = true, XmlSerializeString = false)]
public string GetUsersInfoJSON() {
try{
SqlCommand sqlCommand = new SqlCommand("SELECT * FROM Kullanici", cnn);
cnn.Open();

SqlDataReader r = sqlCommand.ExecuteReader();
int x = 0;
while (r.Read())
{
x++;
}
r.Close();
// Create a multidimensional jagged array
string[][] JaggedArray = new string[x][];
int i = 0;

SqlDataReader reader = sqlCommand.ExecuteReader();
// Call Read before accessing data.
while (reader.Read())
{
JaggedArray = new string[] { reader["U_id"].ToString(), reader["U_ad"].ToString(), reader["U_soyad"].ToString(), reader["U_Kullaniciad"].ToString() , reader["U_pass"].ToString() };
i = i + 1;
}

// Call Close when done reading.
cnn.Close();
reader.Close();

// Return JSON data
JavaScriptSerializer js = new JavaScriptSerializer();
string strJSON = js.Serialize(JaggedArray);
return strJSON;
}
catch(Exception ex){
return "hata";
}
}

hi garfix, this is my web methods to return json ,but i say before my code return json with string tag ,i do not want to <string> tag to get json string with ExtJs.
I want to apply your advice but which one is effecient and appropriate i don't know,, u can help me to write this function that is remove string tag ..
And first way ( 1) Make the agency stop adding the tag ) is possible or not i not nhave any information:(

Thanks garfix....
garfix
Placed on: 07-06-2011 10:59
Patrick van Bergen
User icon
to be continuum
Hi Ebru,

I am afraid I cannot help you any further, since you are not using our library(!)

From this example http://atsung.wordpress....riptserializer-example/ I take it that the JavaScriptSerializer.Serialize function returns a normal JSON string. Please check if this is so right after it is called ("string strJSON = js.Serialize(JaggedArray);"). I think the <string> tag is added only _after_ the function you showed has returned.
ebru
Placed on: 07-06-2011 15:49
Thank u garfix ,i am working on these solutions...
Reddeppa
Placed on: 07-14-2011 14:37
I am sending serialized objects via a NetWorkStream to another computer, on the receiving end I would like to deserialize these objects.

I will be sending many consecutive objects, when I am receiving data via the NetworkStream, how do I know when the first JSON Document ended in order to have JSON.NET Parse the document from the received string?

Or better yet, is there any way to have Json.NET read directly from the NetworkStream and Deserialize/Parse the resulting JSON document?
Adem
Placed on: 09-10-2011 00:54
Thanks.. It's easier than I thought. Smile
Hassan
Placed on: 11-16-2011 09:50
How can I compile it for WP7? I am getting errors calling the Hashtable, ArrayList is inaccessible etc.

Can you please provide one or guide, thank you.
garfix
Placed on: 11-16-2011 10:01
Patrick van Bergen
User icon
to be continuum
@Hassan I am not able to help you. I'm not familiar with this environment.
ritchie
Placed on: 12-21-2011 03:16
Thanks for this, Patrick. I had to chop this up a bit to get it to work on NETMF, and in the process discovered that you can refine the SerializeValue, IsNumeric, and SerializeNumber a bit.

In SerializeValue, try:

if ( value is System.ValueType && !( value is Boolean ) ) {
success = SerializeNumber( value, builder );
}

Consider also changing SerializeNumber to SerializeValueType, and just handle boolean with the rest of the ValueTypes instead of special casing them. Checking for ValueType should be more efficient than converting to a string and back with IsNumeric().

Kind regards,

r.
garfix
Placed on: 02-20-2012 12:33
Patrick van Bergen
User icon
to be continuum
Thanks for your improvements, Ritchie. I changed the number type checking.

I left the boolean values unchanged, because Convert.toString creates "True" and "False" for the boolean values in stead of the JSON constants "true" and "false" Smile
insert data via Json Insert,delete,select,update
Placed on: 09-11-2012 08:09
Hello Friends and Hello Json Team i am in trouble. i want insert data into Sql server through the using Asp.net and json. i dont want show post back of page. i have hear that json is very faster. i have visited many sites from 3 days but not get any example. please suggest me or give me a example. please Thanks in advance...

Json(insert,update,delete,select),Sql, Asp.Net
Frederic Torres
Placed on: 01-26-2013 23:12
I re used your code and add strict and relax validation in the library JSON.SyntaxValidator

https://github.com/frede...s/JSON.SyntaxValidator.

The library is used in the Visual Studio Extension TextHighlighterExtension to support on the fly JSON validation

http://visualstudiogalle...-4fd1-8e14-75840f855569

Thanks.
garfix
Placed on: 01-27-2013 10:40
Patrick van Bergen
User icon
to be continuum
Hello Frederic,

Nice to hear that you could use my code Smile Good luck with your extension, I can see that it is needed very much and that you take good care of it. Keep it up!
1 2 3 4 5 6 Last page

Log in to comment on news articles.