I am creating a json deserializer. I am deserializing a pretty big json file (25mb), which contains a lot of information. It is an array for words, with a lot of duplicates. With NewtonSoft.Json
, I can deserialize the input as stream:
using (var fs = new FileStream(@"myfile.json", FileMode.Open, FileAccess.Read))
using (var sr = new StreamReader(fs))
using (var reader = new JsonTextReader(sr))
{
while (reader.Read())
{
//Read untill I find the narrow subset I need and start parsing and analyzing them directly
var obj = JObject.Load(reader); //Analyze this object
}
}
This allows me to keep reading small parts of the json and analyze it and check for duplicates etc.
If I want to do the same with ServiceStack.Text
. I am doing something like:
using (var fs = new FileStream(@"myfile.json", FileMode.Open, FileAccess.Read))
using (var sr = new StreamReader(fs))
{
var result = ServiceStack.Text.JsonSerializer.DeserializeFromReader<MyObject>(sr);
}
MyObject
only contains the subset of the json I am interested in, but this creates a massive overhead, as I will get a big array that contains a lot of duplicates.
In the first method I can filter these away immediately and thus not keeping them in memory.
The memory footprint between the two are (this includes the console program overhead):
- NewtonSoft: 30mb
- ServiceStack.Text: 215mb
And the time is:
- NewtonSoft: 2.5s
- ServiceStack.Text: 1.5s
The memory footprint is quite important, as I will be processing a lot of these.
I do understand that the ServiceStack method will give me the security of TypeSafety, but the memory footprint is more important for me.
As I can see that ServiceStack.Text
is a lot faster, so I would like to know if I am able to recreate NewtonSoft
example, but with ServiceStack.Text
?
Edit (Added the object I try to parse):
public class MyObject
{
public List<List<Word>> Words { get; set; }
}
public class Word
{
public string B { get; set; }
public string W { get; set; }
public string E { get; set; }
public string P { get; set; }
}
In my test file (which is representative of use case) it has 29000 words, but only around 8500 unique words. I am only analyzing this data, so I cannot change the structure of it. It is a file containing arrays of arrays of words.