RavenDB Map/Reduce/Transform on nested, variable-length arrays

956 views Asked by At

I'm new to RavenDB, and am loving it so far. I have one remaining index to create for my project.

The Problem

I have thousands of responses to surveys (i.e. "Submissions"), and each submission has an array of answers to specific questions (i.e. "Answers"), and each answer has an array of options that were selected (i.e. "Values").

Here is what a single Submission basically looks like:

{
  "SurveyId": 1,
  "LocationId": 1,
  "Answers": [
    {
      "QuestionId": 1,
      "Values": [2,8,32],
      "Comment": null
    },
    {
      "QuestionId": 2,
      "Values": [4],
      "Comment": "Lorem ipsum"
    },
    ...more answers...
  ]
}

More Problem: I have to able to filter by SurveyId, LocationId, QuestionId, Creation Date. As I understand it, that's done at query time... I just need to make sure that these properties are present in the transformation result (or is it the reduce result? or both?). If I'm right, then this is less of an issue.

The Required Result

We need one object per question per survey that gives the sum of each option. Hopefully it's self explanatory:

[
    {
        SurveyId: 1,
        QuestionId: 1,
        NumResponses: 976,
        NumComments: 273,
        Values: {
            "1": 452, // option 1 selected 452 times
            "2": 392, // option 2 selected 392 times
            "4": 785  // option 4 selected 785 times
        }
    },
    {
        SurveyId: 1,
        QuestionId: 2,
        NumResponses: 921,
        NumComments: 46,
        Values: {
            "1": 325,
            "2": 843,
            "4": 119,
            "8": 346,
            "32": 524
        }
    },
    ...
]

My Attempt

I didn't get very far, and I think this post is heading me down the right path, but it doesn't help me with the list of Values. I've searched and searched but can't find any direction for what do with a nested array like that. Here's that I have so far:

MAP:

from submission in docs.Submissions
from answer in submission.Answers
where answer.WasSkipped != true && answer.Value != null
select new {
    SubmissionDate = submission["@metadata"]["Last-Modified"],
    SurveyId = submission.SurveyId,
    LocationId = submission.LocationId,
    QuestionId = answer.QuestionId,
    Value = answer.Value
}

REDUCE:

??

TRANSFORM:

from result in results
from answer in result.Answers
where answer.WasSkipped != true && answer.Value != null
select new {
    SubmissionDate = result["@metadata"]["Last-Modified"],
    SurveyId = result.SurveyId,
    LocationId = result.LocationId,
    QuestionId = answer.QuestionId,
    Value = answer.Value
}

For what it's worth, this is hosted on RavenHQ.

It's been so long that I've been working on this and can't get it right. Any help in getting me to the required result is very appreciated!

1

There are 1 answers

4
Matt Johnson-Pint On BEST ANSWER

Assuming your C# classes look like this:

public class Submission
{
    public int SurveyId { get; set; }
    public int LocationId { get; set; }
    public IList<Answer> Answers { get; set; }
}

public class Answer
{
    public int QuestionId { get; set; }
    public int[] Values { get; set; }
    public string Comment { get; set; }
}

If you are running RavenDB 2.5.2637 or higher, you can now use a dictionary result type:

public class Result
{
    public int SurveyId { get; set; }
    public int QuestionId { get; set; }
    public int NumResponses { get; set; }
    public int NumComments { get; set; }
    public Dictionary<int, int> Values { get; set; }
}

If you are running anything earlier (including 2.0 releases), then you won't be able to use a dictionary, but you can use an IList<KeyValuePair<int,int>> instead.

Here is the index:

public class TestIndex : AbstractIndexCreationTask<Submission, Result>
{
    public TestIndex()
    {
        Map = submissions =>
              from submission in submissions
              from answer in submission.Answers
              select new
              {
                  submission.SurveyId,
                  answer.QuestionId,
                  NumResponses = 1,
                  NumComments = answer.Comment == null ? 0 : 1,
                  Values = answer.Values.ToDictionary(x => x, x => 1)
                  //Values = answer.Values.Select(x => new KeyValuePair<int, int>(x, 1))
              };

        Reduce = results =>
                 from result in results
                 group result by new { result.SurveyId, result.QuestionId }
                 into g
                 select new
                 {
                     g.Key.SurveyId,
                     g.Key.QuestionId,
                     NumResponses = g.Sum(x => x.NumResponses),
                     NumComments = g.Sum(x => x.NumComments),
                     Values = g.SelectMany(x => x.Values)
                               .GroupBy(x => x.Key)
                               .ToDictionary(x => x.Key, x => x.Sum(y => y.Value))
                               //.Select(x => new KeyValuePair<int, int>(x.Key, x.Sum(y => y.Value)))
                 };
    }
}

(No transform step is needed.)

If you can't use 2.5.2637 or higher, then replace the .ToDictionary lines with the commented lines just below them, and use an IList<KeyValuePair<int,int>> in the result class.

The fix to allow dictionaries in the map/reduce was based on this issue which your post helped to identify. Thank you!