Code based, dynamic CAML query composition

03 Jul 2008

Today I faced an interesting challenge while querying list items in SharePoint. On a page I have a control that displays list items based on the terms used to tag the page, i.e., the content type on which the page is based contains a multi-valued field that holds a subset of predefined tags. Similarly, for items in a SharePoint list, each item is tagged using a subset of the same predefined tags. The responsibility of the control is then to (1) read the tags associated with the page and (2) query the tags of the list items in an OR-wise fashion. The (3) outcome is then displayed by the control as a set of context specific items.

To decide on the items to display, an initial approach might be to go through the list one item at a time, looking for matching tags. But working with large lists in SharePoint this approach is not recommended. Alternatively, the predicate against which each item in the list is evaluated could take on the form of a CAML query. Looking for items matching a single tag is then easily expressed:

<Where>
    <Eq>
        <FieldRef Name='MyField' />
        <Value Type='LookupMulti'>tag1</Value>
    </Eq>
</Where>

Next, consider a query with two tags OR’ed together. Although more verbose, the query is still fairly straightforward to compose on the fly:

<Where>
    <Or>
        <Eq>
            <FieldRef Name='MyField' />
            <Value Type='LookupMulti'>tag1</Value>
        </Eq>
        <Eq>
            <FieldRef Name='MyField' />
            <Value Type='LookupMulti'>tag2</Value>
        </Eq>
    </Or>
</Where>

However, for CAML queries where more than two tags are OR’ed together, the binary nature of the OR operator works against us. Where real programming languages have their parsers automatically transform an expression like (a || b || c) into ((a || b) || c), the CAML query parser applies no such transform — In a sense CAML is more like an XML serialized, abstract syntax tree than a query language intended for direct use:

<Where>
    <Or>
        <Or>
            <Eq>
                <FieldRef Name='MyField' />
                <Value Type='LookupMulti'>tag1</Value>
            </Eq>
            <Eq>
                <FieldRef Name='MyField' />
                <Value Type='LookupMulti'>tag2</Value>
            </Eq>
        </Or>
        <Eq>
            <FieldRef Name='MyField' />
            <Value Type='LookupMulti'>tag3</Value>
        </Eq>
    </Or>
</Where>

So how do we go about composing a query that OR together n tags (not to mention a mix of ORs and ANDs)? One approach might be using the U2U CAML Query Builder. The tool delivers a UI for formulating queries, but also exposes its CAML query builder API as a .Net assembly. Unfortunately, documentation on how to use the builder is sparse. Nonetheless, I went for the case of composing a query that OR together three tags:

string ComposeCamlQuery() {
    Builder b = new Builder(CamlTypes.GetListItems);
    b.AddViewField("Title");
    b.AddViewField("MyField");

    bool addCombinerNode;
    foreach (string tag in new string[] { "tag1", "tag2", "tag3" })
        b.AddWhereField("MyField", tag, "LookupMulti", 
                        "Or", out addCombinerNode);
    return b.CamlDocument.InnerXml;
}

The outcome is a rather strange looking CAML query with extra <And></And> and missing <Eq></Eq> tags. Funny thing is that this type of the query can be correctly composed through the UI, so most likely I’m not using the API correctly:

<Where>
    <And>
        <And>
            <Or>
                <FieldRef Name="MyField" /> 
                <Value Type="LookupMulti">tag1</Value> 
            </Or>
            <Or>
                <FieldRef Name="MyField" /> 
                <Value Type="LookupMulti">tag2</Value> 
            </Or>
        </And>
        <Or>
            <FieldRef Name="MyField" /> 
            <Value Type="LookupMulti">tag3</Value> 
        </Or>
    </And>
</Where>

I eventually abandoned the idea of using the CAML Query Builder API. Instead, a colleague pointed me to Waldek Mastykarz’s post on generating dynamic CAML queries. As long as all tags are to be either AND’ed or OR’ed together, Mastykarz provides an elegant solution, which I modified to be recursive, more generic, and read like an induction proof (at a negligible performance cost):

ComposeCamlQuery(new[] { "tag1", "tag2", "tag3" }, 
    "Or", 
    "<Where>{0}</Where>",
    @"<Eq>
        <FieldRef Name='MyField' />
        <Value Type='LookupMulti'>{0}</Value>
     </Eq>");

string ComposeCamlQuery(IList<string> ops, string relOp, 
                        string query, string leaf) {
    return ops.Count == 1
        ? string.Format(query, string.Format(leaf, ops[0]))
        : ComposeCamlQuery(ops.Skip(1).ToList(),
            relOp, 
            string.Format(query, 
                string.Format("<{0}>{1}0</{0}>", 
                    relOp, 
                    string.Format(leaf, ops[0]))), 
                    leaf);
}

Obviously, more complex code could be written for a mix of AND’s and OR’s. Before doing so, however, you probably want to evaluate other approaches for filtering list items:

  1. Formulate a simpler, too general query and post-process the result
  2. Formulate smaller queries and optionally merge the results and do post-processing
  3. Use the SharePoint Search API

I also considered such alternatives as Linq to SharePoint or Caml.net, but although the first holds a lot of promise, it isn’t ready for prime time. As for the latter, its focus is more on composing type-safe queries than dynamic ones, so it’s more of a supplement to the first two alternatives.

As for SharePoint Search, the downside is that search has to be configured and that the list must have been indexed for the outcome to be accurate. Depending on the frequency with which the list is modified, the time between incremental crawls, and the context in which the results are displayed, the search approach may or may not be the solution you’re looking for.

Update, July 19: Added recursive code solution.