Querying SharePoint 2007 lists with WSDL, XML, and Regex F# type providers

22 Aug 2014

In this post I'll show how to expose product information from a legacy SharePoint 2007 installation using the WSDL, XML, and Regex F# type providers. With F# type providers, little actual code needs to be written and most of it can be verified at compile time. Contrast this with using SharePoint from C# where, because SharePoint is heavy on (dynamically generated) XML, code generation and runtime exceptions are a common occurrence.

To follow along with the example, little SharePoint knowledge is required. The key point is that products are stored as items within a SharePoint Pages document library. In SharePoint terms, a document library with items is like a table with rows in a relational database. These items may then be queried through a SOAP web service using an XML query language.

Helpers

The XML query that goes to SharePoint as well as the result coming back are of type XmlElement. I much prefer working with the never XElement type and thus need a couple of routines to convert back and forth. Also, the username and password to authenticate with are read from the configuration file using a convenience function. I tried out the AppSettings type provider, but kept getting NullReferenceExceptions.

// requires the following NuGet packages:
//   FSharp.Data
//   RegexProvider

[<AutoOpen>]
module Utils =
    open System.Configuration
    open System.Xml.Linq
    open System.Xml

    let toXmlElement (e: XElement) =
        let d = XmlDocument()
        d.LoadXml(e.ToString())
        d.DocumentElement

    let toXElement (e: XmlElement) = XElement.Parse(e.OuterXml)

    // with "new XElement("foo", ...)", the constructor being called actually
    // has type XElement(XName, Object[]). The C# compiler will insert a call
    // to the implicit operator XName defined on the XName class to implicitly
    // convert from String to XName. The F# compiler doesn't trigger this 
    // implicit convertion. Instead, we need to convert ourselves.
    let xn s = XName.Get s
    let getAppSetting (key: string) = ConfigurationManager.AppSettings.[key]

Domain model

For this example, I'll keep the domain model simple. It consists of a Product type with optional references to the Document type, representing a web resource such as a PDF file:

module Domain =
    open System

    type Document = { Title: string; Url: Uri }
    type Product = 
        { Title: string
          Url: Uri  
          Manual: Document option
          Datasheet: Document option
          Summary: string option
          Image: Uri option
          Description: string }

Authenticating

Before issuing requests against the SharePoint 2007 APIs, you must first call the Authentication service with a username and password. The authentication service then returns an authentication cookie to pass with subsequent calls. Here's where the WSDL type provider comes in handy. Otherwise, if you wanted to call the service using strong typing, you'd have to add a separate C# project to the solution and from there add a web reference to the Authentication service before referencing the C# project from F#.

module ProductRepository =
    open System
    open System.Text.RegularExpressions
    open System.ServiceModel
    open System.ServiceModel.Channels
    open System.Xml.Linq
    open FSharp.Data
    open Microsoft.FSharp.Data.TypeProviders
    open FSharp.RegexProvider
    open Domain

    type AuthProxy = WsdlService<"http://www.acme.com/_vti_bin/Authentication.asmx">
    type LoginErrorCode = AuthProxy.ServiceTypes.schemas.microsoft.com.sharepoint.soap.LoginErrorCode

    let getAuthCookie username password =
        // to gain access to the underlying transport specific response message
        // property, i.e., the cookie. Otherwise, we receive a runtime exception
        // when accessing the HTTP response.
        new OperationContextScope(AuthProxy.GetAuthenticationSoap().DataContext.InnerChannel) |> ignore
        let loginResult = AuthProxy.GetAuthenticationSoap().Login(username, password)
        
        if loginResult.ErrorCode <> LoginErrorCode.NoError then
            failwith "Unable to authenticate"

        let response = 
            OperationContext
                .Current
                .IncomingMessageProperties
                .[HttpResponseMessageProperty.Name] :?> HttpResponseMessageProperty
        response.Headers.["Set-Cookie"]

A call to getAuthCookie with a valid username and passwords results in the following cookie being returned. The aspxauth part is unique for every authentication request:

.ASPXAUTH=2FD298088C07B99B5710F2F85A2599682321A0238156B36E4A5A70A986FD510A0D191
5756B55AFC95B53FBA6B6D3B3B68E5655DA63D8FCBF0775607C13E1E7E2D76B5F80DED397604EFB9
55B40FF83721C679146C7F4AF9E7F97B62C504170B27F666B58; path=/; HttpOnly

Querying

To query SharePoint 2007, I call the Lists SOAP service using the general execute method below. The three arguments to this method are all XML snippets, instructing SharePoint which predicate to filter products on and which fields of each product to return (actual XML constructed later). Again I call upon the WSDL type provider to generate a proxy for the Lists service. Notice how I add the authCookie to the header before calling GetListItems on the service.

    type ListProxy = WsdlService<"http://www.acme.com/Products/_vti_bin/Lists.asmx">
    let execute query viewFields queryOptions =
        let username = getAppSetting "SharePointUsername"
        let password = getAppSetting "SharePointPassword"  
        let authCookie = getAuthCookie username password
        let client = ListProxy.GetListsSoap()
        let ctx = client.DataContext
               
        let binding = BasicHttpBinding(MaxReceivedMessageSize = 1024L * 1024L * 10L)
        ctx.Endpoint.Binding <- binding
        ctx.Endpoint.Address <- EndpointAddress("http://www.acme.com/Products/_vti_bin/Lists.asmx")

        new OperationContextScope(ctx.InnerChannel) |> ignore
        let request = HttpRequestMessageProperty()
        request.Headers.["Cookie"] <- authCookie
        OperationContext
            .Current
            .OutgoingMessageProperties
            .[HttpRequestMessageProperty.Name] <- request :> obj

        // see getProducts for query construction
        client.GetListItems(
            listName = "Pages", 
            viewName = null, 
            query = (query |> toXmlElement), 
            viewFields = (viewFields |> toXmlElement), 
            rowLimit = Int32.MaxValue.ToString(), 
            queryOptions = (queryOptions |> toXmlElement), 
            webID = null) |> toXElement

Parsing result

When I execute the query against SharePoint's Pages document library, I get back a complex XML document whose specific shape isn't that important. What is important is that I use the XML type provider to get strongly-typed access to its elements. This is possible by running the query beforehand and storing a subset of the result in ProductsTraining.xml. By feeding this sample to the XML type provider, it generates a type including element access logic.

For instance, each product contains an OwsPageX0020ContentX00203 element inferred to be of type string which I access in a strongly-typed way as p.OwsPageX0020ContentX00203 below.

Now that I got all the products the final step is extracting the pieces of information required to construct the Product instances. For legacy reasons, this information is embedded within the (not well-formed) HTML generated by the SharePoint editor control. For instance, the product's manual and datasheet are anchor links (in that order) within the OwsPageX0020ContentX00203 element.

Thus, I use regular expressions to extract relevant pieces of information from the result. To ensure I don't misspell the group names I make use of the simple Regex type provider. It inspects the match result of a Regex and provides strongly-typed access to its named groups.

    type Products = XmlProvider<"ProductsTraining.xml">

    type DocumentsRegex = Regex< @"<a(.*?)href=""(?<href>(.*?))""(.*?)>(?<title>(.*?))</a>">
    let parseDocuments (p: Products.Row) =
        let documents = p.OwsPageX0020ContentX00203
        match documents with
        | None -> (None, None)
        | Some ds -> 
            let matches = DocumentsRegex().Matches(ds)
            let result =
                matches
                |> Seq.map(fun m ->
                    { Document.Title = m.title.Value
                      Url = Uri("http://www.acme.com" + m.href.Value) })
            (result |> Seq.tryFind (fun m -> m.Url.ToString().Contains("/Documents/Manuals/")),
             result |> Seq.tryFind (fun m -> m.Url.ToString().Contains("/Documents/datasheets/")))

    type ImageRegex = Regex< @"<img(.*?)src=""(?<src>(.*?))""">
    let parseImage (p: Products.Row) =
        let image = p.OwsPublishingPageImage
        match image with
        | None -> None
        | Some i ->            
            let m = ImageRegex().Match(i)
            match m.Success with
            | true -> Some (Uri("http://www.acme.com" + m.src.Value))
            | false -> None

    let parseProduct (p: Products.Row) =
        let manual, datasheet = parseDocuments p
        { Title = p.OwsTitle
          Url = Uri(p.OwsEncodedAbsUrl)
          Manual = manual
          Datasheet = datasheet
          Summary = p.OwsPublishingPageContent
          Image = parseImage p
          Description = p.OwsPageX0020ContentX00202 }

Putting it all together

Finally, I need to construct the specific query to get the products and call parseProduct on each item in the result:

    let getProducts() =
        let query = 
            XElement(xn "Query",
                XElement(xn "Where",
                    XElement(xn "Contains",
                        XElement(xn "FieldRef", XAttribute(xn "Name", "PublishingPageLayout")),
                        XElement(xn "Value", XAttribute(xn "Type", "URL"), "Acme_Product.aspx"))))
        let fields = 
            ["Title"; "PublishingPageLayout"; "PublishingPageImage"; "PublishingPageContent"; 
              "Page_x0020_Content_x0020_2"; "Page_x0020_Content_x0020_3"; "EncodedAbsUrl"] 
            |> List.fold (fun acc fld -> 
                acc + XElement(xn "FieldRef", XAttribute(xn "Name", xn fld)).ToString()) ""
        
        let viewFields = XElement.Parse("<ViewFields>" + fields + "</ViewFields>")
        let queryOptions = XElement(xn "QueryOptions", "")
        let result = execute query viewFields queryOptions
        let products = Products.Parse(result.ToString())       
        products.Data.Rows |> Seq.map parseProduct |> Seq.toList

The three parts to the query generated above and passed to execute end up looking like so:

<Query>
  <Where>
    <Contains>
      <FieldRef Name="PublishingPageLayout" />
      <Value Type="URL">Acme_Product.aspx</Value>
    </Contains>
  </Where>
</Query>

<FieldRef Name="Title" />
<FieldRef Name="PublishingPageLayout" />
<FieldRef Name="PublishingPageImage" />
<FieldRef Name="PublishingPageContent" />
<FieldRef Name="Page_x0020_Content_x0020_2" />
<FieldRef Name="Page_x0020_Content_x0020_3" />
<FieldRef Name="EncodedAbsUrl" />

<QueryOptions></QueryOptions>

To kick off everything from a console and display the first product, here's the code required. In the actual production code, I call getProducts from a C# WCF service:

[<EntryPoint>]
let main _ = 
    ProductRepository.getProducts()
    |> Seq.truncate 1
    |> printfn "%A"
    0