Tech in the 603, The Granite State Hacker

The Cognitive Content Crisis

As many folks in my community may already be aware, I’ve been building chatbots with my team, using the Microsoft Bot Framework, a lot lately. In doing so, we’ve encountered a common issue across multiple clients.

Cognitive Content Crisis

While many peolpe are worrying about lofty issues around artificial intelligence like security, privacy, and ethics (all worthy to be sure), I’m considering something more pragmatic here. Folks go into a cognitive agent build without considering content, how it relates to AI and AI development, and how to manage it. While some of my clients with more mature projects have taken a crack at resolving this issue with custom solutions, these custom solutions are often resource intensive, fail to consider all the business requriements, and end up becoming an unnecessary bottleneck to further development. Worse, waiting till a project phase-2 or phase-3 to address it compounds the trouble.

Sadly, there’s often an enterprise content management system (ECMS) in place that could be used, instead, right from pre phase-1. With a reasonable effort, a well-featured existing ECMS can be customized along side your build out, saving a massive effort later.

The Backstory

If you check out that Microsoft Bot Framework website, one of the first things you’ll notice is that building conversational agents is a process that cuts across a number of development disciplines… and the first one that typically gets highlighted is Artificial Intelligence.

Artificial Intelligence around conversational agents could include anything from visual identification & classification to moderation, sentiment analysis, and advanced search, but it predominantly revolves around language tools… especially LUIS, QnA Maker, Azure Search, and others.

At this point, it helps to think about what Artificial Intelligence is. Artificial Intelligince is about experience. In a conversational artificial intelligence, that content is human readable, social, and web like. Experience is content…. Conversational content.

In fact, it’s almost web like. A user typically opens a chat window (which correlates a bit to a browser) and types an utterance (query). The bot catches such utterances, and depending on a number of factors of origination, data state / context, identity, and authorizations, generally produces a text based response.

Getting More Specific

In the case of a bot designed to coach folks with a chronic disease, for example, a user might ask a bot “Can I eat chocolate cake?”. The bot gets this query, and parses it into language elements… which looks something like “can I eat” as an ‘intent’, and “chocolate cake” as an entity. The bot then brings in a rules-set described by conditions it knows about the user (what disease(s) are being managed), and what the bot knows about the users current state (perhaps the blood glucose level if they’re diabetic). Based on the conditions against the rules, the response must be produced. If you have a sophistocated bot, you might have a per-entity response… Take a response like “OH, chocolate cake is wonderful, but your blood sugar level is a bit high right now. Unless you can find something in a low-carb, sugar free variety, I wouldn’t recommend it, but here’s a recipe you might try instead.” That content (including the suggested recipe) must be authored by subject matter experts, moderated by peers, approved (potentially by regulatory and maybe even legal teams), tagged to match the rules engine expectations… much like web content.

Also note, the rules engine itself is also content in a sense. In order to let subject matter experts have a say in tweaking and tuning responses, (what’s a high blood glucose level? What’s too much sugar for a high blood glucose level, et al.) These rules should be expressed as content a subject matter expert could understand and update.

Another common scenario we’re seeing is HR content. Imagine you’ve got a company that produced an HR handbook every year. Well, actually, you’re a conglomerate that has a couple dozen handbooks, and each employee needs answers specific to the one for their division… Not only do you have to tag the contet by year, but by division, and even problem domain. (Imagine trying to answer the question “what is my deductible?” It’s easy enough for a bot to understand that this relates to insurance provided through benefits. The answer might be different depending on whether you mean the PPO or the PMO medical plan… or is that a dental plan question? What about vision? They probably mean this year. Depending on the division they’re a part of (probably indicated by a claim in their authorization token), they might have different providers, as well.

Back to Development

In the development world, not only do you have the problem domain complexities present, but you also have different environments to push content to… the Dev environment is the sandbox a coder works in actively, and it’s only as stable as the developer’s last compile. Then there might be environments named things like DIT, SIT, UAT, QA, and PROD. To do things right, you should update content in each of these environments discretely… updating content in QA should not affect content in SIT, UAT, or PROD.

Information Architecture


Information architecture (IA) is the structural design of shared information environments; the art and science of organizing and labelling websites, intranets, online communities and software to support usability and findability; and an emerging community of practice focused on bringing principles of design, architecture and information science to the digital landscape.[1] Typically, it involves a model or concept of information that is used and applied to activities which require explicit details of complex information systems. These activities include library systems and database development.

Wikipedia, Information Architecture

We’ll add artificial intelligence cognitive models and knowledge bases, especially for conversational AI, to that definition. Note that some AI applications need big data solutions. Most ECMS products are not big data solutions.

Enterprise Content Management Systems

There’s a lot of Enterprise Content Management Systems out there, many of which would be suitable for the task of handling the needs of most conversational AI content management systems.

My career path and community involvement causes me to lean toward SharePoint. If you break down the feature set, it makes sense.

  • Ability for SMEs to manage experience data easily without lots of training to understand create/read/update/delete (CRUD) operations
  • Ability to customize content type structures
  • Ability to concurrently manage individual experience data items
  • Ability to globalize the content (to support multiple languages)
  • Ability to customize workflows (think SME review approval, regulatory, even legal approval) on a per-experience item basis
  • Ability to mark up each experience item with additional metadata both for cognitive processing purposes and for deployment purposes
  • All of this content is then exposed to REST services, so you get the ability to integrate automation to bridge the content into the cognitive models

It’s often said that if you design your data structures properly, the rest of your application will practically build itself. This is no exception. While you will have to build your own automation to bridge the gap between your CMS and your cognitive model environments, you’ll be able to do this easily using REST services.

While you may need to come up with your own granularity, you’ll probably find some clear hits, especially in the area of QnA Maker… every Question / Answer experience pair probably fits nicely as a single content entity. You’ll probably have to add metadata to support QnA maker’s filtering, and the like.

Likewise with LUIS, you may find that each Intent and the related utterances is a single content entity. LUIS, being more sophistocated, will also need related entities and synonyms modeled in content data.

I’ve seen other CMS system used. Most notably CosmosDB and Contentful. Another choice might be some kind of data mart. All of these cases require a heavy investment in building out a UI layer for your SMEs. SharePoint takes care of the bulk of that part for you.

Got a project you want to start working on? Don’t forget to account for content management early on. As always, reach out to me if you need advice on this or any other aspect of building out a solution involving technologies like these… Connect on Twitter, Linkedin or the like…

[office src=”https://onedrive.live.com/embed?cid=90A564D76FC99F8F&resid=90A564D76FC99F8F%211333283&authkey=AE7AR-g9axFJm-4&em=2″ width=”402″ height=”327″]

Tech in the 603, The Granite State Hacker

Getting at Office 365 SharePoint Lists In Windows Phone 7 Silverlight Apps

[Edit 7/8/2013:  I’ve tweaked this to note that this was for WP7.  WP8 has a whole new set of SDKs for SP integration, and it’s a different, and much easier story to use them.]

As promised from my presentation at SharePoint Saturday New Hampshire, I owe a code listing on the meaty part of the chat…  the Office 365 authentication component, especially.  It allows a Windows Phone Silverlight app to access things the lists.asmx service behind the Windows Live ID authentication.  (Frankly, the technique is the same, no matter what kind of client you’re using, but the demo I was doing was using Silverlight 4 for Windows Phone 7.

I also owe slides:
[office src=”https://r.office.microsoft.com/r/rlidPowerPointEmbed?p1=1&p2=1&p3=SD90A564D76FC99F8F!274&p4=&ak=!ABO7SzieOkx6gtY&kip=1″ width=”402″ height=”327″]

Here’s the activity rundown:

1)  The client app (“Windows Phone App”) makes a SAML SOAP request to https://login.microsoftonline.com/extSTS.srf
2)  The SAML response comes back, allowing the app to parse the SAML token.
3)  Make another call, this time to  {your Office365 team site}/_forms/default.aspx?wa=wsignin1, posting the token.
4) The response that comes back need only be checked for errors, the magic is in the cookie container.  It contains an HTTPOnly token (which the development tools do a terribly good job of hiding.)
5)  Assign your cookie container from your previous result to the ListSoapClient that you’re using to make your service calls from.
6)  Profit!

I broke up the “Activation” line on the client side to point out that the calls are Async.

In any case, I have a very rough SPAuthenticationHelper class that I also promised to post.

Here’s an example of how you can use it:

    class SPTasksList

    {
 
        SPAuthenticationHelper _authenticationHelper;
        ListsSoapClient _listsClient;
        bool isBusy = false;

        TaskItem currentUpdate = null;

        string _taskListUri = “http://spsnh.sharepoint.com/TeamSite/Lists/Tasks/AllItems.aspx”;

        public SPTasksList()

        {
            _authenticationHelper = new SPAuthenticationHelper(_taskListUri);
            _listsClient = new ListsSoapClient();
            _listsClient.GetListItemsCompleted += newEventHandler<GetListItemsCompletedEventArgs>(_listsClient_GetTasksListCompleted);
            _listsClient.UpdateListItemsCompleted += newEventHandler<UpdateListItemsCompletedEventArgs>(_listsClient_UpdateListItemsCompleted);
        }
 

        public voidBeginGetTasksList()

        {
            if (!_authenticationHelper.IsAuthenticated)
            {
                _authenticationHelper.OnAuthenticated += newEventHandler<EventArgs>(_authenticationHelper_OnAuthenticated_GetTasks);
                _authenticationHelper.SigninAsync(Configuration.UserName, Configuration.Password);
            }
            else if (!isBusy)
            {
                isBusy = true;
                XElement query = XElement.Parse(“Completed”);
                string ListName = “Tasks”;
                string ViewId = “{f717e507-7c6e-4ece-abf2-8e38e0204e45}”;
                _listsClient.GetListItemsAsync(ListName, ViewId, query, null, null, null, null);
            }
        }

        void_authenticationHelper_OnAuthenticated_UpdateTask(objectsender, EventArgs e)

        {
            _listsClient.CookieContainer = _authenticationHelper.Cookies;
            BeginUpdateTask(currentUpdate);
        }

……
} 

I ported this from a few other examples I found online to Silverlight for Windows Phone.  I apologize,  I haven’t had time to polish it, and I’m having a hard time with the embedded SOAP litteral, but here’s the SPAuthenticationHelper class:





using System;

using System.Net;
using System.IO;
using System.Linq;
using System.Xml;
using System.Xml.Linq;
 
namespace SPSNH_SPConnector.Implementation
{
    public class SPAuthenticationHelper
    {
        public CookieContainerCookies { get; set; }
        public boolIsAuthenticated { get; privateset; }
        public event EventHandler<EventArgs> OnAuthenticated;
 
        private bool_isAuthenticationInProgress = false;
 
        const string_authUrl=“https://login.microsoftonline.com/extSTS.srf”;
        const string _login=“/_forms/default.aspx?wa=wsignin1.0”;
       
        //namespaces in the SAML response
        const string _nsS = “http://www.w3.org/2003/05/soap-envelope”;
        const string _nswst = “http://schemas.xmlsoap.org/ws/2005/02/trust”;
        const string _nswsse = “http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd”;
        const string _nswsu = “http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd”;
        const string _nswsa = “http://www.w3.org/2005/08/addressing”;
        const string _nssaml = “urn:oasis:names:tc:SAML:1.0:assertion”;
        const string _nswsp = “http://schemas.xmlsoap.org/ws/2004/09/policy”;
        const string _nspsf = “http://schemas.microsoft.com/Passport/SoapServices/SOAPFault”;
        const string _samlXml =@” http://schemas.xmlsoap.org/ws/2005/02/trust/RST/Issue http://www.w3.org/2005/08/addressing/anonymous https://login.microsoftonline.com/extSTS.srf {0} {1} {2} http://schemas.xmlsoap.org/ws/2005/05/identity/NoProofKey http://schemas.xmlsoap.org/ws/2005/02/trust/Issue urn:oasis:names:tc:SAML:1.0:assertion “;

        Uri _uri;  
        HttpWebRequest _getTokenRequest = HttpWebRequest.CreateHttp(_authUrl);
        HttpWebRequest _submitTokenRequest = null;
        string _token;
 
        public SPAuthenticationHelper(string uri)
        {
            _uri = new Uri(uri);
            Cookies = new CookieContainer();
        }
 
        public voidSigninAsync(string userName, string password)
        {
            if (!_isAuthenticationInProgress)
            {
                _isAuthenticationInProgress = true;
                getTokenAsync(userName, password);
            }
        }
 
       
        private  void getTokenAsync(stringuserName, string password)
        {
            string tokenRequestXml = string.Format(_samlXml, userName, password, _uri.Host);
 
            _getTokenRequest.Method = “POST”;
            _getTokenRequest.BeginGetRequestStream(newAsyncCallback(Get_GetToken_RequestStreamCallback), tokenRequestXml);
        }
 
        private voidGet_GetToken_RequestStreamCallback(IAsyncResultresult)
        {
            string tokenRequestXml = (string)result.AsyncState;
            var reqstream = _getTokenRequest.EndGetRequestStream(result);
            using (StreamWriterw = new StreamWriter(reqstream))
            {
                w.Write(tokenRequestXml);
                w.Flush();
            }
            _getTokenRequest.BeginGetResponse(new AsyncCallback(Get_GetToken_ResponseStreamCallback), null);
        }
 
 
 
        private voidGet_GetToken_ResponseStreamCallback(IAsyncResultresult)
        {
            _token = null;
 
            varresponse = _getTokenRequest.EndGetResponse(result);
 
            var xDoc = XDocument.Load(response.GetResponseStream());
 
            var body=xDoc.Descendants(XName.Get(“Body”, _nsS)).FirstOrDefault();
            if (body != null)
            {
                var fault = body.Descendants(XName.Get(“Fault”, _nsS)).FirstOrDefault();
                if (fault != null)
                {
                    var error=fault.Descendants(XName.Get(“text”, _nspsf)).FirstOrDefault();
                    if (error != null)
                        throw new Exception(error.Value);
                }
                else
                {
                    var token = body.Descendants(XName.Get(“BinarySecurityToken”, _nswsse)).FirstOrDefault();
                    if (token != null)
                    {
                        _token = token.Value;
                        SubmitTokenAsync();
                    }
                }
            }           
        }
 

        private  void SubmitTokenAsync()
        {
 
            UriBuilder bldr = newUriBuilder(_uri.Scheme, _uri.Host, _uri.Port);
            _submitTokenRequest = HttpWebRequest.CreateHttp(bldr.Uri + _login);
            _submitTokenRequest.CookieContainer = Cookies;
            _submitTokenRequest.Method = “POST”;
            _submitTokenRequest.BeginGetRequestStream(newAsyncCallback(Get_SubmitToken_RequestStreamCallback), null);
        }
 
        private voidGet_SubmitToken_RequestStreamCallback(IAsyncResultresult)
        {
            var requestStream = _submitTokenRequest.EndGetRequestStream(result);
            using (StreamWriterw = new StreamWriter(requestStream))
            {
                w.Write(_token);
                w.Flush();
            }
            _submitTokenRequest.BeginGetResponse(newAsyncCallback(Get_SubmitToken_ResponseCallback), null);
        }
 
        private voidGet_SubmitToken_ResponseCallback(IAsyncResultresult)
        {
            UriBuilder bldr = newUriBuilder(_uri.Scheme, _uri.Host, _uri.Port);
 
            varresponse = _submitTokenRequest.EndGetResponse(result);
            string responseString = (newStreamReader(response.GetResponseStream())).ReadToEnd();
           
            bldr.Path = null;
            Cookies = _submitTokenRequest.CookieContainer;//.GetCookies(bldr.Uri);
            _isAuthenticationInProgress = false;
            IsAuthenticated = true;
            if (OnAuthenticated != null)
            {
                EventArgs args = new EventArgs();
                OnAuthenticated(this, args);
            }
        }
    }
}
 
 

Tech in the 603, The Granite State Hacker

WORKAROUND: Misconfigured Windows-Integrated Authentication for Web Services

In trying to drive a process from a SharePoint list, I ran across a problem…

I couldn’t create a web reference in my C# project due to some really weird problem… In the “Add web reference” wizard, I entered my URL, and was surprised by a pop-up titled “Discovery Credential”, asking me for credentials for the site.

Since I was on the local domain and had “owner” permissions to the site, I thought I would just waltz in and get the WSDL.

Ok, so it wants creds… I gave it my own.

Negative…!?!?

After a few attempts and access denied errors, I hit Cancel, and was rewarded by, of all things, the WSDL display… but I still couldn’t add the reference.

After quite a bit of wrestling, it turns out there was an authentication provider configuration problem. The site was configured to use Kerberos authentication, but the active directory configuration was not set up correctly. (I believe it needed someone to use SetSPN to update the Service Principal Name (SPN) for the service.)

One way to resolve the problem was to set the authentication provider to NTLM, but in my case, I didn’t have, (and wasn’t likely to get) that configuration changed in the site (a SharePoint Web Application) I really needed access to.

In order to make it work, I had to initially create my reference to a similar, accessible site.

(e.g. http://host/sites/myaccessiblesite/_vti_bin/lists.asmx )

Then, I had to initialize the service as such, in code:


private void InitWebService()
{
System.Net.AuthenticationManager.Unregister("Basic");

System.Net.AuthenticationManager.Unregister("Kerberos");

//System.Net.AuthenticationManager.Unregister("Ntlm");

System.Net.AuthenticationManager.Unregister("Negotiate");

System.Net.AuthenticationManager.Unregister("Digest");


SmokeTestSite.Lists workingLists = new SmokeTest.SmokeTestSite.Lists();

workingLists.Url = "http://host/sites/mybrokensite/_vti_bin/lists.asmx";

workingLists.UseDefaultCredentials = true;

workingLists.Proxy = null;

lists = workingLists;
}

What this accomplishes is it unregisters all authentication managers in your application domain. (This can only be done once in the same app domain. Attempts to unregister the same manager more than once while the program’s running will throw an exception.)

So by having all the other authentication managers disabled in the client, the server would negotiate and agree on Ntlm authentication, which succeeds.

Tech in the 603, The Granite State Hacker

“ETL”ing Source Code

The past couple weeks, I’ve been between projects, which has gotten me involved in a number of “odd jobs”. An interesting pattern that I’m seeing in them is querying and joining, and updating data from very traditionally “unlikely” sources… especially code.

SQL databases are very involved, but I find myself querying system views of the schema itself, rather than its contents. In fact, I’m doing so much of this, that I’m finding myself building skeleton databases… no data, just schema, stored procs, and supporting structures.

I’m also pulling and updating metadata from the likes of SharePoint sites, SSRS RDL files, SSIS packages… and most recently, CLR objects that were serialized and persisted to a file. Rather than outputting in the form of reports, in some cases, I’m outputting in the form of more source code.

I’ve already blogged a bit about pulling SharePoint lists into ADO.NET DataSet’s. I’ll post about some of the other fun stuff I’ve been hacking at soon.

I think the interesting part is how relatively easy it’s becoming to write code to “ETL” source code.

Tech in the 603, The Granite State Hacker

Reading SharePoint Lists into an ADO.Net DataTable

[Feb 18, 2009: I’ve posted an update to show the newer technique suggested below by Kirk Evans, also compensating for some column naming issues.]

The other day, I needed to write some code that processed data from a SharePoint list. The list was hosted on a remote MOSS 2007 server.

Given more time, I’d have gone digging for an ADO.NET adapter, but I found some code that helped. Unfortunately, the code I found didn’t quite seem to work for my needs. Out of the box, the code missed several columns for no apparent reason.

Here’s my tweak to the solution:

(The ListWebService points to a web service like http://SiteHost/SiteParent/Site/_vti_bin/lists.asmx?WSDL )

private data.DataTable GetDataTableFromWSS(string listName)

{

ListWebService.Lists lists = new ListWebService.Lists();

lists.UseDefaultCredentials = true;

lists.Proxy = null;

//you have to pass the List Name here

XmlNode ListCollectionNode = lists.GetListCollection();

XmlElement List = (XmlElement)ListCollectionNode.SelectSingleNode(String.Format(“wss:List[@Title='{0}’]”, listName), NameSpaceMgr);

if (List == null)

{

throw new ArgumentException(String.Format(“The list ‘{0}’ could not be found in the site ‘{1}'”, listName, lists.Url));

}

string TechListName = List.GetAttribute(“Name”);

data.DataTable result = new data.DataTable(“list”);

XmlNode ListInfoNode = lists.GetList(TechListName);

System.Text.StringBuilder fieldRefs = new System.Text.StringBuilder();

System.Collections.Hashtable DisplayNames = new System.Collections.Hashtable();

foreach (XmlElement Field in ListInfoNode.SelectNodes(“wss:Fields/wss:Field”, NameSpaceMgr))

{

string FieldName = Field.GetAttribute(“Name”);

string FieldDisplayName = Field.GetAttribute(“DisplayName”);

if (result.Columns.Contains(FieldDisplayName))

{

FieldDisplayName = FieldDisplayName + ” (“ + FieldName + “)”;

}

result.Columns.Add(FieldDisplayName, TypeFromField(Field));

fieldRefs.AppendFormat(“”, FieldName);

DisplayNames.Add(FieldDisplayName, FieldName);

}

result.Columns.Add(“XmlElement”, typeof(XmlElement));

XmlElement fields = ListInfoNode.OwnerDocument.CreateElement(“ViewFields”);

fields.InnerXml = fieldRefs.ToString();

XmlNode ItemsNode = lists.GetListItems(TechListName, null, null, fields, “10000”, null, null);

// Lookup fields always start with the numeric ID, then ;# and then the string representation.

// We are normally only interested in the name, so we strip the ID.

System.Text.RegularExpressions.Regex CheckLookup = new System.Text.RegularExpressions.Regex(“^\\d+;#”);

foreach (XmlElement Item in ItemsNode.SelectNodes(“rs:data/z:row”, NameSpaceMgr))

{

data.DataRow newRow = result.NewRow();

foreach (data.DataColumn col in result.Columns)

{

if (Item.HasAttribute(“ows_” + (string)DisplayNames[col.ColumnName]))

{

string val = Item.GetAttribute(“ows_” + (string)DisplayNames[col.ColumnName]);

if (CheckLookup.IsMatch((string)val))

{

string valString = val as String;

val = valString.Substring(valString.IndexOf(“#”) + 1);

}

// Assigning a string to a field that expects numbers or

// datetime values will implicitly convert them

newRow[col] = val;

}

}

newRow[“XmlElement”] = Item;

result.Rows.Add(newRow);

}

return result;

}

// The following Function is used to Get Namespaces

private static XmlNamespaceManager _nsmgr;

private static XmlNamespaceManager NameSpaceMgr

{

get

{

if (_nsmgr == null)

{

_nsmgr = new XmlNamespaceManager(new NameTable());

_nsmgr.AddNamespace(“wss”, “http://schemas.microsoft.com/sharepoint/soap/”);

_nsmgr.AddNamespace(“s”, “uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882”);

_nsmgr.AddNamespace(“dt”, “uuid:C2F41010-65B3-11d1-A29F-00AA00C14882”);

_nsmgr.AddNamespace(“rs”, “urn:schemas-microsoft-com:rowset”);

_nsmgr.AddNamespace(“z”, “#RowsetSchema”);

}

return _nsmgr;

}

}

private Type TypeFromField(XmlElement field)

{

switch (field.GetAttribute(“Type”))

{

case “DateTime”:

return typeof(DateTime);

case “Integer”:

return typeof(int);

case “Number”:

return typeof(float);

default:

return typeof(string);

}

}