I’ve been contemplating the whole “unstructured” thing for a while now, and I’ve developed some new hypotheses about it. The discussion’s been around the fact that Web 2.0 / Enterprise 2.0 generates a lot of “unstructured” data.
I’m not sure “unstructured” is really the most technically fitting word, though. It’s the word that works if you’re a technical person talking to a non-technical person.
I think the information we’re seeing in these settings is typically better structured than what we’ve seen in the past. The structures are being defined by the provider, however, sometimes on an ad-hoc basis, and can change without notice.
If you’re in the geek domain, I think “undefined” fits better. Maybe “unknowable structure”. It’s Null Schema.
I think we’ve all seen tons of this… it’s a trend towards increasing structure with less defined schema. It seems to fit with the “agile” trend.
So the other aspect of this Web 2.0 thing is that the data doesn’t have to just be of an unknowable format. It can also be communicated through a number of communications channels, at the provider’s discretion. People define conventions to ease this. Interestingly, the convened-upon channels end up providing context for the content. In turn, it adds to its structure… more null schema.
It flies in the face of our tightly defined, versioned SOA end-point contracts. XSOA? 🙂
It’s been said that SOA lives in a different problem space, but that may only be a matter of convention, moving forward.