Management

Enabling the Interaction Publisher

New sites with dynamic, interactive functionality using data from different sources and allowing the user to interact with the data are exciting to see (examples: geo.worldbank.org and carma.org). But how do we unleash this functionality so that non-programmers can create interaction like this? We have content management systems that allow more people to easily add content to sites. But I think we should be driving toward an environment where users can a) take data from a variety of sources and b) create interactive sites based on this data. Maps are the most prominent example, but interactive tables are also important. Let's review where we are now:

  • We have sites already applying Google maps and other interactive functionality to various data sources (examples above).
  • Programmers have resources/examples/documentation for creating these types of sites (see Programmable Web for example).
  • Various APIs have been exposed for interacting and using data (examples).
  • We have tools like Yahoo Pipes that allow advanced users (probably not needing full-blown programmer skills) to create mashups. That said Yahoo Pipes is now focused on consuming/dealing with RSS feeds (the Fetch Data Module is supposed to more general XML, I had problems getting it to do so -- if you look at examples using DC crime data, you see it's RSS with some customization). In addition, this is a hosted solution, so you're at the mercy of Yahoo if you host a mashup with them (I noted Yahoo Pipes having problems accessing feeds intermittently even in my brief testing).
  • There are probably other similar examples of specialized tools, but I know of Swivel, which allows you to create your own graphs of data.

Here are the types of interactive functionality that I think we should be allowing non-programmers (let's call these folks "Interaction Publisher", riffing off the role of "Content Publisher") to create:

  • Interactive data tables. Interaction Publisher should be able to point at one (or multiple) data source, and indicate which columns/attributes to display in a table. The Interaction Publisher should also indicate which attributes should be selectable (in pulldowns for example) be the end user. Of course some theming / design and annotation should be possible.
  • Interactive maps. Interaction Publisher should be able to point at a data source, the attributes containing the locations, and what data to show for each location (along with the extent of the default map and formatting). Also, please can we get rid of the points / waypoints / circles that indicate arbitrary points that are used to indicate data for a large area (for example, a pointer to the capital for a country), and instead highlight the whole area (for example, the whole country). Ideally the Interaction Publisher will be able to indicate further interaction with the map (for example, displaying different layers of a map -- if not full-blown layers, then at least indicating different sets of waypoints to display).
  • Custom data. The Interaction Publisher should also be able to easily publish their own data/content, and pull their data into an interactive feature (for instance, this could even be a simple search on a little database / resource center the user has). An extension of this would be including some mechanism for overriding other data sources data points (of course this should somehow be indicated on the map/table so it isn't misleading).
  • Wizard-like functionality. The Interaction Publisher should not have to resort to XPATH, XSL, or programming in PHP / Perl / whatever.

Sounds nice -- but how would this be possible? One possible step is for institutions to expose their data in a consistent manner (at least each institution exposing its own data in consistently). This would involve something of a meta-API, where you are consistent about:

  • Attributes that can be queried. Perhaps the list would be just topics and countries, for example. The topics lists should be something that the outside world will understand rather than an organization-centric list. If you have multiple topics lists, then it would be preferable if all systems were moved to a single topics list (even if that meant two topics lists per system).
  • Simplicity and consistency in APIs. Perhaps all your XML APIs are at http://xml.example-domain.com/apis/ (with an html page just listing all the APIs there) and then APIs to different systems like http://xml.example-domain.com/api/documents and http://xml.example-domain.com/api/web with example calls like http://xml.example-domain.com/api/web/api-version=1&topic=agriculture.
  • Consistent exposure of non-standard attributes. The issue of consistent query parameters was covered above -- this means that all systems are queried on the same parameters. But of course some systems will need to provide other attributes (such as, say, "Population"). This could be done in a custom namespace in RSS as the DC crime data (see xml) does in its Atom feed (which Yahoo Pipes, for example, can consume). This could be documented, and the consumer of the data could handle this.
  • Custom databases would also preferably comply. Perhaps there could be an http://xml.example-domain.com/api/core/ for institutionally, centrally supported repositories and http://xml.example-domain.com/api/special/ for one-off databases. This would still allow easy access of data by Interaction Publishers.

Some potential ways of inching toward the goal of the non-developer Interaction Designer easily being able to publish dynamic, interactive features would be:

  • Start by using javascript libraries. There are several javascript libraries out there (examples: Dojo, mootools, Prototype / Scriptalicious), but most seem to be too low-level (concentrating on opening/closing panels, transitions, and the like) to be useful for interactive data features. Possibly a library that has higher level features including interactive table such as EXT JS could be used as a first step. It would require touching some code, but perhaps a CMS, for example, could include in its documentation with code snippets indicating what needs to be replaced (for example, where to put in the url to the source XML).
  • Create some simple wizards in CMSes. So that we aren't relying on, for example, Yahoo Pipes for hosting our interaction, we may wish to start including simple wizards in our CMSes. For example, one could be for interactive tables that just had one data source and three columns.
  • Push for stronger hosted interactive feature builders. For example, Yahoo Pipes perhaps could include some of the features mentioned in this email (for example, a tool for creating interactive maps, or a tool for creating a pulldown of options to drive a Google map.

Here's a little chart displaying some of the ideas in this post (also see pdf version):

I'd really like your comments on this post. Specifically:

  • Is the role of Interaction Publisher important?
  • How could we enable this role?
  • What ideas above do you think would work and which would not work?
  • Is their a need for a separate generic standard XML from RSS feeds, or should an institution's RSS just be extended to include custom portions?

 

Selecting a Content Management System (CMS)

There are already various sites comparing features of content management systems (for example the CMS matrix), so this post aims to help set a framework for selecting a Content Management System (CMS). Aside from standard things to keep in mind when selecting a technology, there are some particularly important items for setting the tone of your CMS selection:

  • Standardization / Governance. Is one of your objectives to standardize the look and feel of your site, or to try to ensure there's a consistent quality across your site? If so, then before you start moving into the new system then deciding who will make the decision of what goes up and how the decisions will get made is important. Sure, an advantage of a CMS is that anyone can publish, but this can lead to inconsistent quality. I'm not just talking about how workflow: for instance, who makes the call about adding a whole new site section?
  • Stakeholder buy-in of objectives. This one is of course part of any technology decision, but some key factors in deciding about a CMS are: a) if you've decided to standardize aspects of your site, make sure everyone is bought in (otherwise people will try whatever they can to get out of the standard), b) if people's jobs are going to change (for instance, people that are doing hands-on HTML coding may not be doing that anymore), then is everyone clear on this?
  • Envision key use cases. After you're in the middle of migrating your content/systems, you may lose sight of why you undertook this in the first place. Laying out key use cases in advance allows you to both not loose sight of the goals and also let's you more easily claim victory. Key use cases might be something like "Will be able to allow any staff member to publish a piece of content, resulting in it automatically appearing on the home page as well as the relevant country page, and also appears in country's RSS feed and email alerts". Of course, you also should list key use cases that you don't want to go away like "Compare statistics across different areas of the site in a consistent manner."
  • Make sure everyone understands the complexity of a move to a new system. See this post that lists some of the complexity.

These are some of the particular factors to consider when selecting a CMS:

  • Tagging. For a large institution, you may have issues keeping consistent quality in your tagging (and you may wish to consider an automated concept extraction tool to help in the tagging). At any rate, you will want to think about a method of tagging that will work for everyone (and ensure that your system will support this).
  • multilingual/internationalization support. See this page that describes different levels of multilingual support. Some more advanced types of features to consider are Administrative Title and Interleaving Languages.
  • distributed or centralized content entry input. This relates to the issue of standardization above.
  • community/support.
  • multiple site support. If you need to have multiple sites, what kind of functionality do you need? For instance, does content need to flow between sites? Do the different sites need to enforce a consistent look/brand?
  • integration with other systems or all-in-one. A key decision will be how you are going to integrate with other systems, and, if integration is not as important (for instance for a smaller organization), then ensure that your solution supports the different functionalities you need.

With everything on the web moving so fast now (who knows when Web 3.0 will be the next thing we're all moving to), consider moving to a CMS environment that will allow quick innovation and new functionality. Some specific approaches to this:

  • Try to pick a CMS that is innovating quickly. Of course, what you really want is to pick what CMS will be a winner in the future, but the best we can do now is pick a CMS that is quickly adding new features. Looking at lists like Joomla's extensions page for any CMS that you're interested in should help with this. Of course, it needs to be easy to add any new modules/extensions when they are released.
  • ease of upgrading to new versions of the core CMS. Obviously hosted, SaaS solutions have an advantage here.
  • ease of writing your own new functionality. Would the CMS allow your team to program (in some lightweight language like PHP for example) their own new functionality? If you don't have the skillset, is there a pool of developers outside your organization who could help? Is there useful documentation on how to write your own new functionality?
  • support to expose/share data. We have RSS as a mainstream feed now, but what about richer XML exposed for more structured data? More and more, we'll need to support people outside our organizations utilizing our content/data to write functionality on their own sites, combining your data with other organizations' data.
  • integration with outside systems. If a CMS already has integration with other types of systems (for instance, stats, newsletters, email alerts, membership databases, etc), then it may be easier to move to future leaders in these different spaces.

Why It's Hard to Migrate Content

You know when it's time to move into a new house or apartment, when you look at the stuff you need to move and think "Why in the world do I have this bread machine? I haven't used this in years and I forgot I even had it." Or you dread moving your old clunker of a TV, thinking of the new fancy flat-panel TVs? Well, it's the same thing with migrating to a new system, for instance into a new content management system. Only it's harder. When you're moving and you're pressed for time, you may just start tossing stuff into boxes to be moved, even when you know you don't totally want all the stuff (one reason: you'll need to negotiate with a spouse about getting rid of something, and there's no time for that). This isn't that big a deal, since it's just moving more of the same stuff. Or, if you have a huge sectional couch that won't fit in your new place, then perhaps you can just sell it to the next homeowner. When you're moving content, you have all sorts of extra things to think about including:

  1. It's not just content. Content on a site doesn't just live in some abstract ether, but it is linked into a larger site context. This includes left navigation, headers, footers, and special site behaviors. Of course moving the site context of a simple site like hobbsontech.com would be relatively easy to move (re-creating the the menus, configuring the overall style, etc), but the more sites you have, the more there would be to do. This is especially relevant for sites with a lot of custom dynamic functionality. For instance, if you have comments on your current site's content, then you'd have to figure out how to embed it in the new framework (or just leave it behind). Chances are you have a lot of functionality distributed throughout your site that may even be hard to inventory.
  2. Metadata and taxonomies. You may have to re-create taxonomies in another system, and there may be incompatabilities you have to work through.
  3. Internal references to other pieces of content. Your content probably refers to itself (for instance, a press release may refer to your product description page). This somehow has to be reflected in a new system.
  4. Structured content. You may have structured content (for instance, a document that has multiple chapters), which you'll need to figure out how to handle in the new system.
  5. Outside references to your content. Other sites, as well as search engines, will have links to your content. You'll need to have some strategy to deal with the links from external sites.

In the end, a lot of this has to do with the web of information that's involved in the content of a web site. And this isn't counting the types of technical issues that would come up with any technical migration (differences in size limits for fields, encoding differences, etc.). Of course there's the issue of why you even have all this stuff to move in the first place (and the more stuff you have the more hassle it is to move). This blog entry has focused on why it's difficult to move all this content, but of course one of the morals of the story is to have less stuff in the first place. In the case of the web this would involve better governance of what goes on the web, and clearly defining what the focus of your web site should be. Hopefully, just like when moving houses, any discussion of moving content would also include discussing what stuff you need in the first place. Unlike houses, having extra or duplicate stuff doesn't just inconvenience you but it is a disservice to your users. I'll leave the issue of the old TV and desiring the new flat panel to a future post (on survivorship bias).

Standardization and Large Web Sites

Very large sites supporting a large number of units/stakeholders can easily turn into a hodge-podge of styles, user interface elements, and quality. One of the toughest discussions with clients, however, is why they can't do more customization (even if one of the core requirements of the system is to help enforce standardization). What are some of the reasons *not* to standardize:

  • specific business needs of different groups (not to be confused with a group just wanting to differentiate itself somehow, for instance with a different look, that does not help the web visitor at all)
  • professional development (for instance a developer might be interesting to do a mashup)
  • personal expression (liking particular colors for example)
  • experimentation (don't know in advance what's going to "stick," so try a variety of things)

In my opinion, the first and last reasons are the most compelling (and the third not being a good reason at all for an enterprise-wide system), although one of the problems with experimentation is the frequent expectation that an experiment could quickly be rolled into the normal standardized platform (that's probably a post on its own!). Here are some reasons *to* standardize:

  • consistent brand for the user ("am I still on the same site? Is this high quality content?")
  • consistent UI for the user ("do I know how to use the site?")
  • better support for new site admins or transition of support between sites
  • single sign-on. It's confusing for a user to have various accounts with the same institution.
  • standard statistics. Different statistics packages can have entirely different ways of counting something as basic as a page view. Standardizing no a statistics package can help ensure you're comparing apples to apples in your web analysis.
  • better search. If everyone does their own thing, then there may be more fragmented information which would mean search results aren't as good.
  • stability / support. As anyone who works with software/systems knows, the more functionality or special customization you put into a system, the more effort it takes to maintain it. Also, the system will probably be less stable. This one is also very tough to discuss with a client (and another probable future blog post) since they tend to only see their particular need.

Some possible methods of standardization:

  • Governance. There needs to be a group with the power and influence to say "no" to requests that undermine the quality of the user experience of the site at large. This ideally is not the technology group since there would appear to be a conflict of interest.
  • Clearly define exactly what is inside the standard and what is outside.
  • Technology. The content management system used to manage the site can be set up such that users can only make changes that comply with the standard.
  • The right level of customization. Standardization shouldn't be an excuse to totally control every aspect of everyone's sites or to not allow any innovation.
  • Hooks into core shared functionalty. You may decide that a single sign on for users of your site is desirable. If so, then perhaps the system could be set up with an API such that tools developed and commissioned by other groups could work with the core functionality.
  • Standardized access to data. Ideally, you could define a standard method of each system exposing its core data, that even people outside the institution could utilize for mashups, etc. By providing the data in a simple XML API, this could facilitate both internal and external usage of data.
  • Another potential approach is to have separate branding for the official, blessed content and for the organization-centric content. For instance, you may have multiple units in your institution all looking at the topic of taxes. Ideally you would have one official web site that makes sense of your institution's view of taxes overall, and preferably this would pull information from all the units. The various units still may want their own site, but this is less useful for the end user -- so perhaps these units could have their own sites branded differently (and perhaps all requiring a standard link back to the official site) to clearly indicate it is the view of a particular unit with your institution.

Of course, all of these are easier said than done when trying to get a large number of units into the same system, but perhaps some of these could be initiated even after a large suite of sites have been implemented in a central content management system.

Search Engine Optimization Basics

This post doesn't attempt to cover more obscure aspects of search engine optimization (SEO), but covers the basics that are really easy to overlook when you work on your site. Also, since Google is the major search player, I just refer to "Google" rather than trying to be more generic.

Step 0: Has Google indexed your site at all?

Go to google.com and do a search on site:your-site-name-here, like "site:http://bhphotovideo.com" to see if byphotovideo.com is indexed by Google. If there are no results, you're not indexed. Some ideas to get indexed: a) put in links from sites / pages you already have (for example, your profile on linkedin.com), b) get other sites to link to you (for example, you can comment on other peoples' blogs linking to your site), c) for blogs, use pingomatic to automatically update other services of your site, and d) submit your site to Google for indexing (not sure that actually does anything though?).

Step 1: What are you trying to accomplish?

This one sounds so obvious and silly, but it's very easy to overlook. It's useful to just write down the search phrases you'd like to find your site. Of course the more specific the better, since generic terms will be very difficult to get high rankings on. For example, I knew I wanted people to find this site if they typed my name and a little about me (for example, "David Hobbs CMS").

Step 2: Make sure your keywords are in the title and header tags, as well as in the text users will see (and preferably in the domain and url)

You may not have control over the domain and url (if you are in some content management systems), but you should at least make sure to have the title, header, and main text contain your terms.

Step 3: Track your progress.

Type your search terms into Google and see how high in the rankings you appear. If you have already gotten good results (first page of results?), it may be time to set your goals higher. For instance, for this site I'm now interested in shooting for more topic-based search phrases such as "multilingual CMS" (currently the 14th page of results). Also, you will want to look for dips in the performance of your search phrases. This is especially relevant to test before and after any changes you make to your site/system. If you're working with a client on their site, by having the metrics (and search goals) before you start you'll be able to more objectively discuss the search performance of their site. Another angle is to look at the terms that people are using to actually find your site. You may find interest in your site from unexpected angles that you may wish to further enhance (for instance, people are finding my site with phrases such as "annotate excel graph", so I may put a more generic introduction to that blog entry).

Repeat.

The first step, to get into the Google index at all, involved getting links to your site. As you proceed, of course you also want to have higher and higher quality sites link to you. As mentioned in the previous step, your search goals will also probably change, and you'll want to add/reword/reconfigure portions of your site (per Step 2 above) to optimize for those new goals.