Saturday, December 20, 2014

Serial – een geslaagde contentstrategie

Zojuist is het laatste deel van Serial gepubliceerd. Serial is serie van 12 podcasts van de hand van Sarah Koenig en geproduceerd door This American Life van WBEZ. Het volgt het onderzoek van Sarah naar een moordzaak in 1999. In 12 afleveringen wordt het verhaal verteld van de moord op Hae Min Lee in Baltimore, de veroordeling van Adnan Syed voor deze moord en het onderzoek naar het bewijs – 15 jaar later.


Na aflevering 1 was ik direct verslaafd. Het plot is intrigerend: een ontkennende verdachte, veroordeeld op uiterst dun bewijs dat op veel plaatsen elkaar tegenspreekt. Daarnaast nemen Sarah Koenig en haar team de luisteraar mee op hun zoektocht vol twijfel, verbazing en soms zelfs humor.

Wat zijn de aspecten die Serial in mijn ogen bijzonder maken, en wat doet dit verhaal in een Blog over voornamelijk de technische en business aspecten van contentstrategie?

Serial en contentstrategie


Iedere content strateeg kent inmiddels het belang van een goed verhaal, en dit is een goed verhaal. Niemand begrijpt hoe de zachtaardige 17-jarige Adnan zijn eveneens 17-jarige ex-vriendin heeft kunnen vermoorden, maar hij is er wel voor veroordeeld, ondanks soms flinterdun bewijs. Het interessante verhaal plus de persoonlijke – haast intieme – manier waarop het verteld wordt zijn voor mij de eerste stap in deze geslaagde contentstrategie.

Een tweede stap is de aandacht die in deze productie is gegaan. 15 Maanden onderzoek van een compleet team van journalisten heeft geresulteerd in 12 podcasts van ieder ongeveer een uur non-fictie radio. Het dichtste wat erbij komt dat ik ken is Argos radio – maar dan maatje XXL. Er is een eigen website met ondersteund materiaal en er is zelfs speciaal een eigen soundtrack gecomponeerd door Mark Henry Phillips en Nick Thorburn.

De derde stap is de – voor mij – ideale kanaalkeuze. Ik zit dagelijks twee uur in de auto, en een uur Radio 1 is echt genoeg per dag. Podcasts zijn voor mij ideaal omdat ik kan luisteren naar de dingen die mij interesseren: Argos, de Content Matters Podcast en vele anderen, op het moment dat het mij uitkomt.

De vierde stap in deze geslaagde contentstrategie is het commerciële aspect. De eerste podcasts worden betaald door het WBEZ radiostation, gesponsord door MailChimp. In opvolgende podcasts hoor je extra sponsors aanhaken, aangetrokken door de kwaliteit en/of het succes van de podcast. In één aflevering doet Sarah Koenig een oproep voor de financiering van een tweede seizoen. Deze financiering is binnen een week rond.

Ook ik heb een kleine donatie gedaan aan Serial. Het blijft raar dat ik zonder knipperen 20 euro neertel voor een nieuw goed (e-)boek van mijn favoriete auteurs, maar twijfel over het doneren van een paar euro na het gratis krijgen van een fantastisch audioboek van 12 uur.

Conclusie


Mijn doel van deze blog was om te vertellen over hoe ik genoten heb van deze fantastische podcast, en is dat niet het bewijs van een fantastische content strategie? Gratis Word-of-mouth marketing. Doel bereikt met vlag en wimpel! Ik wacht geduldig op seizoen 2 van Serial, of vergelijkbare initiatieven.

Vertel een goed verhaal, zorg dat het kwalitatief goed gemaakt is en in een handig formaat. En laat de credits (in welke vorm dan ook) binnenstromen. Dat geldt voor elke goede contentstrategie.

Friday, November 21, 2014

Big improvements for Big Content


At HintTech we are one of the first able to test MarkLogic 8, our favorite Big Data & Big Content platform.



On MarkLogic World earlier this year, MarkLogic offered an insight in the new features, such as JavaScript & JSON support, SPARQL 1.1 & Inferencing and Bitemporal support.

In the coming days we will post blogs with revealing updates & inside information about MarkLogic 8 and our test drive of these features.

Read the Blogs through our corporate website, or subscribe there and we will keep you posted.

Be the first to know about MarkLogic 8!

Thursday, November 6, 2014

Zelf lesmateriaal maken

'Leerkrachten moeten zelf lesmateriaal kunnen maken' kopten de grote landelijke media vandaag.



Het idee is dat leraren niet afhankelijk moeten zijn van de drie grote uitgeverijen van schoolboeken, maar moeten zelf kunnen bepalen welk lesmateriaal ze gebruiken. Bovendien moeten ze dit materiaal ook zelf kunnen maken en uitwisselen.

De Stichting Leermiddelen Keuze* wil met behulp van studenten van de lerarenopleidingen van Noordelijke Hogeschool Leeuwarden en de Rijksuniversiteit Groningen digitaal en papieren lesmateriaal gaan beoordelen op kwaliteit en vervolgens voor iedereen beschikbaar stellen.

De minister heeft al interesse getoond voor het initiatief, en D66 zal het voorstellen bij de behandeling van de begroting van het ministerie van Onderwijs.

Kanttekeningen

Dit klinkt natuurlijk als een prima plan, waarbij leerkrachten hun kennis en kunde kunnen gebruiken om het onderwijs beter en goedkoper te maken. Omdat het zo’n goed idee is plaats ik hier graag de volgende kanttekeningen bij, mede vanuit mijn ervaringen uit het project Book2Fit:

  • In lesmateriaal gaat het niet alleen over de enkele les, maar juist ook over een complete methode die een leerling moet begeleiden van “zero to hero”. Veel leerkrachten zullen prima in staat zijn om uitstekend lesmateriaal te produceren, maar wie zorgt het dat het ook in een methode past?
  • In het verlengde hiervan is het belangrijk dat lesmateriaal goed gecategoriseerd wordt, zodat het teruggevonden kan worden voor hergebruik. Voordat je als ijverige leerkracht zelf lesmateriaal maakt over bijvoorbeeld Ebola is het nuttig om te controleren of een collega niet ook al een prima les hierover beschikbaar heeft gesteld. Om lesmateriaal goed te kunnen Metadateren om het uitwisselbaar en toetsbaar te maken heeft Edustandaard het Onderwijsbegrippenkader opgesteld. Ik zou hopen dat alle nieuwe initiatieven gebruik zullen maken van het OBK.
  • Uit eerdere initiatieven zoals Wikiwijs blijkt dat het maken van lesmateriaal geen sinecure is. Goed dat de stichting de kwaliteit wil gaan garanderen. Doen zij ook een controle op het authentiek zijn van het lesmateriaal (auteursrechtcontrole)?
  • Bij dit soort initiatieven zal de 1% regel voor communities zeer waarschijnlijk ook gelden: er zijn veel meer afnemers dan aanbieders. Wellicht zal de 1% actieve bijdragers moeten worden vrijgesteld om lesmateriaal te maken, of stel ik dan grote uitgeverij van schoolboeken nummer 4 voor? De minister heeft al aangegeven dat overheid zich er niet mee moet bemoeien.


Book2Fit

Ik heb zelf mogen bijdragen aan Book2Fit. Een tool waarin Daidalos (nu HintTech), Ricoh en IT-workz al in 2008 samen hebben gewerkt aan een tool die leerkrachten in staat stelt om zelf lesmateriaal samen te stellen.

Binnen Book2Fit hebben we ervoor gekozen om juist samen te werken met de drie grote uitgeverijen van schoolboeken. Zij hebben materiaal ter beschikking gesteld voor leerkrachten om mee te arrangeren.

Daarom konden leerkrachten hun vertrouwde lesmethode blijven gebruiken en tegelijkertijd inhoud van de methode naar eigen wensen aanpassen. In deze benadering maakt het niet uit of een docent daar voorzichtig mee begint, of meteen een groot deel van het materiaal zelf maakt of arrangeert.

*Ik heb zelf geen Stichting Leermiddelen Keuze kunnen vinden, kent iemand ze?

Monday, October 27, 2014

Content is an investment in customer experience

Now and again you run into a quote that ties things together. Some time ago I saw a presentation by Schneider & Nichols on content strategy that included the following quote:

“Content is an investment in customer experience with measurable return”

I have worked for commercial publishers for years, and am also active in content marketing. I always felt the connection between both worlds from a technological perspective, but had problems combining both business models.

This quote nicely ties both worlds together: commercial publishers have always invested in creating content to get a measurable return. The customer experience is that good that customers are willing to pay for the content. Commercial publishers measure content success by turnover and profit.

Content marketers want to create memorable customer experiences. This can only be achieved by investing in memorable content. Creating a meaningful way to measure your return is a challenge any content marketer will face. Although not as easy to measure as commercial publishers, a clear definition must and can be made to justify your content marketing investments.


Content writer by Ritesh Nayak via Flickr (CC BY-SA 2.0)

Here is a previous nice quote

Monday, October 13, 2014

Alfresco Summit 2014

Update: Alfresco One 5.0 is now publicly available. Read what's new here.

October 8th and 9th I had the opportunity to visit the Alfresco Summit 2014 in London. Alfresco is an Enterprise Content Management System, so the focus is not (just) on web content, but all the content an organisation would manage: documents, assets and metadata, managed in specified processes. Alfresco differs from other ECMs through its open architecture and extensive support of standards.



Highlights

The Summit offered an eclectic mix of business, solution and technical sessions. For me as a content strategist, the highlights were:

  • Search: The newest version 5.0 of Alfresco will support the newest version of the SOLR search engine (4.10). This will boost the retrieval of documents, for instance by offering facets (or better: filters. I learned managers don’t like the word “facets”. It sounds too complicated) and other improvements.
  • Media Management Module: In the first quarter of 2015 an improved Media Management Module will become available, offering improved Asset Management like Transcoding Services, improved Metadata management and Sharable collections.
  • WCM: Alfresco Web Content Management has spun-off in a separate product: Crafter. Crafter offers full WCM functionality, keeping all useful Alfresco features such as events, workflows etc.
  • Open source SharePoint integration: Alfresco announced it is contributing its Microsoft SharePoint open source integration to the Apache Software Foundation. This integration connects Microsoft SharePoint to virtually any enterprise content management (ECM) system, including Alfresco, using the open standard CMIS (Content Management Interoperability Services) from OASIS. 
  • Semantic Enrichment: A case study was shown where open source tools were used to semantically enrich documents (read my earlier blog about this subject) in Alfresco. Semantic enrichments can dramatically improve use and retrieval of documents.
  • Activiti integration: Alfresco integrates Activiti, a light-weight workflow and Business Process Management (BPM) Platform targeted at business people, developers and administrators. Its core is a BPMN 2 process engine for Java. 
  • Fred: Our friends at Xenit offer Fred. Fred provides an (extra) intuitive user-interface on top of Alfresco, making the use of Alfresco even easier.


Retrospective

Alfresco already offered a fine palette of functionality. But ECM software often has to vindicate its existence (and licence / support fees) over shared drives or other uncertainties about generic ECM functionalities. The new functions and integrations improve Alfresco’s position in the ECM market.

Friday, October 3, 2014

De E-boekenmarkt is in beweging

Het zijn interessante tijden op de e-boekenmarkt. De markt leek een plateaufase te hebben bereikt, terwijl business en techniek nog lang niet uitontwikkeld zijn. Onderstaande figuur toont de afzet in e-books per kwartaal volgens het Centraal Boekhuis, met een flinke dip in verkopen (rood).

e-Book afzet volgens Centraal Boekhuis

Hierbij een aantal ontwikkelingen van de afgelopen maanden:

  1. 2e Hands e-books
    Tom Kabinet biedt de mogelijkheid om e-books 2e hands aan te bieden en te verkopen. Dit tot grote vreugde van boekenlezers en tot ergernis van het Nederlands Uitgevers Verbond. De rechter heeft bepaald dat Tom Kabinet vooralsnog door mag gaan met zijn winkel in 2e hands e-books.
  2. All-you-can-read e-books
    Schreef ik al eerder over Flat Fee lezen van artikelen, ook op de e-book markt doet dit model zijn intrede. Onder andere Scribd, Amazon, Oyster en Entitle zijn hiermee gekomen, waarbij Entitle in ieder geval ook lezen op de e-reader ondersteund.
    In Nederland is Elly’s choice gestart: 10 e-books per maand voor €2,99 per maand. Klinkt aantrekkelijk, maar helaas alleen (nog) boeken van VBK en Dutch Media, dus nog geen Spotify voor e-books.
  3. E-books lenen
    Bij de samenwerkende bibliotheken kun je sinds kort ook e-books lenen.
    Inmiddels zijn er 125.000 e-booklezers ingeschreven die al meer dan 500.000 e-books uit het assortiment van ongeveer 7000 boeken hebben gedownload. Hiermee is de e-boekenmarkt aanzienlijk verruimd (zie cijfers CB).
  4. Centrale e-boekenplank
    Het CPNB heeft LeesID geïntroduceerd, een centrale e-boekenplank voor e-books. Via LeesID hebben lezers een centrale toegangsplek tot alle e-books van de verschillende deelnemende aanbieders.
  5. Buitenlandse e-books
    Bol.com en Kobo gaan samenwerken. Voor BOL betekent dit toegang tot een enorme collectie buitenlandse e-books en een geavanceerd technologisch platform. Voor Kobo is het een grote sprong op de Nederlandse markt



On the platform, reading van Mo Riza via Flickr (CC BY 2.0)

Waarom zo veel rumoer?

Kenners verwachten een grote entree van Amazon op de Nederlandse markt. Als een dergelijke speler de markt betreedt zal er veel veranderen. De markt zal (hopelijk) groeien, en de bestaande partijen zullen (waarschijnlijk) marktaandeel verliezen. Daarom is het verstandig nu nog zo veel mogelijk strategische posities te betrekken.

Wat zijn uw ervaringen met bovenstaande innovaties? En op welke verandering zit u als e-lezer te wachten? 

Meer lezen?



Friday, September 26, 2014

Similar needs, different results

I recently had the pleasure of concluding two consulting assignments. Though both clients have a lot in common, the outcome was quite different. Why?

Situation and solutions

Both clients are publishers that serve national and international markets. Both offer multilingual products to professional users in PDF and XML-derived formats (such as HTML or ePub). Both need to update their publishing processes to keep up with changing stakeholder demands and are faces with outdated publishing infrastructures.


Image by Sean MacEntee from Flickr

But at one client we advised an Alfresco based solution, whereas the other client we advised a MarkLogic based solution. Being so similar, why two different outcomes?

XML as design choice

Needing editing by countless authors and editors worldwide, client A implicitly chose to set an open and well maintainable storage format as a standard. Generally, (X)HTML will not suffice for publishers, failing at standardisation of contents and the quality of the printed products. So XML it is.

Some excellent Commercial-off-the-Shelf XML products are available (including SDLs Live Content and Alfresco based Componize), but the of-the-shelf part usually means complying with their internal standards: DITA. Although DITA is a great XML format, it is not suitable in all cases, as it was in this case.

Wanting more specialised workflows and interfaces, we concluded that a custom XML based CMS combined with an online XML editor would fit this client’s needs best, with MarkLogic being the best platform to implement this. Binary content (like PDF) will be supported in the system through a layer of XML Metadata.

XML as an option

Client B’s content is mainly created in separate and formalised processes, with lots of content being available only in PDF. Editing being out of this equation, re-use and workflow were of much greater importance.

This client also sees the opportunities for XML and more dynamic publishing, but the hurdles are substantially bigger, and the rewards in editing would not be felt. Rather than going all-in on XML, client B will transform selected publications to XML.

For this client the out-of-the-box functionalities like workflow and flexible content storage tips the scale to an Enterprise Content Management System. Alfresco being the ECMS of choice, this strategic platform will provide the room to manage content and changing customer needs that will characterize enterprise IT for years to come.

Conclusion

It is often said that the devil is in the details. But these details often drive big design choices. Decisive product-ownership provides the roadmap for such strategic choices.

Thursday, September 25, 2014

Thesis students

Being close to the Delft Technical University, my employer has tightened our relations with universities. I have the pleasure of mentoring two students performing their thesis study at HintTech.

Krishna Akundi from TU Delft has just completed his thesis study on visualizing large news archives in the temporal perspective. From a more fundamental research of the subject, his study has transformed into the quest for finding the most usable interface to visualize news trends in topics over time.



Marian Szabo from TU Delft has just started on the subjects of finding and fixing errors in ontologies. All ontologies have the potential for errors, and Marian quest is trying to help fixing them by finding, visualizing and showing possibilities to repair the errors.

Both offer interesting challenges in the exploration, understanding and visualization of Big Content sets, especially sets of RFD.

We are grateful to Newz for use of their data and providing substantive direction.

Wednesday, July 2, 2014

Content Matters Podcast on Semantic Technologies

I am very happy and proud to be featured on this months Content Matters Podcast. Iain Griffin and I talk about Semantic Technologies. We discuss the general concepts and the use of these technologies in recent projects such as Newz and Kennisnet.



If you are active in Content Strategy: Subscribe to the excellent Content Matters Podcast on iTunes or Soundcloud.

Friday, June 27, 2014

Every page is Page One


In my last post I referred to the Lessons from the New York Times innovation report.
One of the conclusions was that they were focusing too much time and energy on Page One. The NY Times number of Home Page Visitors is in a steady decline, whereas Page Views are fairly constant:


In episode 29 of the excellent Content Matters Podcast Mark Baker also talked about this idea and his book: Every Page is Page One.

Marks focus is Topic-based Writing for Technical Communication and the Web, but I think his message can be translated to almost any context (it does for the  NY Times!).

Remember the days that accessing a website used to be typing in the URL or using your bookmarks? (Who still uses bookmarks? As a Chrome user: Chrome knows my bookmarks better than me).

But now, much of the traffic to websites is either through search-engines answering specific queries – so dropping off users deep into your website – or by shared deeplinks to specific pages in your website.

Both will ignore your homepage. And where the page itself will probably look OK (you will have taken care of that one), what impression does this One Page give the visitor – the visitors Page One?

Will the visitor understand you and your story from this page? Will they leave your website it at this page because they don’t understand you? Or will they visit some of your other pages, each one as important as Page One?

Monday, May 26, 2014

Een inspirerende editor

Binnen HintTech organiseren we regelmatig One Day Product sessies. Tijdens deze One Day Product sessies bedenken en bouwen we binnen één dag een werkend product.

Bij de One Day Product sessie van donderdag 22 mei hebben we gewerkt aan beeldverwerving bij artikelen. Klassieke beeldverwerving is vaak een duur en weinig creatief proces, met een erg statisch eindresultaat. Zou het niet fantastisch zijn als je tekstverwerker je inspireert tijdens het schrijven door interessant beeldmateriaal te tonen?


Met dit idee ging het team uit Dayon, HintTech en Tripitch aan de slag. Irina ontwierp een mooi interface, dat door Martin en Sebastiaan werd gebouwd. Tegelijk gingen Jorg, Fedor en Fabian aan de slag met tekstherkenning en het aanboren van beeldbanken.


Na een lange dag van nadenken, ploeteren, discussiëren en bouwen mag het resultaat er zijn: MOOD THIS!

Probeer het uit! Type een tekst in, en als u een paar seconden niet typt gaat MOOD THIS! op zoek naar een moodboard met beelden en kleuren op basis van woorden uit de tekst. Het resultaat zal vaak verrassend zijn (het is afhankelijk van social media tagging) en hopelijk inspirerend, maar vaak ook gewoon erg leuk.


MOOD THIS! is geïnspireerd door Plastr,  een project van Sebastiaan Hoejenbos en Michiel Peters.

Wednesday, May 21, 2014

Lessons from the New York Times innovation report

A few days ago, the full NY Times innovation report was leaked to the press. The NY Times did a rigorous self-examination, and came to some incredibly important observations on the why and how of true transformation to the digital era.



Reading this document sparked a WOW moment in me. I think it is an absolute must read for anyone dealing with content strategy. Not so much for profound new thoughts, but more for the insight in where NY Times digital performance was good and where it is was poor, affirming that changing a content strategy is hard.
The Nieman Journalism Lab wrote an excellent report on it.

I would like to share a few of the observations and proposals, which I run into in my daily practice:

Create a strategy team

The newsroom is too focused on the daily task to produce the daily newspaper, which leaves insufficient time and focus to assess and implement strategic changes.

I think we all can relate to this problem. It is OK to have a team focused on the daily chores, but somewhere change has to happen. Creating a strategy team is a good idea – as long as it is well-anchored within the standing organisation and has the power to implement change.

Put the Reader first & Grow an audience

The NY Times offers content through all the obvious channels. But by better headlines, better search and better social media skills, NY Times was outperformed, even on their own content. Simple changes such as posting at a time your audience is actually reading, repurposing, repeating, repackaging and personalizing.

The battle for an audience isn’t fought anymore on selling a paper or a website by its front-page: every article has to fight its own battle on the saturated content market.

The report also suggests branching off and seeking audiences in different forms. Hosting events and talking to real customers should be on anybody’s agenda.

Structure to repurpose

The report tells a compelling story on why content needs to be enriched through tagging and structured data.

This reaffirms the choices we made in Newz project: structuring content by location, time (and events), story type and topics (persons and organisations). The NY Times adds to this: timeliness, story threads, story tone and the use of imagery.

I feel that any content-organisation should consider structuring content on some of these base functions.

Overcome the huge cultural change to become ‘Digital First’

The report points out that changing a newsroom from a century-old habit of paper & front-page to ‘Digital First’ is a huge challenge. The cultural change will take new technology savvy people and new managers encouraging cultural change.

The Times created many beautiful digital showcases like “Snow Fall”. But the NY Times built “Snow Fall” and not a “Snow Fall building tool”.



Anyone working difficult markets should be willing to experiment more in presentation formats, accept imperfection, accept failure, measure so you know what succeeds and have the tools ready to expand on success.

Read this!

If you are in any way active in content strategy or news: Please read this report. See that you are not alone in your struggle and gorge on the valuable insights.

Thanks to a Dutch analysis from the Bladendokter that pointed me to the report.


Friday, May 16, 2014

MarkLogic World Amsterdam

May 15th, the MarkLogic World Tour touched down in Amsterdam. Approximately 300 attendees flocked the Amsterdam Arena to be immersed in MarkLogic and it possibilities. As longstanding MarkLogic partner and implementer, my employer Dayon and HintTech sponsored this event. MarkLogic presented interesting new Developments, Case studies and Technical Sessions.

Big Content challenges 

Keynote speaker an MarkLogic CEO Gary Bloom and Joe Pasqua, Senior Vice President of Product Strategy explained MarkLogic choices for addressing Big Data challenges, but also the Big Content challenges (see my earlier blog) such as Heterogenic structured data: supporting Multi Schema XML and changing schema's.

MarkLogic 8

MarkLogic 7 already introduced a lot of interesting new features; MarkLogic 8 raises the bar some more:

  • JavaScript & JSON support
    Building Applications on MarkLogic will be available to lots more developers: front-end oriented developers and other developers who prefer {} over <>
  • SPARQL 1.1 & Inferencing
    The Triple Store in MarkLogic 7 is a good starting point, but becomes much more complete by adding Inferencing and SPARQL 1.1 support
  • Bitemporal support
    A very interesting feature, which I could use immediately: What did we know at a certain point in time?

Cases

I visited the more business oriented tracks, with cases from Standards Norway, Intel, and the Healthcare.gov case, known to us as Obamacare. MarkLogic explained that their parts of the system kept performing throughout the toughest time. If only all components would have been this reliable…
Some of our customers presented as well: Ted van Dongen of Swets told how a great tool needs good people to implement. Thanks for the compliments Ted!
Our (previously) own Michel de Ru contributed for Newz to a panel session on Sharing Data despite the Silos.

Can you have too much of a good thing?

I will admit to anyone that I am biased: I think MarkLogic is a great tool that solves many challenges my customers face. But I had an interesting discussion: with all these new features, will MarkLogic become a too monolithic solution? Can you have too much of a good thing?
My 2 eurocents are: For now, keep piling on the good stuff, MarkLogic!

Monday, May 5, 2014

Big Content challenges

At Dayon, we are used to work with Big Data. Coming from a publisher’s background, we have provided content solutions to publishers since 1997.

I read some stories about Big Content, and was intrigued that Gartner saw Big Content as the unstructured part of Big Data. To me, Big Content is the structured version of Big Data.

Let me explain this and address some challenges and Big Content technologies.

Planned Variety

In Terms of the three Big Data V’s (Volume, Velocity and Variety), publishers content is odd. Since the goal of publishers is to make a profit from providing content, content must be able to be published to a vast arrange of channels. To enable this, content must be structured (preferably in XML) and enriched with metadata. Any Variety is planned, because unplanned Variety leads to unplanned structures and/or unplanned publications.

Data is generated, where Content is handcrafted. Tweets en Facebook-posts are only lightly structured, but Blog posts are already quite structured. Some numbers by Chartbeat can be found here and a useful insight by Fastcompany on the rise of “Big Content” as a marketing Tool.

Publishers Content is usually completely structured: XML + Meta Data, sometimes already as RDF Triples (read my earlier Blog post on Semantic Technologies).

So to me, Content is structured Data. Big Content problems differ from other Big Data problems, where handling the Variety to understand your data is a big issue. Therefore, I would like to label the publishers challenge to be a Big Content challenge.

So how big is Big Content?

A quick scan at some of our publishing clients provided these numbers (XML only!):

  1. Publisher 1: 10 million files, 25 GB
  2. Publisher 2: 750.000  files, 15 GB
  3. Publisher 3: 150 million files, 15 GB
  4. Publisher 4: 1 million files, 15.000 new files per day (max)
  5. Publisher 5: 45 million files, 20.000 new files per day (max)
  6. Publisher 6: 500.000  files

Challenges of Big Content

With these numbers in mind, what are the challenges for Big Content?
  1. Volume - XML: Are 30 million XML files a challenge? Or 25 GB in XML a challenge? It really should not be, but in reality I have met quite some technologies struggling with these amounts. An XML system should be true XML to handle this amount of data. XML isn't hard. Doing XML right is hard. If you don’t do XML right, 100.000 files or 1 GB of XML can get you plenty of headaches.
  2. Volume - Other file types: Alas, not all Content is XML. Many Publishers still manage huge amounts of HTML, PDF or other file formats. With PDF, huge numbers often also turn into huge volumes because multi-channel and hence print-quality PDF is stored.If you have to index lots of other file types, do a proper intake process per file and weed out the corrupt and the largest files.
  3. Volume - Subscriptions: At various clients I encountered the problem that Big Content is offered in large amounts of different Subscriptions. Whereas a large amounts of different Subscriptions are not a problem in itself, the combination of Big Content and Big (number of) Subscriptions often is. So if you offer lots of data, be smart about the number of Subscriptions.
  4. Volume - Triples: Nearly all Publishers storing Big Content are looking into Triples as a way to store and link Meta Data from their XML files. Storing your Meta Data in a Triple Store, and Linking it to the Linked Open Data can be a very good idea, but this calls for a Big Triple Store. A set of 1 billion Triples isn't exceptional, but also requires Big Content Technology.
  5. Velocity - Real Time Indexing: Failing at real time indexing is usually the first sign that you are becoming a Big Content publisher. Many technologies struggle with incremental updates, needing complete re-indexing, which in term leads to strange solutions such as overnight indexing, flip-flopping or indexes out of sync with the rest of the front-end.
  6. Velocity - Real Time Alerting: The value of Content depends on its relevance, and timeliness is a huge factor in relevance. Real Time Alerting will offer a competitive edge to content users. To provide Real Time Alerting, XML store need to handle alerting efficiently (using minimal resources) at load time
  7. Variety - Presentation: A Big Content challenge can be how to present all of this Content. If a simple “What’s New” view results in 20.000 hits, what are you going to show the customer?The most used solutions are:
    1. Provide a Search Only interface
    2. Provide as much structure from Meta Data as possible to assist the user in drilling down to the most useful Content
  8. Variety - Enrichment: If the Meta Data you need to provide useful segmentation of your Big Content to your end users just isn't there; there is a need for additional Enrichment. Big Content will (due to costs) call for automated enrichment using Natural Language Processing

Big Content Technology

At Dayon / HintTech we strongly believe that Big Content challenges require specialized Big Content Technology. Here are some of the Big Content Technologies we have implemented:
  1. MarkLogic
    Several of our Big Content clients have selected MarkLogic as their content platform. I believe that MarkLogic is the best XML store and indexer available at this moment.
    As a big bonus, MarkLogic comes with all kinds of useful features such as XQuery, an Application Server, and now even a Triple Store.
    Find out more about MarkLogic at MarkLogic World Amsterdam and meet us there!
  2. OWLIM
    In our project at Newz we needed a Big Content Triple Store. We found OWLIM by Ontotext to provide an excellent Big Triple Store, as did BBC and Press Association.
    W3C maintains a list of Big Triple Stores, with BigOWLIM as one of the top products.
    We also selected Ontotext as our partner for their Semantic Tagging capabilities.
  3. SOLR
    We have also implemented SOLR for Big Content collections. SOLR will not face all of the Big Content challenges, but is a great Open Source search engine.

PS: After writing this blog, I feel like renaming Meta Data to Meta Content. Probably better if I don’t…

Thursday, April 24, 2014

Using Semantic Technologies to crunch Big Data

The interest in Big Data (nice post by a collegue on this subject) has sparked a new interest in Semantic Technologies. It is clear that the Volume and Variance of Big Data requires technologies that can structure and segment Big Data into useful and usable structures. For this, Semantic Technologies are used.

However there are different kinds of Semantic Technologies around, so I will start off with an introduction on Semantics and the Semantic Web. Next I will cover two key Semantic Technologies to arrive at the goal of this introduction: How do Semantic Technologies help us to crunch Big Data.

Semantics and Semantic Web

In 2001 Tim Berners-Lee and others published in article in Scientific American: “The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities.”

This title provides a first definition of Semantic Data:
"Content that is meaningful to computers"
Tim Berners-Lee understood that HTML web pages were useful to humans, but since they were (and often still are) encoded to store visual information, rather than the meaning of the information, they were of no use to automated systems to understand.

To be meaningful for computers, content has to be encoded in such a way that the meaning is clear, and can be processed automatically. XML is the first step in this. The progress so far is:

HTML:<p>Tim Berners-Lee</p>
XML: <author>Tim Berners-Lee</author>

The computer can now apply a formatting style on all authors, and can index them separately, but still cannot use the meaning of the concept “Author” or distinguish this Tim Berners-Lee from any other Tim Berners-Lee (If you think this is a silly example, please visit the Michael Jackson disambiguation page on Wikipedia: http://en.wikipedia.org/wiki/Michael_Jackson_(disambiguation)).

Wikipedia defines Semantic Technology:
"Using Semantic Technologies, meanings is stored separately from data and content files, and separately from application code"
So in our example, the author role is matched to a central definition for the creation of documents, preferably using a standard such as the Dublin Core standard “DC.creator”.

XML: <author>Tim Berners-Lee</author>
RDF expressed in XML: 
<rdf:Description dc:title=" The Semantic Web">
    <dc:creator>Tim Berners-Lee </dc:creator> 
</rdf:Description>

In the next step we can replace “The Semantic Web” and “Tim Berners-Lee” with Unique Resource Identifiers (URI). For ease of understanding, the URI for Tim Berners-Lee could be: http://en.wikipedia.org/wiki/Tim_Berners-Lee and the article could be referenced as: http://dx.doi.org/10.1038/scientificamerican0501-34.

So from a formatted piece of text we arrive at a well-defined relation between two specific URI’s. A computer now can apply logic, based on understandable definitions and relationships.



Such a relationship is called a “Triple”: consisting of three pieces of information – from left to right: a Subject, a Predicate and an Object – together describing a piece of knowledge.

The de facto standard for expressing Semantic information is the W3C's Resource Description Framework (RDF).

So what do we need to make the Semantic Web work?

  1. Well defined relations – like the Dublin Core relations, for instance RFD Schema’s
  2. A way to store a multitude of triples: A Triple Store
  3. Vocabularies: the concepts and relationships between them that describe a certain domain.
  4. Semantic Enrichment to create triples from unstructured data

This article is about Semantic Technologies, so let’s look at how Triple Stores and Semantic Enrichment will help us to get to our goal: Linked Big Data

Triple Stores

Triples are a specific way to store information. To use Triples in an effective way – querying using SPARQL and reasoning – a special database is needed to store these Graph structures. These databases are called Triple Stores

From the given example, it is easy to understand that a vocabulary + dataset can expand into millions or billions of triples. Performance – both ingestion and querying – are important considerations.

Some of the better known Triple Stores are Sesame and Jena for smaller implementations and OWLIM, MarkLogic and Virtuoso for large implementations.

Semantic Enrichment Technologies

To use the Big, we have to understand the Data. In an ideal world, data is created according to a well organised ontology.

Alas, in most cases Big Data is created with no ontology present. To create structure from unstructured data (or structured with a different goal in mind) we need automatic recognition of meaning from our data.

This usually starts with recognising types of information using Semantic Enrichment Technologies. Semantic Enrichment Technologies are a collection of linguistic tools and -techniques such as Natural Language Processing (NLP) and artificial intelligence (AI) to analyse unstructured natural language or -data and try to classify and relate it.

By identifying the parts of speech (subject, predicate, etc.), algorithms can recognise categories, concepts (people, places, organisations, events, etc.), and topics. Once analysed, text can be further enriched with vocabularies, dictionaries, taxonomies, and ontologies (so regardless which literal is used, concepts are matched, for example: KLM = Koninklijke Luchtvaart Maatschappij = Royal Dutch Airlines).

This layer of linked metadata over our data creates Linked Data.

The quality of enrichment will range from (nearly) 100% for literal translated content to 90% or less, depending on the amount of training that is available.

Linked Big Data

So Semantic Enrichment Technologies gives us the opportunity to turn Big Data into Linked Big Data.
Tim Berners-Lee defined Linked Open Data to comply with the following 5 rules:

  1. Available on the web (whatever format) but with an open licence, to be Open Data
  2. Available as machine-readable structured data (e.g. excel instead of image scan of a table)
  3. As (2) plus non-proprietary format (e.g. CSV instead of excel)
  4. All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
  5. All the above, plus: Link your data to other people’s data to provide context

Governments and other public organisations are putting much effort in providing Linked Open Data for citizens and organisations to use.

Commercial organisations will not likely openly publish their data, but will use the same standards as Linked Open Data (such as HTTP, URIs and RDF) and therefore have similar implementations for Linked Big Data.



Some examples of Big Linked Data and Big Open Data initiatives:

  1. Linked Open Data in the Netherlands, UK and USA 
  2. Linked Open Data sources from DbPedia which, essentially makes the content of Wikipedia available in RDF and also links to GeoNames for geographical locations, Freebase, a community-curated database of well-known people, places, and things 
  3. A browse interface for triple stores 
  4. Enriched Dutch Newspaper articles via Newz 
  5. Dutch Laws in RDF
  6. Europeana opens up European cultural history collections 

So what’s in it for me?

Does your organisation create or own lots of unstructured data? Hidden in there probably is a wealth of knowledge, which you can access:

  1. Find out what structure (ontology) fits your needs
  2. Use Semantic Enrichment Technologies to create structure from your unstructured data
  3. Store your data in a Triple Store
  4. Start exploring, learn & earn

I will post more on Triple Stores and Semantic Enrichments in future blogs.

Tuesday, April 15, 2014

Toegankelijkheid

Op Twitter kwam een mooi citaat voorbij dat door Stephanie Rieger aan het gov.uk project is toegeschreven:
"Making content more understandable has done more for accessibility than almost anything we could have done with code"
Als Contentstrateeg, werkend voor een ICT bedrijf, vind ik dit een prachtig citaat. Natuurlijk is de technologie belangrijk, en moet alles goed werken.

Maar veel websites bereiken niet hun volledige potentie door de nuttige boodschap te verpakken in onbegrijpelijk jargon of onduidelijke uitgangspunten.

Thursday, April 3, 2014

Artikel-platforms: less is more

Een aantal maanden geleden schreef ik een blog bij het uitkomen van een aantal nieuwe artikel-platforms. Blendle en eLinea hebben sindsdien gezelschap gekregen van MyJour  en Artikelgemist. Nu het stof van de lanceringen optrekt is het tijd om weer eens te kijken wat deze initiatieven de gebruiker nu gebracht hebben.

Leve de keuze!

Net zoals in de vorige beschouwing is het eerlijk om te beginnen met een aantal lovende woorden aan het adres van de platforms:

  1. Er zijn verschillende betaalmodellen: zowel pay-per-use als all-you-can-read worden aangeboden
  2. Er is een groot aantal kranten en tijdschriften verkrijgbaar. Biedt Blendle vooral de grote dagbladen, eLinea en MyJour bieden veel grote tijdsschriften, en Artikelgemist biedt de kleine namen
  3. Blendle, eLinea en MyJour hebben erg goed hun best gedaan op het gebied van Usability. Blendle heeft de hele pagina’s van de krant, eLinea heeft Onderwerpen, en allen hebben een frisse nieuwe kijk op de toegang tot artikelen
  4. Er zijn verschillende prijsstellingen per artikel, of een niet-goed-geld-terug knop bij Blende

Er is veel bereikt door deze platforms. Ze voldoen echt in een behoefte die de uitgevers niet konden vervullen. Hulde hiervoor!

Het volgende probleem

Het oplossen van een groot probleem heeft als bijkomend effect dat het volgende probleem zichtbaar wordt, en in mijn mening is het volgende probleem dat van curatie.

Iedere nieuwsbron maakt een voorselectie van het nieuws dat de klant aangeboden krijgt. Met de komst van gratis curatie (radio, tv, Spits en Metro, nu.nl etc.) is de waarde van deze functie afgenomen. Dat is één van de redenen dat steeds minder mensen betalen voor een krantenabonnement.

Maar curatie wordt steeds belangrijker. Er is een barrage van nieuws. Klik maar eens door de startpagina’s van MyeBlinia heen. Mijn Feedly pagina verzamelt per week makkelijk 500 potentieel interessante artikelen. En dan mis ik zeker nog erg veel.

Waar Blendle door de keuze voor integrale edities een ingebakken curatieformule heeft, heeft Artikelgemist alleen een zoekfunctie. En waar MyJour prima één artikel kan verkopen en zelfs nuttige suggesties doet, is het uit de voorpagina duidelijk dat MyJour geen startpunt is, maar een eindpunt; een verkooppunt voor het artikel waarvan ik al weet dat ik het wil hebben.

Een goede betaalde of crowd-sourced curatietool is de volgende stap in een volledig open gebruiksmodel van artikelen.

Weer betalen

We zullen er weer aan moeten wennen te betalen voor een leuk artikel. € 0,19 voor een leuk artikel is echt niet veel, maar het is wel een kleine drempel. Ik heb ooit € 20 tegoed op een iTunes variant gezet, en heb er nog regelmatig plezier van. € 20 op MyeBlinia zetten is waarschijnlijk ook een goed idee. Maar dat liefst uit één tegoed SVP.

PS: Mijn collega schreef een leuke Blendle Review

Thursday, March 20, 2014

Top 4 Web based XML Editors

Why web based XML editing?

Publishing natively from XML still is the best solution for publishing demands. Not only if you want to publish to print. Products will be more functional if they are formatted on basis of their meaning, and not their HTML appearance.

For years, XML editing was cumbersome. Costly, non-intuitive XML editors drove all but the most technical editors crazy, and induced Tag-Terror in others. Plus, they needed to be installed on your local system.
Many developers have tried to use Word as an XML editor, but only the bravest have succeeded. Applying schemas after a submit, or translating Schemas to Templates all resulted in much manual correction or unsatisfied editors.

The dream remained of a Web based XML Editor with the following qualifications:

  • WYSIWIG editing
  • No local installation
  • Can be used by many authors 
  • Minimal training necessary
  • Guaranteed schema compliance

Xopus was one of the first Bracket-free online XML Editors, but there are more contenders now. A concise overview of the top web based XML Editors (and not just DITA!):


Xopus

Xopus was one of the first real web based XML Editors. It received huge attention. Eventually early adapters noticed that it still enforced XML rules. It did not provide MS Word-like editing – as can be expected, but it was clearly a big step.
Xopus was acquired by SDL last year, probably to support the new LiveContent product. Version 4.3 of Xopus was launched in February 2014.


FontoXML

The second half of 2013, FontoXML was launched. FontoXML offers a very intuitive interface and some really nice real-time features.


Xeditor

In 2013, Appsoft launched Xeditor. Also a very user friendly web based XML Editor.




<oXygen/>

<oXygen/> is known for its XML products, but it has also seen the need for online editing. An Author Demo Applet for editing DITA can be found on the <oXygen/> website.


You choose!

Which XML Editor suits your needs best depends on your specific editing needs, but I am happy that there are multiple vendors now wanting to suit your XML Editing needs.

What is the cure for the content shock?

In January, Mark Schaefer published an article that received lots of attention: "Content Shock: Why content marketing is not a sustainable strategy". When most companies are just getting used to the concept that telling convincing and compelling stories is a great way to market products, Schaefer already predicts its demise.

Is the content shock real?

Are we inundated with content? Sure! Just look at the amounts of online content produced / uploaded each day. More and more different channels are trying to reach us, through more and more types of content.
But this is not a new phenomenon. Consumers are battling the information overload since the rise of mass media. This is why the born digital generation is getting better and better at curation.

Is the end of content marketing in sight?

Schaefer certainly has a point that a mass adoption of content marketing will dilute the effects of writing content. It will be harder for companies to reach consumers with just any blurb of content, where eventually the costs of this way of marketing will have to be weighed against the expected income (just like any other type of marketing). On the consumer side it will be harder and harder to digest all this content and filter out useful content from the increasing background noise.
But will this end most of content marketing? Schaefer himself looks at the economics and predicts that the costs of being relevant to your consumers will rise to levels which are unacceptable to most companies.

He certainly makes a good point. With the barrage of information, entering markets will get harder. And interesting marketing channels like video, audio and infographs are expensive compared to text.
The effort will be to create great content; relevant content. And where great content is no guarantee for success, creating mediocre content will increasingly be moot. And yes: great content comes at a price, but distributing your ad on TV or print was expensive as well. So pay less for the channel and more for the content. Convince yourself that your message is relevant to your customers before investing in creating it.

As Robert Rose said “Great content wins. End of story.” Ask any commercial publisher!

It is hard on the consumers as well

With the barrage of content, it is hard on the customer as well. But everyone will still search their personal net before their Zero Moment of Truth. What is the best digital camera? To which school should I send my kids?
So the only way for the consumer to react to the content shock and keep finding the true gems that are out there somewhere is by curation. Customers are increasingly improving their personal filters to battle lame advertisements.
Curation tools – like spam- and virus filters – are probably always lagging on the object they want conquer. So what can you use to curate your information streams?

  • Niche sites that focus on quality over quantity. I myself am a fan of a few sites on Content Strategy. Their Hit over Miss Ratio is that good, that I read anything that is posted there.
  • Human curated groups. For instance LinkedIn Groups offers some great possibilities.
  • Feedly tells me what’s posted on the important sites, but also what the impact of this new is
  • There are some Authorative sites, which I will have to trust, just because they are the biggest
  • Follow influencers on Twitter, use other tools like Flipboard etc.

So there is the answer to you Content Marketeers: Throw your best content at us! Relevant ones will always stick, but don’t expect to be a monopolist.

The original blog-thread:

  1. Mark Schaefer: Content Shock
  2. Reaction from the Content Marketing Institute
  3. Reaction from Businesses Grow
  4. Reaction from Copy Blogger
  5. Tips from the Content Marketing Institute