User Tools

Site Tools


teaching:json-ld

JSON-LD

Date: Wed 06 Mar 2019

Introduction

At the beginning of the semester I discussed that we were going to learn how to add semantic data (that is, machine readable data) to our websites in two ways:

  1. By using HTML5 semantic elements
  2. By using JSON-LD

We've learned and used many of the HTML5 semantic elements already. These are things like: nav, header, footer, article, etc. Although these semantic elements only provide broad, general semantic information about the structure and parts of a HTML page, they do work. For example, consider that my use of the nav element has resulted in Google providing navigational links in the snippet for my website:

Google search result for homepage of cseanburns.net

However, in order to provide a detailed description of a web page's content, we need to use an additional technology. Therefore, this week we start to learn and use JSON-LD.

JSON-LD is used to add semantic data to web pages; that is, machine-readable data that may be used to add meaning about the content on a page in such a way that will allow machines (software) to interpret and use that content appropriately. It is based on the JSON format, or the JavaScript Object Notation format. This is a format that, despite the name, is language agnostic and "is built on two structures: a collection of name/value pairs" and "an ordered list of values".

Upon first glance, there does not seem to be much of a difference between a JSON and a JSON-LD record. What makes the latter special is that it uses a vocabulary that establishes shared definitions of terms (i.e., it links data). This means that if we were to use JSON-LD to describe a Movie, we could provide a definition of a movie, via a link to a vocabulary, and then add data about that definition based on the special properties that movies have. For example, movies generally have the following special properties:

  • actors
  • directors
  • duration
  • trailers
  • music

And so forth.

To apply this technology, we therefore have to learn a few tools. Specifically, we have to learn:

  1. How to understand and create JSON
  2. How to use a vocabulary. We'll use schema.org.
  3. How to create a JSON-LD file using schema.org.
  4. How to add this to our web site.

JSON

Per the JSON website, JSON is "a lightweight data-interchange format," which simply means it's a data format used to exchange data across systems, or, to send, receive, and store data.

One of the benefits of JSON is that it is not only a machine-readable format, but it's also a fairly easy format for us to write and read. We'll start with an example of a JSON object. All objects begin with a left curly brace and end with a right curly brace:

{
}

Then within that object, we place a series of name and value pairs. If it helps, think of names as variables and the values as the data assigned to those variables. For our values, we can use a number of different data types, including arrays (lists), strings (text), numbers, or binaries, such as true or false. Most text (strings) will be placed within quotes (except for true or false or nil). Numbers may be unquoted.

Since we're describing objects, we can think of this process as creating metadata for an object. If you remember from ICT 201, I used the TV show Downton Abbey in my lecture on metadata (lecturers for other sections may have used a different example). Let's use that show here to describe it in JSON (at least a little bit of it):

{
"title": "Downton Abbey",
"creator": "Julian Fellowes",
"writer": [
  "Julian Fellowes",
  "Shelagh Stephenson",
  "Tina Pepler"
  ],
"actor": [
  "Hugh Bonneville",
  "Jessica Brown Findlay",
  "Laura Carmichael",
  "Jim Carter",
  "Brendan Coyle"
  ],
"country": "United Kingdom",
"language": "en-gb",
"seriesLength": 6,
"noOfEpisodes": 52
}

So, how do we parse that? What details do we need to know about that format? Let's take a look at the components of that JSON object:

JSON object describing parts of Downton Abbey

schema.org: A vocabulary

In the above example, we used JSON's name and value pairs to describe a little bit of the show Downton Abbey. Although the names we used were pretty straightforward (e.g., title, creator, writer, etc.), I used them mainly because they were conventional and not because they were linked to a specific vocabulary. This is fine under some scenarios, but in order to create linked data, we do need to use terms that have been properly defined. To do that, we'll used schema.org. Although there are multiple vocabularies to choose from, schema.org is a good one creating linked data for search engines, since it was, in part, the result of a collaboration of search engines, including Google, Microsoft, Yahoo, and Yandex.

schema.org is a vocabulary that contains "two hierarchies: one for textual property values, and one for the things they describe", i.e., value and name pairs. In our previous example, we used terms such as title as a name and Downton Abbey as the textual property value for that name, and although I used these names because they were fairly conventional, schema.org provides a vocabulary of terms that help create a common meaning for machines to understand.

The schema.org hierarchy is pretty large and covers many different object types to describe. We can use the vocabulary to name actions, creative works, organizations, persons, places, products, and more. When doing so, we then use specific data types to describe those names.

Since schema.org is a hierarchy, terms in the hierarchy exist at broader and narrower levels. For example, Place is a Thing but that also has more specific properties, such as landform, which also has more specific properties, such as Continent or BodyOfWater, and the latter may have more specific terms, such as Canal, Pond, Waterfall, or more. Viewing the objects linked to for any of these shows that these objects all have specific properties plus examples how to use and describe them.

JSON-LD

When describing something using schema.org, like the Downton Abbey TV show, it's best to search schema.org to see if it contains a vocabulary item that best matches the object to be described. As it turns out. schema.org has a term called TVSeries, which is a subset of CreativeWork which is a subset of Thing. We can borrow properties from any of these to describe the show in JSON-LD. Examining the page for TVSeries, we see some helpful terms. I'll use those terms to replace the terms I had made up in the JSON only record:

{
"@context": "https://schema.org/",
"@type": "TVSeries",
"name": "Downton Abbey",
"creator": "Julian Fellowes",
"author": 
  [
  "Julian Fellowes",
  "Shelagh Stephenson",
  "Tina Pepler"
  ],
"actor": [
  "Hugh Bonneville",
  "Jessica Brown Findlay",
  "Laura Carmichael",
  "Jim Carter",
  "Brendan Coyle"
  ],
"countryOfOrigin": "United Kingdom",
"inLanguage": "en-gb",
"numberOfEpisodes": 52,
"numberOfSeasons": 6
}

At the end, that's all there really is to it.

Adding JSON-LD to a webpage

The last step is to add the JSON-LD to a web page. It's important to note that you should only add JSON-LD content that matches the content found in the HTML on the page. If you add content that is missing or doesn't mirror the content on the page, then searches might demote the page.

To add the JSON-LD, within the head part of the HTML page, perhaps just before the closing head element, place the JSON-LD in a script tag, like so:

<script type="application/ld+json">
{
"@context": "https://schema.org/",
"@type": "TVSeries",
"name": "Downton Abbey",
"creator": "Julian Fellowes",
"author": 
  [
  "Julian Fellowes",
  "Shelagh Stephenson",
  "Tina Pepler"
  ],
"actor": [
  "Hugh Bonneville",
  "Jessica Brown Findlay",
  "Laura Carmichael",
  "Jim Carter",
  "Brendan Coyle"
  ],
"countryOfOrigin": "United Kingdom",
"inLanguage": "en-gb",
"numberOfEpisodes": 52,
"numberOfSeasons": 6
}
</script>
teaching/json-ld.txt · Last modified: 2019/02/25 15:07 by seanburns