Introduction to XML

Learn about XML

XML stands for Extensible Markup Language. XML is a specific structure for formatting documents to be readable by both humans and computers. If that sounds super general, that’s because it is.

Unlike most programming languages (and even other markup type languages like HTML) XML has no opinions about the purpose or type of data it encodes, rather it is a set of rules for how data should be organized.

Let’s look at an example of XML in action:

<drivers-license> <name>Mike Dane</name> <sex>Male</sex> <eye-color>Brown</eye-color> <organ-donor /> </drivers-license>

In the code above, we’re using XML to encode driver’s license information.

In order to meaningfully represent our driver’s license information, we want to indicate both the type of information we’re representing (eg. name) and also the specific value relevant to this driver (Mike Dane).

XML allows us to do this easily by using tags.

Container Tags

Tags are special containers in XML where we can store specific pieces of information. For the name, sex and eye color fields above, we’re using container tags and storing the driver’s information inside of them.

<eye-color>Brown</eye-color>

Container tags consist of two parts: a start tag (<eye-color>) and an end tag (</eye-color>) with an element name inside of them (eye-color).

The element name in the code above, eye-color, is placed inside less than and greater than symbols, and is used to define the type of information contained within the tags.

With the information encoded in this way, a computer program could parse through the XML document and figure out that the driver’s eye color is Brown by looking inside the eye-color container tags.

Tags used for grouping

We can also use these container tags to group related data together.

<drivers-license> <name>Mike Dane</name> <sex>Male</sex> ... </drivers-license>

Notice in our driver’s license document that we’re also defining a <drivers-license>...</drivers-license> container tag, which has other, more specific elements inside of it. This is a common practice, and is often used to wrap related data into groups.

Theoretically we might have an xml document storing hundreds of driver’s licenses, each one being contained in a <drivers-license> tag.

We could do something similar with the driver’s name field:

<drivers-license> <name>Mike Dane</name> ... </drivers-license>

Instead of storing their first and last name in the same <name></name> tag, we’ll use the name tag as a grouping container and create separate tags for first and last:

<drivers-license> <name> <first>Mike</first> <last>Dane</last> </name> ... </drivers-license>

In the example above we now have three levels of nesting: drivers-license -> name -> first/last. This is one of the great things about XML, it’s useful not only for storing data, but for preserving the hierarchical relationships between the data.

Single Tags

Container tags are used when we want to encode specific data values like name and eye color, or to group more specific elements like in the case of drivers-license or name. But you’ll also notice another type of tag in the driver’s license XML, storing the <organ-donor /> element.

<drivers-license> ... <organ-donor /> </drivers-license>

This is an example of a single tag. Single tags are self-closing, meaning we only need one tag not a separate tag for opening and closing.

The driver’s organ donor status lends itself to a single tag because it doesn’t need to enclose a specific value. A driver is either an organ donor or they’re not. So in the case of the driver’s license XML, if the <organ-donor /> element is there, we know they’re an organ donor, and, if it’s not, we know they’re not— we can get all the information we need based on the existence of the tag.

We indicate a single tag by putting a forward slash / after the element name.

Attributes

We’ve talked about the different types of tags and how we can use them to define the various elements, or pieces of information in our XML document. The final tool we can use in XML are attributes.

Attributes are special pieces of information we can define about an element within its tag.

<drivers-license state=”NY”> </drivers-license>

In the code above we’re using an attribute to define information about the driver’s license element. In this case, defining the state where the license was issued.

Attributes are placed after the element name and contain the attribute name followed by an equal sign with the attribute’s value placed inside quotation marks.

Attributes can be named anything. It’s sometimes common practice for every element of the same type to have the same attributes— so every drivers-license element in our document could have a state attribute.

Review

  • XML stands for Extensible Markup Language
  • An XML document is composed of a series of Elements which are organized in a hierarchical fashion through the use of Tags
  • Container tags consist of two separate tags: a start tag and an end tag.
  • Container tags can have other tags nested inside of them.
  • Single tags are self-closing, meaning they only have one tag.