Everything Is Data!

Christian Luebbe @ EPFL Extension School · 13 minutes

Everything! 'Just' transform it into numbers so AI can learn from it.

In Data comes in many forms we started taking a closer look at different data types and their applications. In particular, we introduced you to the following common data formats:

    1. Tabular data
    2. Text
    3. Audio data
    4. Visual data: images and videos

But there is much more out there! After all, every piece of information that we record creates data. In this article, we will introduce you to five more data types and highlight domains in which they are used to develop AI systems:

    5. Temporal data and time series
    6. Networks
    7. Geospatial and location data
    8. Emotions
    9. The internet of things

In fact, by the end of this article, you will hopefully agree that, as the title suggests: Everything is data!

Every object around us contains countless pieces of information. If you can record and capture this information, you can turn it into data.

So let’s jump back in!

5. Temporal Sequences: Time Series

An important thing to realize about information is that it is rarely static. Rather, it tends to develop and evolve over time. To capture these changes in information we also record the temporal aspects of our data. The audio and video data that we discussed in the previous article are good examples of this.

But there are many more applications of temporal data. For example, we might record the time and date of a particular weather pattern, or how long it rains for compared with how long it is sunny. By making these recordings at regular intervals we can track changes, for example the fluctuation in temperature over the course of a fixed period of time, such as a week.

Temporal data allows us to order data into chronological sequences. These sequences are called time series and they help us gain insights into historical events. We can use time series to record any of the data types discussed in this and the previous article over time. For example, our posts on social media form a time series of text data.

Similarly, we might measure how much time passes between individual events. We can record recurring events, like births, and ask how often they happen. We can look at changes in quantities over time by counting how many times an activity occurs within a set time period. There are also long-term trends and seasonal patterns to consider – a geologist might consider a time series of millions of years.

Recording this data gives businesses the opportunity to make plans. When do people tend to book flights? When are people buying the most ice cream? Learning from the past helps us to plan for the future, and if you know the peak demand times for certain goods and services you can capitalize on that information.

Detailed analysis of temporal data can help us to identify relationships between cause and effect, which further increases the accuracy of our forecasts. However, in many real-life situations, we find a vast number of different influences and quantities interacting with each other in highly complex ways.

An airplane uses millions of individual moving components and electronic systems that are all interacting with each other to keep the plane in the air and on course. Identifying structures and relationships within such convoluted temporal data might seem impossible, but it’s vital that we do so for the safety of the passengers and crew. Fortunately, the huge computational capacity of AI technologies means it can observe patterns in complex systems that a human brain could never detect.

Like humans, AI systems learn from past events and extract patterns and insights that enable them to predict the future. Time series data plays an important role in lots of different domains, so the potential for this AI technology is very broad. For businesses, time series data helps them plan budgets and inventories, and to forecast expected sales and costs. Time series data also helps them to identify their most loyal customers, as well as those who might be about to shop elsewhere.

Humans only capture a very small portion of time series data. Most of it is automated by sensors and machines for the simple reason that they’re much better than we are at reliably recording a wide range of signals over extended periods of time.

Take fitness trackers and smartwatches. They contain accelerometers which measure changes in motion, and with a little help from AI they are able to tell whether you are walking, running or cycling. Car insurance companies take a similar approach when it comes to monitoring and characterizing driving styles so they can adjust their premiums accordingly.

Time series data can also help identify fraudulent activity. Banks and credit card companies already make use of AI technologies to spot suspicious patterns, but the potential for fraud also exists in online gaming platforms where users can often make multiple purchases within a relatively short time. AI uses time series data to tell which of these transactions fall within the bounds of normal behavior, and those that look suspicious.

6. Networks

A network describes a series of connected points, and also the varying importance of those connections. Points in a network, also known as nodes, can represent anything from people and places to more abstract things like words.

A public transport network consists of different bus and tram stops – these are the nodes. The connection between those nodes are the bus and tram routes. The importance of each connection could be measured in terms of how many buses or trams pass through every hour, or how far it is to the next stop along the line.

It’s easy to represent such networks in a single diagram: