Everything Is Data!

Christian Luebbe @ EPFL Extension School · 13 minutes

Everything! 'Just' transform it into numbers so AI can learn from it.

In Data comes in many forms we started taking a closer look at different data types and their applications. In particular, we introduced you to the following common data formats:

    1. Tabular data
    2. Text
    3. Audio data
    4. Visual data: images and videos

But there is much more out there! After all, every piece of information that we record creates data. In this article, we will introduce you to five more data types and highlight domains in which they are used to develop AI systems:

    5. Temporal data and time series
    6. Networks
    7. Geospatial and location data
    8. Emotions
    9. The internet of things

In fact, by the end of this article, you will hopefully agree that, as the title suggests: Everything is data!

Every object around us contains countless pieces of information. If you can record and capture this information, you can turn it into data.

So let’s jump back in!

5. Temporal Sequences: Time Series

An important thing to realize about information is that it is rarely static. Rather, it tends to develop and evolve over time. To capture these changes in information we also record the temporal aspects of our data. The audio and video data that we discussed in the previous article are good examples of this.

But there are many more applications of temporal data. For example, we might record the time and date of a particular weather pattern, or how long it rains for compared with how long it is sunny. By making these recordings at regular intervals we can track changes, for example the fluctuation in temperature over the course of a fixed period of time, such as a week.

Temporal data allows us to order data into chronological sequences. These sequences are called time series and they help us gain insights into historical events. We can use time series to record any of the data types discussed in this and the previous article over time. For example, our posts on social media form a time series of text data.

Similarly, we might measure how much time passes between individual events. We can record recurring events, like births, and ask how often they happen. We can look at changes in quantities over time by counting how many times an activity occurs within a set time period. There are also long-term trends and seasonal patterns to consider – a geologist might consider a time series of millions of years.

Recording this data gives businesses the opportunity to make plans. When do people tend to book flights? When are people buying the most ice cream? Learning from the past helps us to plan for the future, and if you know the peak demand times for certain goods and services you can capitalize on that information.

Detailed analysis of temporal data can help us to identify relationships between cause and effect, which further increases the accuracy of our forecasts. However, in many real-life situations, we find a vast number of different influences and quantities interacting with each other in highly complex ways.

An airplane uses millions of individual moving components and electronic systems that are all interacting with each other to keep the plane in the air and on course. Identifying structures and relationships within such convoluted temporal data might seem impossible, but it’s vital that we do so for the safety of the passengers and crew. Fortunately, the huge computational capacity of AI technologies means it can observe patterns in complex systems that a human brain could never detect.

Like humans, AI systems learn from past events and extract patterns and insights that enable them to predict the future. Time series data plays an important role in lots of different domains, so the potential for this AI technology is very broad. For businesses, time series data helps them plan budgets and inventories, and to forecast expected sales and costs. Time series data also helps them to identify their most loyal customers, as well as those who might be about to shop elsewhere.

Humans only capture a very small portion of time series data. Most of it is automated by sensors and machines for the simple reason that they’re much better than we are at reliably recording a wide range of signals over extended periods of time.

Take fitness trackers and smartwatches. They contain accelerometers which measure changes in motion, and with a little help from AI they are able to tell whether you are walking, running or cycling. Car insurance companies take a similar approach when it comes to monitoring and characterizing driving styles so they can adjust their premiums accordingly.

Time series data can also help identify fraudulent activity. Banks and credit card companies already make use of AI technologies to spot suspicious patterns, but the potential for fraud also exists in online gaming platforms where users can often make multiple purchases within a relatively short time. AI uses time series data to tell which of these transactions fall within the bounds of normal behavior, and those that look suspicious.

6. Networks

A network describes a series of connected points, and also the varying importance of those connections. Points in a network, also known as nodes, can represent anything from people and places to more abstract things like words.

A public transport network consists of different bus and tram stops – these are the nodes. The connection between those nodes are the bus and tram routes. The importance of each connection could be measured in terms of how many buses or trams pass through every hour, or how far it is to the next stop along the line.

It’s easy to represent such networks in a single diagram:

Public transportation network of the local railway traffic in the Léman region around the lake of Geneva.

While it’s easy to create a diagram for a simple transport network, it would seem impossible to do the same for something like the world wide web1 – the network of every website worldwide, commonly referred to as the internet. You couldn’t do it for social media platforms, either. These networks have up to 2.7 billion users (nodes) and each of those users has an average of 300 friends (connections).

But we can identify groups and commonalities in these larger networks when we analyze them with AI technology. Among these groups we can predict which types of social media posts will achieve high levels of engagement. This sort of information is extremely valuable for marketing and advertising companies, and it informs the sort of content they create and deploy on their social media channels. However, the same information can also be used to spread rumors and fake news.

Highly complex networks exist in the real world too. Physical networks like multinational road and rail networks, electrical grids, and other pieces of critical infrastructure require a lot of clever scheduling and resource allocation. AI applications are able to do this much more efficiently and accurately than humans. Retailers use AI to arrange the logistics in their supply chain and deploy their resources where they will be most effective. Back online, streaming services use AI to manage networks of servers according to changing consumer demands throughout the day.

7. Geospatial and Location Data

Geospatial

Depiction of all public transport stops in Switzerland.

Not so long ago, when we needed to find a particular location we’d have to look at a paper map or ask a stranger for directions. These days we can open an app on our phones or use a navigation system in our vehicles to find the way. This is all thanks to the Global Positioning System (GPS) that knows our location relative to our desired destination.

Mobile apps use GPS to give us all sorts of information about our surroundings. On our screens we can see where all the shops and restaurants are, where we can find a taxi and how to best avoid a traffic jam. And even when a reliable GPS signal is unavailable, we can still access geospatial information from a local Wi-Fi hotspot.

There are two ways of capturing this location-based information. We can use a stationary geographical reference system to give us what we refer to as geospatial data, or geodata for short. Examples of geodata include GPS coordinates, postal addresses, or more locally defined grids like floor plans in shopping malls or factories.

Alternatively we can use a local, potentially moving, reference system. This local system is what we humans use to perceive and assess our immediate environment relative to our own position. Autonomous robots and self-driving cars use a local system involving infrared sensors, radar and lidar2 to locate objects in their vicinity and track them relative to themselves.

We use location-based data in these navigational apps every day, but there are many other use cases that are perhaps less obvious. Once again, we find ridesharing apps making the most of these new technologies. These look at the data of common pick-up and drop-off points to predict future demand and despatch drivers to those areas.

Insurance companies use location data to calculate your home insurance; is your neighborhood high in crime? Does it have a history of being flooded easily? Geodata also plays an important role in the energy industry, where AI helps decision makers to find reservoirs of crude oil and optimal locations for wind farms.

Location-based information is also of vital importance when it comes to organizing a timely response to a natural disaster or an unfolding humanitarian crisis. AI systems automatically survey satellite images to assess the scale of the damage, the number of displaced people, and the accessibility of the affected area.

Geodata is also used in public health to see connections between groups of people suffering from ailments like respiratory diseases and environments with high rates of pollution. One of the earliest known uses of geospatial analysis in epidemiology happened during the 1849-1854 cholera outbreak in London. By mapping the locations of people infected with the disease, John Snow was able to identify a single water pump as the likely source. Today, AI is using geodata to track the spread of viruses like Ebola and Covid-19 in a similar, but much more efficient, way.

8. Emotions

Nothing defines humans more than our emotions and feelings, so it’s perhaps unsurprising to find that they’re incredibly difficult to identify and record with sensors.

However, we do leave clues to our emotional state in our digital activities: the words and expressions we use in messages, the reviews and social posts we write, and every emoji, “like”, and share all say something about the way we feel. Social media networks use algorithms to capture all of these subtle displays of emotion to build up a personal profile of each user.

😃😆😅🤣😊😇😉😍😘😋😜🤪🤨🧐😢😏😒😔😟😖😫😭😤😡🤬🤯😳😱🤗😰🤔🤭🤫😬😧😴🤤😵🤐🤧🤒🤕
Emojis are a very effective way to communicate emotions in a compact way. An increasingly diverse set of emoji enables the description of more and more differentiated feelings.

As little as 250 “likes” provides enough information for AI technologies to determine your demographic and psychological traits. And from there they can predict your opinion on different topics with more accuracy than your parents or your partner. With this information, social networks know which posts will result in your engagement.

These psychological profiling techniques can be used to target your newsfeed with content that will actively influence your emotions and subsequent opinions. Influencing how people might vote in elections has been demonstrated several times around the world by a data company called Cambridge Analytica.

9. The Internet of Things

Soon, everything will be connected and sharing data. Increasing numbers of machines and devices are collecting data through built-in sensors and sharing it over the internet. These devices can interact with each other, and we are able to monitor and control them remotely.

They'll be home from work in 20 minutes! - Perfect, the coffee will be ready in a minute! - Oh, no! I forgot to order milk!

Machines can stream data live from distant locations, allowing engineers to run diagnostics, deploy software updates, and carry out other forms of maintenance without having to visit the actual site.

This network of interconnected devices is referred to as the Internet of Things (IoT), and it produces a vast amount of every kind of data imaginable. In the near future, household appliances like fridges will be part of this network and have connectivity that will allow them to add items to your shopping list, or even place the grocery order all on its own.

There are already many such devices in the healthcare sector which help care for the elderly and the sick by monitoring their vital signs. These devices produce a steady stream of data that can be analyzed for patterns and anomalies. When they detect something like an irregular heartbeat or heavy breathing, they can send an alert to nearby doctors and carers, saving vital time for diagnoses and interventions.

The whole is bigger than the sum of its parts.

In this article, we have introduced you to a range of different types of data and discussed a variety of applications for them. We have also seen how all of these different data formats help us and AI systems to generate fresh insights and make data-driven decisions.

Our discussions have focused on individual types of data, but we can also combine different data types and data sources to go even further. For example, we have already seen that video combines visual, audio and temporal data to create a format that gives us more insight than any one data type can do on its own.

When we look at a combination of images and sounds, we might be able to identify the speaker as well as the context of what is being said. When multiple data formats are combined, the ability of AI systems to create value grows exponentially. The whole is bigger than the sum of the parts, and we can still only guess at the innovative ways AI will combine and exploit these different data types.

Energy companies provide us with another example of where AI systems are involved at every step of business operations. They use AI trained on user behavior to predict future demand and the supply of renewable energy by looking at regional weather patterns. Both of these predictions are then used by another AI system to plan electricity production and energy storage. All of this is coordinated alongside the national grid to cope with peak times and distribute energy as and when it is needed. And unlike standard algorithms, AI has the flexibility to respond to constantly changing requirements quickly and efficiently.

Everything is data – and valuable to someone.

Our DNA encodes everything that makes us who we are, from the color of our eyes and the shape of our nose to our ability to run fast or our likelihood of developing Alzheimer’s. Sequencing entire genomes has become significantly faster, easier, and cheaper over the last decade. Research institutes and pharmaceutical companies work on identifying the genes associated with different diseases and study how patients will react to pharmaceutical drugs based on their DNA profile.

AI technology is ideal in this field because it is equipped to detect the sort of hidden patterns you’d expect to find in large and complex DNA data sets. Yes, even the data in nature’s blueprint is now subject to interpretation by AI.

It should be clear by now that if something can be recorded, it can become data. From there the data can be used to either train AI systems, or fed to AI applications to derive insights and outcomes. This dual usage is why data itself has become such a valuable, sought-after commodity – something that can be harvested, mined, refined, and traded.

  1. With the Opte Project Barrett Lyon visualized in 2003 at least some small part of the internet. 

  2. Lidar is an abbreviation and stands for “laser imaging, detection, and ranging”. 

Next

AI and Philosophy