Big Data: The 6 Vs You Need to Look at for Important Insights
Trump is not very tech-savvy: there is no computer at his desk. His assistant once revealed that he does not use email. Yet, a big data company, Cambridge Analytica, ensured that he won the elections. The company developed a model that can predict the personality of every adult in the United States using big data. Personalized ads were created. "We can target villages or apartment blocks. Even individuals,” explained CEO Nix in an interview with VICE. Trump behaved like a perfect opportunistic algorithm that follows the reactions of the public. And we know what this led to.
"Virtually every message that Trump broadcast was driven by big data. – Alexander Nix, CEO Cambridge Analytica"
Make visible what was previously hidden
The above is an example of what you can do with big data. It works according to the principle that the more you know about something or a situation, the more you can make reliable predictions about what will happen in the future. Comparing multiple kinds of data reveals relationships which were previously hidden. This offers you insights that make it easier for you to reach your target audience.
The various Vs of big data
Big data is best described with the six Vs: volume, variety, velocity, value, veracity and variability.
Volume is an obvious feature of big data and is mainly about the relationship between size and processing capacity. This aspect changes rapidly as data collection continues to increase. Just like the IT capacity for storage and processing.
Walmart, a company with an incredible amount of data, is building the largest private cloud in the world to handle large amounts of data per hour. With the Data Café program, they model, manipulate and visualize this information to gain insight into their shoppers. A practical example: during Halloween, sales analysts could see that, although a special new cookie was very popular in most stores, there were two stores where it was not selling at all. This was quickly picked up on and it turned out that the cookies were accidentally not placed on the shelves. It was resolved immediately.
When you talk about big data people often only think of volume, but there are also the five other Vs that can help you make data valuable: These Vs are also important in enriching smaller databases.
In addition, with big data volume can also be "high-dimensional": you can ask big questions about small data.
The V of variety describes the wide variety of data that is being stored and still needs to be processed and analyzed. New types of data from social networks and mobile devices, among others, complement existing types of structured information. For example: audio and video files, photos, GPS data, medical files, instrument measurements, graphics, web documents, bonus cards and internet search behavior. Unstructured data such as voice and social media make processing and categorizing data extra complicated. How do you ensure you are only taking the data that helps target your audience?
An example from my own practice: a charity has a database of households. These include features such as car ownership, value under the Valuation of Immovable Property Act (WOZ) and whether people are donors or not. I linked this data to the Mentality segmentation tool. I then searched in that database for the features that your donor company can predict. So, I calculated which households had a high chance of becoming a donor and the charity undertook targeted fundraising actions.
I also enriched the customer base for a media company with social interests. This allows the company to approach potential customers (potentials) which resemble existing customers (lookalikes). The potentials then receive specific offers, creating a huge conversion boost.
Predicting political views
In the case of Trump (above), his recruiters had an app that identified the political views and personalities of all the residents of a household. They only visited homes where the app predicted that their message would be listened to. Trump's people were prepared with guidelines for conversations tailored to the personalities of the residents. The street team entered all responses into the app, allowing all this data to be fed to the Trump campaign team headquarters.
Velocity is a measurement of the temporary value of data. Big data is rapidly changing. Therefore, we need to process structured and unstructured data streams quickly to take advantage of geolocation data, perceived hypes and trends, and real time available market and customer information. Velocity involves the condition that you need to process your data within minutes or seconds to get the results you're looking for.
This V describes what value you can get from which data and how big data gets better results from stored data.
For example, I enriched the database by postal code area for a Dutch retailer. Based on the specific customer information, the retailer decided which location for a new store would have the best connection with the target group. Enrichment allows you to make predictions. My customer also chose the layout of the store and the offer to suit the specific wishes of (potential) shoppers.
Also, a good way to value your big data is to work with personas. They give a name and face to different customer groups and are a very powerful way of making organizations more customer-oriented. Personas were devised because there was a need to profile the many website visitors, thus increasing the user-friendliness of these sites.
You can create personas based on available customer behavior data. For the Van Gogh Museum, for example, personas have been created to bring the different visitor types to life.
Veracity shows the quality and origin of data, allows it to be considered questionable, conflicting or impure, and provides information about matters you are not sure how to deal with. In short: the truth and authenticity of the data, and what can you do with it? In a sense, it is a hygiene factor. By showing the veracity of your data, you show that you have taken a critical look at it.
Everything belonging to a company's core process is reliable, the rest is contaminated. You must take this pollution into account. You must be convinced that the data you have selected will also work properly and will be sufficient. It is a lot of monotonous but necessary work.
Finally, variability: to what extent, and how fast, is the structure of your data changing? And how often does the meaning or shape of your data change?
For example, take the newspaper subscription benefit: an internet subscription costs 50 euros, a paper subscription 100 euros subscription, and a paper and internet subscription 100 euros. One option is illogical. If you offer these options to people, most people choose a paper and internet subscription, which seems more advantageous. But if you take away the illogical choice: an internet subscription for 50 euros or a paper and internet subscription for 100 euros, then many people will choose the internet subscription.
In this way, the composition of a questionnaire or, for example, unsubscribe buttons changes how things appear to people and thus the outcome. In purely technical terms this means: if you change variables, your model will also change.
Use big data and check out the Vs that apply to you
There are several ways of working with big data that give you interesting insights. For example, you can use it to target potential voters, to directly track changes in your stores, to make personas and lookalikes, and to predict donorship. So, if you have a database, then it is a pity to do nothing with it. Use the Vs that apply to you, and you cannot go wrong.