At the end of the day, the most important component in information is whether you can trust it or not. There are multiple reasons why you should be skeptical of information when you receive it, even if it is delivered to you by a company specialized in collecting it:
What makes information trustworthy is the fact that you have the ability to check it at any given time. How can you trust that which you can’t verify?
“Trust, but verify.” ~ Ironically enough, this proverb is estimated to be a paraphrase of Vladimir Lenin and Joseph Stalin
The Internet
The Internet has become a gigantic space. It started off with the adoption of the TCP/IP communication protocol back in 1983, exploded in the early 2000s, and has only been growing since.
It is estimated that 67% of the world’s population uses the Internet today. This translates in an enormous amount of information being created and shared every day, roughly 328.77 million terabytes daily.
If we assume a HD movie takes up around 4Gbs’ worth of disk space, 328.77 million terabytes worth of data would be like storing 82.192 million movies. And that’s only a daily volume.
There are two major obstacles in the way of capturing the full scope of the Internet’s Data today: reach of collection and frequency of collection.
Internet’s architecture is distributed by default. This means that information can’t be accessed on any single endpoints. The information present on the Internet is very literally, everywhere, both physically and virtually.
In other words, if you want to access the most amount of data from the Internet you possibly can, you need to crawl through every single place where it can possibly be.
And this is where it gets tricky, because as mentioned earlier, the Internet’s data is very literally everywhere. In other words, you won’t be able to access it all if your Internet Address (or IP address), stays in the same place.
For example, some social networks, forums, services and more can are geo blocked. Geo-blocking is a technique used to block access from a service to a user based on that user’s location. This happens due to legal compliance reasons like it did for Threads and Vkontakte but also for ideological reasons in the case of Truth Social.
On top of that, several companies have started monetizing the access to their data, even if that data is actually publicly accessible. This is the case for X (previously Twitter). We talked about the dangers of making data harder to access in a previous story here if you want to learn more about it.
2. Frequency
Frequency of collection here refers to the ability to capture data posted on the Internet frequently enough so that you don’t miss out on important bits of information.
On top of the very large volumes of information that are being created and uploaded to the Internet, there is also the question of information being edited, removed or otherwise altered once it has been initially uploaded.
Collecting such information every 5 minutes, every day or every week can actually have a very significant impact on the quality of the information collected, especially when dealing with social networks.
One of the use cases we are working on establishing lies in using Exorde’s data to monitor unusual/ill-intentioned activity from social network users attempting to propagate false or altered narratives.
We made a complete report on this actually, when we carefully studied our data over a month-long period over the Ukraine/Russia conflict. Amongst other things, we realized that a good many posts written in French aiming to promote Vladimir Putin were actually getting deleted very fast after having posted… but not fast enough for our protocol to miss out on them *wink*. You can read the full report here.
Capturing the data was very literally, only ever half of the problem. As astounding as that may sound, processing the data to make something useful out of it, is very much a whole different subset of problems.
Think of raw data like wood. Having wood is nice, but you can’t really do anything with it if you don’t transform it one way or another. You can set wood on fire to shed some light and heat, you can sculpt it to build structures and tools, you can even refine it and use it as decoration nowadays. Just as you can transform wood in many ways, so can you transform data.
Transforming data is like forging steel. To make steel you essentially need two base components: iron and carbon. However, you also need a precise amount of each and just the right temperature. If you get any of the proportions, the timing or the temperature wrong, it all goes to waste. The same applies to data. Without the right kind of processing techniques, your data is useless. Except, unlike forging steel, it won’t be as obvious whatever transformation you applied was useless.
This is essentially why Data Scientists and Data Analysts exist, and why this sector is gaining in momentum day by day.
Every company has a different way of processing the data they are collecting, but in ours, we want to look at specific metrics that allow us to quickly classify how people are talking about these topics: an emotion-based analysis.
In other words, we look at the way people talk about topics online, and from there, we are capable of rebuilding a narrative on the direction global conversations are taking, how fast they are spreading, and where.
Imagine you were trying to push a false narrative on social networks today. You make a clickbaity headline, some catchy text, maybe an AI-generated image to captivate your audience’s attention. Once the post is out, your only objective is to get people to share it, so it can propagate far and fast.
This is exactly what Exorde was built to measure: how fast your narrative is spreading, and where. As everything is timestamped, it’s easy to retrace a narrative to its origin, even if the original content was deleted.
This means that Exorde is capable of analyzing information online using the same KPIs (Key Performance Indicators) that anyone attempting to spread misinformation would be using to measure the efficacy of their work.
Our Solution
Exorde Labs’ team has been working relentlessly for over 4 years now to develop a technical solution to all the issues stated earlier. Complex problems require complex solutions nonetheless, especially when it comes to monitoring the information on the Internet today.
Here’s what makes our approach unique:
This approach guarantees the best and most reliable coverage of the Internet, in the most transparent and neutral way technically conceivable today.