Big data. It’s the buzzword on everyone’s lips, and it’s often promoted as the answer to nearly any problem in society today. The ability to collect, collate, and analyze the vast amount of data now at the fingertips of global organizations offers huge benefits. Today’s internet companies, from Amazon to Google, make a livelihood out of amassing large amounts of data about everything from warehouses to personal online behavior. Solutions to problems and the promise of economic power come with harnessing big data and extracting trends and insights from it.

There is no denying that the use of data for insight analysis is a very powerful modern day force.

But in the cybersecurity industry, is it really the end all be all that some tout it to be?

The Truth about Big Data

You would think that analyzing big data in the cybersecurity industry would provide insights about threat actor behavior, ability to detect incidents and help to resolve them in real time. In theory, it should be providing a massive performance boost in cyberspace.

There have been some notable successes. One great example is denial of service attacks, when criminals bombard a website with so many requests the computer can’t determine which are real and which are fake. Big data is great for this type of attack, because it looks for common attributes among the fake requests and weeds them out.

However, in most security operations, looking for threat actor behavior and activity in such a large amount of data makes finding any positive results extremely difficult. Often, there are no common traits. Terabytes of all sorts of data are generated from everyday simple activities— merely browsing a shared folder creates hundreds and hundreds of logs. And the problem: who’s to know if that activity represents a real access to data or if you’re a threat actor?

Big data attempts to solve this issue by using machine learning to cluster activity types and use them as a baseline for normal system activity. Known templates of malicious behavior can help in this training process. Bad is what happens outside those norms. The downside is that it’s near impossible to spot new techniques when you approach from a big data perspective. Zero days, novel techniques used to defeat typical computer systems, are just that, new techniques that give system operators zero days to react to the bug and to fix it. Because zero days are, by definition, something new, big data has a real problem in spotting them: either by templates – as they haven’t been seen before – or by anomalous activity as there is so much noise from standard behavior, malicious behavior just doesn’t stand out.

In general, systems using AI, algorithms and big data have not come to fruition as the way to finally solve everything in cybersecurity.

So what is the way forward, then?

The Missing Piece: Right Data

We need to be paying more attention to not just big data, but right data. A small amount of valuable data is often better than an infinite amount of value-less data.

In the realm of cybersecurity, cyber deception can allow the collection of very valuable right data, which can then even be used to inform and train big data AI and machines to be more effective, in real time.

This is the premise of our cyber deception platform. We wanted to get away from the big data-only camp, so we developed a tool that creates environments that allow you to attract and then examine and measure threat actor behavior. The result is a small amount of data, yes. But that data is:

  • Specific to your organization
  • Delivered in real time
  • About a known source of malicious behavior
  • Detailed and complete

The data that comes out of our system identifies particular characteristics, a sequence of commands, particular uses of files and directory structures, and other markers that are the result of the bad behavior of threat actors.

The Value of Right Data

Using this tool allows you to gather information on the modus operandi of the threat actor and allow you to deflect them. Not only that, you now have data that can defend the rest of the organization, which has incredible value:

1) You can feed the info from the Cyber Deception Platform into other systems. A typical system, a SIEM, is a data lake of all security events. If you have found a unique identifier of adversary behavior, or even multiple identifiers such as a certain file and a certain directory structure, you can then direct the system to find anything that happened over a desired time period that corresponds to the identifiers, across every computer in the organization. Without right data, you wouldn’t have known what to look for

2) You can use it in real time. The attacks discovered by deception are typically happening in real time, so extra vigilance is key. With right data, you can make an alert and tell people in the security operations center this is happening

3) You can use the MO to defend yourself. For example, say the right data provided by the cyber deception platform shows the attacker always uses a particular Windows subysytem. You can reconfigure this Windows subsystem to defend itself against this specific attack

The best way to build a strong cybersecurity postures is to use right data to guide machine learning and AI, filtering out threat actor behavior from the big data created by all your organization’s security systems.

There will always be a need for curated, high-value data to guide big data systems and shape their view of what is bad. The data you can get from our system is the input data you need to teach your algorithms and machines. Despite the influx of marketing information around machine learning and AI attributes, it is obviously not the answer to all our collective cybersecurity ills. Just look at the headlines about breaches nowadays—these companies are all deploying systems that use AI, proof enough that big data alone cannot build a sound cybersecurity strategy. The Solar Winds megabreach interacted with a number of systems that used AI and didn’t get picked up until a technicality set off a manual alarm.

Use Right Data for a Baseline

Changing data, changing techniques, and changing circumstances means right data will always be an essential piece of the puzzle.

Just recently, the majority of the world’s workforce became remote workers. Many security systems became useless, as people’s behavior completely changed overnight. Changing user behavior is a prime example of why you can’t always rely on AI to predict criminal activity.

When planning a cybersecurity strategy, you should be sure that you have a sensible number of systems that use big data and others that provide right data. Right data can come from threat intel inputs as well as using deception tactics across all attack surfaces.

In the case of deception, right data can come from:

  • Deploying things that attract threat actors.
  • Choosing threat intel collecting campaigns that are specific to the kind of threat your organization faces.
  • Collecting telemetry from external networks and cloud systems that generally fall outside of the big data lake realm.
  • The alerts and notifications from these campaigns. They are infrequent, but when you do get them they are not false positives—your team should stop and pay attention.
  • Concrete campaigns designed to deliver IoCs and TTPS, with context. You’ll be presented with concrete pieces of digital observables, logs, hashes, activity trails, all of which are tagged with the MITRE TTP framework to show what kind of data you have.

With this data, you can send it to human operators or even automatically feed it into AI. That’s why a combination of right data and big data is the way to go. No more looking for a needle in a haystack—instead you have the tools in your hand you need to put modern data analysis tools to work for you and your organization’s cybersecurity.