Machine learning is cybersecurity’s latest pipe dream – Part 1

Simon Crosby, CTO and co-founder, Bromium

A recurring claim at security conferences is that “security is a big data / machine learning (ML) / artificial intelligence (AI) problem”. This is unfortunately wildly optimistic, and wrong in general. While certain security problems can be addressed by ML/AI algorithms, in general the problem of detecting a malicious actor amidst the vast trove of information collected by most organisations, is not one of them.

Our faith in AI is based on personal experience (“everything cloud is big data and good”) and the memes of the consumerisation era. It is tempting to project this optimism into an enterprise context: The idea that it ought to be possible to sift through large amounts of data to find signs of an attack of breach is intuitively reasonable. Moreover, every IT pro managing systems at scale is aware of the value of sophisticated tools that help them to pick through large volumes of data to find relevant information to aid trouble shooting and even security investigations says, Simon Crosby, CTO and co-founder, Bromium.

First generation tools such as SIEM systems (Security Information and Event Management) gave security teams a new way of correlating events and triaging large volumes of noisy data. Solutions emerged that borrowed the approach of Google to index logs and alerts, adding powerful search and manipulation capabilities to allow teams to be ever more effective in finding faults and security violations. These tools have helped security teams enormously, but still leave two challenges: vast amounts of data of unknown value that we don’t know when to discard; and the nagging worry that the security team may have missed a needle somewhere within the haystack – that is, a concern that the algorithms may be imperfect and miss the bad guy anyway.

Is machine learning the answer to the security problem? Again, this is an imprecise statement of the problem and the potential set of solutions. In this piece I want to focus on attack detection. In this domain we need to ask two questions: Can an algorithm reliably find the needle in the haystack (the tiny differences from “normal” behavior that might be indicative of an attack)? Second, can such an algorithm increase our confidence in the absence of an attack – effectively enabling us to be sure that there would be no loss if we discard the haystack of data representing the organisation’s normal activity? AI and ML are broadly viewed as magical technologies that will transform human experience. It’s an enormously seductive idea. We’ve all experienced the power of machine learning systems in Google search, the recommendation engines of Amazon and Netflix and the powerful spam filtering capabilities of web-mail providers such as G-Mail and Outlook. Former Symantec CTO Amit Mital once said that machine learning offers of the “few beacons of hope in this mess.”

But it’s important not to succumb to hubris. Google’s fabled ability to identify flu epidemics turned out to be woefully inaccurate, and while the domain of cyber security is characterised by weak signals and a huge number of variables to track, intelligent actors have a large attack surface to exploit. Unfortunately, there is no guarantee that using ML/AI will leave you much better off than before – which is relying on skilled experts to do the hard work. Unfortunately that has yet to stop the marketing spin.

The author of this blog is Simon Crosby, CTO and co-founder, Bromium.

Comment on this article below or via Twitter: @IoTNow_ OR @jcIoTnow