With the increasing sophistication of network breaches and malware attacks coupled with greater integration between IT and OT networks, there is a growing consensus that a move to a Zero Trust network architecture is warranted to mitigate threats, particularly against critical infrastructure, to mitigate.
Almost every major organization and the most sensitive government agencies are considering adopting a zero trust strategy, at least in some network domains, within the next few years. But zero trust remains an elusive goal and has proven to challenge existing infrastructure and cyber-physical systems.
It has often been said that companies cannot buy zero trust out of the box, that it is not a product or a technology. Think of zero trust as a security strategy that assumes hackers have already compromised devices and even internal systems cannot be trusted permanently by default. Network implementations that used to trust every device on the internal network must change their mindset to block all but the minimum required internal network communications, which can be both risky and complex to implement.
On a traditional network, if malware or ransomware infects a computer, it can likely scan the rest of the network to find other vulnerable applications, services, or ports. Ransomware could also use external servers to complete the breach, identifying and encrypting the most sensitive assets across the network. Network segmentation, such as a Purdue model in OT, might hamper this to some extent, but catastrophic damage was usually inevitable.
On the other hand, a Zero Trust architecture can greatly limit the extent to which malware or perimeter network breaches can spread and limit the damage. The challenge for security teams: They have to make devices connect to at least some other devices, otherwise there really is no network and everything collapses. We must make the policy of allowing trust only when it is necessary and subject to certain conditions or context. But how can we determine that efficiently?
Challenges for Zero Trust
There are a number of technical approaches to restricting network connectivity and implementing zero trust policies by default, including more granular segmentation or micro-segmentation of the network. These may require endpoint agents or edge firewalls that make deployment in a mission-critical OT process almost impossible. Especially when the sudden blocking of an internal connection disrupted an important process or recovery scenario.
A Zero Trust approach also relies on Explicit a priori Authorization to connect which are really not feasible as these situations are impossible to enumerate fully in complex and mission critical environments and legitimate connections end up failing.
In order for zero trust to work in cyber-physical environments and industrial processes/devices, we need to incorporate context and intelligence into the policy decision. To make better quality connectivity decisions, security teams need to understand what they are trying to protect and what processes and applications need to be running.
Understanding these Zero Trust policies isn’t just about MAC and IP addresses or ports. We want to know what type of devices are being used, their expected behavior, and what hardware and software are being used. It’s about knowing how the whole OT environment behaves. Which machines talk to which other machines? With what protocol? What payload will be exchanged? At what frequency? Can we allow minimal connection with the protocols we expect to known machines at certain points in the process based on observed previous behavior, but restrict all other unexpected communications?
We need to make these contextual decisions in real-time, with a deeper understanding of the industrial process, the devices involved, unique OT protocols, established communication patterns, and more. This type of understanding implies an AI/ML-based system that can understand or learn some of these contextual patterns and apply decisions accordingly.
Building on this concept, we propose the following three steps to achieve the goals of Zero Trust policy without affecting industrial processes:
- Leverage knowledge of an asset’s cyber hygiene to enable connectivity: Grant resources access to resources based on their security hygiene. If no patches are applied, no antivirus is installed on the endpoint, or has outdated signatures, prompt the user to update the application before proceeding. When thinking about OT environments, it is quite common for automation vendors to validate Windows updates before allowing them to be installed on HMIs. A pragmatic zero-trust deployment should take this into account and allow some patching gap.
- Data diodes can help enforce the Zero Trust principle: Use data diodes as a convenient way to implement one-way trust and protect the most critical assets. They require sensitive resources to initiate communication or to send data in only one direction to preprocessors or IT applications. This can limit exposure to critical assets and data and achieve goals without disrupting processes.
- Practice limited proactive enforcement of monitoring and alerts: There is no doubt that zero trust can disrupt established networks and processes. Security teams don’t want to enforce new policies without understanding the implications, and they may not have the time or resources to test policies or take the network offline for full research. Passively observe network traffic and compare it to defined policies and alerts or log exceptions for a specific period of time to reduce interruptions.
Think of zero trust as a journey. These implementations must embark on a path of increasing sophistication, definition and enforcement, particularly for OT and industrial process environments. Try to create connectivity and security policies in these environments with more context than typical IT applications in virtual data centers. It’s the best way to achieve desired security goals with minimal disruption and reasonable project costs.
Moreno Carullo, Co-Founder and CTO, Nozomi Networks