It’s tough living on the edge

Red Hat

The past couple of years have seen computing pushed out to the edge of the network at an ever-faster pace. The details vary as there are many different “edges” depending upon what problem is being addressed. But the overall trend is clear. By 2023, over 50% of new enterprise IT infrastructure deployed will be at the edge rather than corporate datacenters, up from less than 10% today, according to IDC.

High volume data streaming in from IoT devices, which often must be quickly processed, filtered and acted upon, is one driver of edge computing. But there are an increasing number of other application areas, such as telco network functions, which are best optimized by placing service provisioning closer to users and devices.

None of this is a repudiation of cloud computing, but it does illustrate how assumptions that computing was on a path to wholesale centralization were simplistic at best. In practice, enterprise computing is highly heterogeneous, and organizations are mostly pursuing hybrid cloud approaches.

Although distributing pushes compute out to where data, users and devices live has advantages, it also introduces challenges relative to centralized computing. These fall into three general categories: architecture and technology, ongoing operations and security.

Architecture and technology

The scale of some edge deployments is such that they can use similar software used in datacenters. For example, OpenStack is popular among telco’s to create private clouds at the edge just as it is for creating private clouds in a more traditional on-premises environment.

However, even if some of the software stack is common, edge installations must take several unique considerations into account. For example, you can’t just flip a switch and add more servers from a central pool if more capacity is needed at the edge.

It’s important to plan for the needed compute, as well as storage, networking and any other hardware, up-front. Upgrading hundreds or even thousands on edge sites is an expensive undertaking. At the same time, the cost of over-provisioning those hundreds or thousands of sites adds up quickly too. The lesson here is that you must design deliberately.

As noted earlier; the edge can look very different depending upon the application. An appropriate architecture for hundreds of clusters with tens of servers each differs significantly from one another for thousands of smaller endpoints, much less one that’s made up of millions of individual edge computing devices.

Operations within large distributed systems

There are also practical issues related to operating a large distributed system. All those edge clusters might be installed in locations that don’t have an IT staff and might even be in places with no permanent human presence at all.

We need to account for the fact that this is a distributed system connected by potentially unreliable and throughput-constrained networks. How do we want an edge cluster to behave if it loses its connection to the datacenter? If disconnected operation makes sense, the system needs to be designed with that in mind.

We also need to deal with failures within the edge cluster itself. Failures are a normal expected event at scale. We must provide redundancy while also considering cost tradeoffs. Is it cheaper to install some extra hardware so that repairs can be made mostly on a slower-paced scheduled basis? Or are we better running leaner and treating failures as an urgent event?

Site management operations, such as deployments and upgrades, must be handled remotely and be fast, reliable and automated. Good monitoring and logging are required for centralized management to work at all. Effective analytics can also help to predict failures and thereby head off some problems before they occur.

Edge computing security

In some ways, security is a subset of operations in the context of edge computing, but it’s important enough that it’s worth calling out separately. I’ve written previously about IoT device security specifically, but edge computing as a whole also has some specific security challenges.

The scale of many edge computing installations means that the automation mentioned above must apply to security as well. Automating patching and security scanning is a good practice. But using automated tooling that enforces security policies and minimizes potential vulnerabilities at distributed sites is essential.

The edge has other unique considerations. In general, datacenters have established robust physical security practices around controlling access to the hardware, properly disposing of assets, such as disk drives that may contain sensitive information, and generally providing a highly engineered and controlled environment.

This is often not the case with edge clusters. Branch office and other remote systems have had to take these factors into account for a long time. However, edge locations might not even have the level of controls that a bank branch or satellite company office does. And the scale can be much greater.

Plan and plan again

One could argue that there are relatively few challenges that we see in edge computing that we don’t also see — to greater or lesser degrees — elsewhere. But that’s the rub: We see requirements for failure resiliency and automation everywhere. But dealing with them at the edge, where both distribution and scale are so great, can be especially challenging. For example, once a fix is identified, rolling it out to every edge cluster is probably a much more significant task with more failure modes than in the case of centralized infrastructure.

The above example further highlights that edge architectures need to be carefully planned. This includes up-front design work that considers the practical realities of a highly distributed system that exists largely outside of controlled datacenter environments. But it also includes deliberate planning for the on-going operations on the entire system, including provisioning, failure recovering, upgrades and security.

All IoT Agenda network contributors are responsible for the content and accuracy of their posts. Opinions are of the writers and do not necessarily convey the thoughts of IoT Agenda.