How to measure IoT scalability in the real world

February 15, 2022

Everyone loves a scalable product. When you purchase a new laptop, car, or movie-streaming service, scalability might be one of your criteria. For example, does your new laptop have the memory and RAM to handle your future work-related requirements? Does your automobile have enough seats to hold your growing family and all of its trappings? Does your movie-streaming service allow you to add accounts to your plan for your soon-to-be teenagers?

Scalability is specific to the use case you’re trying to address. This is certainly the case in IoT where scalability is highly use-case specific, says Steve Hilton, co-founder and president at IoT analysts MachNation. For example, the scale of a typical smart parking meter solution is markedly different from the scale of a smart factory robotics solution. And the technology and implementation used to support these end-to-end IoT use cases must be built and tested to handle their relative scales.

As a company that does end-to-end IoT performance testing, MachNation has to accurately define KPIs and conduct tests to measure scalability so enterprises with concerns about their IoT solutions’ growth can find bottlenecks in their use cases before their customers do.

Let’s look at two areas though there are many more where enterprises test for and validate IoT scalability. Along the way, we’ll discuss five IoT scalability KPIs and view some actual test results from end-to-end IoT performance testing using MachNation Tempest.

Metric #1: device scalability

Device scalability refers to the number of devices that an IoT solution is capable of supporting at some level of acceptable performance. Acceptable device scalability for an IoT platform depends on the types of use cases on that platform. For example, some platforms might claim to be able to support connected factory solutions that have 1500 connected IoT gateways connected to a bunch of PLCs. Other platforms might claim to be able to support smart vending-machine solutions that have 10,000 connected devices. While another type of platform might claim to be able to support LPWAN solutions that have 500,000 or more connected remote assets.

It’s essential that an enterprise’s end-to-end IoT solution can support its required level of device scalability. To ensure this, enterprises must test at least two major IoT workflows although there are clearly more.

Provisioning workflow. There has to be a way to securely and efficiently connect IoT devices to and register devices on an IoT platform. This involves the bi-directional sharing of metadata, data, and security credentials between devices and an IoT platform. Some high-quality IoT solutions are capable of provisioning 500 devices or more per second.

Over-the-air (OTA) update workflow. Keeping device firmware and software updated is required for virtually all IoT solutions. Having an automated and secure way to accomplish this workflow at scale is critical. Some high-quality IoT device management platforms are required by their customers to simultaneously run 100 OTA updates per second, the equivalent of 8.6 million updated devices in a 24-hour period. This might be necessary when patching an urgent security vulnerability.

So, let’s look at an example of scalability testing of the provisioning workflow. What’s an acceptable length of time to provision 500,000 LPWAN devices? A platform might be built to securely provision one device every 2 seconds. While this sounds fast, it would take over 11 days to provision all 500,000 devices and this assumes no failures or errors in the process. A similar OTA update for 500,000 devices might run for months on a poorly designed IoT platform, because the rate to perform a software update is much slower and the computational requirements much higher than device provisioning.

Enterprises must collect data about their IoT solution’s scalability in order to improve platform performance. And to be fair, it has been very difficult to collect this kind of granular data. However, using appropriate IoT performance testing tools, an enterprise can simulate 500,000 LPWAN devices and have them try to connect at varying rates from 1 per second to 1000 per second or more. During the test the enterprise would capture, analyse, and visualise the provisioning failure rates (i.e., the number of devices that failed to provision) and actual provisioning speed (i.e., how long it really took each device to connect). These metrics would allow the enterprise to find and quantify provisioning bottlenecks and help engineering teams improve device scalability for their IoT solutions.

Metric #2: message scalability

Message scalability refers to the number of messages traveling from IoT devices to an IoT platform at some level of performance. Acceptable message scalability also called the message ingestion rate can vary greatly based on the IoT use case. For example, message scalability for a connected street light solution will likely be much lower than message scalability for a connected wind turbine solution.

There are three measurements of message scalability for enterprise IoT solutions.

Messaging rate. The messaging rate is defined as the number of messages flowing from IoT devices to their IoT platform per unit of time. Typical IoT platforms might be built to handle upwards of 2000 messages per second. This means that messages are flowing from devices into an IoT platform at 2000 messages per second. That said, some platforms can burst to rates far exceeding 10,000 messages per second. To put that in perspective, if you had an IoT solution generating 10,000 messages per second each with a 8kb IoT message, you’d generate 6.9 gigabytes of data per day or 2.5 terabytes of data per year.

Message failure rate. The message failure rate is defined as the percentage of messages that leave IoT devices but never make it to the IoT platform. A message failure rate of 0.5% is often considered acceptable, although clearly there are IoT use cases where enterprises require a much lower failure rate.

Message latency. Message latency is defined as the number of milliseconds it takes a message to get from an IoT device to its IoT platform. A high quality IoT solution should have average message latency less than 100 milliseconds with similarly small variance in latencies.

To help an enterprise or solution provider improve message scalability, an enterprise should conduct IoT performance tests where large numbers of simulated IoT devices are sending typical IoT data to an IoT platform at 500 messages per second and linearly increase the message scalability through 2500 messages per second or more. MachNation recommends running the simulation for several hours or days, capturing all data on message failure rates and latencies for every second of the test, and visualising the test results on a set of dashboards. This allows an enterprise’s engineering team to find, quantify, and fix IoT solution bottlenecks and failure points, thereby increasing message scalability and solution quality.

Conclusion

Scalability is an intrinsic part of every IoT solution. And the only way to guarantee or improve scalability is to test and measure the various performance KPI for your particular IoT use case. Keep in mind that there are many more aspects to scalability, reliability, and performance that are specific to each IoT use case. We’ll cover more of these aspects soon.

The author is Steve Hilton, a co-founder and president at MachNation.

Comment on this article below or via Twitter: @IoTNow_OR @jcIoTnow