Recently I reflected on the realities of modern software development and their influence on performance engineering.
Modern software development makes much use of CI/CD (continuous integration and delivery) and feature flags. CI/CD is oriented towards deploying changes to production as fast as possible, minimizing the delay between merging the change and seeing the impact in production. Obviously, the only way to reliably implement CI/CD is to automate unit tests and any other verification one can think of (integration tests etc). However performance testing of any kind introduces a large issue: assuming Web UI characteristic response time is around 1 sec, to reach at least 99.9% level a developer needs to wait 1000 seconds, an unacceptably long time in the CD world. Of course, it’s possible to test over several threads but that creates its own issues in a service-oriented environment. As your testing environment is probably not scaled to the level of your production environment, tests run by different teams have high chances of impacting each other as merges are happening all the time. Queuing the tests and executing them sequentially slows you down and so defeats the purpose of CI/CD. Saving grace here is that changes deployed by CI/CD directly are typically on a smaller and less complex scale.
Really impactful changes are deployed with a Feature Flag (FF). Of course these also go through CI/CD but as the FF is typically off (or set to 0% traffic) until all the services/teams deploy their stuff there is initially no impact. Then the experiment begins: the FF is set to some (typically small) percentage and the traffic begins to flow. This would be a great time to start assessing not only business but the performance impact of the change. However the issue is that the code-generated metrics aren’t FF-aware out of the box.The metrics from the FF-affected traffic and the regular traffic are summarized together. This means that, until the traffic flowing through FF-protected features reaches a relatively high percentage level, there is very little chance to discover a performance issue.
Unless the issue is really glaring and there is a huge performance impact, of course.
For example, I’ve seen many page load metrics degrade even in a space of one month. The degradation is particularly visible if the page aggregates output of several components and back-end services. The nominal page owner isn’t aware of any changes and degradations because the metrics aren’t FF-sensitive and don’t show changes until it’s too late to roll it back.
This, in turn, leads to real difficulties in preventing performance degradation: if the feature is successful business-wise there is very little possibility it’ll be rolled back due to slightly decreased performance. At best, we’ll be playing a catch-up game with resources allocated to performance improvement after the fact. At worst, the application will live with a degraded performance until the days when this code has to be touched again for some reason.
That’s why it’s really helpful to make your metrics instrumentation aware of FF values. Ideally your instrumentation (like annotations in Java) will have an easy way for a developer to specify the Feature Flags the instrumentation will generate separate metrics for. There is a temptation to generate a separate metric for all active FF, but that might not be beneficial as you’d likely get a “metric explosion” with a really large number of metrics. Some of these metrics might not even have enough data in them to provide any value. It’s best to stick to FF the developers are really interested in. As a safety feature, if FF isn’t defined or off the metric generation will ignore it. Unfortunately this means you can’t use any of the existing instrumentation SDKs out of the box as that code has no link to your FF system. You’ll have to develop your own FF-sensitive wrappers and support them. However, developing in this way will help your code evolve: even if the company decides to change the way FF system works or change the metrics toolkit the user code across the company won’t need to change: just deploy a new version of the wrapper and you are good to go.
Performance engineering used to be a regular part of the Software Development LifeCycle. Its position has changed dramatically. Performance engineering is either barely done in production or not at all, or requires rather lengthy preparation in the modern world of service-oriented architecture, CI/CD, feature flags and blue/green deployments. This is not the way it has to be! My next post will be about returning performance engineering back to pre-production stages, while keeping CI/CD.