CraftCMS and PHP Tuning for Heavy Traffic

Nevin Morgan, Senior DevOps Engineer

Article Categories: #Code, #Back-end Engineering, #Performance

Posted on

Tuning your CraftCMS/PHP application can be tough. This guide will help you ensure that your platform can handle even the heaviest of loads.

With the recent launch of this year’s NFLPA team report card, we had the opportunity to help their team prep and tune their platform for what we expected to be a deluge of visitors. With over 10.8 million requests in the first 24 hours and a peak of 140K+ a minute, all that preparation was well worth it. We wanted to share what we learned from that experience to help you make sure that your platform can perform at its absolute best — and maybe even survive your largest traffic spikes. If you want to learn more about our work with NFLPA, you can read all about it in the case study.

It’s important to note that if you are using a managed hosting platform, you won’t have access to all the controls we talk about here. We typically configure infrastructure directly on one of the major cloud providers (we prefer AWS) so that we have the flexibility to tune the platform holistically.

Testing Your Platform

When you begin tuning a system for a large amount of traffic, it helps to have an idea of what sorts of traffic levels the site normally sees. Our recommendation is to collect some analytics over time through either a tool like Google Analytics or plain old web server logs. This traffic data will give you a baseline to create some testing, which will be critical to validate that your changes led to improvements.

Our tool of choice for load testing a platform is K6 by Grafana Labs because it gives us a performant and elegant way to stress test a setup. One important note here is that you should not be doing a stress test against your production environment. Always do those tests on a replica because you might find out that under a heavy load, you can crash your platform. We recommend targeting 2 - 3x your average traffic as your baseline; in an ideal world, 2x your highest traffic spike in the past two years. This target will give you a decent margin of safety to ensure that your platform will survive an unexpected spike in traffic and give you time to react.

Once you have your baseline and a test written out, you will want to run that test a few times and record your average results so that you have a set of metrics to compare your changes to. If you don’t have some metrics for comparison, you can never really know if your changes made things better or worse.

A testing script might look something like this:

import { sleep, group } from 'k6'
import http from 'k6/http'
import { randomIntBetween } from 'https://jslib.k6.io/k6-utils/1.2.0/index.js';

export const options = {
  // Key configurations for avg load test in this section
  stages: [
    // traffic ramp-up from 1 to 1000 users over 5 minutes
    { duration: '5m', target: 1000 },
     // stay at 1000 users for 30 minutes
    { duration: '30m', target: 1000 },
    // ramp-down to 0 users
    { duration: '5m', target: 0 },
  ],
};

export default function main() {
  let response
  group('visit homepage', function() {
    response = http.get('https://example.com')
  })

  // emulate the average time a user might stay on a page
  sleep(randomIntBetween(10, 52))

  group('visit page 1', function() {
    response = http.get('https://example.com/page-1')
  })

  sleep(randomIntBetween(10, 52))

  group('visit page 2', function() {
    response = http.get('https://example.com/page-2')
  })

  sleep(randomIntBetween(10, 52))

  group('visit page 3', function() {
    response = http.get('https://example.com/page-3')
  })

  sleep(randomIntBetween(10, 52))
  // ... other pages in a typical user flow
}

Tuning your platform #

Now that you have metrics for comparison, you can start looking to tune the platform and evaluate what constraints you are operating within.

Caching #

Whenever possible, we recommend utilizing something like Varnish for full-page caching, or Redis/Valkey to provide a caching layer to your application. A well-configured cache will drastically reduce the load on your server and allow you to serve more users with the same resources. You will also want to make sure that assets are being served from a CDN like CloudFront or CloudFlare so that your web server isn’t handling all of that additional traffic as well. While not directly related to tuning your platform, caching can be a critical piece of the performance puzzle.

PHP-FPM tuning #

If you aren’t using PHP-FPM, we would recommend you start by reconfiguring the platform.

When it comes to CraftCMS, WordPress, and Laravel, a sneaky thing that can surprise you is that, if you don’t tune PHP-FPM and just throw more CPU and RAM at a server, it won’t improve your ability to handle more traffic.

To start tuning PHP-FPM, you first need to know how much memory an average process consumes, or the maximum memory limit you have configured for the platform. For most CraftCMS based platforms, 512 MB is a good number.

You can use the Process Status (ps) command in Linux to get an idea of actual usage (but this may not always be possible unless they use very stable repeating memory amounts). To do that, you will want to sort by RSS (Resident Set Size) to show the process with the highest memory usage first ps -elyC php-fpm --sort:rss. The name “php-fpm” should match your process name and might need to be changed to the actual process name like “php8.3-fpm”, and the RSS column will contain the average memory usage in kilobytes for each process. To find the php process name, you can run ps -ely | grep php.

Once you have your number for memory usage per process, you can begin to do the math and figure out what your settings should be. You will want to ensure that you leave enough reserved RAM out of your calculations to ensure that other services and the operating system have enough available RAM (generally 1 - 2GB for Ubuntu).

{Total Available RAM} - {Reserved RAM} - {10% Buffer} = {RAM Available to PHP}
{RAM Available to PHP} / {Average Process Size} = {max_children}

Resulting PHP-FPM settings:
pm.max_children={max_children}
pm.start_servers={25% of max_children}
pm.min_spare_servers={25% of max_children}
pm.max_spare_servers={75% of max_children}

Resource Tuning

Once you have PHP-FPM tuned to make full use of your currently available resources, it is time to start stress-testing the environment. During this time, you will want to keep an eye on your system monitoring to see the sorts of CPU and RAM usage levels that are recorded. If you need some help with monitoring, see our article on the subject in our Maintenance Matters series. Generally, you want no more than 70% resource usage on your servers, services, or databases under that peak load. If any one of them shows either 100% CPU or RAM usage, you should increase the available resources for that component or adjust the scaling policies until you achieve a 70% peak. As you do this, update the PHP-FPM settings to make the best use of the available resources.

While you tune the platform, keep an eye on the number of failed requests. Sometimes, depending on your business SLA/KPIs, it might be an acceptable tradeoff to have a small portion of your requests fail as your scaling policies add in more resources — rather than have a high constant operational cost of overhead — to handle an aggressive spike in traffic. Oftentimes, it is better for the business to have a few users experience a momentary blip than waste money to have that overhead. A good way to achieve balance is to use containerization to allow you to add in capacity far faster than traditional auto-scaling group would otherwise manage.

Ongoing Maintenance

Once you have your platform tuned up and humming along you will want to regularly test that things are still working well. You can have regressions due to application changes, or security updates at any time. It is also important to make sure that the peak load test itself is updated to match your traffic levels as usage patterns change.

In general, it is best to run a smaller scale test (10 - 20 concurrent users) monthly to gauge any performance regressions and the peak load test yearly to make sure that the platform can always handle a hefty spike without an outage.

Nevin Morgan

Nevin is a senior devops engineer working remotely from Ohio. He specializes in automating and codifying the infrastructure and platforms our teams and clients depend upon.

More articles by Nevin

Related Articles