We’re not talking about the usual email authentication today, folks. The dream of the 90’s (web hosting) is alive at Fraudmarc. Enjoy a moment of Portlandia and then we’ll dive into our roots as a server company.

Moving bits through the net has been our jam for decades. Fraudmarc was spun out from a high-uptime, geographically redundant server and hosting company. It gives us an edge to operate 24×7 global infrastructure that processes significant messaging data.

This website is built on WordPress and, until recently, we happily relied on a top of the line and top dollar managed WP company. Our wp-admin interface has been plagued by slow performance for months despite the best efforts of their friendly tech support team. If I had to guess, they’ve fallen into the classic trap of over-provisioning infrastructure to squeak out higher margins in the increasingly competitive managed wordpress hosting space. I get it but no thanks.

Can’t handle slow, can’t get distracted

Hundreds of businesses delegate their email policy management to Fraudmarc so they can focus on doing what only they can do. They create their biggest value and leave the email to us.

In the same way, Fraudmarc is awesome at email authentication (SPF, DKIM, & DMARC) so we stay laser focused on delivering that to our clients. We cannot be distracted by running our own web hosting servers.. We do operate tons of other web infrastructure so the temptation is always there, especially when our old managed WP host started dropping the ball. Enter SpinupWP, a newer managed WP company that lets us provide the underlying servers while they do all of the WP heavy lifting. I’m happy to promote other startups that make the net better so there’s a $50 credit for you to try them at the bottom of this post.

We’re loving ARM processors these days and use them to power a lot of Fraudmarc’s API, DB, and DNS infrastructure but you’ve got to be careful because they’re sometimes much slower than trusty old x86 cpus.

WordPress is the only place Fraudmarc uses php so I set off to find existing arm64 php benchmarks. PHP on Arm64 from Amazon AWS was a good start but missed several important points. Their post was written before the release of php 8 and accidentally(?) skipped single-threaded benchmarks that show arm64 is 50% slower than x86. You’re reading this post via php on ARM and I’ll show you the reason.

Benchmark environment

We’ll benchmark with two official php scripts: bench and micro bench. Our testing will be performed on Ubuntu 20.04 on the x86_64 c5.large and the arm64 c6g.large instances. Each instance has two vCPUs and 4 GB memory. Php 7 is installed from the default Ubuntu repo and php 8 is from ondrej/php PPA. Here are the exact versions we used:

PHP 7.4.3 (cli) (built: Oct  6 2020 15:47:56) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
    with Zend OPcache v7.4.3, Copyright (c), by Zend Technologies
PHP 8.0.3 (cli) (built: Mar  5 2021 07:54:13) ( NTS )
Copyright (c) The PHP Group
Zend Engine v4.0.3, Copyright (c) Zend Technologies
    with Zend OPcache v8.0.3, Copyright (c), by Zend Technologies

Much like SPF Compression, this WP site makes heavy use of caching at hundreds of network edge locations. This means requests are served very quickly from locations close to visitors (or mail receiving servers in the case of SPF Compression) and continues to function even if the origin server suffers and outage. The servers we’re testing have 2 vCPUs which is plenty for our needs since the vast majority of requests are handled by the edge nodes without requiring any php to be executed on the origin server.

We’ll test the servers with 1-4 parallel benchmarks as this should represent a requested hardware utilization of 50%-200%. Clearly we cannot use more than 100% of the hardware (2 vCPUs) but the bursty nature of internet traffic does regularly queue up more work than a sever can handle. The AWS post linked above only tested at 100% requested hardware usage, a scenario that’s unlikely to ever occur in real life. I’m more interested in observing the unloaded and overloaded cases where the number of tasks don’t line up perfectly with the number of vCPUs. In particular, I want to see the multi-tasking penalty where the OS is juggling more tasks than the number of vCPUs.

Initial results

Average execution time (in seconds) over multiple runs. 2x, 3x, and 4x represent the number of concurrently launched benchmarks.

The green column reveals what AWS omitted: single-threaded php is 50% slower on arm64 “Graviton2” instances. Ouch. But that’s not the whole story.

The white column shows what happened when we executed two benchmarks at the same time. When fully utilizing an instance’s 2 vCPUs, arm was 20% faster. Arm continued to pull ahead with more concurrent benchmarks.

Micro_bench.php told a similar story:

Let’s visualize these results with the same column colors.

Our testing revealed that php 8 was faster than php 7 on the same hardware. The php team has delivered progressively better releases for decades. Super impressive! At hardware utilization levels above 50% we observed that arm64 was 17%-35% faster than x86_64. That matches the accidentally(?) naive claim from AWS. We also uncovered what they forgot(?) to mention: arm64 was 50% slower than x86_64 when system utilization was 50% or lower.

Analysis

I believe what we’re really observing here is the difference in Simultaneous multithreading and full cpu cores. Desktop suppliers and server companies tend to conflate treads and cores. The typical cloud vCPU is only a thread, not a core. For the sake of this comparison, think of two threads sharing a single cpu core. That’s approximately how it works in x86 environments. Arm servers are different. C6g is powered by the AWS Graviton2 arm64 processor without SMT so a vCPU represents an actual CPU core.

Php on arm64 was 50% slower than php on x86 when viewed core-for-core. I expect this to improve over time as AWS and other developers submit arm-related patches to php. Php on arm was already fast enough to add value. vCPU-for-vCPU, arm64 was faster and considerably cheaper than x86. AWS charges about 20% less per vCPU on arm64 compared to x86. Their arm64 vCPU represents a full core whereas their x86 vCPUs are threads sharing a processor core. Let me repeat that visually:

x86 (green arrow) was faster than arm (red arrow) core-for-core. Our test instances each had 2 vCPUs which meant only a single true vCPU core on x86 and two cores on arm. At concurrency level 1, the single x86 core runs php fastest.

Look to the right of the red arrow and you’ll notice where arm started to shine. Thanks to arm vCPUs representing full arm cpu cores, the two vCPU arm instance executed two simultaneous tests at full speed.

arm64 is faster vCPU-for-vCPU

Remember what happened when we threw 2 or more tasks at these 2 vCPU instance?

The green arrows show how the true cpu cores on arm outperform the shared (threaded) cores on x86.

Conclusion

In a sentence: this site now runs on arm 🙂

On the notoriously pricey AWS, arm vCPUs were 20% cheaper than x86 vCPUs plus a 2x vCPU over-provisioning is required to match the physical core count of the equivalent arm-based instance. Something feels very right about separating server selection from WP management, especially after a rough few months of low performance from a high dollar managed WP platform. The excellent Query Monitor plugin shows exactly how fast our admin pages load.

Our origin server is now executing WordPress (php) on an arm processor of our choosing but managed by true WP experts so we remain focused on email innovations like Universal SPF, a layer 2 solution that overcomes the DNS lookup limits and other nuances of SPF records. By adding the universal SPF string to the beginning of any SPF record, you can freely and instantly improve your email delivery rates.

If you’re in the ~40% of the net that’s powered by WordPress you might want to consider moving to arm because it’s not often one can improve performance while lowering costs. We didn’t find any managed WP hosts doing this but it’s only a matter of time. Whether they’ll pass the cost savings on to us end customers is anyones guess. We seem to be SpinupWP’s first customer to use arm86 instead of x86_64 but Ubuntu has excellent arm support so our only problem was that backups initially failed. Hopefully someone from SpinupWP will see this and update their setup script to install the architecture-appropriate version of rclone 😉 Here’s a referral link for $50 in SpinupWP credit in case you want to support another small startup that’s doing big things.