Optimize NGFW Performance with Intel® Xeon® Processors on Public Cloud

Authors: Xiang Wang, Jayprakash Patidar, Declan Doherty, Eric Jones, Subhiksha Ravisundar, Heqing Zhu

1 Introduction

Next-generation firewalls (NGFWs) are central to network security solutions. Unlike traditional firewalls that rely on port and protocol inspection, NGFWs offer advanced deep packet inspection capabilities, including intrusion detection/prevention systems (IDS/IPS), malware detection, and application identification and control, to defend against modern threats.

NGFWs are compute-intensive, handling tasks like cryptographic operations for traffic encryption/decryption and complex rule matching. Intel processors provide key technologies to optimize NGFW performance, such as Intel® Advanced Encryption Standard New Instructions (Intel® AES-NI) and Intel® QuickAssist Technology (Intel® QAT) for accelerated crypto performance. Intel also focuses on software optimizations, notably with Hyperscan, a high-performance string and regular expression (regex) matching library that leverages Single Instruction Multiple Data (SIMD) technology to boost pattern-matching. Integrating Hyperscan into NGFW IPS systems like Snort can yield performance improvements of up to 3x on Intel processors.

NGFWs are commonly deployed as security appliances in enterprise data centers. However, there is a growing demand for virtual NGFW appliances or software packages deployable on public clouds, enterprise data centers, or network edges. This software-centric approach reduces the operational overhead of physical appliances, enhances system scalability, and offers flexible procurement options.

The adoption of public cloud NGFW deployments is increasing due to cost advantages. However, selecting the optimal cloud instance type for NGFW, considering the wide variety of CPU, memory, and I/O characteristics, as well as pricing, can be challenging. This paper introduces an NGFW reference implementation from Intel, optimized with Intel technologies including Hyperscan, to provide a reliable benchmark for NGFW performance on Intel platforms. This implementation is part of Intel's NetSec Reference Software package, which also includes the Multi-Cloud Networking Automation Tool (MCNAT) to automate deployment on select public cloud providers and simplify TCO analysis for instance selection.

For more information on the NetSec Reference Software package, please contact the authors.

2 Background and Motivation

Most NGFW vendors are extending their offerings from physical appliances to virtual solutions deployable on public clouds. Public cloud NGFW deployments are gaining traction due to several benefits:

Scalability: Enables easy scaling of compute resources across geographies to meet performance demands.
Cost effectiveness: Offers flexible pay-per-use subscriptions, eliminating capital expenditure and reducing operational costs associated with physical appliances.
Native integration with cloud services: Provides seamless integration with public cloud services like networking, access controls, and AI/ML tools.
Cloud workloads protection: Facilitates local traffic filtering for enterprise workloads hosted on public clouds.

The reduced cost of running NGFW workloads in the public cloud is attractive for enterprises. However, selecting the instance with the best performance and Total Cost of Ownership (TCO) for NGFW is complex due to the vast array of cloud instance options with varying compute characteristics, pricing, and I/O bandwidth. Intel has developed the NGFW Reference Implementation to aid in performance and TCO analysis for different public cloud instances based on Intel processors. This guide demonstrates performance and performance-per-dollar metrics to assist in choosing the right Intel-based instances for NGFW solutions on public cloud services such as AWS and GCP.

3 NGFW Reference Implementation

Intel's NetSec Reference Software package (latest release 25.05) provides optimized reference solutions leveraging Intel CPU instruction set architectures (ISAs) and accelerators to showcase performance on both on-premise and cloud environments. The reference software is available under the Intel Proprietary License (IPL).

Key highlights of the NetSec Reference Software package include:

A comprehensive suite of reference solutions for networking, security, and AI frameworks for cloud, enterprise data centers, and edge locations.
Facilitates rapid adoption of Intel technologies and reduces time to market.
Provides source code for replicating deployment scenarios and testing environments on Intel platforms.

The NGFW reference implementation, a core component of the NetSec Reference Software package, drives NGFW performance characteristics and TCO analysis on Intel platforms. It integrates Intel technologies like Hyperscan, forming a robust foundation for NGFW analysis. Given that different Intel hardware platforms offer varying compute and I/O capabilities, the NGFW reference implementation offers a clear view of platform strengths and facilitates performance comparisons across Intel processor generations. It provides insights into compute performance, memory bandwidth, I/O bandwidth, and power consumption, enabling TCO analysis based on performance-per-dollar metrics.

The latest release (25.05) of the NGFW reference implementation includes:

Basic stateful firewall functionality.
Intrusion Prevention System (IPS) capabilities.
Support for advanced Intel processors, including Intel® Xeon® 6 processors and Intel Xeon 6 SoC.

Future releases are planned to include:

VPN inspection: IPsec decryption for content inspection.
TLS inspection: A TLS Proxy to terminate client-server connections for plaintext traffic inspection.

3.1 System Architecture

The system architecture leverages open-source software. VPP (Vector Packet Processing) provides a high-performance data plane with stateful firewall functions, including Access Control Lists (ACLs). Multiple VPP threads are spawned with configured core affinity, with each VPP worker thread pinned to a dedicated CPU core or execution thread.

Snort 3 is selected for IPS, supporting multi-threading with worker threads pinned to dedicated CPU cores or execution threads. Snort and VPP are integrated via a Snort plugin, utilizing queue pairs for packet exchange. Packets are stored in shared memory. A new Data Acquisition (DAQ) component for Snort, the VPP Zero Copy (ZC) DAQ, implements Snort's DAQ API functions to receive and transmit packets by reading from and writing to these queues, effectively achieving zero-copy data transfer.

Since Snort 3 is compute-intensive, careful allocation of processor cores and balancing of VPP and Snort3 threads are crucial for achieving optimal system-level performance. The architecture includes new VPP graph nodes: snort-enq for load-balancing packets to Snort threads, and snort-deq for polling packets from queues, one per Snort worker thread.

Figure 1: NGFW Reference Architecture. Depicts a system architecture with VPP, Snort, and network interfaces for Home and External networks.

3.2 Intel Optimizations

The NGFW reference implementation utilizes several Intel optimizations:

Hyperscan: This high-performance multiple regex matching library significantly boosts Snort's performance compared to its default search engine. Figure 3 illustrates Hyperscan's integration with Snort for accelerated literal and regex matching. Snort 3 offers native Hyperscan integration, which can be enabled via configuration files or command-line options.
Receive Side Scaling (RSS): VPP leverages RSS in Intel® Ethernet Network Adapters to distribute traffic across multiple VPP worker threads.
Intel® QAT and Intel AVX-512 instructions: Future releases supporting IPsec and TLS will incorporate Intel's crypto acceleration technologies. Intel QAT accelerates crypto performance, particularly public key cryptography used in network connection establishment. Intel AVX-512 instructions enhance cryptographic performance, including VPMADD52 (multiply and accumulation), vector AES (for Intel AES-NI), vPCLMUL (vectorized carry-less multiply for AES-GCM), and Intel® Secure Hash Algorithm - New Instructions (Intel® SHA-NI).

Figure 2: NGFW Reference Architecture with VPP Graph Nodes. Illustrates the VPP graph nodes including snort-enq and snort-deq.

Figure 3: Snort with Hyperscan Integration. Shows how Hyperscan integrates with Snort for enhanced regex matching.

4 Cloud Deployment of NGFW Reference Implementation

4.1 System Configuration

The focus is on cleartext inspection scenarios, aligning with use cases and KPIs defined in RFC9411. The traffic generator simulates 64KB HTTP transactions with one GET request per connection. ACLs are configured to permit traffic within specified subnets. The benchmarking utilizes the Snort Lightspd ruleset and Cisco's security policy. A dedicated server is included to handle requests from traffic generators.

Table 3. Test configurations
Metric	Value
Use Case	Cleartext Inspection (FW + IPS)
Traffic Profile	HTTP 64KB GET (1 GET per Connection)
VPP ACLs	Yes (2 stateful ACLs)
Snort Rules	Lightspd (~49k rules)
Snort Policy	Security (~21k rules enabled)

Figure 4: AWS System Topology. A diagram showing the network setup for NGFW deployment on AWS.

Figure 5: GCP System Topology. A diagram showing the network setup for NGFW deployment on GCP.

The system topology for public cloud deployment includes three primary instance nodes: a client, a server, and a proxy. A bastion node is also present for user connections. Both the client (running WRK) and server (running Nginx) feature a single dedicated data-plane network interface. The proxy (running NGFW) has two data-plane interfaces for testing. These interfaces connect to dedicated subnets (subnet A for client-proxy and subnet B for proxy-server) to isolate them from management traffic. Dedicated IP address ranges, routing, and ACL rules are configured to manage traffic flow.

4.2 System Deployment

Intel's Multi-Cloud Networking Automation Tool (MCNAT) is a software tool designed to automate networking workload deployments on public clouds and assist in selecting the best cloud instance based on performance and cost. MCNAT uses profiles to define instance-specific variables and settings. These profiles can be passed to the MCNAT CLI tool for deploying specific instance types on a chosen Cloud Service Provider (CSP).

An example command line usage is provided:

# ./mcnat.py --deploy -u user -c aws.oregon -s ngfw-intel -p c7i-xlarge

Table 4. MCNAT Command Line Usage
Option	Description
--deploy	Instructs the tool to create a new deployment
-u	Defines which user credentials to use
-c	CSP to create deployment on (AWS, GCP, etc)
-s	Scenario to deploy
-p	Profile to use

The MCNAT command line tool enables single-step instance building and deployment. Post-deployment, necessary SSH configurations are created to allow instance access.

4.3 System Benchmarking

After MCNAT deploys the instances, performance tests can be executed using the MCNAT application toolkit. Test cases are configured in tools/mcn/applications/configurations/ngfw-intel/ngfw-intel.json, specifying parameters such as throughput test, packet size, and Hyperscan enablement.

The following command can be used to launch a test:

# python -m mcn_application_toolkit run --deployment-path DEPLOYMENT_PATH -f configurations/ngfw-intel/ngfw-intel.json ngfw-intel

This command runs the NGFW with a defined set of rules against HTTP traffic generated by WRK on the client, pinning specific CPU cores to gather comprehensive performance data for the instance under test. Upon completion, test data is formatted as a CSV file and returned to the user.

5 Performance and Cost Evaluation

This section compares NGFW deployments on various cloud instances powered by Intel Xeon processors across AWS and GCP, providing guidance for selecting the most suitable cloud instance type for NGFW based on performance and cost. Instances with 4 vCPUs are chosen, as recommended by most NGFW vendors. The results for AWS and GCP include:

NGFW performance on small instance types featuring 4 vCPUs with Intel® Hyper-Threading Technology (Intel® HT Technology) and Hyperscan enabled.
Generation-to-generation performance gains observed from 1st Gen Intel Xeon Scalable processors to 5th Gen Intel Xeon Scalable processors.
Generation-to-generation performance-per-dollar gains from 1st Gen Intel Xeon Scalable processors to 5th Gen Intel Xeon Scalable processors.

5.1 AWS Deployment

5.1.1 Instance Type List

Table 5. AWS Instances and On-demand Hour Rates
Instance Type	CPU Model	VCPU	Memory (GB)	Network performance (Gbps)	On-demand hourly rate ($)
c5-xlarge	2nd Gen Intel® Xeon® Scalable processors	4	8	10	0.17
c5n-xlarge	1st Gen Intel® Xeon® Scalable processors	4	10.5	25	0.216
c6i-xlarge	3rd Gen Intel® Xeon® Scalable processors	4	8	12.5	0.17
c6in-xlarge	3rd Gen Intel Xeon Scalable processors	4	8	30	0.2268
c7i-xlarge	4th Gen Intel® Xeon® Scalable processors	4	8	12.5	0.1785

Table 5 provides an overview of the AWS instances used. For more platform details, refer to Platform Configuration. It also lists the on-demand hourly rate (https://aws.amazon.com/ec2/pricing/on-demand/). The rates presented are from the time of publication, focusing on the US west coast, and may vary by region, availability, corporate accounts, and other factors.

5.1.2 Results

Figure 6: NGFW Reference Software Performance and Performance per Dollar on AWS. Bar chart showing throughput and performance per dollar for various AWS instance types.

Figure 6 compares performance and performance per hour rate across the listed instance types. Key observations include:

Performance improves with instances based on newer generations of Intel Xeon processors. For example, upgrading from c5.xlarge (2nd Gen Intel Xeon Scalable processor) to c7i.xlarge (4th Gen Intel Xeon Scalable processor) shows a 1.97x performance improvement.
Performance per dollar also improves with newer generations of Intel Xeon processors. An upgrade from c5n.xlarge (1st Gen Intel Xeon Scalable processor) to c7i.xlarge (4th Gen Intel Xeon Scalable processor) demonstrates a 1.88x improvement in performance per hour rate.

5.2 GCP Deployment

5.2.1 Instance Type List

Table 6. GCP Instances and On-demand Hour Rates
Instance Type	CPU Model	VCPU	Memory (GB)	Default egress bandwidth (Gbps)	On-demand hourly rate ($)
n1-std-4	1st Gen Intel® Xeon® Scalable processors	4	15	10	0.189999
n2-std-4	3rd Gen Intel® Xeon® Scalable processors	4	16	10	0.194236
c3-std-4	4th Gen Intel® Xeon® Scalable processors	4	16	23	0.201608
n4-std-4	5th Gen Intel® Xeon® Scalable processors	4	16	10	0.189544
c4-std-4	5th Gen Intel® Xeon® Scalable processors	4	15	23	0.23761913

Table 6 provides an overview of the GCP instances used. For more platform details, refer to Platform Configuration. It also lists the on-demand hourly rate (https://cloud.google.com/compute/vm-instance-pricing?hl=en). The rates presented are from the time of publication, focusing on the US west coast, and may vary by region, availability, corporate accounts, and other factors.

5.2.2 Results

Figure 7: NGFW Reference Software Performance and Performance per Dollar on GCP. Bar chart showing throughput and performance per dollar for various GCP instance types.

Figure 7 compares performance and performance per hour rate across the listed instance types. Key observations include:

Performance improves with instances based on newer generations of Intel Xeon processors. Upgrading from n1-std-4 (1st Gen Intel Xeon Scalable processor) to c4-std-4 (5th Gen Intel Xeon Scalable processor) shows a 2.68x performance improvement.
Performance per dollar also improves with newer generations of Intel Xeon processors. An upgrade from n1-std-4 (1st Gen Intel Xeon Scalable processor) to c4-std-4 (5th Gen Intel Xeon Scalable processor) demonstrates a 2.15x improvement in performance per hour rate.

6 Summary

With the increasing adoption of multi- and hybrid-cloud deployment models, delivering NGFW solutions on public clouds offers consistent protection across environments, scalability for security requirements, and simplicity with minimal maintenance. Network security vendors provide NGFW solutions utilizing various cloud instance types. Minimizing Total Cost of Ownership (TCO) and maximizing Return on Investment (ROI) are critical when selecting the right cloud instance, considering compute resources, network bandwidth, and price.

This evaluation used the NGFW reference implementation as a representative workload and leveraged MCNAT to automate deployment and testing across different public cloud instance types. Benchmarking results indicate that instances with the latest generation Intel Xeon Scalable processors on AWS (powered by 4th Gen Intel Xeon Scalable processors) and GCP (powered by 5th Gen Intel Xeon Scalable processors) deliver significant performance and TCO improvements. These instances show performance increases of up to 2.68x and performance-per-hour rate improvements of up to 2.15x over previous generations. This evaluation provides valuable references for selecting Intel-based public cloud instances for NGFW deployments.

Appendix A Platform Configuration

Detailed platform configurations for the tested instances:

c5-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz, 2 cores, HT On, Turbo On, Total Memory 8GB (1x8GB DDR4 2933 MT/s [Unknown]), BIOS 1.0, microcode 0x5003801, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

c5n-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz, 2 cores, HT On, Turbo On, Total Memory 10.5GB (1x10.5GB DDR4 2933 MT/s [Unknown]), BIOS 1.0, microcode 0x2007006, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

c6i-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz, 2 cores, HT On, Turbo On, Total Memory 8GB (1x8GB DDR4 3200 MT/s [Unknown]), BIOS 1.0, microcode 0xd0003f6, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

c6in-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz, 2 cores, HT On, Turbo On, Total Memory 8GB (1x8GB DDR4 3200 MT/s [Unknown]), BIOS 1.0, microcode 0xd0003f6, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

c7i-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz, 2 cores, HT On, Turbo On, Total Memory 8GB (1x8GB DDR4 4800 MT/s [Unknown]), BIOS 1.0, microcode 0x2b000620, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

n1-std-4 – "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) CPU @ 2.00GHz, 2 cores, HT On, Turbo On, Total Memory 15GB (1x15GB RAM []), BIOS Google, microcode 0xffffffff, 1x device, 1x 32G PersistentDisk, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

n2-std-4 - Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) CPU @ 2.60GHz, 2 cores, HT On, Turbo On, Total Memory 16GB (1x16GB RAM []), BIOS Google, microcode 0xffffffff, 1x device, 1x 32G PersistentDisk, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

c3-std-4 - Test by Intel as of 03/14/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz @ 2.60GHz, 2 cores, HT On, Turbo On, Total Memory 16GB (1x16GB RAM []), BIOS Google, microcode 0xffffffff, 1x Compute Engine Virtual Ethernet [gVNIC], 1x 32G nvme_card-pd, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1″

n4-std-4 - Test by Intel as of 03/18/25. 1-node, 1x Intel(R) Xeon(R) PLATINUM 8581C CPU @ 2.10GHz, 2 cores, HT On, Turbo On, Total Memory 16GB (1x16GB RAM []), BIOS Google, microcode 0xffffffff, 1x Compute Engine Virtual Ethernet [gVNIC], 1x 32G nvme_card-pd, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

c4-std-4 - Test by Intel as of 03/18/25. 1-node, 1x Intel(R) Xeon(R) PLATINUM 8581C CPU @ 2.30GHz, 2 cores, HT On, Turbo On, Total Memory 15GB (1x15GB RAM []), BIOS Google, microcode 0xffffffff, 1x Compute Engine Virtual Ethernet [gVNIC], 1x 32G nvme_card-pd, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

Appendix B Intel NGFW Reference Software Configuration

Software Configuration and Version
Software Configuration	Software Version
Host OS	Ubuntu 22.04 LTS
Kernel	6.8.0-1025
Compiler	GCC 11.4.0
WRK	74eb9437
WRK2	44a94c17
VPP	24.02
Snort	3.1.36.0
DAQ	3.0.9
LuaJIT	2.1.0-beta3
Libpcap	1.10.1
PCRE	8.45
ZLIB	1.2.11
Hyperscan	5.6.1
LZMA	5.2.5
NGINX	1.22.1
DPDK	23.11

	Intel Xeon CPU Support List for C621-WD12 Motherboard Find detailed specifications for Intel Xeon Platinum, Gold, Silver, and Bronze processors compatible with the C621-WD12 motherboard, including SKYLAKE-S and Cascade Lake series.
	Intel Product Change Notification 853587-00: Boxed Processor Updates Notification regarding updates to Intel Boxed Processor manuals, Single Point of Contact (SPoC) details, and China RoHS compliance tables, affecting various Intel Core and Xeon processors.
	Intel Xeon Processor E3-1200 v5 Product Family: Enabling Better Designs and Faster Time to Market Product brief detailing the Intel Xeon Processor E3-1200 v5 product family, highlighting its workstation-class performance, reliability, and security features for professionals in design, media, and entertainment.
	8th and 9th Generation Intel® Core™ and Xeon® E Processor Families Datasheet Comprehensive datasheet detailing the architecture, features, technologies, power management, and specifications for Intel's 8th and 9th Generation Core™ and Xeon® E processor families. Covers performance, interfaces, and system integration aspects.
	Intel Core Ultra Processors (PS Series) Datasheet Detailed technical datasheet for Intel Core Ultra Processors (PS Series). Covers processor architecture, features, power management, security, virtualization, graphics, NPU, audio, and connectivity for OEM/ODM product development.
	Intel Performance Optimizations for Deep Learning Explore Intel's performance optimizations for deep learning, covering the oneAPI ecosystem, oneDNN library, and TensorFlow optimizations. Learn how to accelerate AI and data analytics pipelines with Intel hardware and software solutions.
	Intel oneAPI Base Toolkit Optimizes SonoScape S-Fetus 4.0 Obstetric Screening Assistant Performance Learn how SonoScape leveraged the Intel oneAPI Base Toolkit to enhance the performance and efficiency of its S-Fetus 4.0 Obstetric Screening Assistant, achieving up to 20x improvement through cross-architecture optimization.
	Edge Insights for Fleet: Developer Guide and Installation Explore Intel's Edge Insights for Fleet with this developer guide, covering installation, configuration, and deployment for commercial vehicle fleet video analytics and edge computing.