Optimize NGFW Performance with Intel® Xeon® Processors on Public Cloud

Authors: Xiang Wang, Jayprakash Patidar, Declan Doherty, Eric Jones, Subhiksha Ravisundar, Heqing Zhu

1 Introduction

Next-generation firewalls (NGFWs) are central to network security solutions. Unlike traditional firewalls that rely on port and protocol inspection, NGFWs offer advanced deep packet inspection capabilities, including intrusion detection/prevention systems (IDS/IPS), malware detection, and application identification and control, to defend against modern threats.

NGFWs are compute-intensive, handling tasks like cryptographic operations for traffic encryption/decryption and complex rule matching. Intel processors provide key technologies to optimize NGFW performance, such as Intel® Advanced Encryption Standard New Instructions (Intel® AES-NI) and Intel® QuickAssist Technology (Intel® QAT) for accelerated crypto performance. Intel also focuses on software optimizations, notably with Hyperscan, a high-performance string and regular expression (regex) matching library that leverages Single Instruction Multiple Data (SIMD) technology to boost pattern-matching. Integrating Hyperscan into NGFW IPS systems like Snort can yield performance improvements of up to 3x on Intel processors.

NGFWs are commonly deployed as security appliances in enterprise data centers. However, there is a growing demand for virtual NGFW appliances or software packages deployable on public clouds, enterprise data centers, or network edges. This software-centric approach reduces the operational overhead of physical appliances, enhances system scalability, and offers flexible procurement options.

The adoption of public cloud NGFW deployments is increasing due to cost advantages. However, selecting the optimal cloud instance type for NGFW, considering the wide variety of CPU, memory, and I/O characteristics, as well as pricing, can be challenging. This paper introduces an NGFW reference implementation from Intel, optimized with Intel technologies including Hyperscan, to provide a reliable benchmark for NGFW performance on Intel platforms. This implementation is part of Intel's NetSec Reference Software package, which also includes the Multi-Cloud Networking Automation Tool (MCNAT) to automate deployment on select public cloud providers and simplify TCO analysis for instance selection.

For more information on the NetSec Reference Software package, please contact the authors.

2 Background and Motivation

Most NGFW vendors are extending their offerings from physical appliances to virtual solutions deployable on public clouds. Public cloud NGFW deployments are gaining traction due to several benefits:

The reduced cost of running NGFW workloads in the public cloud is attractive for enterprises. However, selecting the instance with the best performance and Total Cost of Ownership (TCO) for NGFW is complex due to the vast array of cloud instance options with varying compute characteristics, pricing, and I/O bandwidth. Intel has developed the NGFW Reference Implementation to aid in performance and TCO analysis for different public cloud instances based on Intel processors. This guide demonstrates performance and performance-per-dollar metrics to assist in choosing the right Intel-based instances for NGFW solutions on public cloud services such as AWS and GCP.

3 NGFW Reference Implementation

Intel's NetSec Reference Software package (latest release 25.05) provides optimized reference solutions leveraging Intel CPU instruction set architectures (ISAs) and accelerators to showcase performance on both on-premise and cloud environments. The reference software is available under the Intel Proprietary License (IPL).

Key highlights of the NetSec Reference Software package include:

The NGFW reference implementation, a core component of the NetSec Reference Software package, drives NGFW performance characteristics and TCO analysis on Intel platforms. It integrates Intel technologies like Hyperscan, forming a robust foundation for NGFW analysis. Given that different Intel hardware platforms offer varying compute and I/O capabilities, the NGFW reference implementation offers a clear view of platform strengths and facilitates performance comparisons across Intel processor generations. It provides insights into compute performance, memory bandwidth, I/O bandwidth, and power consumption, enabling TCO analysis based on performance-per-dollar metrics.

The latest release (25.05) of the NGFW reference implementation includes:

Future releases are planned to include:

3.1 System Architecture

The system architecture leverages open-source software. VPP (Vector Packet Processing) provides a high-performance data plane with stateful firewall functions, including Access Control Lists (ACLs). Multiple VPP threads are spawned with configured core affinity, with each VPP worker thread pinned to a dedicated CPU core or execution thread.

Snort 3 is selected for IPS, supporting multi-threading with worker threads pinned to dedicated CPU cores or execution threads. Snort and VPP are integrated via a Snort plugin, utilizing queue pairs for packet exchange. Packets are stored in shared memory. A new Data Acquisition (DAQ) component for Snort, the VPP Zero Copy (ZC) DAQ, implements Snort's DAQ API functions to receive and transmit packets by reading from and writing to these queues, effectively achieving zero-copy data transfer.

Since Snort 3 is compute-intensive, careful allocation of processor cores and balancing of VPP and Snort3 threads are crucial for achieving optimal system-level performance. The architecture includes new VPP graph nodes: snort-enq for load-balancing packets to Snort threads, and snort-deq for polling packets from queues, one per Snort worker thread.

Figure 1: NGFW Reference Architecture. Depicts a system architecture with VPP, Snort, and network interfaces for Home and External networks.

3.2 Intel Optimizations

The NGFW reference implementation utilizes several Intel optimizations:

Figure 2: NGFW Reference Architecture with VPP Graph Nodes. Illustrates the VPP graph nodes including snort-enq and snort-deq.
Figure 3: Snort with Hyperscan Integration. Shows how Hyperscan integrates with Snort for enhanced regex matching.

4 Cloud Deployment of NGFW Reference Implementation

4.1 System Configuration

The focus is on cleartext inspection scenarios, aligning with use cases and KPIs defined in RFC9411. The traffic generator simulates 64KB HTTP transactions with one GET request per connection. ACLs are configured to permit traffic within specified subnets. The benchmarking utilizes the Snort Lightspd ruleset and Cisco's security policy. A dedicated server is included to handle requests from traffic generators.

Table 3. Test configurations
MetricValue
Use CaseCleartext Inspection (FW + IPS)
Traffic ProfileHTTP 64KB GET (1 GET per Connection)
VPP ACLsYes (2 stateful ACLs)
Snort RulesLightspd (~49k rules)
Snort PolicySecurity (~21k rules enabled)
Figure 4: AWS System Topology. A diagram showing the network setup for NGFW deployment on AWS.
Figure 5: GCP System Topology. A diagram showing the network setup for NGFW deployment on GCP.

The system topology for public cloud deployment includes three primary instance nodes: a client, a server, and a proxy. A bastion node is also present for user connections. Both the client (running WRK) and server (running Nginx) feature a single dedicated data-plane network interface. The proxy (running NGFW) has two data-plane interfaces for testing. These interfaces connect to dedicated subnets (subnet A for client-proxy and subnet B for proxy-server) to isolate them from management traffic. Dedicated IP address ranges, routing, and ACL rules are configured to manage traffic flow.

4.2 System Deployment

Intel's Multi-Cloud Networking Automation Tool (MCNAT) is a software tool designed to automate networking workload deployments on public clouds and assist in selecting the best cloud instance based on performance and cost. MCNAT uses profiles to define instance-specific variables and settings. These profiles can be passed to the MCNAT CLI tool for deploying specific instance types on a chosen Cloud Service Provider (CSP).

An example command line usage is provided:

# ./mcnat.py --deploy -u user -c aws.oregon -s ngfw-intel -p c7i-xlarge
Table 4. MCNAT Command Line Usage
OptionDescription
--deployInstructs the tool to create a new deployment
-uDefines which user credentials to use
-cCSP to create deployment on (AWS, GCP, etc)
-sScenario to deploy
-pProfile to use

The MCNAT command line tool enables single-step instance building and deployment. Post-deployment, necessary SSH configurations are created to allow instance access.

4.3 System Benchmarking

After MCNAT deploys the instances, performance tests can be executed using the MCNAT application toolkit. Test cases are configured in tools/mcn/applications/configurations/ngfw-intel/ngfw-intel.json, specifying parameters such as throughput test, packet size, and Hyperscan enablement.

The following command can be used to launch a test:

# python -m mcn_application_toolkit run --deployment-path DEPLOYMENT_PATH -f configurations/ngfw-intel/ngfw-intel.json ngfw-intel

This command runs the NGFW with a defined set of rules against HTTP traffic generated by WRK on the client, pinning specific CPU cores to gather comprehensive performance data for the instance under test. Upon completion, test data is formatted as a CSV file and returned to the user.

5 Performance and Cost Evaluation

This section compares NGFW deployments on various cloud instances powered by Intel Xeon processors across AWS and GCP, providing guidance for selecting the most suitable cloud instance type for NGFW based on performance and cost. Instances with 4 vCPUs are chosen, as recommended by most NGFW vendors. The results for AWS and GCP include:

5.1 AWS Deployment

5.1.1 Instance Type List

Table 5. AWS Instances and On-demand Hour Rates
Instance TypeCPU ModelVCPUMemory (GB)Network performance (Gbps)On-demand hourly rate ($)
c5-xlarge2nd Gen Intel® Xeon® Scalable processors48100.17
c5n-xlarge1st Gen Intel® Xeon® Scalable processors410.5250.216
c6i-xlarge3rd Gen Intel® Xeon® Scalable processors4812.50.17
c6in-xlarge3rd Gen Intel Xeon Scalable processors48300.2268
c7i-xlarge4th Gen Intel® Xeon® Scalable processors4812.50.1785

Table 5 provides an overview of the AWS instances used. For more platform details, refer to Platform Configuration. It also lists the on-demand hourly rate (https://aws.amazon.com/ec2/pricing/on-demand/). The rates presented are from the time of publication, focusing on the US west coast, and may vary by region, availability, corporate accounts, and other factors.

5.1.2 Results

Figure 6: NGFW Reference Software Performance and Performance per Dollar on AWS. Bar chart showing throughput and performance per dollar for various AWS instance types.

Figure 6 compares performance and performance per hour rate across the listed instance types. Key observations include:

5.2 GCP Deployment

5.2.1 Instance Type List

Table 6. GCP Instances and On-demand Hour Rates
Instance TypeCPU ModelVCPUMemory (GB)Default egress bandwidth (Gbps)On-demand hourly rate ($)
n1-std-41st Gen Intel® Xeon® Scalable processors415100.189999
n2-std-43rd Gen Intel® Xeon® Scalable processors416100.194236
c3-std-44th Gen Intel® Xeon® Scalable processors416230.201608
n4-std-45th Gen Intel® Xeon® Scalable processors416100.189544
c4-std-45th Gen Intel® Xeon® Scalable processors415230.23761913

Table 6 provides an overview of the GCP instances used. For more platform details, refer to Platform Configuration. It also lists the on-demand hourly rate (https://cloud.google.com/compute/vm-instance-pricing?hl=en). The rates presented are from the time of publication, focusing on the US west coast, and may vary by region, availability, corporate accounts, and other factors.

5.2.2 Results

Figure 7: NGFW Reference Software Performance and Performance per Dollar on GCP. Bar chart showing throughput and performance per dollar for various GCP instance types.

Figure 7 compares performance and performance per hour rate across the listed instance types. Key observations include:

6 Summary

With the increasing adoption of multi- and hybrid-cloud deployment models, delivering NGFW solutions on public clouds offers consistent protection across environments, scalability for security requirements, and simplicity with minimal maintenance. Network security vendors provide NGFW solutions utilizing various cloud instance types. Minimizing Total Cost of Ownership (TCO) and maximizing Return on Investment (ROI) are critical when selecting the right cloud instance, considering compute resources, network bandwidth, and price.

This evaluation used the NGFW reference implementation as a representative workload and leveraged MCNAT to automate deployment and testing across different public cloud instance types. Benchmarking results indicate that instances with the latest generation Intel Xeon Scalable processors on AWS (powered by 4th Gen Intel Xeon Scalable processors) and GCP (powered by 5th Gen Intel Xeon Scalable processors) deliver significant performance and TCO improvements. These instances show performance increases of up to 2.68x and performance-per-hour rate improvements of up to 2.15x over previous generations. This evaluation provides valuable references for selecting Intel-based public cloud instances for NGFW deployments.

Appendix A Platform Configuration

Detailed platform configurations for the tested instances:

c5-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz, 2 cores, HT On, Turbo On, Total Memory 8GB (1x8GB DDR4 2933 MT/s [Unknown]), BIOS 1.0, microcode 0x5003801, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

c5n-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz, 2 cores, HT On, Turbo On, Total Memory 10.5GB (1x10.5GB DDR4 2933 MT/s [Unknown]), BIOS 1.0, microcode 0x2007006, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

c6i-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz, 2 cores, HT On, Turbo On, Total Memory 8GB (1x8GB DDR4 3200 MT/s [Unknown]), BIOS 1.0, microcode 0xd0003f6, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

c6in-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz, 2 cores, HT On, Turbo On, Total Memory 8GB (1x8GB DDR4 3200 MT/s [Unknown]), BIOS 1.0, microcode 0xd0003f6, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

c7i-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz, 2 cores, HT On, Turbo On, Total Memory 8GB (1x8GB DDR4 4800 MT/s [Unknown]), BIOS 1.0, microcode 0x2b000620, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

n1-std-4 – "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) CPU @ 2.00GHz, 2 cores, HT On, Turbo On, Total Memory 15GB (1x15GB RAM []), BIOS Google, microcode 0xffffffff, 1x device, 1x 32G PersistentDisk, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

n2-std-4 - Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) CPU @ 2.60GHz, 2 cores, HT On, Turbo On, Total Memory 16GB (1x16GB RAM []), BIOS Google, microcode 0xffffffff, 1x device, 1x 32G PersistentDisk, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

c3-std-4 - Test by Intel as of 03/14/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz @ 2.60GHz, 2 cores, HT On, Turbo On, Total Memory 16GB (1x16GB RAM []), BIOS Google, microcode 0xffffffff, 1x Compute Engine Virtual Ethernet [gVNIC], 1x 32G nvme_card-pd, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1″

n4-std-4 - Test by Intel as of 03/18/25. 1-node, 1x Intel(R) Xeon(R) PLATINUM 8581C CPU @ 2.10GHz, 2 cores, HT On, Turbo On, Total Memory 16GB (1x16GB RAM []), BIOS Google, microcode 0xffffffff, 1x Compute Engine Virtual Ethernet [gVNIC], 1x 32G nvme_card-pd, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

c4-std-4 - Test by Intel as of 03/18/25. 1-node, 1x Intel(R) Xeon(R) PLATINUM 8581C CPU @ 2.30GHz, 2 cores, HT On, Turbo On, Total Memory 15GB (1x15GB RAM []), BIOS Google, microcode 0xffffffff, 1x Compute Engine Virtual Ethernet [gVNIC], 1x 32G nvme_card-pd, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"

Appendix B Intel NGFW Reference Software Configuration

Software Configuration and Version
Software ConfigurationSoftware Version
Host OSUbuntu 22.04 LTS
Kernel6.8.0-1025
CompilerGCC 11.4.0
WRK74eb9437
WRK244a94c17
VPP24.02
Snort3.1.36.0
DAQ3.0.9
LuaJIT2.1.0-beta3
Libpcap1.10.1
PCRE8.45
ZLIB1.2.11
Hyperscan5.6.1
LZMA5.2.5
NGINX1.22.1
DPDK23.11
Models: Optimize Next Generation Firewalls, Optimize, Next Generation Firewalls, Generation Firewalls, Firewalls

File Info : application/pdf, 13 Pages, 650.42KB

PDF preview unavailable. Download the PDF instead.

854861?explicitVersion=true

References

Intel Corporation Adobe PDF Library 25.1.250

Related Documents

PreviewIntel Xeon CPU Support List for C621-WD12 Motherboard
Find detailed specifications for Intel Xeon Platinum, Gold, Silver, and Bronze processors compatible with the C621-WD12 motherboard, including SKYLAKE-S and Cascade Lake series.
PreviewIntel Product Change Notification 853587-00: Boxed Processor Updates
Notification regarding updates to Intel Boxed Processor manuals, Single Point of Contact (SPoC) details, and China RoHS compliance tables, affecting various Intel Core and Xeon processors.
PreviewIntel Xeon Processor E3-1200 v5 Product Family: Enabling Better Designs and Faster Time to Market
Product brief detailing the Intel Xeon Processor E3-1200 v5 product family, highlighting its workstation-class performance, reliability, and security features for professionals in design, media, and entertainment.
Preview8th and 9th Generation Intel® Core™ and Xeon® E Processor Families Datasheet
Comprehensive datasheet detailing the architecture, features, technologies, power management, and specifications for Intel's 8th and 9th Generation Core™ and Xeon® E processor families. Covers performance, interfaces, and system integration aspects.
PreviewIntel Core Ultra Processors (PS Series) Datasheet
Detailed technical datasheet for Intel Core Ultra Processors (PS Series). Covers processor architecture, features, power management, security, virtualization, graphics, NPU, audio, and connectivity for OEM/ODM product development.
PreviewIntel Performance Optimizations for Deep Learning
Explore Intel's performance optimizations for deep learning, covering the oneAPI ecosystem, oneDNN library, and TensorFlow optimizations. Learn how to accelerate AI and data analytics pipelines with Intel hardware and software solutions.
PreviewIntel oneAPI Base Toolkit Optimizes SonoScape S-Fetus 4.0 Obstetric Screening Assistant Performance
Learn how SonoScape leveraged the Intel oneAPI Base Toolkit to enhance the performance and efficiency of its S-Fetus 4.0 Obstetric Screening Assistant, achieving up to 20x improvement through cross-architecture optimization.
PreviewEdge Insights for Fleet: Developer Guide and Installation
Explore Intel's Edge Insights for Fleet with this developer guide, covering installation, configuration, and deployment for commercial vehicle fleet video analytics and edge computing.