Optimize NGFW Performance with Intel® Xeon® Processors on Public Cloud
Authors: Xiang Wang, Jayprakash Patidar, Declan Doherty, Eric Jones, Subhiksha Ravisundar, Heqing Zhu
1 Introduction
Next-generation firewalls (NGFWs) are central to network security solutions. Unlike traditional firewalls that rely on port and protocol inspection, NGFWs offer advanced deep packet inspection capabilities, including intrusion detection/prevention systems (IDS/IPS), malware detection, and application identification and control, to defend against modern threats.
NGFWs are compute-intensive, handling tasks like cryptographic operations for traffic encryption/decryption and complex rule matching. Intel processors provide key technologies to optimize NGFW performance, such as Intel® Advanced Encryption Standard New Instructions (Intel® AES-NI) and Intel® QuickAssist Technology (Intel® QAT) for accelerated crypto performance. Intel also focuses on software optimizations, notably with Hyperscan, a high-performance string and regular expression (regex) matching library that leverages Single Instruction Multiple Data (SIMD) technology to boost pattern-matching. Integrating Hyperscan into NGFW IPS systems like Snort can yield performance improvements of up to 3x on Intel processors.
NGFWs are commonly deployed as security appliances in enterprise data centers. However, there is a growing demand for virtual NGFW appliances or software packages deployable on public clouds, enterprise data centers, or network edges. This software-centric approach reduces the operational overhead of physical appliances, enhances system scalability, and offers flexible procurement options.
The adoption of public cloud NGFW deployments is increasing due to cost advantages. However, selecting the optimal cloud instance type for NGFW, considering the wide variety of CPU, memory, and I/O characteristics, as well as pricing, can be challenging. This paper introduces an NGFW reference implementation from Intel, optimized with Intel technologies including Hyperscan, to provide a reliable benchmark for NGFW performance on Intel platforms. This implementation is part of Intel's NetSec Reference Software package, which also includes the Multi-Cloud Networking Automation Tool (MCNAT) to automate deployment on select public cloud providers and simplify TCO analysis for instance selection.
For more information on the NetSec Reference Software package, please contact the authors.
2 Background and Motivation
Most NGFW vendors are extending their offerings from physical appliances to virtual solutions deployable on public clouds. Public cloud NGFW deployments are gaining traction due to several benefits:
- Scalability: Enables easy scaling of compute resources across geographies to meet performance demands.
- Cost effectiveness: Offers flexible pay-per-use subscriptions, eliminating capital expenditure and reducing operational costs associated with physical appliances.
- Native integration with cloud services: Provides seamless integration with public cloud services like networking, access controls, and AI/ML tools.
- Cloud workloads protection: Facilitates local traffic filtering for enterprise workloads hosted on public clouds.
The reduced cost of running NGFW workloads in the public cloud is attractive for enterprises. However, selecting the instance with the best performance and Total Cost of Ownership (TCO) for NGFW is complex due to the vast array of cloud instance options with varying compute characteristics, pricing, and I/O bandwidth. Intel has developed the NGFW Reference Implementation to aid in performance and TCO analysis for different public cloud instances based on Intel processors. This guide demonstrates performance and performance-per-dollar metrics to assist in choosing the right Intel-based instances for NGFW solutions on public cloud services such as AWS and GCP.
3 NGFW Reference Implementation
Intel's NetSec Reference Software package (latest release 25.05) provides optimized reference solutions leveraging Intel CPU instruction set architectures (ISAs) and accelerators to showcase performance on both on-premise and cloud environments. The reference software is available under the Intel Proprietary License (IPL).
Key highlights of the NetSec Reference Software package include:
- A comprehensive suite of reference solutions for networking, security, and AI frameworks for cloud, enterprise data centers, and edge locations.
- Facilitates rapid adoption of Intel technologies and reduces time to market.
- Provides source code for replicating deployment scenarios and testing environments on Intel platforms.
The NGFW reference implementation, a core component of the NetSec Reference Software package, drives NGFW performance characteristics and TCO analysis on Intel platforms. It integrates Intel technologies like Hyperscan, forming a robust foundation for NGFW analysis. Given that different Intel hardware platforms offer varying compute and I/O capabilities, the NGFW reference implementation offers a clear view of platform strengths and facilitates performance comparisons across Intel processor generations. It provides insights into compute performance, memory bandwidth, I/O bandwidth, and power consumption, enabling TCO analysis based on performance-per-dollar metrics.
The latest release (25.05) of the NGFW reference implementation includes:
- Basic stateful firewall functionality.
- Intrusion Prevention System (IPS) capabilities.
- Support for advanced Intel processors, including Intel® Xeon® 6 processors and Intel Xeon 6 SoC.
Future releases are planned to include:
- VPN inspection: IPsec decryption for content inspection.
- TLS inspection: A TLS Proxy to terminate client-server connections for plaintext traffic inspection.
3.1 System Architecture
The system architecture leverages open-source software. VPP (Vector Packet Processing) provides a high-performance data plane with stateful firewall functions, including Access Control Lists (ACLs). Multiple VPP threads are spawned with configured core affinity, with each VPP worker thread pinned to a dedicated CPU core or execution thread.
Snort 3 is selected for IPS, supporting multi-threading with worker threads pinned to dedicated CPU cores or execution threads. Snort and VPP are integrated via a Snort plugin, utilizing queue pairs for packet exchange. Packets are stored in shared memory. A new Data Acquisition (DAQ) component for Snort, the VPP Zero Copy (ZC) DAQ, implements Snort's DAQ API functions to receive and transmit packets by reading from and writing to these queues, effectively achieving zero-copy data transfer.
Since Snort 3 is compute-intensive, careful allocation of processor cores and balancing of VPP and Snort3 threads are crucial for achieving optimal system-level performance. The architecture includes new VPP graph nodes: snort-enq for load-balancing packets to Snort threads, and snort-deq for polling packets from queues, one per Snort worker thread.
3.2 Intel Optimizations
The NGFW reference implementation utilizes several Intel optimizations:
- Hyperscan: This high-performance multiple regex matching library significantly boosts Snort's performance compared to its default search engine. Figure 3 illustrates Hyperscan's integration with Snort for accelerated literal and regex matching. Snort 3 offers native Hyperscan integration, which can be enabled via configuration files or command-line options.
- Receive Side Scaling (RSS): VPP leverages RSS in Intel® Ethernet Network Adapters to distribute traffic across multiple VPP worker threads.
- Intel® QAT and Intel AVX-512 instructions: Future releases supporting IPsec and TLS will incorporate Intel's crypto acceleration technologies. Intel QAT accelerates crypto performance, particularly public key cryptography used in network connection establishment. Intel AVX-512 instructions enhance cryptographic performance, including VPMADD52 (multiply and accumulation), vector AES (for Intel AES-NI), vPCLMUL (vectorized carry-less multiply for AES-GCM), and Intel® Secure Hash Algorithm - New Instructions (Intel® SHA-NI).
4 Cloud Deployment of NGFW Reference Implementation
4.1 System Configuration
The focus is on cleartext inspection scenarios, aligning with use cases and KPIs defined in RFC9411. The traffic generator simulates 64KB HTTP transactions with one GET request per connection. ACLs are configured to permit traffic within specified subnets. The benchmarking utilizes the Snort Lightspd ruleset and Cisco's security policy. A dedicated server is included to handle requests from traffic generators.
Metric | Value |
---|---|
Use Case | Cleartext Inspection (FW + IPS) |
Traffic Profile | HTTP 64KB GET (1 GET per Connection) |
VPP ACLs | Yes (2 stateful ACLs) |
Snort Rules | Lightspd (~49k rules) |
Snort Policy | Security (~21k rules enabled) |
The system topology for public cloud deployment includes three primary instance nodes: a client, a server, and a proxy. A bastion node is also present for user connections. Both the client (running WRK) and server (running Nginx) feature a single dedicated data-plane network interface. The proxy (running NGFW) has two data-plane interfaces for testing. These interfaces connect to dedicated subnets (subnet A for client-proxy and subnet B for proxy-server) to isolate them from management traffic. Dedicated IP address ranges, routing, and ACL rules are configured to manage traffic flow.
4.2 System Deployment
Intel's Multi-Cloud Networking Automation Tool (MCNAT) is a software tool designed to automate networking workload deployments on public clouds and assist in selecting the best cloud instance based on performance and cost. MCNAT uses profiles to define instance-specific variables and settings. These profiles can be passed to the MCNAT CLI tool for deploying specific instance types on a chosen Cloud Service Provider (CSP).
An example command line usage is provided:
# ./mcnat.py --deploy -u user -c aws.oregon -s ngfw-intel -p c7i-xlarge
Option | Description |
---|---|
--deploy | Instructs the tool to create a new deployment |
-u | Defines which user credentials to use |
-c | CSP to create deployment on (AWS, GCP, etc) |
-s | Scenario to deploy |
-p | Profile to use |
The MCNAT command line tool enables single-step instance building and deployment. Post-deployment, necessary SSH configurations are created to allow instance access.
4.3 System Benchmarking
After MCNAT deploys the instances, performance tests can be executed using the MCNAT application toolkit. Test cases are configured in tools/mcn/applications/configurations/ngfw-intel/ngfw-intel.json
, specifying parameters such as throughput test, packet size, and Hyperscan enablement.
The following command can be used to launch a test:
# python -m mcn_application_toolkit run --deployment-path DEPLOYMENT_PATH -f configurations/ngfw-intel/ngfw-intel.json ngfw-intel
This command runs the NGFW with a defined set of rules against HTTP traffic generated by WRK on the client, pinning specific CPU cores to gather comprehensive performance data for the instance under test. Upon completion, test data is formatted as a CSV file and returned to the user.
5 Performance and Cost Evaluation
This section compares NGFW deployments on various cloud instances powered by Intel Xeon processors across AWS and GCP, providing guidance for selecting the most suitable cloud instance type for NGFW based on performance and cost. Instances with 4 vCPUs are chosen, as recommended by most NGFW vendors. The results for AWS and GCP include:
- NGFW performance on small instance types featuring 4 vCPUs with Intel® Hyper-Threading Technology (Intel® HT Technology) and Hyperscan enabled.
- Generation-to-generation performance gains observed from 1st Gen Intel Xeon Scalable processors to 5th Gen Intel Xeon Scalable processors.
- Generation-to-generation performance-per-dollar gains from 1st Gen Intel Xeon Scalable processors to 5th Gen Intel Xeon Scalable processors.
5.1 AWS Deployment
5.1.1 Instance Type List
Instance Type | CPU Model | VCPU | Memory (GB) | Network performance (Gbps) | On-demand hourly rate ($) |
---|---|---|---|---|---|
c5-xlarge | 2nd Gen Intel® Xeon® Scalable processors | 4 | 8 | 10 | 0.17 |
c5n-xlarge | 1st Gen Intel® Xeon® Scalable processors | 4 | 10.5 | 25 | 0.216 |
c6i-xlarge | 3rd Gen Intel® Xeon® Scalable processors | 4 | 8 | 12.5 | 0.17 |
c6in-xlarge | 3rd Gen Intel Xeon Scalable processors | 4 | 8 | 30 | 0.2268 |
c7i-xlarge | 4th Gen Intel® Xeon® Scalable processors | 4 | 8 | 12.5 | 0.1785 |
Table 5 provides an overview of the AWS instances used. For more platform details, refer to Platform Configuration. It also lists the on-demand hourly rate (https://aws.amazon.com/ec2/pricing/on-demand/). The rates presented are from the time of publication, focusing on the US west coast, and may vary by region, availability, corporate accounts, and other factors.
5.1.2 Results
Figure 6 compares performance and performance per hour rate across the listed instance types. Key observations include:
- Performance improves with instances based on newer generations of Intel Xeon processors. For example, upgrading from c5.xlarge (2nd Gen Intel Xeon Scalable processor) to c7i.xlarge (4th Gen Intel Xeon Scalable processor) shows a 1.97x performance improvement.
- Performance per dollar also improves with newer generations of Intel Xeon processors. An upgrade from c5n.xlarge (1st Gen Intel Xeon Scalable processor) to c7i.xlarge (4th Gen Intel Xeon Scalable processor) demonstrates a 1.88x improvement in performance per hour rate.
5.2 GCP Deployment
5.2.1 Instance Type List
Instance Type | CPU Model | VCPU | Memory (GB) | Default egress bandwidth (Gbps) | On-demand hourly rate ($) |
---|---|---|---|---|---|
n1-std-4 | 1st Gen Intel® Xeon® Scalable processors | 4 | 15 | 10 | 0.189999 |
n2-std-4 | 3rd Gen Intel® Xeon® Scalable processors | 4 | 16 | 10 | 0.194236 |
c3-std-4 | 4th Gen Intel® Xeon® Scalable processors | 4 | 16 | 23 | 0.201608 |
n4-std-4 | 5th Gen Intel® Xeon® Scalable processors | 4 | 16 | 10 | 0.189544 |
c4-std-4 | 5th Gen Intel® Xeon® Scalable processors | 4 | 15 | 23 | 0.23761913 |
Table 6 provides an overview of the GCP instances used. For more platform details, refer to Platform Configuration. It also lists the on-demand hourly rate (https://cloud.google.com/compute/vm-instance-pricing?hl=en). The rates presented are from the time of publication, focusing on the US west coast, and may vary by region, availability, corporate accounts, and other factors.
5.2.2 Results
Figure 7 compares performance and performance per hour rate across the listed instance types. Key observations include:
- Performance improves with instances based on newer generations of Intel Xeon processors. Upgrading from n1-std-4 (1st Gen Intel Xeon Scalable processor) to c4-std-4 (5th Gen Intel Xeon Scalable processor) shows a 2.68x performance improvement.
- Performance per dollar also improves with newer generations of Intel Xeon processors. An upgrade from n1-std-4 (1st Gen Intel Xeon Scalable processor) to c4-std-4 (5th Gen Intel Xeon Scalable processor) demonstrates a 2.15x improvement in performance per hour rate.
6 Summary
With the increasing adoption of multi- and hybrid-cloud deployment models, delivering NGFW solutions on public clouds offers consistent protection across environments, scalability for security requirements, and simplicity with minimal maintenance. Network security vendors provide NGFW solutions utilizing various cloud instance types. Minimizing Total Cost of Ownership (TCO) and maximizing Return on Investment (ROI) are critical when selecting the right cloud instance, considering compute resources, network bandwidth, and price.
This evaluation used the NGFW reference implementation as a representative workload and leveraged MCNAT to automate deployment and testing across different public cloud instance types. Benchmarking results indicate that instances with the latest generation Intel Xeon Scalable processors on AWS (powered by 4th Gen Intel Xeon Scalable processors) and GCP (powered by 5th Gen Intel Xeon Scalable processors) deliver significant performance and TCO improvements. These instances show performance increases of up to 2.68x and performance-per-hour rate improvements of up to 2.15x over previous generations. This evaluation provides valuable references for selecting Intel-based public cloud instances for NGFW deployments.
Appendix A Platform Configuration
Detailed platform configurations for the tested instances:
c5-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz, 2 cores, HT On, Turbo On, Total Memory 8GB (1x8GB DDR4 2933 MT/s [Unknown]), BIOS 1.0, microcode 0x5003801, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"
c5n-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz, 2 cores, HT On, Turbo On, Total Memory 10.5GB (1x10.5GB DDR4 2933 MT/s [Unknown]), BIOS 1.0, microcode 0x2007006, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"
c6i-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz, 2 cores, HT On, Turbo On, Total Memory 8GB (1x8GB DDR4 3200 MT/s [Unknown]), BIOS 1.0, microcode 0xd0003f6, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"
c6in-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz, 2 cores, HT On, Turbo On, Total Memory 8GB (1x8GB DDR4 3200 MT/s [Unknown]), BIOS 1.0, microcode 0xd0003f6, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"
c7i-xlarge - "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz, 2 cores, HT On, Turbo On, Total Memory 8GB (1x8GB DDR4 4800 MT/s [Unknown]), BIOS 1.0, microcode 0x2b000620, 1x Elastic Network Adapter (ENA), 1x 32G Amazon Elastic Block Store, Ubuntu 22.04.5 LTS, 6.8.0-1024-aws, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"
n1-std-4 – "Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) CPU @ 2.00GHz, 2 cores, HT On, Turbo On, Total Memory 15GB (1x15GB RAM []), BIOS Google, microcode 0xffffffff, 1x device, 1x 32G PersistentDisk, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"
n2-std-4 - Test by Intel as of 03/17/25. 1-node, 1x Intel(R) Xeon(R) CPU @ 2.60GHz, 2 cores, HT On, Turbo On, Total Memory 16GB (1x16GB RAM []), BIOS Google, microcode 0xffffffff, 1x device, 1x 32G PersistentDisk, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"
c3-std-4 - Test by Intel as of 03/14/25. 1-node, 1x Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz @ 2.60GHz, 2 cores, HT On, Turbo On, Total Memory 16GB (1x16GB RAM []), BIOS Google, microcode 0xffffffff, 1x Compute Engine Virtual Ethernet [gVNIC], 1x 32G nvme_card-pd, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1″
n4-std-4 - Test by Intel as of 03/18/25. 1-node, 1x Intel(R) Xeon(R) PLATINUM 8581C CPU @ 2.10GHz, 2 cores, HT On, Turbo On, Total Memory 16GB (1x16GB RAM []), BIOS Google, microcode 0xffffffff, 1x Compute Engine Virtual Ethernet [gVNIC], 1x 32G nvme_card-pd, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"
c4-std-4 - Test by Intel as of 03/18/25. 1-node, 1x Intel(R) Xeon(R) PLATINUM 8581C CPU @ 2.30GHz, 2 cores, HT On, Turbo On, Total Memory 15GB (1x15GB RAM []), BIOS Google, microcode 0xffffffff, 1x Compute Engine Virtual Ethernet [gVNIC], 1x 32G nvme_card-pd, Ubuntu 22.04.5 LTS, 6.8.0-1025-gcp, gcc 11.4, NGFW 24.12, Hyperscan 5.6.1"
Appendix B Intel NGFW Reference Software Configuration
Software Configuration | Software Version |
---|---|
Host OS | Ubuntu 22.04 LTS |
Kernel | 6.8.0-1025 |
Compiler | GCC 11.4.0 |
WRK | 74eb9437 |
WRK2 | 44a94c17 |
VPP | 24.02 |
Snort | 3.1.36.0 |
DAQ | 3.0.9 |
LuaJIT | 2.1.0-beta3 |
Libpcap | 1.10.1 |
PCRE | 8.45 |
ZLIB | 1.2.11 |
Hyperscan | 5.6.1 |
LZMA | 5.2.5 |
NGINX | 1.22.1 |
DPDK | 23.11 |