行业新闻 - 易IC

FPGAs在AI数据路径中的作用

2026-01-20 EE Times

FPGAs offer programmable and flexible hardware design but require longer design cycles than CPU- or GPU-based systems. As AI workloads scale and push for higher compute density, faster time to deployment, and lower energy per operation in latency- and control-bound workloads, do FPGAs still make sense in the AI stack?

In an interview with EE Times, Esam Elashmawi, chief strategy and marketing officer at Lattice Semiconductor, said FPGAs do not compete with GPUs for AI compute but instead operate as companion devices in the data path, particularly at the edge. “If you need very high performance and you are willing to live with high power, then you can use a GPU or a CPU,” he said. “FPGAs are a good companion to it.”

Elashmawi positioned FPGAs directly in the data path—at the edge, the far edge, and alongside high-performance processors—where power, latency, and determinism shape system behavior more than peak tera operations per second (TOPS). www.eic.net.cn

AI does not exist as a single workload but rather as a system property. Sensors generate data, connectivity transports it, intelligence interprets it, and security protects it.

At the VLSI Design Conference 2026, Pravin Desale, senior VP of engineering/R&D at Lattice Semiconductor, described AI as the next major innovation cycle, following the internet, mobile computing, and cloud infrastructure. “Each cycle reaches maturity faster than the one before it, placing increasing pressure on hardware platforms to adapt,” he said.

In such an environment, reprogrammable hardware increasingly acts as connective tissue between subsystems, spanning the AI system chain rather than serving as primary compute. “Change is another name for innovation,” Desale said. “FPGAs absorb change differently from fixed silicon. Teams can demonstrate ideas quickly, deploy them early, and refine them in the field while a business model matures. Low-power and small-form-factor devices allow products to remain in production for long periods without locking functionality on day one.”

Desale continued: “At 2 nm, tapeout costs now run into millions of dollars before a product ever reaches volume production. That cost curve pushes startups and small teams out of the innovation equation long before software or in-field upgrades are even considered.”

GPUs use fixed-function cores optimized for parallel processing. They have become the default accelerator for AI training and large-scale simulations because they handle massive matrix operations efficiently. Mature software ecosystems, such as CUDA, TensorFlow, and PyTorch, also reduce the barrier to entry for developers.

FPGAs, on the other hand, are well-suited for applications that require deterministic behavior, custom data paths, or tight latency control, such as signal processing, telecom infrastructure, and real-time AI inference. The tradeoff shifts onto development effort, which demands hardware description languages and detailed knowledge of synthesis and routing.

Academic comparisons suggest that deployed performance matters more than advertised TOPS figures.

A paper presented at the 2020 IEEE International Conference on Field Programmable Technology compared an AI-optimized Intel Stratix 10 NX FPGA with Nvidia’s T4 and V100 GPUs. While the comparison involved a high-end FPGA, the findings highlight broader differences between GPU and FPGA execution models under real-time constraints.

Rather than chasing peak numbers, the study looked at real-time inference and asked how much tensor hardware is used by applications, after factoring in data movement and system overheads.

The results showed that on small-batch inference—common in real-time and edge deployments—the FPGA delivered higher effective tensor utilization than the GPUs. GPU tensor cores excel at large matrix–matrix operations, but utilization drops sharply on the smaller matrix–vector workloads typical of inference. Once PCIe transfer overheads enter the picture, the gap widens further.

“If your application requires very low power, and you do not necessarily need more than one tera operation per second of performance, then an FPGA is a more cost-effective, lower power solution,” Elashmawi said.

At low batch sizes, latency constraints restrain developers from grouping workloads into large blocks. In such cases, the FPGA keeps trained model parameters in the FPGA’s on-chip memory and moves data through custom pipelines. This allows the FPGA to deliver higher end-to-end performance than GPUs despite similar peak TOPS ratings.

In applications such as automotive ADAS, industrial automation, and robotics, data is continuously received from cameras and sensors. Systems must make decisions quickly, not in large batches. In such systems, the FPGA aggregates and preprocesses sensor data, co-processes it, and feeds it to a GPU or CPU that handles higher-level decision making.

The FPGA’s value comes from low latency and parallelism, not from replacing the AI accelerator.

As AI spreads beyond data centers into edge systems, robotics, vehicles, and infrastructure, not every function justifies a custom accelerator. Some workloads demand fast boot times, microamp-level standby currents, flexible I/O, and deterministic response. These attributes do not scale neatly with advanced process nodes.

Large FPGAs deliver massive connectivity and density at high power. Mid-range FPGAs strike a balance between performance and efficiency. Small FPGAs, often millimeters in size, sit closest to sensors, motors, and interfaces.

“Today, the latter [small and mid-range FPGAs] support gigabit-class connectivity, fast boot times, and in-field programmability, making them suitable for battery-operated systems for long-term frequent power cycling,” Elashmawi said. “In such environments, small power savings accumulate into system-level gains.”

The same pattern appears in robotics. Modern warehouse robots and emerging humanoid platforms require precise, low-latency control of dozens of motors under tight power constraints. FPGAs placed close to actuators handle control and co-processing, while higher-level AI runs elsewhere.

At the far edge, FPGAs can also run inference. Elashmawi pointed to deployments in industrial displays, laptops, and factory equipment, where the device operates in milliwatts and remains always on. Presence detection, safety checks, and defect sorting do not require large models or high throughput, but they depend on continuous operation and predictable response times.

Training and large-scale inference remain GPU-led and power-hungry by design. However, the AI boom has expanded the system around the accelerator. As AI moves from racks into vehicles, factories, robots, and infrastructure, latency, power, and system overheads matter as much as raw compute. In those spaces, the FPGA’s role narrows relative to the GPU’s, but it becomes harder to replace.

GPUs will continue to dominate training and large-scale inference. ASICs will continue to chase efficiency at scale. The debate does not hinge on whether FPGAs can match GPUs at peak performance. It centers on whether AI systems can function efficiently without the low-power, low-latency logic that operates quietly in between.

Even at the high end, developers increasingly deploy GPUs as part of tightly coupled systems rather than as standalone accelerators. The industry now prioritizes integration over brute-force scaling. Nvidia’s decision to partner with Intel on tightly coupled x86–GPU systems linked by NVLink reflects that same architectural shift, reducing data movement overheads and integrating compute more closely with the rest of the system. Even though the announcement did not mention reconfigurable logic, it shows that AI performance depends on how components connect, not just on peak accelerator throughput.

As AI drives semiconductor demand higher while tightening the economics of fixed-function design, the FPGA’s role looks less glamorous but more foundational.

易IC库存管理软件

上一篇（CES 2026物理AI诞生之年）

下一篇（IoT Tech Expo探索边缘AI与物联网的现实融合）