Implementing a PID Controller on FPGA: Fixed-Point Arithmetic, Anti-Windup, and Loop Bandwidth
Why Fixed-Point?
In the world of FPGAs, latency is the enemy of bandwidth. While modern FPGAs can handle floating-point arithmetic (IEEE 754), doing so requires significant logic resources and, more importantly, introduces deep pipelines. A floating-point multiplier might take 10–20 clock cycles to yield a result. At 125 MHz, that's over 100ns of latency just for one multiplication.
For a precision feedback loop—like locking a laser to a high-finesse cavity—we need the entire process (ADC → Filter → PID → DAC) to happen in as few cycles as possible. Fixed-point arithmetic allows us to perform additions and multiplications in 1–3 clock cycles with deterministic timing and minimal resource usage.
Fixed-Point Representation: The Q16.16 Format
We use a signed Q16.16 format for our gain parameters (, , ). This means:
- •Total width: 32 bits.
- •Integer part: 16 bits (including sign).
- •Fractional part: 16 bits.
This format provides a gain range of approximately with a resolution of . For most optical and microwave control loops, this dynamic range is the "sweet spot" between precision and headroom.
Proportional Term ()
The proportional path is the simplest but requires careful bit-growth management. When you multiply a 14-bit error signal (from the ADC) by a 32-bit gain (), the result is 46 bits.
// Proportional path logic
assign p_term_full = signed_error * reg_kp;
assign p_term = p_term_full >>> 16; // Shift back to align with fractional partWe immediately truncate or round the lower 16 bits to keep the signal aligned for the final summation.
Integral Term and Anti-Windup
The integrator is the "memory" of the controller, but it is also a liability. If the system is far from the setpoint (e.g., the laser is blocked), the error signal remains large, and the integrator will continue to grow until it saturates its internal registers. This is Integrator Windup. When the laser is unblocked, the controller will be "stuck" at the rail for a long time while the integrator unwinds, causing a massive overshoot.
We implement Conditional Integration (Clamping) to prevent this:
- •Check for saturation: Is the total output already at the DAC limit?
- •Check for direction: Is the error signal trying to push the integrator further into saturation?
- •Inhibit: If both are true, we stop the integration for that clock cycle.
// Anti-windup logic snippet
always_ff @(posedge clk) begin
if (!output_saturated || (error_sign != output_sign)) begin
i_state <= i_state + (signed_error * reg_ki);
end
endOutput Clamping and DAC Mapping
The final stage of the PID is the summation: . The result is a high-width internal signal (typically 48 or 64 bits to prevent intermediate overflows). Before sending this to the DAC, we must clamp it to the DAC's native resolution.
For the Red Pitaya's AD9767, this is a 14-bit signed range: -8192 to +8191.
Failure to clamp properly will lead to integer wrap-around, where a maximum positive output suddenly becomes a maximum negative output—a catastrophic failure mode for any feedback loop.
Pipeline Latency: Digital vs. Physical
In our SystemVerilog implementation, the total digital "latency budget" is strictly monitored:
- •Multiplier stage: 2 cycles.
- •Accumulator stage: 1 cycle.
- •Saturation and Clamping: 1 cycle.
At 125 MHz, the digital processing logic contributes only 32ns of delay. However, it is important to distinguish between this digital latency and the physical latency of the entire DSP chain.
The dominant delays in any real-world feedback loop are often governed by physics:
- •ADC/DAC Conversion: The time required for the actual silicon to sample and reconstruct the analog voltage.
- •Group Delay: The inherent delay introduced by anti-aliasing filters and the decimation process (e.g., our 6-stage CIC filter).
While these physical delays are constrained by hardware specifications and the laws of signal processing, minimizing the digital overhead to the nanosecond level is critical. By solving the digital latency problem on the FPGA, we ensure that the controller logic itself is no longer the "weak link" that drags down the performance of the physical system.
Measuring Loop Bandwidth
The gold standard for verifying your PID implementation is the Swept Sine Injection. We use the FPGA's built-in DDS to inject a small perturbation into the feedback loop and measure the response at various frequencies.
By plotting the magnitude and phase, we can identify the Unity Gain Frequency (UGF) and the Phase Margin. A typical loop running on our architecture on a Red Pitaya can achieve stable UGFs in the 500 kHz to 2 MHz range, depending on the plant physics.
Summary
Implementing a PID on an FPGA is an exercise in managing bits and time. By choosing a fixed-point architecture and implementing robust anti-windup logic, you can build a controller that pushes the digital processing delay to the theoretical minimum.
While we cannot rewrite the laws of physics that govern analog conversion and filter group delay, an FPGA-based approach ensures that your digital infrastructure is fast enough to let the physical system reach its true potential.