A pipelined VHDL implementation of posit division and square root operations with SIMD support, developed as part of a Master's thesis at Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU).
This repository contains the RTL design of a fully pipelined posit arithmetic division and square root unit. The design uses a non-restoring digit-recurrence algorithm to compute the quotient (or square root) one bit per clock cycle, producing results with guard/round/sticky (GRS) bits for correct rounding. An 8-lane SIMD wrapper instantiates parallel pipelines for vectorized throughput.
The accompanying thesis (104 pages) covers the conceptual design, algorithm selection rationale, microarchitecture, and FPGA evaluation in detail.
posit_div_root
┌──────────────────────────────────────────────────────────┐
│ ┌──────────┐ │
│ │ Input │ │
│ │ Registers │ │
│ └────┬─────┘ │
│ │ │
│ ┌────▼─────┐ ┌────────────┐ │
│ │ Decoder 1 │ │ Decoder 2 │ posit_decoder (×2) │
│ │ (unpack) │ │ (unpack) │ — sign, regime, exp, │
│ └────┬─────┘ └─────┬──────┘ frac, zero/NaR flags │
│ │ │ │
│ ┌────▼──────────────▼──┐ │
│ │ Sign & Scale Calc │ posit_sign_scale │
│ │ (result sign, │ — XOR signs (div) or pass │
│ │ scaling factor) │ (sqrt), combine regime+exp │
│ └────┬─────────────────┘ │
│ │ │
│ ┌────▼─────────────────┐ │
│ │ Division/Sqrt Core │ posit_div_root_calc │
│ │ (non-restoring │ — N-ES pipeline stages, │
│ │ digit recurrence) │ 1 quotient bit per cycle │
│ └────┬─────────────────┘ │
│ │ │
│ ┌────▼─────────────────┐ │
│ │ Frac Normalize │ posit_frac_normalize │
│ │ (leading-1 adjust) │ — shift fraction, adjust │
│ └────┬─────────────────┘ scaling factor │
│ │ │
│ ┌────▼─────────────────┐ │
│ │ Reconstruct │ posit_reconstruct │
│ │ (regime/exp/frac │ — rebuild posit encoding, │
│ │ packing + GRS │ GRS rounding, zero/NaR │
│ │ rounding) │ special-case handling │
│ └────┬─────────────────┘ │
│ │ │
│ ┌────▼─────┐ │
│ │ Output │ │
│ │ Register │ │
│ └──────────┘ │
└──────────────────────────────────────────────────────────┘
| Stage | Module | Function | Latency |
|---|---|---|---|
| Input registration | posit_div_root |
Registers raw posit inputs and i_op_type |
1 cycle |
| Decode | posit_decoder (×2) |
Extracts sign, regime (via priority_encoder), exponent, fraction; detects zero/NaR |
Combinational |
| Sign & scale | posit_sign_scale |
Computes result sign (XOR for division, passthrough for sqrt) and combined scaling factor (regime ∥ exp) |
Combinational |
| Division/sqrt core | posit_div_root_calc |
Non-restoring digit recurrence producing one quotient bit per cycle with GRS bits | 1 + (N-1-ES-2) + 2 cycles |
| Normalize | posit_frac_normalize |
Corrects leading-one position, adjusts scaling factor | Combinational |
| Reconstruct | posit_reconstruct |
Packs regime/exponent/fraction into posit encoding with GRS rounding; handles zero, NaR, and sqrt-of-negative | Combinational |
| Output registration | posit_div_root |
Registers final result | 1 cycle |
Total pipeline latency: N - ES + 2 clock cycles, where N = G_DATA_WIDTH and ES = G_EXP_WIDTH.
Examples: 9 cycles for posit(8,1), 17 cycles for posit(16,1).
- Division:
i_op_type = '1'— computesi_posit1_raw / i_posit2_raw - Square root:
i_op_type = '0'— computessqrt(i_posit1_raw)(second operand ignored)
Both operations share the same pipeline; the core adapts the divisor update logic based on i_op_type.
The top-level entity posit_div_root accepts two generics:
generic (
G_DATA_WIDTH : integer := 8; -- Posit bit width (tested with 8 and 16)
G_EXP_WIDTH : integer := 1 -- Exponent field width
);| Condition | Division result | Sqrt result |
|---|---|---|
| Operand is zero | 0 / x = 0 |
sqrt(0) = 0 |
| Divisor is zero | x / 0 = NaR |
— |
| Operand is NaR | NaR |
NaR |
| Negative operand | Follows sign rules | sqrt(neg) = NaR |
posit_div_root_simd instantiates G_N_OPERANDS (default: 8) parallel posit_div_root pipelines sharing a single clock, reset, and operation type. Input and output use the posit_array_t type defined in posit_pkg.vhd:
type posit_array_t is array (0 to PKG_N_OPERANDS - 1)
of std_logic_vector(PKG_DATA_WIDTH - 1 downto 0);Package constants PKG_DATA_WIDTH, PKG_EXP_WIDTH, and PKG_N_OPERANDS in posit_pkg.vhd configure the SIMD width and posit format globally.
posit-division-unit/
├── README.md
├── LICENSE Apache 2.0 (code)
├── LICENSE-THESIS CC BY 4.0 (thesis PDF)
├── CITATION.cff Machine-readable citation
│
├── rtl/
│ ├── posit_pkg.vhd Package: constants, types, clogb2()
│ ├── posit_div_root.vhd Top-level division/sqrt pipeline
│ ├── posit_div_root_calc.vhd Non-restoring digit-recurrence core
│ ├── posit_div_root_simd.vhd 8-lane SIMD wrapper
│ ├── posit_decoder.vhd Posit unpacking (sign, regime, exp, frac)
│ ├── posit_sign_scale.vhd Result sign and scaling factor computation
│ ├── posit_frac_normalize.vhd Fraction normalization
│ ├── posit_reconstruct.vhd Posit reconstruction with GRS rounding
│ ├── priority_encoder.vhd Leading-one detection for regime decoding
│ └── barrel_shifter.vhd Logarithmic barrel shifter
│
├── tb/
│ ├── posit_decoder_tb.vhd Decoder unit tests
│ ├── priority_decoder_tb.vhd Priority encoder unit tests
│ ├── posit_sign_scale_tb.vhd Sign/scale unit tests
│ ├── posit_div_root_calc_tb.vhd Division core unit tests
│ ├── posit_system_tb.vhd Full pipeline tests (8-bit, 8 test vectors)
│ └── posit_system_16_2_tb.vhd Full pipeline tests (16-bit, 14 test vectors)
│
├── constraints/
│ └── clk_constraint.xdc 125 MHz clock constraint (Xilinx)
│
├── thesis/
│ └── thesis.pdf Full 104-page thesis
│
└── docs/
├── architecture.md Pipeline architecture overview
└── modules.md Per-module interface documentation
Import the source files from rtl/ and testbenches from tb/ into a Vivado project. The constraint file targets a 125 MHz clock (8 ns period) on the i_clk port:
create_clock -name clk_100 -period 8 [get_ports i_clk]
# Analyze all source files (order matters for dependencies)
ghdl -a --std=08 rtl/posit_pkg.vhd
ghdl -a --std=08 rtl/priority_encoder.vhd
ghdl -a --std=08 rtl/barrel_shifter.vhd
ghdl -a --std=08 rtl/posit_decoder.vhd
ghdl -a --std=08 rtl/posit_sign_scale.vhd
ghdl -a --std=08 rtl/posit_div_root_calc.vhd
ghdl -a --std=08 rtl/posit_frac_normalize.vhd
ghdl -a --std=08 rtl/posit_reconstruct.vhd
ghdl -a --std=08 rtl/posit_div_root.vhd
ghdl -a --std=08 rtl/posit_div_root_simd.vhd
# Analyze and run a testbench
ghdl -a --std=08 tb/posit_system_tb.vhd
ghdl -e --std=08 posit_system_tb
ghdl -r --std=08 posit_system_tb --wave=posit_system_tb.ghwThe testbenches use VHDL assert statements and report messages. A passing run produces no assertion errors.
| Testbench | Configuration | Test vectors | Coverage |
|---|---|---|---|
posit_decoder_tb |
posit(8,1) | Decoder field extraction | Sign, regime, exponent, fraction unpacking |
priority_decoder_tb |
Generic | Leading-one detection | Priority encoder correctness |
posit_sign_scale_tb |
posit(8,1) | Sign/scale computation | Division sign XOR, sqrt passthrough, scaling factor |
posit_div_root_calc_tb |
posit(8,1) | Core division iterations | Quotient bits, GRS bits |
posit_system_tb |
posit(8,1) | 8 vectors | Division, sqrt, negative operands, fractional values |
posit_system_16_2_tb |
posit(16,1) | 14 vectors | Division, sqrt, large values, zero/NaR edge cases, sqrt(negative) |
The thesis provides detailed evaluation results including:
- Algorithm selection rationale comparing Goldschmidt, Newton-Raphson, and digit-recurrence methods (Chapter 5)
- FPGA resource utilization and timing analysis on Xilinx targets (Chapter 7)
- Accuracy analysis across the posit dynamic range (Chapter 7)
- Single-lane vs. SIMD throughput comparison (Chapter 7)
See thesis/thesis.pdf for the complete evaluation.
@mastersthesis{enescu2025posit,
author = {David Enescu},
title = {Conceptual Design, Implementation \& Evaluation of a Vectorized Posit Division Unit},
school = {Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg},
year = {2025},
type = {Master's Thesis}
}- Code (
rtl/,tb/,constraints/): Apache License 2.0 - Thesis (
thesis/): Creative Commons Attribution 4.0 International