`its-hub`: A Python library for inference-time scaling

its_hub is a Python library for inference-time scaling of LLMs, focusing on mathematical reasoning tasks.

its_hub_algorithms.mp4

📚 Documentation

For comprehensive documentation, including installation guides, tutorials, and API reference, visit:

https://ai-innovation.team/its_hub

Installation

its_hub provides a minimal core focused on algorithms, with optional language model implementations.

Core Installation (Algorithms Only)

For gateway integration - just algorithms and interfaces, minimal dependencies:

pip install its_hub

This includes:

✓ Self-Consistency and Best-of-N algorithms
✓ Abstract base classes (AbstractLanguageModel, AbstractOutcomeRewardModel)
✓ Only 2 dependencies: numpy, typing-extensions

With Language Model Support

For standalone use - includes OpenAI-compatible language model implementation:

pip install its_hub[lm]

Adds: OpenAICompatibleLanguageModel, LLMJudge, StepGeneration (requires openai, aiohttp, backoff)

With Experimental Algorithms

For experimental features - includes beam search and particle filtering:

pip install its_hub[experimental]

Adds: Process reward models, beam search, particle filtering algorithms

Development Installation

git clone https://github.com/Red-Hat-AI-Innovation-Team/its_hub.git
cd its_hub
pip install -e ".[dev]"
# or using uv:
uv sync --extra dev

Quick Start

Example 1: Gateway Integration (Core Installation)

Installation required: pip install its_hub (core only, minimal dependencies)

Gateway integration requires implementing two interfaces: AbstractLanguageModel for LM calls and AbstractOrchestrator for managing parallel execution with concurrency control and rate limiting.

import asyncio

from its_hub import AbstractLanguageModel, AbstractOrchestrator, SelfConsistency

# Step 1: Implement AbstractLanguageModel with your gateway's LM client
class MyGatewayLM(AbstractLanguageModel):
    def __init__(self, gateway_client):
        self.client = gateway_client

    async def agenerate_single(self, messages, stop=None, **kwargs):
        response = await self.client.generate(messages, stop=stop, **kwargs)
        return {"role": "assistant", "content": response}

# Step 2: Implement AbstractOrchestrator for concurrency control
# (or use the built-in LMOrchestrator from its_hub[lm])
class MyGatewayOrchestrator(AbstractOrchestrator):
    async def agenerate(self, lm, messages_lst, **kwargs):
        # Manage parallel calls with your gateway's rate limits
        ...

async def main():
    lm = MyGatewayLM(your_gateway_client)
    orchestrator = MyGatewayOrchestrator()
    algorithm = SelfConsistency(orchestrator=orchestrator)
    result = await algorithm.ainfer(lm, "What is 2+2?", budget=5)
    print(result)  # {"role": "assistant", "content": "4", ...}

asyncio.run(main())

The AbstractOrchestrator is the central coordination point — it controls how algorithms fan out parallel LM calls, enforces rate limits, and provides structured error handling. See Orchestration for details.

Example 2: Standalone Use with OpenAI-Compatible LM

Installation required: pip install its_hub[lm]

import asyncio

from its_hub import OpenAICompatibleLanguageModel, SelfConsistency

lm = OpenAICompatibleLanguageModel(
    endpoint="https://api.openai.com/v1",
    api_key="your-api-key",
    model_name="gpt-4o-mini",
)

algorithm = SelfConsistency()
result = algorithm.infer(lm, "What is the capital of France?", budget=3)
print(result)  # Most common answer from 3 generations

# Close lm for resource cleanup
asyncio.run(lm.close())

Example 3: Best-of-N with LLM Judge

Installation required: pip install its_hub[lm]

import asyncio

from its_hub import BestOfN, LLMJudge, OpenAICompatibleLanguageModel

lm = OpenAICompatibleLanguageModel(
    endpoint="https://api.openai.com/v1",
    api_key="your-api-key",
    model_name="gpt-4o-mini",
)

judge = LLMJudge(lm=lm, fallback_score=5.0)
algorithm = BestOfN(orm=judge)
result = algorithm.infer(lm, "Write a sorting function", budget=5)
print(result)  # Best response as judged by LLM

# Close lm for resource cleanup
asyncio.run(lm.close())

Key Features

🔬 Multiple Algorithms: Self-Consistency, Best-of-N, Beam Search (experimental), Particle Filtering (experimental)
🚀 Gateway Integration: Clean abstractions (AbstractLanguageModel, AbstractOrchestrator) for easy integration with AI gateways
🔄 Orchestration: AbstractOrchestrator provides structured concurrency, rate limiting, and error propagation for parallel LM calls — essential for production gateway deployments
🧮 Math-Optimized: Built for mathematical reasoning tasks
⚡ Async-First: ainfer() is the primary method; infer() is a sync wrapper. Concurrent generation with limits and error handling
🎯 Minimal Core: Only 2 dependencies (numpy, typing-extensions) for core install

Coding Agent Plugin

its-hub is available as a plugin for two coding agents, bringing inference-time scaling directly into your coding workflow.

Claude Code

Via org marketplace (recommended — includes all Red Hat AI plugins):

/plugin marketplace add Red-Hat-AI-Innovation-Team/plugins
/plugin install its-hub@Red-Hat-AI-Innovation-Team/plugins

Via this repo directly:

/plugin marketplace add Red-Hat-AI-Innovation-Team/its_hub
/plugin install its-hub@Red-Hat-AI-Innovation-Team/its_hub

From a local clone:

git clone https://github.com/Red-Hat-AI-Innovation-Team/its_hub.git
/plugin marketplace add /path/to/its_hub

Codex CLI

codex plugin marketplace add Red-Hat-AI-Innovation-Team/plugins

Then install the plugin from the marketplace. See .codex-plugin/INSTALL.md for manual installation.

After Installing

Invoke the setup-guide skill to configure your model endpoint and algorithm.

Skill	Description
`setup-guide`	Guided first-time configuration
`inference-scaling`	Run inference-time scaling on a single prompt
`batch-scaling`	Batch scaling from a JSONL/CSV/TXT file

For detailed documentation, visit: https://ai-innovation.team/its_hub

Name		Name	Last commit message	Last commit date
Latest commit History 286 Commits
.claude-plugin		.claude-plugin
.claude		.claude
.codex-plugin		.codex-plugin
.devcontainer		.devcontainer
.github		.github
benchmarking		benchmarking
docs		docs
eval		eval
examples		examples
its_hub		its_hub
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.jupytext.yml		.jupytext.yml
BREAKING_CHANGES.md		BREAKING_CHANGES.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
REFACTOR.md		REFACTOR.md
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`its-hub`: A Python library for inference-time scaling

📚 Documentation

Installation

Core Installation (Algorithms Only)

With Language Model Support

With Experimental Algorithms

Development Installation

Quick Start

Example 1: Gateway Integration (Core Installation)

Example 2: Standalone Use with OpenAI-Compatible LM

Example 3: Best-of-N with LLM Judge

Key Features

Coding Agent Plugin

After Installing

About

Uh oh!

Releases 22

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

its-hub: A Python library for inference-time scaling

📚 Documentation

Installation

Core Installation (Algorithms Only)

With Language Model Support

With Experimental Algorithms

Development Installation

Quick Start

Example 1: Gateway Integration (Core Installation)

Example 2: Standalone Use with OpenAI-Compatible LM

Example 3: Best-of-N with LLM Judge

Key Features

Coding Agent Plugin

After Installing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`its-hub`: A Python library for inference-time scaling

Packages