Skip to content

Red-Hat-AI-Innovation-Team/its_hub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

286 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

its-hub: A Python library for inference-time scaling

Tests codecov PyPI version

its_hub is a Python library for inference-time scaling of LLMs, focusing on mathematical reasoning tasks.

its_hub_algorithms.mp4

๐Ÿ“š Documentation

For comprehensive documentation, including installation guides, tutorials, and API reference, visit:

https://ai-innovation.team/its_hub

Installation

its_hub provides a minimal core focused on algorithms, with optional language model implementations.

Core Installation (Algorithms Only)

For gateway integration - just algorithms and interfaces, minimal dependencies:

pip install its_hub

This includes:

  • โœ“ Self-Consistency and Best-of-N algorithms
  • โœ“ Abstract base classes (AbstractLanguageModel, AbstractOutcomeRewardModel)
  • โœ“ Only 2 dependencies: numpy, typing-extensions

With Language Model Support

For standalone use - includes OpenAI-compatible language model implementation:

pip install its_hub[lm]

Adds: OpenAICompatibleLanguageModel, LLMJudge, StepGeneration (requires openai, aiohttp, backoff)

With Experimental Algorithms

For experimental features - includes beam search and particle filtering:

pip install its_hub[experimental]

Adds: Process reward models, beam search, particle filtering algorithms

Development Installation

git clone https://github.com/Red-Hat-AI-Innovation-Team/its_hub.git
cd its_hub
pip install -e ".[dev]"
# or using uv:
uv sync --extra dev

Quick Start

Example 1: Gateway Integration (Core Installation)

Installation required: pip install its_hub (core only, minimal dependencies)

Gateway integration requires implementing two interfaces: AbstractLanguageModel for LM calls and AbstractOrchestrator for managing parallel execution with concurrency control and rate limiting.

import asyncio

from its_hub import AbstractLanguageModel, AbstractOrchestrator, SelfConsistency

# Step 1: Implement AbstractLanguageModel with your gateway's LM client
class MyGatewayLM(AbstractLanguageModel):
    def __init__(self, gateway_client):
        self.client = gateway_client

    async def agenerate_single(self, messages, stop=None, **kwargs):
        response = await self.client.generate(messages, stop=stop, **kwargs)
        return {"role": "assistant", "content": response}

# Step 2: Implement AbstractOrchestrator for concurrency control
# (or use the built-in LMOrchestrator from its_hub[lm])
class MyGatewayOrchestrator(AbstractOrchestrator):
    async def agenerate(self, lm, messages_lst, **kwargs):
        # Manage parallel calls with your gateway's rate limits
        ...

async def main():
    lm = MyGatewayLM(your_gateway_client)
    orchestrator = MyGatewayOrchestrator()
    algorithm = SelfConsistency(orchestrator=orchestrator)
    result = await algorithm.ainfer(lm, "What is 2+2?", budget=5)
    print(result)  # {"role": "assistant", "content": "4", ...}

asyncio.run(main())

The AbstractOrchestrator is the central coordination point โ€” it controls how algorithms fan out parallel LM calls, enforces rate limits, and provides structured error handling. See Orchestration for details.

Example 2: Standalone Use with OpenAI-Compatible LM

Installation required: pip install its_hub[lm]

import asyncio

from its_hub import OpenAICompatibleLanguageModel, SelfConsistency

lm = OpenAICompatibleLanguageModel(
    endpoint="https://api.openai.com/v1",
    api_key="your-api-key",
    model_name="gpt-4o-mini",
)

algorithm = SelfConsistency()
result = algorithm.infer(lm, "What is the capital of France?", budget=3)
print(result)  # Most common answer from 3 generations

# Close lm for resource cleanup
asyncio.run(lm.close())

Example 3: Best-of-N with LLM Judge

Installation required: pip install its_hub[lm]

import asyncio

from its_hub import BestOfN, LLMJudge, OpenAICompatibleLanguageModel

lm = OpenAICompatibleLanguageModel(
    endpoint="https://api.openai.com/v1",
    api_key="your-api-key",
    model_name="gpt-4o-mini",
)

judge = LLMJudge(lm=lm, fallback_score=5.0)
algorithm = BestOfN(orm=judge)
result = algorithm.infer(lm, "Write a sorting function", budget=5)
print(result)  # Best response as judged by LLM

# Close lm for resource cleanup
asyncio.run(lm.close())

Key Features

  • ๐Ÿ”ฌ Multiple Algorithms: Self-Consistency, Best-of-N, Beam Search (experimental), Particle Filtering (experimental)
  • ๐Ÿš€ Gateway Integration: Clean abstractions (AbstractLanguageModel, AbstractOrchestrator) for easy integration with AI gateways
  • ๐Ÿ”„ Orchestration: AbstractOrchestrator provides structured concurrency, rate limiting, and error propagation for parallel LM calls โ€” essential for production gateway deployments
  • ๐Ÿงฎ Math-Optimized: Built for mathematical reasoning tasks
  • โšก Async-First: ainfer() is the primary method; infer() is a sync wrapper. Concurrent generation with limits and error handling
  • ๐ŸŽฏ Minimal Core: Only 2 dependencies (numpy, typing-extensions) for core install

Coding Agent Plugin

its-hub is available as a plugin for two coding agents, bringing inference-time scaling directly into your coding workflow.

Claude Code

Via org marketplace (recommended โ€” includes all Red Hat AI plugins):

/plugin marketplace add Red-Hat-AI-Innovation-Team/plugins
/plugin install its-hub@Red-Hat-AI-Innovation-Team/plugins

Via this repo directly:

/plugin marketplace add Red-Hat-AI-Innovation-Team/its_hub
/plugin install its-hub@Red-Hat-AI-Innovation-Team/its_hub

From a local clone:

git clone https://github.com/Red-Hat-AI-Innovation-Team/its_hub.git
/plugin marketplace add /path/to/its_hub
Codex CLI
codex plugin marketplace add Red-Hat-AI-Innovation-Team/plugins

Then install the plugin from the marketplace. See .codex-plugin/INSTALL.md for manual installation.

After Installing

Invoke the setup-guide skill to configure your model endpoint and algorithm.

Skill Description
setup-guide Guided first-time configuration
inference-scaling Run inference-time scaling on a single prompt
batch-scaling Batch scaling from a JSONL/CSV/TXT file

For detailed documentation, visit: https://ai-innovation.team/its_hub

About

A Python library for inference-time scaling LLMs

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors