AI Composer is a tool for generating verified implementations from documentation and CVL specifications.
You will need at least Python 3.12, Docker (with compose), and a Claude API key, and the ability to build
the documentation (see here),
and a working, local installation of the prover (see here). The Claude API Key should be in your
environment under ANTHROPIC_API_KEY. You will also need a suitably recent version of uv.
You will need to provision the various Postgres databases used by AI Composer. Do this as follows:
- cd into
scripts/ - run
docker compose create && docker compose start. This will initialize a local postgres database. NB: no attempt has been made to ensure this database is secure; caveat emptor
NB: You will need to restart this docker image each time your host computer restarts, unless you adjust the restart policy.
You will need to build the local RAG database used for CVL manual searches by the LLM:
- Run
./gen_docs.shto build the HTML documentation intoprover-docs/ - Run
./populate_rag.shto populate the standardrag_db
The sanity analyzer requires additional prover documentation beyond the CVL manual:
- Run
./gen_docs.sh(if not already done for the base RAG setup) - Run
./populate_extended_rag.shto populate theextended_rag_db
Note: The cex-analyzer and AI Composer use the standard rag_db (CVL-only), while sanity-analyzer defaults to extended_rag_db (CVL + prover docs). You can override this with the --rag-db flag if needed.
From the root of the Certora Prover repo, run ./gradlew copy-assets. Ensure that your CERTORA environment
variable is configured to point to the output of this build (CertoraProver/target)
Install the requirements for AI Composer via uv sync --group ml. You may do this in
a virtual environment, and in such case you also need to install the dependencies for the certora-cli:
uv pip install -r certora_cli_requirements.txt from the CertoraProver/scripts folder, and optionally the Solidity compiler, if none is
available system-wide. Also be sure to activate this new virtual environment each time you want to run AI Composer.
AI Composer assumes that the solidity compiler is available on your $PATH and follows the naming convention solcX.Y, where X and Y
are taken from the Solidity version numbers: 0.X.Y. For example, to make solc version 0.8.29 available to AI Composer, you must ensure
that an executable solc8.29 is somewhere on your path. Currently the LLM is prompted to use solidity version 0.8.29 but you can adjust
the prompts as needed.
AI Composer is primarily a command line tool, with some more graphical debugging utilities available for use.
Once you have completed the above setup, you can run AI Composer via:
python3 ./main.py cvl_input.spec interface_file.sol system_doc.txt
Where cvl_input.spec is the CVL specification the implementation must conform to, interface_file.sol
contains an interface definition which the generated contract must implement, and system_doc.txt
is a text file containing a description of the overall system (defining key concepts, etc.)
AI Composer will iterate some number of times while it attempts to generate code. This process is semi automatic; AI Composer may ask for help via the human in the loop tool, propose spec changes, or ask for requirement relaxation. It is recommended that you "babysit" the process as it runs.
A basic trace of what the tool is doing is displayed to stdout. You can enable --debug to see very verbose output, but
more friendly debugging options are described below.
Once generation is completed, the generated sources and the LLM commentary is dumped to stdout.
A few options can help tweak your experience:
--prover-capture-output falsewill have the prover runs invoked by the AI Composer print its output to stdout/stderr instead of being captured--prover-keep-folderswill print the temporary directories used for the prover runs, and not clean them up--debug-prompt-override PROMPTwill append whatever text you provide inPROMPTto the initial prompt. Useful for instructing the LLM to do different things--tokens THow many tokens to sample from the LLM. This needs to be relatively high due to the amount of code that needs to be generated--thinking-tokens Thow many tokens of the overall token budget should be used for thinking--modelThe name of the Anthropic model to use for the task. Defaults to sonnet--thread-idand--checkpoint-idare used for resuming workflows that crash or need tweaking (see below)--summarization-thresholdenables the summarization of older messages after a certain threshold
The --thread-id and --checkpoint-id options allow you to resume AI Composer execution from a specific point in time. Together, these identifiers describe a checkpoint in the execution history where the workflow can be resumed.
Thread ID: Identifies a specific execution session of AI Composer. This is displayed early in the output when starting a workflow:
Selected thread id: crypto_session_6511ace2-cfbf-11f0-aeb6-e8cf83d12a2d
Checkpoint ID: Identifies a specific point within that session. This is displayed throughout execution as the workflow progresses:
current checkpoint: 1f0cfbf9-bbd9-6365-8001-90d0fca3dbdf
To resume from a specific checkpoint, provide both identifiers:
python3 ./main.py --thread-id crypto_session_6511ace2-cfbf-11f0-aeb6-e8cf83d12a2d --checkpoint-id 1f0cfbf9-bbd9-6365-8001-90d0fca3dbdf cvl_input.spec interface_file.sol system_doc.txt
This will restart execution from exactly that point in the workflow. NB the checkpoint ID does not need to be the most recent; you can "time travel" if you decide you dislike a decision you made previously.
During execution, you can pause the current workflow by sending SIGINT (usually by hitting Ctrl+C). Once the workflow reaches a point of quiescence, you will be dropped into the "Debug Console". This console allows you to explore the current state of the implementation, and review the entire message history. You can also use this console to provide explicit guidance; this guidance is echoed to the LLM verbatim.
The message history does NOT preserve messages across summarization boundaries.
After completion of a session, if you wish to see a visualization of the entire process you can use the traceDump.py script.
The basic usage is:
python3 scripts/traceDump.py thread-id conn-string out-file
Where thread-id is the thread ID for the session you wish to visualize. conn-string is the PostgreSQL string for connecting to the audit database, this should be
postgresql://audit_db_user:audit_db_password@localhost:5432/audit_db unless you have changed where audit data is stored. out-file is the name of an HTML file into
which the visual will be dumped.
To get the final deliverable from AI Composer, use the VFS materializer like so:
python3 ./resume.py materialize thread-id path
where thread-id is the thread ID of the session whose output you wish to view, and path the path to a directory into which the resulting VFS is dumped.
Once AI Composer finishes generation, you can refine/adjust the specification and resume generation, seeding the process with the output of a prior session. This is referred to as "meta-iteration".
Meta iteration can be done in one of two ways:
- use
materializecommand ofresume.py(described above) to materialize the result of a prior run into a folder, arbitrarily changing the contents of that folder, and then using theresume-dircommand ofresume.py, OR - using
resume-idwith the thread ID of a completed run and passing in an updated specification file
In the former case, the invocation looks like this:
python3 resume.py resume-dir thread-id path
Here thread-id is the thread ID of the workflow whose contents were materialized into path, the directory containing the changed
project files.
In the latter case, the invocation is:
python3 resume.py resume-id thread-id new-spec
where thread-id is the thread ID of the workflow on which you want to iterate, and new-spec is the path
to the updated/refined spec file to use for the next iteration.
AI Composer is a research prototype released by Certora Labs. The code generated by AI Composer should not be placed into production without thorough vetting/testing/auditing.