gepa-viz turns prompt optimization into something builders can actually inspect

GitHub README capture for modaic-ai/gepa-viz

Most prompt-optimization tooling still behaves like a black box. You run an experiment, wait for scores to settle, and eventually get a "better" prompt back, but you do not really see how the search unfolded. Which branches were promising? Which reflections actually improved the result? Which candidates died early, and why? gepa-viz is interesting because it treats that hidden loop as a product surface. Instead of collapsing everything into one final metric, it lets you watch prompt evolution happen in real time.

What the project actually does

According to the README, gepa-viz is a live visualization layer for GEPA prompt-optimization runs. It renders the candidate tree as a force-directed graph, then streams updates while the optimization loop is still running. Accepted candidates appear as larger donut-shaped nodes with per-example validation outcomes around the ring. Rejected proposals show up as small gray nodes. Clicking into a node opens a detail view with the candidate prompt, a diff against its parent, reflection minibatch results, and a clickable Pareto frontier.

That framing matters. A lot of optimization tooling tells you what won, but not how the system explored the search space. gepa-viz makes the exploration itself inspectable. For anyone tuning prompts, reward signals, or evaluation sets, that is usually the more valuable view.

Why this feels product-minded

The best part of the repo is that it does not assume every optimization run happens in the same environment. The project supports three distinct modes: embedded, remote, and static.

In embedded mode, a Python context manager starts the viewer for you and streams the run locally while the experiment executes. In remote mode, the callback posts snapshots to a standalone server so the optimizer can run somewhere else and the browser UI can stay on your own machine. In static mode, the tool just writes a run.json artifact that you can reopen later.

That is strong product thinking. Instead of forcing users into one workflow, the repo acknowledges the real situations builders run into: local iteration, remote compute, and offline artifact review. The result is not just a visualization demo. It is a tool that can fit into actual research and engineering loops.

The implementation choices are more thoughtful than they look

I like that the end-user path stays small. The README shows a simple pip install gepa-viz, and the packaged wheel ships the prebuilt SPA inside the Python package. That means users do not need to install Node just to inspect a run. The browser UI is still modern — Vite, React Router, D3, Tailwind — but the runtime story for users stays lightweight.

That split is exactly the kind of decision more developer tools should make. Fancy frontend, boring install. The repo keeps the complexity where it belongs: in the build pipeline, not in the everyday user experience.

There is also a nice systems detail in the transport model. For live mode, the callback can POST run state to /ingest, and the server fans updates out to browsers over SSE. That is a very sensible choice for this kind of one-way streaming UI. It is simpler than overengineering everything around WebSockets, and it matches the product need well: the run emits updates, the dashboard watches them, and the user clicks around the evolving graph.

Why the visualization itself matters

What makes gepa-viz stand out is not just that it draws a graph. It is that the graph encodes the right questions.

The large accepted nodes show per-example outcomes, so two prompts with the same aggregate score can still look meaningfully different. One candidate might succeed on a different slice of the validation set than another. That is exactly the kind of information that gets lost when people optimize toward a single number and stop there.

The rejected gray nodes are smart too. A lot of tools only show the winning path, which creates the illusion that optimization was cleaner than it really was. gepa-viz keeps the failed branches visible, along with the feedback that produced them. That is useful because failure paths often teach you more about evaluation quality, prompt brittleness, and search behavior than the final winner does.

The prompt-diff view is probably my favorite detail. When a system evolves prompts iteratively, the important question is not just "what is the current prompt?" It is "what changed between this step and the previous one, and did that change actually help?" gepa-viz brings that into the interface directly instead of making users compare blobs of text by hand.

Why builders should care even if they do not use GEPA

Even if someone never touches GEPA itself, this repo points at a broader lesson for AI tooling: optimization and agent behavior should be inspectable by default. Too many systems still ask users to trust a benchmark, a leaderboard, or a single final prompt without surfacing the path that produced it.

That is not great product design. When people are tuning prompts or search loops, they need more than an output. They need to understand where the system explored, where it got stuck, and which feedback signals were actually productive. gepa-viz gets that. It treats observability as part of the core workflow rather than an afterthought for power users.

I think that instinct will age well. As more AI products move from one-shot prompting into iterative search, self-reflection, ranking, and evaluation loops, the tools that win will not just generate better outputs. They will make the search process easier for humans to understand.

The limits are pretty clear too

This is still a focused tool, not a general observability platform for every prompt workflow. It is tied to the GEPA data model, and the value depends on the quality of the evaluation examples and feedback flowing through the run. If the benchmark is weak, the visualization will still make the run easier to inspect, but it cannot rescue a bad evaluation setup.

That said, I actually like that the scope is narrow. The repo is not trying to solve every AI tracing problem at once. It is solving one concrete job well: helping people see how prompt optimization evolves over time.

Why this repo stood out to me

The deeper idea here is simple: prompt search should feel inspectable, not mystical. gepa-viz takes a workflow that could have stayed buried in logs and turns it into a live interface with enough structure to support real judgment. That is much more useful than another optimization paper demo that only shows the final graph in a static screenshot.

For builders working on agent systems, prompt tuning, or evaluation-heavy AI products, that is a valuable direction. The more optimization becomes part of the product loop, the more important it is to show the work behind the score.

GitHub: https://github.com/modaic-ai/gepa-viz

gepa-viz turns prompt optimization into something builders can actually inspect

Nguyen Duc Tuan Minh

What the project actually does

Why this feels product-minded

The implementation choices are more thoughtful than they look

Why the visualization itself matters

Why builders should care even if they do not use GEPA

The limits are pretty clear too

Why this repo stood out to me

Community

Get involved in our community. Everyone is welcome!

Discord

Join our Discord server to get in touch with the community.

GitHub

Contribute to the project on GitHub or create an issue.

Crowdin

Help us translate the app into your language.

SimpMusic is sponsored by:

Want to advertise on SimpMusic?