At Software Planet Group, we continue to invest in the development of bespoke tools aimed at streamlining the work of IT professionals. Our recent advances in integrating large language models (LLMs) with custom toolchains may be of particular interest to our colleagues and clients.
Moving Away from Ollama
Initially, we placed high hopes on Ollama’s open model platform, especially its support for tool use. Unfortunately, all tested models with tool integration demonstrated a critical lack of stability in generating well-structured, JSON-based requests to external tools.
After a series of experiments, our engineers made the decision to switch to the most affordable alternative within the GPT-4.1 family—GPT-4.1 nano, currently priced at $0.40 per million tokens. This doesn’t mean Ollama is off the table permanently. Rather, we believe open models still require substantial fine-tuning to serve reliably in production-grade tool integration tasks.
Using GPT-4.1 nano for Tool Invocation
The shift to GPT-4.1 nano yielded significant improvements in interacting with tools, though it still doesn’t guarantee flawless performance. Common pain points include prompt misinterpretation, confusion around parameter naming, and incorrect data formatting.
To address this, we’ve employed lightweight intermediary layers between the LLM and the tools. These help detect missing or malformed arguments and can either auto-correct the inputs or re-invoke the model when necessary. This kind of error-handling strategy is becoming increasingly standard in systems built around LLM-based automation.
Choosing the Right File Edit Format
As tool integration matures, model developers are placing greater emphasis on file-editing capabilities. OpenAI, for instance, has proposed a standard patch format for this purpose and is actively training its models to use it:
OpenAI Prompt Cookbook – GPT-4.1 Patch Format
While we tested this standard extensively, our internal benchmarks showed no substantial improvements in diff generation quality. Consequently, we made the strategic decision to define our own patch format that better aligns with the real-world structure of our development pipelines.
Breaking Out of Isolation: The Data Freshness Problem
A recurring challenge for any production-grade LLM is data staleness. This becomes especially problematic in fast-moving domains like software development, where even minor changes—such as a renamed CLI parameter—can cause the model to revert to outdated references, wasting both time and budget.
To combat this, we’re actively integrating internal tooling with up-to-date external information sources. This approach does raise potential concerns around data privacy and security. To mitigate those risks, we’ve begun building anonymising gateways that act as specialised privacy filters between our internal systems and the wider internet.
Looking Ahead: Keeping Pace with AI Innovation
It’s difficult to overstate the pace at which AI is evolving. Every day brings new possibilities for practical applications, from smarter automation to deeper integration with existing workflows. At Software Planet Group, we firmly believe that those who innovate early gain a crucial edge.
That’s why we remain committed to evaluating and adopting cutting-edge AI solutions that enhance the efficiency and quality of our project work. If you share our enthusiasm for innovation, we invite you to join the conversation.