Devstral 2507 with Goose

2025-07-29

Introduction

Last weekend, I experimented with Devstral Small 1.1, aka devstral-small-2507, to test out its capabilities with tool calling and code generation. I also wanted to test its performance through Goose, which I've been using recently at work.

While researching how to get everything working, I stumbled across Angie Jones's analysis on devstral-small-2505: Is devstral really agent friendly?, and her 3 Prompts to Test for Agent Readiness. These set me up to try out the experiments myself on my own machine.

(disclaimer: I am currently employed by Block, this article is my personal investigation and opinion.)

A diagram showing a User interacting with Goose, acting as the agent, delegating to LM Studio as the LLM host, and using local and remote tools. — Goose, LM Studio, Devstral Small 2507 and local tools can operate fully offline.

Setup

I got started with the following configuation:

Goose Version 1.1.4
- Configured to use a local "Ollama" provider (actually LM Studio)
- Developer extension enabled
Devstral Small 1.1 (aka devstral-small-2507), 4-bit quant
- Loaded with 12294-token Context Length
- Loaded with 0.2 temperature
LM Studio Version 0.3.20 (devstral-small-2507 recommendations LM Studio 0.3.18 or later for tool calling)

Tool Call parsing issue

During these experiments, I encountered an issue affecting devstral-small-2507, where the tool call tokens are not correctly handled by the LLM host and are returned as part of the content. This issue appears to affect some Ollama, LM Studio and llama.cpp versions. It was suppsedly fixed in LM Studio 0.3.20, but still appears occasionally when the inference response contains both message content and a tool call. For example:

"message": {
  "role": "assistant",
  "content": "Let me try that again using the correct approach:[TOOL_CALLS]developer__text_editor[ARGS]{\"command\": \"write\", \"path\": \"/Users/tass/workspace/loose-goose/tool-test.txt\", \"file_text\": \"Hello World\"}"
}

as opposed to the expected:

"message": {
  "role": "assistant",
  "tool_calls": [{
    "type": "function",
    "function": {
      "name": "developer__text_editor",
      "arguments": "{\"command\": \"write\", \"path\": \"/Users/tass/workspace/loose-goose/tool-test.txt\", \"file_text\": \"Hello World\"}"
    }
  }]
}

To work around this issue, I had to open up new conversations with Goose when encountering problems, instead of allowing devstral to respond and try and course-correct.

Prompt 1: Tool Calling

The Tool Calling prompt invites Goose to use the text editor tool in the developer extenion (aka developer__text_editor) to write to a file. Due to the tool call parsing issue, I had to experiment with this a few times before getting it to work properly.

Attempt #1

user:

Create a file at ~/workspace/loose-goose/tool-test.txt with the contents "Hello World". Use the write tool. Do not ask for confirmation. Just do it.

goose tool call:

developer__text_editor: { "command": "write", "file_text": "Hello World", "path": "/Users/tass/workspace/loose-goose/tool-test.txt" } {"error":"Execution failed: Failed to write file: No such file or directory (os error 2)"}

goose:

Let me create the directory first and then write the file:[TOOL_CALLS]developer__shell[ARGS]{"command": "mkdir -p /Users/tass/workspace/loose-goose"}

Conclusion: Failure ❌

The tool call succeeded but file couldn't be created, unable to make follow-up tool call to mkdir. At this point, I created the ~/workspace/loose-goose/directory manually to simplify the test.

Attempt #2

user:

Create a file at ~/workspace/loose-goose/tool-test.txt with the contents "Hello World". Use the write tool. Do not ask for confirmation. Just do it.

goose tool call:

text_editor: { "command": "write", "file_text": "Hello World", "path": "/Users/tass/workspace/loose-goose/tool-test.txt" } {"error":"Execution failed: Execution failed: Execution failed: Tool not found: text_editor"}

Conclusion: Failure ❌

Tool call failed due to incorrect name (text_editor instead of developer__text_editor)

Attempt #3

user:

Create a file at ~/workspace/loose-goose/tool-test.txt with the contents "Hello World". Use the write tool. Do not ask for confirmation. Just do it.

goose tool call:

developer__text_editor: { "command": "write", "file_text": "Hello World", "path": "/Users/tass/workspace/loose-goose/tool-test.txt" } {"type":"text","text":"Successfully wrote to /Users/tass/workspace/loose-goose/tool-test.txt" ...}

goose:

The file has been created at `/Users/tass/workspace/loose-goose/tool-test.txt` with the contents \"Hello World\"."

Conclusion: Success ✅

I confirmed that the file was present at the location, and included the text "Hello World".

Prompt 2: Memory Awareness

The Memory Awareness prompt invites Goose to use the ongoing conversation context to keep track of the current activity and run follow-up tool calls.

Attempt #1

user:

Now append a new line that says: "I know what I'm doing"

goose tool call:

developer__text_editor: { "command": "str_replace", "new_str":"Hello World\nI know what I'm doing", "old_str":"Hello World", "path": "/Users/tass/workspace/loose-goose/tool-test.txt" } {"type":"text","text":"The file /Users/tass/workspace/loose-goose/tool-test.txt has been edited, (...)" ...}

goose:

The file has been updated to: Hello World I know what I'm doing Would you like to make any other changes or perform additional actions?

Conclusion: Success ✅

I wasn't expecting it to do a full replacement of the text file, but it worked.

Prompt 3: File System Reasoning

A combination of the above, the File System Reasoning prompt invites Goose to review the file we created, even though we are referring to the file name without the path.

Attempt #1

user:

What is the current content of tool-test.txt?

goose tool call:

developer__text_editor: {"command":"view","path":"/Users/tass/workspace/loose-goose/tool-test.txt"} {"type":"resource","resource":{"uri":"file:///Users/tass/workspace/loose-goose/tool-test.txt","mime_type":"text","text":"Hello World\nI know what I'm doing\n"} ...}

goose:

The current content of `tool-test.txt` is: Hello World I know what I'm doing

Conclusion: Success ✅

Perfect, no notes.

Observations

Goose prefixes interactions with a lot of system context. Even when no extensions / MCPs are loaded, the LLM is briefed on which extensions are available. This didn't seem to affect the tests, but I'll need to keep the Context Length a bit longer to compensate.
I'll continue using devstral-small-2507 on my machine, but I'll keep experimenting with different LLM hosts. The "Local Inference" section on the devstral-small-2507 Hugging Face card has some recommendations I'll try out next.
I'll keep investigating how to improve consistency of tool calls, especially in cases where Goose recovers from failure.

Feedback? Questions? Let's discuss on Mastodon, or email me