Automating automations

2026-04-04

Modern LLMs are effective at translating natural language into structured specifications. Node-RED flows are structured specifications. Put those together and you get an automation platform that can edit itself: describe what you want, and the platform writes and deploys the flow.

In this article, we build a Node-RED flow for building automations by emailing plain-text requirements to an LLM. The LLM can read and write Node-RED flows via the Admin API, so the automation platform can modify itself. The process breaks down into three stages:

  • Building a Node-RED flow that can exchange emails with an LLM
  • Extending the flow to handle multi-turn tool calls, including the Node-RED Admin API itself
  • Hardening it with concurrency gating and system prompt guidance

This article includes real Node-RED flows. Use the export controls to copy or download the "flows" JSON, and then import it into your Node-RED instancee.

Bonus: NotebookLM conversation

Just for fun, I loaded this article into NotebookLM to generate a "podcast" conversation. It's amusing, but don't read too much into it.

Communicating requirements via email.
A new automation that matches my requirements. The bulk of the work was handled by the LLM.

Technology choices

  • Email as the interface: it's asynchronous (so the agent can take as long as it needs to iterate on tool calls), threaded (so each subject line becomes its own conversation), and universally accessible without joining a specific chat platform.
    • with Gmail as the email server, for convenience. A self-hosted mail server would be even better for preserving privacy.
  • Node-RED as the automation platform: its flow-based model lets the agent compose automations from existing building blocks rather than writing everything from scratch, and it runs entirely locally.
  • LM Studio as the LLM harness: it exposes OpenAI-compatible endpoints and runs at reasonable speed on my M1 MacBook, keeping the whole system offline-friendly.
    • with openai/gpt-oss-20b (4-bit quants) as the LLM. In my tests it performed the best on looped tool calls and successfully building Node-RED flows.

Communicating with the LLM via email

The flow polls an email inbox, sends the message body to the LLM, and emails back the response.

 
Gmail as the email server, Node-RED as the automation platform, LM Studio as the LLM harness.

The main workflow

An inject node triggers an email inbox check every 20 seconds. If there's an unseen email from an approved sender, an email-reply-parser strips the reply chain, and the visible text is sent to the LLM. The LLM's response is converted from markdown to HTML and sent back as a reply.

This flow uses the OpenAI-compatible /v1/responses endpoint (see the LM Studio docs for a comparison with the other endpoints). It gives us:

  • Stateful chat: The harness persists the conversation server-side and returns a response identifier. Subsequent requests pass that identifier to continue the conversation without resending the whole context. The flow maps it to the email subject, so each thread tracks its own conversation.
  • Caller-side tool calling: The harness captures tool call requests and returns the function name and arguments for the caller to execute.

Node-RED workflow: Communicating with the LLM via email

The "LM studio POST" node handles LLM interactions as HTTP requests and responses.

Now we can communicate with our LLM via email:

A successful round-trip.

Adding tool calls

The main workflow only produces text responses. To let the LLM take actions, we need to declare tools, handle invocations, and feed the results back into the conversation.

"Tool call" is an overloaded term, so it's worth naming the two flavours:

  • Caller-side tools. The caller tells the LLM what tools it can execute on the LLM's behalf.
  • Harness-side tools. The LLM harness runs the tools itself and iterates on them before responding.

This workflow uses caller-side tools, hosted by the Node-RED flow.

Declaring the tools

Node-RED admin tools

The tools wrap the Node-RED admin methods, called via HTTP Request nodes.

Tool Purpose
get_flows Get all Node-RED flows
get_flow Get a single flow by its ID
post_flow Create a new flow
put_flow Update an existing flow
get_nodes List installed node modules and their types

Each tool's JSONSchema-formatted type signature goes into the LLM's system prompt. The LLM can only request a call that matches a declared signature.

A trimmed example from the LM Studio POST /v1/responses call body:

{
  "model": "openai/gpt-oss-20b",
  "input": input, // e.g. "What is the Node-RED flow abc123 doing?"
  "tools": [{
    "type": "function",
    "name": "get_flow",
    "description": "Get an individual Node-RED flow configuration by its ID. ...",
    "parameters": {
      "type": "object",
      "properties": {
        "id": {
          "type": "string",
          "description": "The ID of the flow to retrieve."
        }
      },
      "required": ["id"],
      "additionalProperties": false
    }
  }]
}
  1. user What's happening on flow abc123?
  2. assistant I have no idea, what is a flow?
Without any tools declared, the LLM has no way to inspect the flow.
  1. system Tool available: get_flow(id: string)
  2. user What's happening on flow abc123?
  3. assistant 🛠️ get_flow("abc123")
With the tool declared in its context, the LLM requests a tool call instead of guessing.

Calling requested tools

Node-RED nodes pick tool call requests out of the LLM response, switch on the function name, and invoke the tool with its arguments.

The tool runs, but the response is lost. We need to feed it back to the LLM.

Continuing the conversation

The tool output becomes the next input. The LLM recognises this as a response to its own tool call and continues from there, possibly with more tool calls, so the flow loops around until the LLM returns a plain response.

A couple of notes on the flow:

  • This time, I've provided a full suite of "Node-RED Admin" tools to the LLM.
  • The response id is extracted from each LM Studio post, and provided as the previous_response_id in subsequent calls.
  • The "link" arrows below form a loop, from after "format tool response" back to "prepare payload".
The email thread only shows the assistant's final response.
  1. system Tools available: get_flow(id: string), ...
  2. user What's happening on flow abc123?
  3. assistant 🛠️ get_flow("abc123")
  4. system { nodes: [...] }
  5. assistant This flow loads the ABC RSS feed every 6 hours and emails Tass the top 5 headlines.
Internally, the workflow loops around tool calls and assistant responses.

Tweaking safety and performance

Concurrency gating

The trigger runs every 20 seconds, but it may take longer for the LLM to respond, especially when it loops around a tool call. A flow-context gate prevents this: the gate closes when an email is picked up, and re-opens after a response is sent. A catch node also re-opens the gate on error, so the flow doesn't deadlock.

 
The gate closes when a request starts processing, and re-opens on completion or error.

Keeping the model focused

The system prompt tells the LLM to check Node-RED's current state before making changes:

  • Call get_nodes before creating flows, to verify which node types are installed.
  • Call get_flow before updating one.
  • Prefer built-in nodes over writing logic in function nodes.
  • Never modify the "Automate automations" group.
  • If a tool call fails, inspect the error and retry up to 3 times.

The get_nodes rule exists because models can assume node types are available when they are not, or will fall back to "function" nodes with boilerplate JavaScript instead of a higher-level node.

The finished flow

Here's the finished flow, with concurrency gating, error handling, and a stronger system prompt that includes example conversations.

Challenges and future work

This works well as a proof of concept, but there's a lot more work to do before it's ready for building and iterating on more complicated workflows.

Poor failure feedback

The Node-RED Admin API accepts a flow deployment as long as the JSON structure is valid. A misconfigured node, a reference to a nonexistent config node, or bad wiring all pass validation. The flow then fails at runtime, and Node-RED propagates no error back to the caller, so the LLM has no way to detect or iterate on structural mistakes after deploying.

Without actionable error feedback, the LLM has to one-shot every flow. A validate-and-retry loop would help, but it needs something to retry on. Swapping LM Studio for a frontier cloud model, or delegating to an MCP server with richer tooling, would improve results at the cost of diluting the point of this exercise.

Local model limitations

Most local models I tested were not capable of iterating on a multi-step tool-calling task without losing track of the conversation. I experimented with openai/gpt-oss-20b, google/gemma-4-26b-a4b, zai-org/glm-4.7-flash, and mistralai/devstral-small-2-2512 (with 4-bit quants for all) but only achieved tolerable results with openai/gpt-oss-20b. Hopefully future models will improve on this.

Rough edges

  • The "get unseen emails" node retrieves multiple emails at once (with all but the first halted by the concurrency gate) and marks them as read immediately, rather than once they've been acted on.
  • Concurrency gating by halting mid-flight is blunt; it would be cleaner to disable the schedule while a flow is in progress.
  • Despite system prompt instructions to prefer built-in nodes, the model gravitates toward function nodes with hand-written JavaScript, often assuming modules like xml2js are available when they're not.
  • Even with the get_nodes rule, models sometimes reference node types they've seen in training data (node-red-contrib-mongodb, node-red-node-sqlite) without checking whether they're installed.

Wrapping up

Three Node-RED flows, an IMAP poll, an LM Studio endpoint, and a handful of admin tools are enough to build an agent that reads requests from email, builds flows, and responds back. Concurrency gating keeps the inbox poll from racing itself, and the system prompt nudges the model toward checking the live state of Node-RED before guessing at it. Within those guardrails, requests like "check a sensor every 10 minutes and email me if it's above a threshold" produce a working flow with minimal intervention.

We're limited with the error feedback provided by Node-RED, as it currently deploys any JSON that is structurally valid. Until the platform can hand back a useful error, the model is stuck one-shotting every request, requiring human intervention to correct errors. Next, we'll need to evaluate how to add more in-depth validation to Node-RED, or compare this experience to other automation platforms like n8n.


If you have feedback or questions about this article, let's catch up via Mastodon, LinkedIn, or email.

Now (2026-03-18)Automating automationsSmartifying your devicesCHCon 2025 Badge ChallengeNorth Vietnam trip 2024LEGO: Tangara (2025)