Function Calling

Function calling (a.k.a. tool use) is the mechanism that lets a large language model invoke external functions, APIs, or code by emitting a structured JSON request. The host application runs the function and returns the result to the model, turning a text generator into an agent that can read systems and take real-world actions.

Key Takeaways

  • Function calling lets an LLM choose which external function to run and with what arguments, by emitting structured JSON rather than free text.
  • It is the same concept as Anthropic's tool use and Google's function declarations. Different vendors, identical pattern.
  • It is the bridge between a chatbot and an agent. Without it, an LLM can only talk; with it, it can post a journal entry, send a dunning email, or query an AR ledger.
  • In AR and O2C, function calling powers cash application agents, dispute resolution agents, collections agents, and forecasting agents that act across the GL, ERP, and CRM.
  • Production safety requires schema clarity, idempotent write tools, scoped permissions, full audit logs, and human approval for any irreversible financial action.

What function calling is and why it matters

Function calling, also known as tool use, is a capability that lets a large language model invoke external functions, APIs, or pieces of code instead of only producing text. Rather than answering a question with prose, the model emits a structured JSON object that names the function it wants to call and specifies the arguments. The host application executes that function, returns the result to the model, and the model uses the result to compose its next response or to decide on its next action.

This is the capability that separates a chatbot from an AI-native agent. A standalone LLM can summarise an email, draft a dunning letter, or explain a deduction. Add function calling, and the same model can look up the customer in the ERP, pull the aging report, draft the dunning letter, and schedule a follow-up call with the account manager. For finance leaders evaluating agentic AI for AR and O2C, function calling is the architectural primitive that makes everything else possible.

The mechanics: prompt to tool_call to execution to model

The pattern is consistent across providers. First, the host application registers a set of available tools with the model. Each tool has a name, a natural-language description of what it does, and a JSON schema describing its parameters. Second, the user makes a request. Third, the model decides whether to respond directly or to call a tool. If it chooses a tool, it returns a structured tool_call object containing the function name and arguments that conform to the schema.

Fourth, the host application, not the model, actually executes the function. The model never touches a database or sends an email itself. The host runs the code, captures the result, and feeds it back into the conversation as a tool result message. Fifth, the model reads the result and either composes a final answer to the user or emits another tool call. This loop can chain many calls together, which is how multi-step agentic workflows are built.

Major model providers and emerging standards

OpenAI introduced function calling in 2023 and later generalised the concept under the umbrella term tools. Anthropic offers the same capability under the name tool use across the Claude 3 and later families. Google calls the equivalent feature function declarations in Gemini. The vocabulary differs; the underlying contract is the same.

What is changing fast is the standardisation layer above individual provider APIs. The Model Context Protocol, or MCP, is an open standard for exposing tools, resources, and prompts to any compatible AI agent, regardless of which model is driving the workflow. Vendor-specific agent frameworks such as the OpenAI Agents SDK are also maturing. For enterprise IT leaders, the practical implication is that tools built today against MCP are increasingly portable across model providers, which reduces lock-in risk in a fast-moving market.

Use cases in AR and finance

Function calling is what makes an AR agent practically useful. A cash application agent might be given tools such as lookup_open_invoices(customer), match_payment(invoice_ids, amount), and post_to_gl(allocation). The model receives a remittance email, calls the lookup function, reasons over the open invoices, calls the matcher, and posts the allocation, all without a human touching the screen for clean matches.

A dispute resolution agent uses tools like get_invoice_history(customer), draft_response(dispute_id), and create_credit_memo(amount, reason) to triage and respond to deductions. A collections agent calls get_aging(customer), send_dunning(template_id), and schedule_call(account_mgr) to drive the dunning ladder. A forecasting agent uses query_ar_ledger(filters), pull_payment_history(customer), and build_forecast(horizon) to refresh a thirteen-week cash forecast on demand. In each case the model is the brain; the tools are its hands.

Production considerations and safety patterns

Moving from a demo to a production AR agent surfaces a predictable set of engineering problems. Schema clarity comes first. If a tool description is ambiguous, the model will pick the wrong tool or pass the wrong arguments. Treat tool descriptions as serious technical writing.

Idempotency matters for any tool that changes state. Network blips and retries are inevitable, so a post_to_gl call should be safe to invoke twice without double-booking the entry. Permissions and authorisation should be scoped per user: the agent must only see tools the operating user is allowed to use, or the agent becomes a privilege escalation vector. Latency and cost are operational concerns. Every tool call is an extra round-trip to the model and an extra slug of tokens. Designs that minimise unnecessary calls pay for themselves quickly.

Safety patterns that work in finance include: auto-execute read-only tools but require human approval for write tools above a threshold; rate-limit financial actions such as credit memos or refunds; log every tool call, its inputs, its outputs, and the model's reasoning trace; and add an explicit confirmation step for destructive or irreversible operations. Common pitfalls are giving the model too many tools at once, vague descriptions, no error handling when tools fail, and no guard against infinite loops where the agent calls itself forever.

Function calling versus RAG: action versus retrieval

Function calling is often confused with retrieval-augmented generation, or RAG. They are complementary, not interchangeable. RAG is a read-only pattern for injecting fresh, domain-specific knowledge into the model's context window. It answers the question what does the model know? Function calling answers a different question: what can the model do?

A practical AR agent uses both. RAG grounds the model in your dispute policy, your dunning playbook, and your customer master data. Function calling lets the same model act on that knowledge by querying live balances, posting allocations, and sending emails. Treat retrieval as the library and function calling as the workshop. Together they turn a generic LLM into an agentic system that can actually move work through the cash application, deductions, and collections processes.

Frequently asked questions

Is function calling the same thing as tool use?

Yes. Function calling is OpenAI's term, tool use is Anthropic's term for Claude, and function declarations is Google's term for Gemini. The underlying pattern, where the model emits structured JSON describing a function to invoke and the host executes it, is identical across all three providers.

Does the LLM actually run the function itself?

No. The model only produces a structured request. The host application runs the function against your databases, APIs, or services and returns the result to the model. This separation is what lets you enforce permissions, audit logs, and approval workflows.

How is function calling different from RAG?

RAG is read-only retrieval that injects relevant documents into the model's context so it can answer with current knowledge. Function calling is for taking action: querying a live system, writing a record, sending a message. Production AR agents typically use both, with RAG providing policy context and function calling executing the work.

What are realistic AR use cases for function calling?

Cash application agents that look up open invoices and post allocations, dispute resolution agents that retrieve invoice history and create credit memos, collections agents that pull aging and trigger dunning sequences, and forecasting agents that refresh thirteen-week cash projections from live ledger data.

What are the biggest production risks?

Ambiguous tool descriptions that cause wrong selections, non-idempotent write tools that double-post on retry, over-broad permissions that turn the agent into a privilege escalation path, missing audit logs, and no human approval gate for irreversible financial actions such as large credit memos or refunds.

Should I worry about vendor lock-in if I build on one provider's function calling?

Less than you might think. The Model Context Protocol, or MCP, is emerging as an open standard for exposing tools to any compatible agent, and most enterprise AR platforms abstract the provider layer. Build your tools as cleanly described, idempotent services and they will port across OpenAI, Anthropic, and Google with limited rework.

Continue learning