Anthropic Launches Prompt Caching for Claude Models

Anthropic has introduced prompt caching on its API, allowing developers to avoid repeating prompts by remembering context between API calls. The feature is available in public beta on Claude 3.5 Sonnet and Claude 3 Haiku, with support for the largest model, Opus, coming soon.

Prompt caching lets users keep frequently used contexts in their sessions, reducing costs and latency for long instructions and uploaded documents. It also enables fine-tuning of model responses. Early users have seen substantial speed and cost improvements with prompt caching.

Anthropic highlighted potential use cases, including reducing costs and latency for conversational agents, faster code autocompletion, providing multiple instructions to search tools, and embedding entire documents in a prompt.

One advantage of caching prompts is lower pricing per token. For Claude 3.5 Sonnet, writing a prompt will cost $3.75 per 1 million tokens, while using a cached prompt will cost $0.30 per token, representing a 10x savings increase. Similar pricing applies to Claude 3 Haiku and Opus.

However, Anthropic’s cache only has a 5-minute lifetime and is refreshed upon each use. This is not the first time Anthropic has competed against other AI platforms through pricing; it previously slashed its token prices before releasing the Claude 3 family of models.

Prompt caching is a highly requested feature, with similar solutions available on other platforms like Lamina and OpenAI’s developer forums. While it’s distinct from large language model memory, prompt caching can still greatly benefit developers building atop Anthropic’s platform.
Source: https://venturebeat.com/ai/anthropics-new-claude-prompt-caching-will-save-developers-a-fortune/