[{"content":"","date":"June 2, 2026","externalUrl":null,"permalink":"/en/tags/binary-tree/","section":"Tags","summary":"","title":"Binary Tree","type":"tags"},{"content":"","date":"June 2, 2026","externalUrl":null,"permalink":"/en/","section":"Hassenfeld","summary":"","title":"Hassenfeld","type":"page"},{"content":"","date":"June 2, 2026","externalUrl":null,"permalink":"/en/series/leetcode/","section":"Series","summary":"","title":"LeetCode","type":"series"},{"content":"","date":"June 2, 2026","externalUrl":null,"permalink":"/en/tags/leetcode/","section":"Tags","summary":"","title":"LeetCode","type":"tags"},{"content":"A binary tree consists of:\nNode element; Pointer to the child node Left; Pointer to another child node Right. LeetCode tree representation:\nstruct TreeNode { int val; TreeNode *left; TreeNode *right; TreeNode(int x) : val(x), left(NULL), right(NULL) {} }; leetcode104. Maximum Depth of Binary Tree # int maxDepth(TreeNode* root) { if(root==nullptr){ return 0; } return max(maxDepth(root-\u0026gt;left),maxDepth(root-\u0026gt;right))+1; } Recursively calculate the depth of the left and right subtrees first #Recursion max() returns the maximum depth + 1 leetcode226. Invert Binary Tree # TreeNode* invertTree(TreeNode* root) { if(root==nullptr){ return nullptr; } TreeNode* temp; temp=root-\u0026gt;left; root-\u0026gt;left=root-\u0026gt;right; root-\u0026gt;right=temp; invertTree(root-\u0026gt;left); invertTree(root-\u0026gt;right); return root; } Consider traversing the binary tree recursively and swapping the left/right child nodes of each node to generate a mirror image of the binary tree.\nTime Complexity: O(n) Space Complexity: O(n) leetcode101. Symmetric Tree # bool isSymmetric(TreeNode* root) { if(root==nullptr){ return true; } return isMirror(root-\u0026gt;left,root-\u0026gt;right); } bool isMirror(TreeNode* root1,TreeNode* root2){ if(root1==nullptr\u0026amp;\u0026amp;root2==nullptr){ return true; } if(root1==nullptr||root2==nullptr||root1-\u0026gt;val!=root2-\u0026gt;val){ return false; } return(isMirror(root1-\u0026gt;left,root2-\u0026gt;right)\u0026amp;\u0026amp;isMirror(root1-\u0026gt;right,root2-\u0026gt;left)); } Time Complexity: O(n) Space Complexity: O(n) Function isSymmetric() :\nSpecial case handling: If the root node is null, return true directly. Return value: The result of isMirror(root-\u0026gt;left, root-\u0026gt;right). Function isMirror():\nFirst, check if the current left and right nodes satisfy the symmetry condition. Then, check if the sub-nodes L.left and R.right are symmetric, and L.right and R.left are symmetric. #Recursion ![[SymmetricTree.png|697]] ","date":"June 2, 2026","externalUrl":null,"permalink":"/en/posts/20260602-leetcode.binarytree/","section":"Posts","summary":"","title":"LeetCode.Binary Tree","type":"posts"},{"content":"","date":"June 2, 2026","externalUrl":null,"permalink":"/en/series/","section":"Series","summary":"","title":"Series","type":"series"},{"content":"","date":"June 2, 2026","externalUrl":null,"permalink":"/en/tags/","section":"Tags","summary":"","title":"Tags","type":"tags"},{"content":"","date":"2026年6月2日","externalUrl":null,"permalink":"/ja/tags/%E4%BA%8C%E5%88%86%E6%9C%A8/","section":"Tags","summary":"","title":"二分木","type":"tags"},{"content":"","date":"2026年06月02日","externalUrl":null,"permalink":"/tags/%E4%BA%8C%E5%8F%89%E6%A0%91/","section":"Tags","summary":"","title":"二叉树","type":"tags"},{"content":"Each node in a singly linked list is a struct pointer, which contains:\nNode element data; The pointer address of the next node. LeetCode linked list representation:\nstruct ListNode { int val; ListNode *next; ListNode(int x) : val(x), next(nullptr) {} }; leetcode206. Reverse Linked List # Iterative Method # ListNode* reverseList(ListNode* head) { ListNode *head_prev = nullptr, *head_next; while (head) { head_next = head-\u0026gt;next; head-\u0026gt;next = head_prev; head_prev = head; head = head_next; } return head_prev; } Time Complexity: O(n) Space Complexity: O(1) Recursive Method # ListNode* reverseList(ListNode* head, ListNode* head_prev = nullptr) { if (head == nullptr) { return head_prev; } ListNode* head_next = head-\u0026gt;next; head-\u0026gt;next = head_prev; return reverseList(head_next, head); } Time Complexity: O(n) Space Complexity: O(n) return reverseList(head_next, head);\nThis is in tail recursion form. #TailRecursion #Recursion Pass the \u0026ldquo;next node to process\u0026rdquo; (head_next) as the new head. Pass the \u0026ldquo;current node\u0026rdquo; (head) as the head_prev for the next recursion loop. Note In the iterative method, ListNode *head_prev = nullptr is located inside the function; in the recursive method, ListNode *head_prev = nullptr is within the function parameters. · Iterative: It only exists in this single function call, which is executed only once. · Recursive: Each recursion level (each function call) has its own head_prev.\nleetcode234. Palindrome Linked List # Fast and Slow Pointers # Having just finished the problem on reversing a linked list, I immediately thought of reversing the second half of the list and then comparing whether the first and second halves are identical. While putting the linked list into an array for palindrome comparison would clearly involve less code, the advantage of fast and slow pointers is obvious: it avoids using O(n) extra space.\nbool isPalindrome(ListNode* head) { if(head==nullptr||head-\u0026gt;next==nullptr){ return true; } ListNode* head1=head; // head1 slow pointer ListNode* head2=head; // head2 fast pointer while(head2!=nullptr\u0026amp;\u0026amp;head2-\u0026gt;next!=nullptr){ head1=head1-\u0026gt;next; head2=head2-\u0026gt;next-\u0026gt;next; } ListNode* secondHalf=reverseList(head1,nullptr); ListNode* p1=head; ListNode* p2=secondHalf; bool flag=true; while(p2!=nullptr){ if(p1-\u0026gt;val!=p2-\u0026gt;val){ flag=false; break; } p1=p1-\u0026gt;next; p2=p2-\u0026gt;next; } return flag; } ListNode* reverseList(ListNode* head,ListNode *head_prev = nullptr) { ListNode *head_next; if(head==nullptr){ return head_prev; } head_next=head-\u0026gt;next; head-\u0026gt;next=head_prev; return reverseList(head_next,head); } ","date":"June 1, 2026","externalUrl":null,"permalink":"/en/posts/20260601-leetcode.list/","section":"Posts","summary":"","title":"LeetCode.List","type":"posts"},{"content":"","date":"June 1, 2026","externalUrl":null,"permalink":"/en/tags/list/","section":"Tags","summary":"","title":"List","type":"tags"},{"content":"","date":"2026年6月1日","externalUrl":null,"permalink":"/ja/tags/%E3%83%AA%E3%82%B9%E3%83%88/","section":"Tags","summary":"","title":"リスト","type":"tags"},{"content":"","date":"2026年06月01日","externalUrl":null,"permalink":"/tags/%E9%93%BE%E8%A1%A8/","section":"Tags","summary":"","title":"链表","type":"tags"},{"content":"","date":"May 3, 2026","externalUrl":null,"permalink":"/en/tags/agent/","section":"Tags","summary":"","title":"Agent","type":"tags"},{"content":" Introduction to GrantGo # GrantGo is an always-online subsidy application assistant designed to help you understand complex policies and navigate application processes smoothly. Here, you can complete registration, profile personalization, subsidy consultation, and document pre-screening with ease, helping you avoid detours and quickly find the subsidies that suit you best. The system integrates your personal profile, policy interpretation, application steps, and document checklists into a single workspace, transforming the experience from \u0026ldquo;confused and lost\u0026rdquo; to \u0026ldquo;informed and efficient.\u0026rdquo;\nVisit GrantGo Hassenfeld-hub/GrantGo Vue 0 0 GrantGo Features # Login \u0026amp; Registration # On the homepage, click \u0026ldquo;Login\u0026rdquo; or \u0026ldquo;Free Trial\u0026rdquo; to enter the login page. New users can register and log in using Email + Password. Profile Completion # Users need to complete their Personal Profile: Highest education, graduation year, current work/residence city, months of social security contribution, employment status, local residency, marital/parental status, entrepreneurial intent, and industry direction. City selection uses a three-level cascade (Province/City/District), with the option to stop at the provincial or municipal level. Click \u0026ldquo;Save and Start Matching\u0026rdquo; to enter the workspace, where the system automatically triggers the first profile matching. Workspace # Left Sidebar History: View, switch, or delete conversations; supports \u0026ldquo;New Conversation\u0026rdquo;. Middle AI Chat: Interact with the AI to get subsidy matching suggestions, policy explanations, and answers to procedural questions. Right Sidebar PDF Parsing: Upload documents for intelligent pre-screening, displaying matching score, document checklists, and process progress. Top Bar: Collapsible sidebars to maximize reading space. Intelligent Q\u0026amp;A # High-Match Subsidies The system returns high-match subsidy cards based on your personal profile. Each card displays: Subsidy name, applicable region, matching score, amount, and eligibility criteria. Click View Application Guide to generate detailed guidance for that specific subsidy. Application Guide Includes a summary, application URL, official government links, consultation phone numbers, required documents, and step-by-step procedures. Click Generate Flowchart for a visual representation of the application path. Click Show Checklist to track your preparation progress. Free Chat\nAsk any question at any time, and GrantGo will provide answers. PDF Pre-screening\nSupports uploading PDF files (e.g., graduation certificates, social security records). After uploading, the system parses the file and shows real-time status: Uploading, Parsing, Completed, or Failed. Post-parsing results include: Matching percentage, document checklist status, and process timeline suggestions. Technical Architecture # Frontend Application Core Framework: Vue 3 + Vite + TypeScript State Management: Pinia (Multi-session Chat management, user states) Routing: Vue Router UI \u0026amp; Styling: Tailwind CSS v4 + Custom geek-style design (Inter font) Key Libraries: markdown-it, mermaid, dompurify (Safe rendering) Business Backend (Go) Core Framework: Gin (Go 1.25) Database \u0026amp; ORM: PostgreSQL / SQLite + GORM Real-time Communication: Gorilla WebSocket (Streamed chat handling) Authentication: JWT-based Auth + CSRF protection Key Responsibilities: User authentication, profile data, region tools, backend data aggregation, WebSocket forwarding, security middleware, and rate limiting. AI Intelligent Service (Python) Core Framework: FastAPI + Uvicorn AI Core Libraries: LangChain, OpenAI API (Compatible with DeepSeek, etc.) Document \u0026amp; Vector Storage: PyMuPDF, ChromaDB, Redis Key Responsibilities: Pure LLM chat mode, System Prompt strategy control, formatted JSON card generation, RAG retrieval (Backup/Safety link), and PDF parsing. ","date":"May 3, 2026","externalUrl":null,"permalink":"/en/posts/20260503-grantgo/","section":"Posts","summary":"GrantGo AI Technical Overview","title":"GrantGo AI","type":"posts"},{"content":"","date":"May 3, 2026","externalUrl":null,"permalink":"/en/tags/grantgo-ai/","section":"Tags","summary":"","title":"GrantGo AI","type":"tags"},{"content":" Info The following content is reposted from a technical article on Anthropic\u0026rsquo;s official website. For the full original text, please visit: https://www.anthropic.com/engineering/building-effective-agents\nWe\u0026rsquo;ve worked with dozens of teams building LLM agents across industries. Consistently, the most successful implementations use simple, composable patterns rather than complex frameworks.\nOver the past year, we\u0026rsquo;ve worked with dozens of teams building large language model (LLM) agents across industries. Consistently, the most successful implementations weren\u0026rsquo;t using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns.\nIn this post, we share what we’ve learned from working with our customers and building agents ourselves, and give practical advice for developers on building effective agents.\nWhat are agents? # \u0026ldquo;Agent\u0026rdquo; can be defined in several ways. Some customers define agents as fully autonomous systems that operate independently over extended periods, using various tools to accomplish complex tasks. Others use the term to describe more prescriptive implementations that follow predefined workflows. At Anthropic, we categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents:\nWorkflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks. Below, we will explore both types of agentic systems in detail. In Appendix 1 (“Agents in Practice”), we describe two domains where customers have found particular value in using these kinds of systems.\nWhen (and when not) to use agents # When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all. Agentic systems often trade latency and cost for better task performance, and you should consider when this tradeoff makes sense.\nWhen more complexity is warranted, workflows offer predictability and consistency for well-defined tasks, whereas agents are the better option when flexibility and model-driven decision-making are needed at scale. For many applications, however, optimizing single LLM calls with retrieval and in-context examples is usually enough.\nWhen and how to use frameworks # There are many frameworks that make agentic systems easier to implement, including:\nThe Claude Agent SDK; Strands Agents SDK by AWS; Rivet, a drag and drop GUI LLM workflow builder; and Vellum, another GUI tool for building and testing complex workflows. These frameworks make it easy to get started by simplifying standard low-level tasks like calling LLMs, defining and parsing tools, and chaining calls together. However, they often create extra layers of abstraction that can obscure the underlying prompts ​​and responses, making them harder to debug. They can also make it tempting to add complexity when a simpler setup would suffice.\nWe suggest that developers start by using LLM APIs directly: many patterns can be implemented in a few lines of code. If you do use a framework, ensure you understand the underlying code. Incorrect assumptions about what\u0026rsquo;s under the hood are a common source of customer error.\nSee our cookbook for some sample implementations.\nBuilding blocks, workflows, and agents # In this section, we’ll explore the common patterns for agentic systems we’ve seen in production. We\u0026rsquo;ll start with our foundational building block—the augmented LLM—and progressively increase complexity, from simple compositional workflows to autonomous agents.\nBuilding block: The augmented LLM # The basic building block of agentic systems is an LLM enhanced with augmentations such as retrieval, tools, and memory. Our current models can actively use these capabilities—generating their own search queries, selecting appropriate tools, and determining what information to retain.\nThe augmented LLM\nWe recommend focusing on two key aspects of the implementation: tailoring these capabilities to your specific use case and ensuring they provide an easy, well-documented interface for your LLM. While there are many ways to implement these augmentations, one approach is through our recently released Model Context Protocol, which allows developers to integrate with a growing ecosystem of third-party tools with a simple client implementation.\nFor the remainder of this post, we\u0026rsquo;ll assume each LLM call has access to these augmented capabilities.\nWorkflow: Prompt chaining # Prompt chaining decomposes a task into a sequence of steps, where each LLM call processes the output of the previous one. You can add programmatic checks (see \u0026ldquo;gate” in the diagram below) on any intermediate steps to ensure that the process is still on track.\nThe prompt chaining workflow\nWhen to use this workflow: This workflow is ideal for situations where the task can be easily and cleanly decomposed into fixed subtasks. The main goal is to trade off latency for higher accuracy, by making each LLM call an easier task.\nExamples where prompt chaining is useful:\nGenerating Marketing copy, then translating it into a different language. Writing an outline of a document, checking that the outline meets certain criteria, then writing the document based on the outline. Workflow: Routing # Routing classifies an input and directs it to a specialized followup task. This workflow allows for separation of concerns, and building more specialized prompts. Without this workflow, optimizing for one kind of input can hurt performance on other inputs.\nThe routing workflow\nWhen to use this workflow: Routing works well for complex tasks where there are distinct categories that are better handled separately, and where classification can be handled accurately, either by an LLM or a more traditional classification model/algorithm.\nExamples where routing is useful:\nDirecting different types of customer service queries (general questions, refund requests, technical support) into different downstream processes, prompts, and tools. Routing easy/common questions to smaller, cost-efficient models like Claude Haiku 4.5 and hard/unusual questions to more capable models like Claude Sonnet 4.5 to optimize for best performance. Workflow: Parallelization # LLMs can sometimes work simultaneously on a task and have their outputs aggregated programmatically. This workflow, parallelization, manifests in two key variations:\nSectioning: Breaking a task into independent subtasks run in parallel. Voting: Running the same task multiple times to get diverse outputs. The parallelization workflow\nWhen to use this workflow: Parallelization is effective when the divided subtasks can be parallelized for speed, or when multiple perspectives or attempts are needed for higher confidence results. For complex tasks with multiple considerations, LLMs generally perform better when each consideration is handled by a separate LLM call, allowing focused attention on each specific aspect.\nExamples where parallelization is useful:\nSectioning: Implementing guardrails where one model instance processes user queries while another screens them for inappropriate content or requests. This tends to perform better than having the same LLM call handle both guardrails and the core response. Automating evals for evaluating LLM performance, where each LLM call evaluates a different aspect of the model’s performance on a given prompt. Voting: Reviewing a piece of code for vulnerabilities, where several different prompts review and flag the code if they find a problem. Evaluating whether a given piece of content is inappropriate, with multiple prompts evaluating different aspects or requiring different vote thresholds to balance false positives and negatives. Workflow: Orchestrator-workers # In the orchestrator-workers workflow, a central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results.\nThe orchestrator-workers workflow\nWhen to use this workflow: This workflow is well-suited for complex tasks where you can’t predict the subtasks needed (in coding, for example, the number of files that need to be changed and the nature of the change in each file likely depend on the task). Whereas it’s topographically similar, the key difference from parallelization is its flexibility—subtasks aren\u0026rsquo;t pre-defined, but determined by the orchestrator based on the specific input.\nExample where orchestrator-workers is useful:\nCoding products that make complex changes to multiple files each time. Search tasks that involve gathering and analyzing information from multiple sources for possible relevant information. Workflow: Evaluator-optimizer # In the evaluator-optimizer workflow, one LLM call generates a response while another provides evaluation and feedback in a loop.\nThe evaluator-optimizer workflow\nWhen to use this workflow: This workflow is particularly effective when we have clear evaluation criteria, and when iterative refinement provides measurable value. The two signs of good fit are, first, that LLM responses can be demonstrably improved when a human articulates their feedback; and second, that the LLM can provide such feedback. This is analogous to the iterative writing process a human writer might go through when producing a polished document.\nExamples where evaluator-optimizer is useful:\nLiterary translation where there are nuances that the translator LLM might not capture initially, but where an evaluator LLM can provide useful critiques. Complex search tasks that require multiple rounds of searching and analysis to gather comprehensive information, where the evaluator decides whether further searches are warranted. Agents # Agents are emerging in production as LLMs mature in key capabilities—understanding complex inputs, engaging in reasoning and planning, using tools reliably, and recovering from errors. Agents begin their work with either a command from, or interactive discussion with, the human user. Once the task is clear, agents plan and operate independently, potentially returning to the human for further information or judgement. During execution, it\u0026rsquo;s crucial for the agents to gain “ground truth” from the environment at each step (such as tool call results or code execution) to assess its progress. Agents can then pause for human feedback at checkpoints or when encountering blockers. The task often terminates upon completion, but it’s also common to include stopping conditions (such as a maximum number of iterations) to maintain control.\nAgents can handle sophisticated tasks, but their implementation is often straightforward. They are typically just LLMs using tools based on environmental feedback in a loop. It is therefore crucial to design toolsets and their documentation clearly and thoughtfully. We expand on best practices for tool development in Appendix 2 (\u0026ldquo;Prompt Engineering your Tools\u0026rdquo;).\nAutonomous agent\nWhen to use agents: Agents can be used for open-ended problems where it’s difficult or impossible to predict the required number of steps, and where you can’t hardcode a fixed path. The LLM will potentially operate for many turns, and you must have some level of trust in its decision-making. Agents\u0026rsquo; autonomy makes them ideal for scaling tasks in trusted environments.\nThe autonomous nature of agents means higher costs, and the potential for compounding errors. We recommend extensive testing in sandboxed environments, along with the appropriate guardrails.\nExamples where agents are useful:\nThe following examples are from our own implementations:\nA coding Agent to resolve SWE-bench tasks, which involve edits to many files based on a task description; Our “computer use” reference implementation, where Claude uses a computer to accomplish tasks. High-level flow of a coding agent\nCombining and customizing these patterns # These building blocks aren\u0026rsquo;t prescriptive. They\u0026rsquo;re common patterns that developers can shape and combine to fit different use cases. The key to success, as with any LLM features, is measuring performance and iterating on implementations. To repeat: you should consider adding complexity only when it demonstrably improves outcomes.\nSummary # Success in the LLM space isn\u0026rsquo;t about building the most sophisticated system. It\u0026rsquo;s about building the right system for your needs. Start with simple prompts, optimize them with comprehensive evaluation, and add multi-step agentic systems only when simpler solutions fall short.\nWhen implementing agents, we try to follow three core principles:\nMaintain simplicity in your agent\u0026rsquo;s design. Prioritize transparency by explicitly showing the agent’s planning steps. Carefully craft your agent-computer interface (ACI) through thorough tool documentation and testing. Frameworks can help you get started quickly, but don\u0026rsquo;t hesitate to reduce abstraction layers and build with basic components as you move to production. By following these principles, you can create agents that are not only powerful but also reliable, maintainable, and trusted by their users.\nAcknowledgements # Written by Erik Schluntz and Barry Zhang. This work draws upon our experiences building agents at Anthropic and the valuable insights shared by our customers, for which we\u0026rsquo;re deeply grateful.\nAppendix 1: Agents in practice # Our work with customers has revealed two particularly promising applications for AI agents that demonstrate the practical value of the patterns discussed above. Both applications illustrate how agents add the most value for tasks that require both conversation and action, have clear success criteria, enable feedback loops, and integrate meaningful human oversight.\nA. Customer support # Customer support combines familiar chatbot interfaces with enhanced capabilities through tool integration. This is a natural fit for more open-ended agents because:\nSupport interactions naturally follow a conversation flow while requiring access to external information and actions; Tools can be integrated to pull customer data, order history, and knowledge base articles; Actions such as issuing refunds or updating tickets can be handled programmatically; and Success can be clearly measured through user-defined resolutions. Several companies have demonstrated the viability of this approach through usage-based pricing models that charge only for successful resolutions, showing confidence in their agents\u0026rsquo; effectiveness.\nB. Coding agents # The software development space has shown remarkable potential for LLM features, with capabilities evolving from code completion to autonomous problem-solving. Agents are particularly effective because:\nCode solutions are verifiable through automated tests; Agents can iterate on solutions using test results as feedback; The problem space is well-defined and structured; and Output quality can be measured objectively. In our own implementation, agents can now solve real GitHub issues in the SWE-bench Verified benchmark based on the pull request description alone. However, whereas automated testing helps verify functionality, human review remains crucial for ensuring solutions align with broader system requirements.\nAppendix 2: Prompt engineering your tools # No matter which agentic system you\u0026rsquo;re building, tools will likely be an important part of your agent. Tools enable Claude to interact with external services and APIs by specifying their exact structure and definition in our API. When Claude responds, it will include a tool use block in the API response if it plans to invoke a tool. Tool definitions and specifications should be given just as much prompt engineering attention as your overall prompts. In this brief appendix, we describe how to prompt engineer your tools.\nThere are often several ways to specify the same action. For instance, you can specify a file edit by writing a diff, or by rewriting the entire file. For structured output, you can return code inside markdown or inside JSON. In software engineering, differences like these are cosmetic and can be converted losslessly from one to the other. However, some formats are much more difficult for an LLM to write than others. Writing a diff requires knowing how many lines are changing in the chunk header before the new code is written. Writing code inside JSON (compared to markdown) requires extra escaping of newlines and quotes.\nOur suggestions for deciding on tool formats are the following:\nGive the model enough tokens to \u0026ldquo;think\u0026rdquo; before it writes itself into a corner. Keep the format close to what the model has seen naturally occurring in text on the internet. Make sure there\u0026rsquo;s no formatting \u0026ldquo;overhead\u0026rdquo; such as having to keep an accurate count of thousands of lines of code, or string-escaping any code it writes. One rule of thumb is to think about how much effort goes into human-computer interfaces (HCI), and plan to invest just as much effort in creating good agent-computer interfaces (ACI). Here are some thoughts on how to do so:\nPut yourself in the model\u0026rsquo;s shoes. Is it obvious how to use this tool, based on the description and parameters, or would you need to think carefully about it? If so, then it’s probably also true for the model. A good tool definition often includes example usage, edge cases, input format requirements, and clear boundaries from other tools. How can you change parameter names or descriptions to make things more obvious? Think of this as writing a great docstring for a junior developer on your team. This is especially important when using many similar tools. Test how the model uses your tools: Run many example inputs in our workbench to see what mistakes the model makes, and iterate. Poka-yoke your tools. Change the arguments so that it is harder to make mistakes. While building our agent for SWE-bench, we actually spent more time optimizing our tools than the overall prompt. For example, we found that the model would make mistakes with tools using relative filepaths after the agent had moved out of the root directory. To fix this, we changed the tool to always require absolute filepaths—and we found that the model used this method flawlessly.\n","date":"April 28, 2026","externalUrl":null,"permalink":"/en/posts/20260428-02.buildingeffectiveagents/","section":"Posts","summary":"","title":"02.Building Effective Agents","type":"posts"},{"content":"","date":"April 28, 2026","externalUrl":null,"permalink":"/en/series/anthropic/","section":"Series","summary":"","title":"Anthropic","type":"series"},{"content":"","date":"April 28, 2026","externalUrl":null,"permalink":"/en/tags/anthropic/","section":"Tags","summary":"","title":"Anthropic","type":"tags"},{"content":"","date":"April 28, 2026","externalUrl":null,"permalink":"/en/tags/workflow/","section":"Tags","summary":"","title":"Workflow","type":"tags"},{"content":"","date":"April 27, 2026","externalUrl":"https://grantgo.hassenfeld.org/","permalink":"/en/projects/grantgo/","section":"Projects","summary":"GrantGo: Your AI-powered grant assistant. From precise profiling to document pre-review, we provide full-cycle support. Simplify registration, enhance consultation efficiency, and help you skip the guesswork to quickly secure the most suitable grant opportunities.","title":"GrantGo","type":"projects"},{"content":"","date":"April 27, 2026","externalUrl":null,"permalink":"/en/projects/","section":"Projects","summary":"","title":"Projects","type":"projects"},{"content":" Info The following content is reposted from a technical article on Anthropic\u0026rsquo;s official website. For the full original text, please visit: https://www.anthropic.com/engineering/contextual-retrieval\nFor an AI model to be useful in specific contexts, it often needs access to background knowledge. For example, customer support chatbots need knowledge about the specific business they\u0026rsquo;re being used for, and legal analyst bots need to know about a vast array of past cases.\nDevelopers typically enhance an AI model\u0026rsquo;s knowledge using Retrieval-Augmented Generation (RAG). RAG is a method that retrieves relevant information from a knowledge base and appends it to the user\u0026rsquo;s prompt, significantly enhancing the model\u0026rsquo;s response. The problem is that traditional RAG solutions remove context when encoding information, which often results in the system failing to retrieve the relevant information from the knowledge base.\nIn this post, we outline a method that dramatically improves the retrieval step in RAG. The method is called “Contextual Retrieval” and uses two sub-techniques: Contextual Embeddings and Contextual BM25. This method can reduce the number of failed retrievals by 49% and, when combined with reranking, by 67%. These represent significant improvements in retrieval accuracy, which directly translates to better performance in downstream tasks.\nYou can easily deploy your own Contextual Retrieval solution with Claude with our cookbook.\nA note on simply using a longer prompt # Sometimes the simplest solution is the best. If your knowledge base is smaller than 200,000 tokens (about 500 pages of material), you can just include the entire knowledge base in the prompt that you give the model, with no need for RAG or similar methods.\nA few weeks ago, we released prompt caching for Claude, which makes this approach significantly faster and more cost-effective. Developers can now cache frequently used prompts between API calls, reducing latency by \u0026gt; 2x and costs by up to 90% (you can see how it works by reading our prompt caching cookbook).\nHowever, as your knowledge base grows, you\u0026rsquo;ll need a more scalable solution. That’s where Contextual Retrieval comes in.\nA primer on RAG: scaling to larger knowledge bases # For larger knowledge bases that don\u0026rsquo;t fit within the context window, RAG is the typical solution. RAG works by preprocessing a knowledge base using the following steps:\nBreak down the knowledge base (the “corpus” of documents) into smaller chunks of text, usually no more than a few hundred tokens; Use an embedding model to convert these chunks into vector embeddings that encode meaning; Store these embeddings in a vector database that allows for searching by semantic similarity. At runtime, when a user inputs a query to the model, the vector database is used to find the most relevant chunks based on semantic similarity to the query. Then, the most relevant chunks are added to the prompt sent to the generative model.\nWhile embedding models excel at capturing semantic relationships, they can miss crucial exact matches. Fortunately, there’s an older technique that can assist in these situations. BM25 (Best Matching 25) is a ranking function that uses lexical matching to find precise word or phrase matches. It\u0026rsquo;s particularly effective for queries that include unique identifiers or technical terms.\nBM25 works by building upon the TF-IDF (Term Frequency-Inverse Document Frequency) concept. TF-IDF measures how important a word is to a document in a collection. BM25 refines this by considering document length and applying a saturation function to term frequency, which helps prevent common words from dominating the results.\nHere’s how BM25 can succeed where semantic embeddings fail: Suppose a user queries \u0026ldquo;Error code TS-999\u0026rdquo; in a technical support database. An embedding model might find content about error codes in general, but could miss the exact \u0026ldquo;TS-999\u0026rdquo; match. BM25 looks for this specific text string to identify the relevant documentation.\nRAG solutions can more accurately retrieve the most applicable chunks by combining the embeddings and BM25 techniques using the following steps:\nBreak down the knowledge base (the \u0026ldquo;corpus\u0026rdquo; of documents) into smaller chunks of text, usually no more than a few hundred tokens; Create TF-IDF encodings and semantic embeddings for these chunks; Use BM25 to find top chunks based on exact matches; Use embeddings to find top chunks based on semantic similarity; Combine and deduplicate results from (3) and (4) using rank fusion techniques; Add the top-K chunks to the prompt to generate the response. By leveraging both BM25 and embedding models, traditional RAG systems can provide more comprehensive and accurate results, balancing precise term matching with broader semantic understanding.\nA Standard Retrieval-Augmented Generation (RAG) system that uses both embeddings and Best Match 25 (BM25) to retrieve information. TF-IDF (term frequency-inverse document frequency) measures word importance and forms the basis for BM25.\nThis approach allows you to cost-effectively scale to enormous knowledge bases, far beyond what could fit in a single prompt. But these traditional RAG systems have a significant limitation: they often destroy context.\nThe context conundrum in traditional RAG # In traditional RAG, documents are typically split into smaller chunks for efficient retrieval. While this approach works well for many applications, it can lead to problems when individual chunks lack sufficient context.\nFor example, imagine you had a collection of financial information (say, U.S. SEC filings) embedded in your knowledge base, and you received the following question: \u0026ldquo;What was the revenue growth for ACME Corp in Q2 2023?\u0026rdquo;\nA relevant chunk might contain the text: \u0026ldquo;The company\u0026rsquo;s revenue grew by 3% over the previous quarter.\u0026quot; However, this chunk on its own doesn\u0026rsquo;t specify which company it\u0026rsquo;s referring to or the relevant time period, making it difficult to retrieve the right information or use the information effectively.\nIntroducing Contextual Retrieval # Contextual Retrieval solves this problem by prepending chunk-specific explanatory context to each chunk before embedding (“Contextual Embeddings”) and creating the BM25 index (“Contextual BM25”).\nLet’s return to our SEC filings collection example. Here\u0026rsquo;s an example of how a chunk might be transformed:\noriginal_chunk = \u0026#34;The company\u0026#39;s revenue grew by 3% over the previous quarter.\u0026#34; contextualized_chunk = \u0026#34;This chunk is from an SEC filing on ACME corp\u0026#39;s performance in Q2 2023; the previous quarter\u0026#39;s revenue was $314 million. The company\u0026#39;s revenue grew by 3% over the previous quarter.\u0026#34; Copy\nIt is worth noting that other approaches to using context to improve retrieval have been proposed in the past. Other proposals include: adding generic document summaries to chunks (we experimented and saw very limited gains), hypothetical document embedding, and summary-based indexing (we evaluated and saw low performance). These methods differ from what is proposed in this post.\nImplementing Contextual Retrieval # Of course, it would be far too much work to manually annotate the thousands or even millions of chunks in a knowledge base. To implement Contextual Retrieval, we turn to Claude. We’ve written a prompt that instructs the model to provide concise, chunk-specific context that explains the chunk using the context of the overall document. We used the following Claude 3 Haiku prompt to generate context for each chunk:\n\u0026lt;document\u0026gt; {{WHOLE_DOCUMENT}} \u0026lt;/document\u0026gt; Here is the chunk we want to situate within the whole document \u0026lt;chunk\u0026gt; {{CHUNK_CONTENT}} \u0026lt;/chunk\u0026gt; Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else. Copy\nThe resulting contextual text, usually 50-100 tokens, is prepended to the chunk before embedding it and before creating the BM25 index.\nHere’s what the preprocessing flow looks like in practice:\nContextual Retrieval is a preprocessing technique that improves retrieval accuracy.\nIf you’re interested in using Contextual Retrieval, you can get started with our cookbook.\nUsing Prompt Caching to reduce the costs of Contextual Retrieval # Contextual Retrieval is uniquely possible at low cost with Claude, thanks to the special prompt caching feature we mentioned above. With prompt caching, you don’t need to pass in the reference document for every chunk. You simply load the document into the cache once and then reference the previously cached content. Assuming 800 token chunks, 8k token documents, 50 token context instructions, and 100 tokens of context per chunk, the one-time cost to generate contextualized chunks is $1.02 per million document tokens.\nMethodology # We experimented across various knowledge domains (codebases, fiction, ArXiv papers, Science Papers), embedding models, retrieval strategies, and evaluation metrics. We’ve included a few examples of the questions and answers we used for each domain in Appendix II.\nThe graphs below show the average performance across all knowledge domains with the top-performing embedding configuration (Gemini Text 004) and retrieving the top-20-chunks. We use 1 minus recall@20 as our evaluation metric, which measures the percentage of relevant documents that fail to be retrieved within the top 20 chunks. You can see the full results in the appendix - contextualizing improves performance in every embedding-source combination we evaluated.\nPerformance improvements # Our experiments showed that:\nContextual Embeddings reduced the top-20-chunk retrieval failure rate by 35% (5.7% → 3.7%). Combining Contextual Embeddings and Contextual BM25 reduced the top-20-chunk retrieval failure rate by 49% (5.7% → 2.9%). Combining Contextual Embedding and Contextual BM25 reduce the top-20-chunk retrieval failure rate by 49%.\nImplementation considerations # When implementing Contextual Retrieval, there are a few considerations to keep in mind:\nChunk boundaries: Consider how you split your documents into chunks. The choice of chunk size, chunk boundary, and chunk overlap can affect retrieval performance1. Embedding model: Whereas Contextual Retrieval improves performance across all embedding models we tested, some models may benefit more than others. We found Gemini and Voyage embeddings to be particularly effective. Custom contextualizer prompts: While the generic prompt we provided works well, you may be able to achieve even better results with prompts tailored to your specific domain or use case (for example, including a glossary of key terms that might only be defined in other documents in the knowledge base). Number of chunks: Adding more chunks into the context window increases the chances that you include the relevant information. However, more information can be distracting for models so there\u0026rsquo;s a limit to this. We tried delivering 5, 10, and 20 chunks, and found using 20 to be the most performant of these options (see appendix for comparisons) but it’s worth experimenting on your use case. Always run evals: Response generation may be improved by passing it the contextualized chunk and distinguishing between what is context and what is the chunk.\nFurther boosting performance with Reranking # In a final step, we can combine Contextual Retrieval with another technique to give even more performance improvements. In traditional RAG, the AI system searches its knowledge base to find the potentially relevant information chunks. With large knowledge bases, this initial retrieval often returns a lot of chunks—sometimes hundreds—of varying relevance and importance.\nReranking is a commonly used filtering technique to ensure that only the most relevant chunks are passed to the model. Reranking provides better responses and reduces cost and latency because the model is processing less information. The key steps are:\nPerform initial retrieval to get the top potentially relevant chunks (we used the top 150); Pass the top-N chunks, along with the user\u0026rsquo;s query, through the reranking model; Using a reranking model, give each chunk a score based on its relevance and importance to the prompt, then select the top-K chunks (we used the top 20); Pass the top-K chunks into the model as context to generate the final result. Combine Contextual Retrieva and Reranking to maximize retrieval accuracy.\nPerformance improvements # There are several reranking models on the market. We ran our tests with the Cohere reranker. Voyage also offers a reranker, though we did not have time to test it. Our experiments showed that, across various domains, adding a reranking step further optimizes retrieval.\nSpecifically, we found that Reranked Contextual Embedding and Contextual BM25 reduced the top-20-chunk retrieval failure rate by 67% (5.7% → 1.9%).\nReranked Contextual Embedding and Contextual BM25 reduces the top-20-chunk retrieval failure rate by 67%.\nCost and latency considerations # One important consideration with reranking is the impact on latency and cost, especially when reranking a large number of chunks. Because reranking adds an extra step at runtime, it inevitably adds a small amount of latency, even though the reranker scores all the chunks in parallel. There is an inherent trade-off between reranking more chunks for better performance vs. reranking fewer for lower latency and cost. We recommend experimenting with different settings on your specific use case to find the right balance.\nConclusion # We ran a large number of tests, comparing different combinations of all the techniques described above (embedding model, use of BM25, use of contextual retrieval, use of a reranker, and total # of top-K results retrieved), all across a variety of different dataset types. Here’s a summary of what we found:\nEmbeddings+BM25 is better than embeddings on their own; Voyage and Gemini have the best embeddings of the ones we tested; Passing the top-20 chunks to the model is more effective than just the top-10 or top-5; Adding context to chunks improves retrieval accuracy a lot; Reranking is better than no reranking; All these benefits stack: to maximize performance improvements, we can combine contextual embeddings (from Voyage or Gemini) with contextual BM25, plus a reranking step, and adding the 20 chunks to the prompt. We encourage all developers working with knowledge bases to use our cookbook to experiment with these approaches to unlock new levels of performance.\nAppendix I # Below is a breakdown of results across datasets, embedding providers, use of BM25 in addition to embeddings, use of contextual retrieval, and use of reranking for Retrievals @ 20.\nSee Appendix II for the breakdowns for Retrievals @ 10 and @ 5 as well as example questions and answers for each dataset.\n1 minus recall @ 20 results across data sets and embedding providers.\n","date":"April 25, 2026","externalUrl":null,"permalink":"/en/posts/20260425-01.introducingcontextualretrieval/","section":"Posts","summary":"","title":"01.Introducing Contextual Retrieval","type":"posts"},{"content":"","date":"April 25, 2026","externalUrl":null,"permalink":"/en/tags/rag/","section":"Tags","summary":"","title":"RAG","type":"tags"},{"content":"","date":"April 20, 2026","externalUrl":null,"permalink":"/en/posts/20260420-00.introduction/","section":"Posts","summary":"","title":"00.Introduction","type":"posts"},{"content":"","date":"April 20, 2026","externalUrl":null,"permalink":"/en/tags/ai/","section":"Tags","summary":"","title":"AI","type":"tags"},{"content":"","date":"April 20, 2026","externalUrl":null,"permalink":"/en/tags/llm/","section":"Tags","summary":"","title":"LLM","type":"tags"},{"content":" About Me # I am Hassenfeld, an undergraduate student majoring in Software Engineering. Currently, I am continuously learning and exploring new knowledge and techniques\u0026hellip;\nTech Stack # Languages # Frameworks \u0026amp; Libraries # Education # Undergraduate: Dalian University of Technology, Ritsumeikan University High School: Hangzhou No. 2 High School of Zhejiang Province Hobbies # Basketball, volleyball, cycling, scale modeler, photography, series and movies addict\u0026hellip;\nAbout This Site # This site is built with the Hugo static site generator, featuring the Blowfish theme. The source code is hosted on GitHub, with deployment automated via GitHub Actions CI/CD pipelines to GitHub Pages. Domain resolution is managed by Cloudflare.\n","date":"March 25, 2026","externalUrl":null,"permalink":"/en/about/","section":"Hassenfeld","summary":"","title":"About","type":"page"},{"content":"","date":"March 25, 2026","externalUrl":null,"permalink":"/en/posts/","section":"Posts","summary":"","title":"Posts","type":"posts"},{"content":" Website Built # I built my first website using Hugo\u0026amp; Blowfish.\nThis site is built with the Hugo static site generator, featuring the Blowfish theme. The source code is hosted on GitHub, with deployment automated via GitHub Actions CI/CD pipelines to GitHub Pages. Domain resolution is managed by Cloudflare.\n\u0026mdash;26.3.22\n","date":"March 22, 2026","externalUrl":null,"permalink":"/en/posts/20260325-stationbuilt/","section":"Posts","summary":"","title":"Website Built","type":"posts"},{"content":"","date":"February 16, 2026","externalUrl":"/projects/happySpringFestival2026/index.html","permalink":"/en/projects/happyspringfestival2026/","section":"Projects","summary":"Light up the wish wall and embrace the new year with fresh hopes and dreams together！","title":"Happy Spring Festival 2026","type":"projects"},{"content":"","date":"December 31, 2025","externalUrl":"/works/happyNewYear2026/index.html","permalink":"/en/projects/happynewyear2026/","section":"Projects","summary":"Tap to sparkle! Fireworks bloom to welcome the New Year!","title":"Happy New Year 2026","type":"projects"},{"content":"","externalUrl":null,"permalink":"/en/authors/","section":"Authors","summary":"","title":"Authors","type":"authors"},{"content":"","externalUrl":null,"permalink":"/en/categories/","section":"Categories","summary":"","title":"Categories","type":"categories"}]