Multi-Agent Collaboration
Multi-Agent Collaboration
While a monolithic agent architecture can be effective for well-defined problems, its capabilities are often constrained when faced with complex, multi-domain tasks. The Multi-Agent Collaboration pattern addresses these limitations by structuring a system as a cooperative ensemble of distinct, specialized agents. This approach is predicated on the principle of task decomposition, where a high-level objective is broken down into discrete sub-problems. Each sub-problem is then assigned to an agent possessing the specific tools, data access, or reasoning capabilities best suited for that task.
Collaboration can take various forms:
- Sequential Handoffs: One agent completes a task and passes its output to another agent for the next step in a pipeline (similar to the Planning pattern, but explicitly involving different agents).
- Parallel Processing: Multiple agents work on different parts of a problem simultaneously, and their results are later combined.
- Debate and Consensus: Multi-Agent Collaboration where Agents with varied perspectives and information sources engage in discussions to evaluate options, ultimately reaching a consensus or a more informed decision.
- Hierarchical Structures: A manager agent might delegate tasks to worker agents dynamically based on their tool access or plugin capabilities and synthesize their results. Each agent can also handle relevant groups of tools, rather than a single agent handling all the tools.
- Expert Teams: Agents with specialized knowledge in different domains (e.g., a researcher, a writer, an editor) collaborate to produce a complex output.
- Critic-Reviewer: Agents create initial outputs such as plans, drafts, or answers. A second group of agents then critically assesses this output for adherence to policies, security, compliance, correctness, quality, and alignment with organizational objectives. The original creator or a final agent revises the output based on this feedback. This pattern is particularly effective for code generation, research writing, logic checking, and ensuring ethical alignment. The advantages of this approach include increased robustness, improved quality, and a reduced likelihood of hallucinations or errors.
Practical Applications & Use Cases
- Complex Research and Analysis
- Software Development
- Creative Content Generation
- Supply Chain Optimization
- Network Analysis & Remediation
Summary
- What: Complex problems often exceed the capabilities of a single, monolithic LLM-based agent. A solitary agent may lack the diverse, specialized skills or access to the specific tools needed to address all parts of a multifaceted task. This limitation creates a bottleneck, reducing the system's overall effectiveness and scalability. As a result, tackling sophisticated, multi-domain objectives becomes inefficient and can lead to incomplete or suboptimal outcomes.
- Why: The Multi-Agent Collaboration pattern offers a standardized solution by creating a system of multiple, cooperating agents. A complex problem is broken down into smaller, more manageable sub-problems. Each sub-problem is then assigned to a specialized agent with the precise tools and capabilities required to solve it. These agents work together through defined communication protocols and interaction models like sequential handoffs, parallel workstreams, or hierarchical delegation. This agentic, distributed approach creates a synergistic effect, allowing the group to achieve outcomes that would be impossible for any single agent.
- Rule of thumb: Use this pattern when a task is too complex for a single agent and can be decomposed into distinct sub-tasks requiring specialized skills or tools. It is ideal for problems that benefit from diverse expertise, parallel processing, or a structured workflow with multiple stages, such as complex research and analysis, software development, or creative content generation.
Key Takeaways
- Multi-agent collaboration involves multiple agents working together to achieve a common goal.
- This pattern leverages specialized roles, distributed tasks, and inter-agent communication.
- Collaboration can take forms like sequential handoffs, parallel processing, debate, or hierarchical structures.
- This pattern is ideal for complex problems requiring diverse expertise or multiple distinct stages.
Code Examples
CrewAI Implementation
multi_agent_sequential_crewai.py
import osfrom dotenv import load_dotenvfrom crewai import Agent, Task, Crew, Processfrom langchain_google_genai import ChatGoogleGenerativeAIdef setup_environment() -> None:"""Loads environment variables and checks for the required API key."""load_dotenv()if not os.getenv("GOOGLE_API_KEY"):raise ValueError("GOOGLE_API_KEY not found. Please set it in your .env file.")def main() -> None:"""Initializes and runs the AI crew for content creation using Gemini."""setup_environment()# Define the language model to use.# Updated to a model from the Gemini 2.0 series for better performance and features.# For cutting-edge (preview) capabilities, you could use "gemini-2.5-flash".llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")# Define Agents with specific roles and goalsresearcher = Agent(role="Senior Research Analyst",goal="Find and summarize the latest trends in AI.",backstory=("You are an experienced research analyst with a knack for identifying ""key trends and synthesizing information."),verbose=True,allow_delegation=False,)writer = Agent(role="Technical Content Writer",goal="Write a clear and engaging blog post based on research findings.",backstory=("You are a skilled writer who can translate complex technical topics ""into accessible content."),verbose=True,allow_delegation=False,)# Define Tasks for the agentsresearch_task = Task(description=("Research the top 3 emerging trends in Artificial Intelligence in 2024-2025. ""Focus on practical applications and potential impact."),expected_output="A detailed summary of the top 3 AI trends, including key points and sources.",agent=researcher,)writing_task = Task(description=("Write a 500-word blog post based on the research findings. ""The post should be engaging and easy for a general audience to understand."),expected_output="A complete 500-word blog post about the latest AI trends.",agent=writer,context=[research_task],)# Create the Crewblog_creation_crew = Crew(agents=[researcher, writer],tasks=[research_task, writing_task],process=Process.sequential,llm=llm,verbose=True,)# Execute the Crewprint("## Running the blog creation crew with Gemini 2.0 Flash... ##")try:result = blog_creation_crew.kickoff()print("\n-------------------\n")print("## Crew Final Output ##")print(result)except Exception as e:print(f"\nAn unexpected error occurred: {e}")if __name__ == "__main__":main()
Google ADK Implementation
multi_agent_hierarchical_adk.py
import osfrom google.adk.agents import LlmAgent, BaseAgentfrom google.adk.agents.invocation_context import InvocationContextfrom google.adk.events import Eventfrom typing import AsyncGeneratorfrom dotenv import load_dotenvload_dotenv() # Load environment variables from .env fileGOOGLE_MODEL_ID = os.getenv("GOOGLE_MODEL_ID")# Correctly implement a custom agent by extending BaseAgentclass TaskExecutor(BaseAgent):"""A specialized agent with custom, non-LLM behavior."""name: str = "TaskExecutor"description: str = "Executes a predefined task."async def _run_async_impl(self, context: InvocationContext) -> AsyncGenerator[Event, None]:"""Custom implementation logic for the task."""# This is where your custom logic would go.# For this example, we'll just yield a simple event.yield Event(author=self.name, content="Task finished successfully.")# Define individual agents with proper initialization# LlmAgent requires a model to be specified.greeter = LlmAgent(name="Greeter",model=GOOGLE_MODEL_ID,instruction="You are a friendly greeter.",)task_doer = TaskExecutor() # Instantiate our concrete custom agent# Create a parent agent and assign its sub-agents# The parent agent's description and instructions should guide its delegation logic.coordinator = LlmAgent(name="Coordinator",model=GOOGLE_MODEL_ID,description="A coordinator that can greet users and execute tasks.",instruction=("When asked to greet, delegate to the Greeter. ""When asked to perform a task, delegate to the TaskExecutor."),sub_agents=[greeter,task_doer,],)# The ADK framework automatically establishes the parent-child relationships.# These assertions will pass if checked after initialization.assert greeter.parent_agent == coordinatorassert task_doer.parent_agent == coordinatorprint("Agent hierarchy created successfully.")
multi_agent_loopagent_adk.py
import asynciofrom typing import AsyncGeneratorfrom google.adk.agents import LoopAgent, LlmAgent, BaseAgentfrom google.adk.events import Event, EventActionsfrom google.adk.agents.invocation_context import InvocationContext# Best Practice: Define custom agents as complete, self-describing classes.class ConditionChecker(BaseAgent):"""A custom agent that checks for a 'completed' status in the session state."""name: str = "ConditionChecker"description: str = "Checks if a process is complete and signals the loop to stop."async def _run_async_impl(self, context: InvocationContext) -> AsyncGenerator[Event, None]:"""Checks state and yields an event to either continue or stop the loop."""status = context.session.state.get("status", "pending")is_done = status == "completed"if is_done:# Escalate to terminate the loop when the condition is met.yield Event(author=self.name, actions=EventActions(escalate=True))else:# Yield a simple event to continue the loop.yield Event(author=self.name,content="Condition not met, continuing loop.",)# Correction: The LlmAgent must have a model and clear instructions.process_step = LlmAgent(name="ProcessingStep",model="gemini-2.0-flash-exp",instruction=("You are a step in a longer process. Perform your task. ""If you are the final step, update session state by setting ""'status' to 'completed'."),)# The LoopAgent orchestrates the workflow.poller = LoopAgent(name="StatusPoller",max_iterations=10,sub_agents=[process_step,ConditionChecker(), # Instantiating the well-defined custom agent.],)# This poller will now execute 'process_step' and then 'ConditionChecker'# repeatedly until the status is 'completed' or 10 iterations have passed.
multi_agent_parallel_adk.py
from google.adk.agents import Agent, ParallelAgent# It's better to define the fetching logic as tools for the agents# For simplicity in this example, we'll embed the logic in the agent's instruction.# In a real-world scenario, you would use tools.# Define the individual agents that will run in parallelweather_fetcher = Agent(name="weather_fetcher",model="gemini-2.0-flash-exp",instruction="Fetch the weather for the given location and return only the weather report.",output_key="weather_data", # The result will be stored in session.state["weather_data"])news_fetcher = Agent(name="news_fetcher",model="gemini-2.0-flash-exp",instruction="Fetch the top news story for the given topic and return only that story.",output_key="news_data", # The result will be stored in session.state["news_data"])# Create the ParallelAgent to orchestrate the sub-agentsdata_gatherer = ParallelAgent(name="data_gatherer",sub_agents=[weather_fetcher,news_fetcher,],)
multi_agent_sequential_adk.py
from google.adk.agents import SequentialAgent, Agent# This agent's output will be saved to session.state["data"]step1 = Agent(name="Step1_Fetch", output_key="data")# This agent will use the data from the previous step.# We instruct it on how to find and use this data.step2 = Agent(name="Step2_Process",instruction="Analyze the information found in state['data'] and provide a summary.",)pipeline = SequentialAgent(name="MyPipeline",sub_agents=[step1, step2],)# When the pipeline is run with an initial input, Step1 will execute,# its response will be stored in session.state["data"], and then# Step2 will execute, using the information from the state as instructed.
multi_agent_tool_adk.py
from google.adk.agents import LlmAgentfrom google.adk.tools import agent_toolfrom google.genai import types# 1. A simple function tool for the core capability.# This follows the best practice of separating actions from reasoning.def generate_image(prompt: str) -> dict:"""Generates an image based on a textual prompt.Args:prompt: A detailed description of the image to generate.Returns:A dictionary with the status and the generated image bytes."""print(f"TOOL: Generating image for prompt: '{prompt}'")# In a real implementation, this would call an image generation API.# For this example, we return mock image data.mock_image_bytes = b"mock_image_data_for_a_cat_wearing_a_hat"return {"status": "success",# The tool returns the raw bytes, the agent will handle the Part creation."image_bytes": mock_image_bytes,"mime_type": "image/png",}# 2. Refactor the ImageGeneratorAgent into an LlmAgent.# It now correctly uses the input passed to it.image_generator_agent = LlmAgent(name="ImageGen",model="gemini-2.0-flash",description="Generates an image based on a detailed text prompt.",instruction=("You are an image generation specialist. Your task is to take ""the user's request ""and use the `generate_image` tool to create the image. ""The user's entire request should be used as the 'prompt' ""argument for the tool. ""After the tool returns the image bytes, you MUST output the ""image."),tools=[generate_image],)# 3. Wrap the corrected agent in an AgentTool.# The description here is what the parent agent sees.image_tool = agent_tool.AgentTool(agent=image_generator_agent,description="Use this tool to generate an image. The input should be a descriptive prompt of the desired image.",)# 4. The parent agent remains unchanged. Its logic was correct.artist_agent = LlmAgent(name="Artist",model="gemini-2.0-flash",instruction=("You are a creative artist. First, invent a creative and ""descriptive prompt for an image. ""Then, use the `ImageGen` tool to generate the image using ""your prompt."),tools=[image_tool],)