12 min readMehdi Hadeli

Build a Research Agent with Web Search in .NET: Structured Briefs Before Writing

Build a Research Agent with Web Search in .NET: Structured Briefs Before Writing

Runnable sample: samples/research-agent-web-search-dotnet.

I liked the core idea in Raghuveer's post on a research agent with web search: do the thinking in stages, not all at once. The agent should not jump straight from a topic string to a polished article. It should first produce a structured research brief that a person or a downstream writer agent can inspect.

That pattern translates well to .NET.

It also gets more useful in .NET codebases because the ecosystem already pushes you toward explicit contracts, typed models, and bounded workflows. A research agent that returns a real schema instead of a blob of markdown fits naturally with that style.

This post takes the core research-then-write idea from Building a Research Agent with Web Search and Synthesis and adapts it for a .NET implementation. I also used Microsoft's guidance on Microsoft Agent Framework and the official .NET web search tool documentation to keep the .NET side grounded in the actual platform surface.

The companion sample stays deliberately runnable without cloud credentials. It uses a local search corpus and a deterministic synthesis step so you can inspect the architecture and run the workflow immediately. In a production version, the same shape can sit in front of a real web-search tool.

One change I do think is worth making even in the local sample is source quality ranking. If a research agent pulls from online articles and blogs, it should not only match on topic. It should prefer sources that are both relevant and stronger signals overall. In the sample, that means each source carries a popularity score and a community rating, and the ranking step blends those with textual relevance.

The Job of the Research Agent

The research agent is not the writer.

Its job is to take a topic such as research agent web search in .NET and return a brief with fields like these:

  • the proposed angle
  • likely overlap with existing content
  • the search queries it ran
  • the most relevant sources it found
  • an outline for the writer to expand
  • cross-link candidates
  • risks or review notes

That sounds modest, but it fixes a real quality problem.

If the first agent in the chain writes prose immediately, it has to decide the angle, gather current information, detect duplication, choose structure, and draft copy in one pass. That is a lot of responsibility in one step. When the result is wrong, you do not know whether the failure came from weak research, weak differentiation, or weak writing.

If the research agent produces a typed brief first, you get a checkpoint.

That checkpoint matters for both humans and systems. A human can reject the angle in thirty seconds. A downstream writer agent can trust that the topic has already been scoped. A reviewer agent can verify whether the final article stayed faithful to the brief.

Why .NET Is a Good Fit for This Pattern

The .NET version benefits from three things that teams already value.

First, typed outputs are normal. Returning a ResearchBrief record is more natural in C# than hoping a free-form response always preserves the right headings.

Second, the orchestration boundary is clear. Microsoft's Agent Framework guidance makes a useful distinction between an agent and a workflow: use an agent when the task is open-ended, and use a workflow when the process has explicit steps. Research for content usually wants both. The search step can be agentic, but the pipeline around it is usually fixed: research, review, write, review again.

Third, the tool boundary is explicit. The web search tool in Agent Framework is a tool, not the application itself. That is the right mental model. Search gathers evidence. Your application still decides what counts as overlap, which sources are trustworthy, and when a human has to approve the brief.

That separation is the part I would preserve even if the model, provider, or tool changed later.

The Shape I Would Actually Build

For a .NET implementation, I would keep the design boring on purpose.

The sample in this repo uses these pieces:

  • ResearchAgent as the orchestration core
  • IWebSearchClient as the search boundary
  • FakeWebSearchClient as the runnable local implementation
  • SampleDataLoader for the existing-post catalog and search corpus
  • ResearchBrief and related records as the output contract

The search corpus now stores not just topic tags and summaries, but also quality hints for ranking. That makes the local sample closer to the real problem: an agent often has several plausible articles to choose from, and some of them are obviously better references than others.

That is enough to demonstrate the important behavior without turning the sample into a provider-specific tutorial.

The flow is straightforward:

flowchart TD
    A[Topic] --> B[Load existing posts]
    B --> C[Plan search queries]
    C --> D[Search loop with max iterations]
    D --> E[Deduplicate and rank sources]
    E --> F[Detect overlap with existing coverage]
    F --> G[Synthesize typed ResearchBrief]
    G --> H[Human review or writer agent]

That is the main design choice from Raghuveer's post that I would keep. The brief is the real output. The searches are just a way to earn it.

What the Sample Demonstrates

The sample is intentionally local-first.

Instead of requiring Azure credentials or a hosted search provider, it ships with:

  • a small JSON catalog of existing posts
  • a small JSON corpus of search documents
  • a fake search client that ranks documents by token overlap, popularity, and community rating
  • a synthesis step that emits a deterministic ResearchBrief

That means you can run the sample, inspect the JSON output, and understand the control flow without burning time on setup.

The command looks like this:

dotnet run --project ./samples/research-agent-web-search-dotnet/src/ResearchAgent.Sample -- --topic "research agent web search in .NET"

The output is a structured brief, not generated prose. It includes the proposed article angle, identified overlap, source citations, outline sections, and risk notes.

That is the right level for the first stage of a content pipeline.

The Code That Defines the Brief

The output contract is the first piece worth making explicit in the article, because this is what keeps the rest of the pipeline honest:

public sealed record ResearchSource(
    string Title,
    string Url,
    string WhyItMatters,
    int PopularityScore,
    double CommunityRating
);

public sealed record ResearchBrief(
    string Topic,
    string Angle,
    string Summary,
    IReadOnlyList<string> ExistingCoverage,
    IReadOnlyList<string> SearchQueries,
    IReadOnlyList<ResearchSource> Sources,
    IReadOnlyList<string> Outline,
    IReadOnlyList<string> CrossLinkCandidates,
    IReadOnlyList<string> Risks
);

Two details matter here.

First, the brief stores the source list as structured records, not raw citations glued into a paragraph. Second, the source entries retain PopularityScore and CommunityRating, which means downstream steps can see why one article was ranked above another.

The Important Design Decisions

1. Typed brief first, prose later

This is the biggest carryover from the source article, and it is still the right call in .NET.

The sample returns a real model:

public sealed record ResearchBrief(
    string Topic,
    string Angle,
    string Summary,
    IReadOnlyList<string> ExistingCoverage,
    IReadOnlyList<string> SearchQueries,
    IReadOnlyList<ResearchSource> Sources,
    IReadOnlyList<string> Outline,
    IReadOnlyList<string> CrossLinkCandidates,
    IReadOnlyList<string> Risks);

That forces the research stage to answer concrete questions. If the angle is empty or the overlap detection is weak, you see it immediately.

2. Bounded search loop

The source article calls out the need for an iteration cap, and that applies just as much here.

The sample plans a short query list and respects --max-iterations. In practice, you usually want only a few rounds:

  • the raw topic
  • the same topic with .NET emphasis
  • one query that pushes for differentiation or implementation detail

More searching is not always better. Past a certain point, you are just paying to restate the same context.

2.5. Source ranking should reward quality, not just keyword overlap

This is the part you asked for explicitly, and I think it improves the sample.

If the agent is meant to fetch online articles and blogs, the ranker should not stop at “does this text contain the right tokens.” You usually want a blend of:

  • topical match
  • source popularity or reach
  • rating or other quality signal

The local sample now does exactly that. The search index stores popularity and rating metadata for each article, and the score combines those signals with query relevance:

private static double Score(SearchDocument document, IReadOnlyCollection<string> queryTokens)
{
    var searchable = string.Join(
        ' ',
        new[] { document.Title, document.Summary }.Concat(document.Tags)
    );
    var docTokens = Tokenize(searchable).ToHashSet(StringComparer.OrdinalIgnoreCase);
    double textRelevance = 0;

    foreach (var token in queryTokens)
    {
        if (docTokens.Contains(token))
        {
            textRelevance += 1;
        }
    }

    if (
        document.Tags.Any(tag => tag.Contains(".net", StringComparison.OrdinalIgnoreCase))
        && queryTokens.Contains("net", StringComparer.OrdinalIgnoreCase)
    )
    {
        textRelevance += 0.5;
    }

    var popularityWeight = Math.Clamp(document.PopularityScore / 500.0, 0, 2.5);
    var ratingWeight = Math.Clamp(document.CommunityRating / 2.0, 0, 2.5);

    return textRelevance + popularityWeight + ratingWeight;
}

This is still a simple heuristic, but it is a better heuristic. It lets the sample prefer stronger online references when two sources are similarly relevant.

The seed JSON now reflects that shape too:

{
  "title": "Microsoft Agent Framework overview",
  "url": "https://learn.microsoft.com/agent-framework/overview/",
  "summary": "Use agents for open-ended tasks and workflows for explicit multi-step orchestration with checkpointing and human-in-the-loop support.",
  "tags": ["microsoft agent framework", ".net", "workflow", "checkpointing"],
  "popularityScore": 980,
  "communityRating": 4.8
}

In a real system, those values would come from whatever trust signals you have available: internal editorial scores, engagement metrics, feed metadata, or an external ranking service. The sample keeps them in JSON so the behavior is visible and testable.

3. Overlap detection is a first-class concern

This is the part that makes the sample useful for a real content workflow.

The agent does not just ask, "what should I write?" It also asks, "what have I already written that is close enough to matter?"

That overlap detection affects the angle. If an existing post already covers basic web-search tool wiring, the new brief should shift upward into orchestration, checkpoints, or evaluation instead of rewriting the same beginner tutorial.

4. Human review belongs at the brief boundary

This is where I agree strongly with the original post.

The review gate should happen after research and before writing. That is the cheapest moment to catch a bad direction. The sample makes this easy because the JSON brief is readable enough to approve manually and structured enough to feed into another stage.

How This Maps to Real Agent Framework Usage

The sample uses a fake search client because that keeps the project runnable. The production mapping is still simple.

In a hosted setup, the local IWebSearchClient boundary is where you would swap in a real provider-backed implementation. Microsoft documents two relevant paths:

  • Agent Framework web search for agent-side tool use
  • Azure AI Foundry hosted web search when you want provider-managed tool execution

I would still keep the application boundary the same:

  • tool layer gathers fresh information
  • research layer decides what matters
  • brief model defines what downstream steps can trust

That way, you are not coupling your content workflow to one search provider or one SDK sample.

The CLI entrypoint stays small for the same reason. It loads the corpus, runs the agent, and prints the typed brief:

public static async Task<int> RunAsync(
    string[] args,
    TextWriter standardOutput,
    TextWriter standardError,
    CancellationToken cancellationToken = default
)
{
    try
    {
        var options = CommandLineOptions.Parse(args);
        var loader = new SampleDataLoader(AppContext.BaseDirectory);
        var existingPosts = await loader.LoadExistingPostsAsync(
            options.ExistingPostsPath,
            cancellationToken
        );
        var searchDocuments = await loader.LoadSearchDocumentsAsync(
            options.SearchIndexPath,
            cancellationToken
        );

        var agent = new ResearchAgent(existingPosts, new FakeWebSearchClient(searchDocuments));
        var brief = await agent.CreateBriefAsync(
            options.Topic,
            options.MaxIterations,
            cancellationToken
        );

        await standardOutput.WriteLineAsync(JsonSerializer.Serialize(
            brief,
            new JsonSerializerOptions { WriteIndented = true }));

        return 0;
    }
    catch (ArgumentException ex)
    {
        await standardError.WriteLineAsync(ex.Message);
        return 1;
    }
}

That snippet is worth showing because it makes the boundary clear: query execution and ranking happen inside the research layer, while the CLI is only responsible for input, output, and errors.

A Concrete Example of the Differentiation Step

Suppose your site already has a post called agent-framework-web-search-tool-dotnet.

If you ask the research agent for research agent web search in .NET, a weak pipeline might produce a brief that simply restates how to enable web search in an agent.

The sample deliberately pushes away from that.

It detects the overlap and proposes a narrower angle: use web search as an evidence-gathering step that feeds a typed research brief, with review before writing. That is a better article because it adds a workflow idea instead of retelling tool setup.

That is exactly the kind of behavior a research agent should create.

Why I Prefer This to "Just Ask the Model for an Outline"

You can absolutely ask a model for an outline directly. Sometimes that is fine.

But if the content matters, a research brief is a better contract.

It gives you:

  • better separation between discovery and writing
  • a natural human approval step
  • clearer failure modes
  • easier automation for later pipeline stages
  • a reusable artifact for audits and reviews

That is true whether the writer is another agent or a person.

The Sample Files Worth Looking At

If you want the shortest path through the sample, start here:

Those four files show the whole idea without much ceremony.

If you specifically want to inspect the popularity-aware ranking path, I would start with:

Final Takeaway

The useful part of a research agent is not that it can search the web. Plenty of tools can do that.

The useful part is that it turns web search into a structured decision artifact.

That is what makes the rest of the pipeline easier to trust.

In .NET, that pattern fits especially well because typed contracts, explicit orchestration, and replaceable service boundaries are already normal practice. So if you want to build a research agent for a content or documentation pipeline, I would not start with "how do I make the model write a better draft?" I would start with "what brief should the writer be forced to honor?"

That question leads to a better system.