Developers increasingly build with AI coding agents, and one of the first things those agents do when working on an identity integration is try to pull in your docs. If your documentation isn’t structured for that, the agent either gets noisy HTML it has to parse, or it gives up and hallucinates the API surface instead.
There are ways to make your docs more agent-friendly, and we recently implemented a number of these solutions for Ping Identity’s documentation. The result: our docs are now much easier for agents to discover and consume, which means better accuracy and a smoother experience for developers building with AI.
The problem: great docs, invisible to agents
Ping Identity has extensive technical documentation — thousands of pages across docs.pingidentity.com and developer.pingidentity.com. But “extensive” doesn’t automatically mean “agent-friendly.”
Before this work, an AI agent trying to use our docs would hit a few walls:
-
HTML noise: Documentation pages are built for browsers. Navigation chrome, scripts, and hundreds of lines of HTML overhead are irrelevant for an agent — and they’re expensive in terms of tokens. Stripping HTML to get to the actual content is possible, but imperfect, since the agent has to guess which parts are content and which parts are noise.
Well-structured, semantic HTML does carry useful context, but in practice most HTML pages aren’t very well-structured or semantically rich — and even when they are, the signal-to-noise ratio rarely justifies the token cost.
-
No discovery mechanism: Even if an agent knew our docs existed, there was no structured way for it to find the right page. It would have to either crawl the entire site or guess at URLs.
-
No machine-readable product metadata: Pages described products in natural language, but nothing told a search engine or agent in structured terms which product a page was about.
On top of all this, AI-driven traffic already accounts for a significant percentage of site visits. We were already being crawled; we just weren’t giving those crawlers what they needed.
To make matters more complex, there isn’t really a single “right” way to solve these problems. Best practices and specifications are still evolving, and it may be a while before an approach that works across most agents solidifies into a standard. Any solution we implemented had to be flexible enough to evolve as the ecosystem does, while still providing immediate value to developers building with AI today.
What we built
Because of the evolving landscape, we took a multipronged approach, implementing several complementary features to make our docs more agent-friendly. Our goal was to make it easy for agents to discover the docs they need, give them a clean, efficient way to fetch the content, and provide structured metadata to help them understand what they’re looking at, all without diminishing the experience for human readers.
Markdown alternates for every page
The most significant single change: every documentation page now has a Markdown alternate at the same URL, with .html swapped for .md, which lives parallel to the HTML version. For example:
https://docs.pingidentity.com/pingam/8/setup/am-admin-interface-tools.html
→
https://docs.pingidentity.com/pingam/8/setup/am-admin-interface-tools.md
Markdown alternates strip out navigation, scripts, and all the browser-facing overhead, leaving just the content. Pulling Markdown instead of HTML can eliminate a significant portion of the data an agent would otherwise have to process as noise. That’s a dramatic drop in token cost and a meaningful improvement in accuracy. Taking a random sampling of pages across our docsets, the Markdown version is typically between 85% and 95% smaller than the HTML version, while still retaining the content an AI agent actually cares about.
To improve discovery of the Markdown alternates, we also added a <link rel="alternate" type="text/markdown"> element in the <head> of every HTML page, pointing to the corresponding Markdown URL. This way, an agent that starts with the HTML page can easily find the Markdown version without having to guess the URL structure. To help humans discover it too, we also added a link to the Markdown version at the top of each page.
We generate the markdown versions based on the published HTML, rather than just exposing our raw files. This is because we write our documentation using AsciiDoc. While most agents are perfectly happy to handle AsciiDoc, we do enough post processing during our build pipeline that it wasn’t feasible to just expose the raw AsciiDoc, and if we were going to have to process the files anyway, we may as well align with what has become an unofficial standard for AI-oriented docs. The Markdown generation happens at build time via an extension to our Antora build pipeline. Building the Markdown and the HTML from the same source content keeps them in sync — you can’t accidentally publish docs that have one without the other.
llms.txt indexes
Markdown alternates solve fetching a known page. But how does an agent know which pages exist?
The answer is llms.txt, an emerging standard for giving AI models a curated, structured index of a site — similar in spirit to robots.txt, but designed for context consumption rather than crawling control.
We generate llms.txt files as part of the same Markdown build pipeline, at two levels:
- Per-docset:
https://docs.pingidentity.com/pingam/llms.txtlists every page in the PingAM docset with its title and Markdown URL. An agent working on a PingAM integration can load this file to get a full inventory of available content, then selectively fetch the pages relevant to its task. - Site-wide:
https://docs.pingidentity.com/llms.txtandhttps://developer.pingidentity.com/llms.txtaggregate the per-docset files into a single index organized by product.
JSON-LD structured data
In parallel with the Markdown and llms.txt work, we added JSON-LD structured data to every documentation page. JSON-LD is a machine-readable format for expressing things like “this page is a TechArticle about PingAM, published on this date, by Ping Identity.”
Search engines and AI knowledge graphs already know how to consume this. Adding it to our pages helps both SEO and agent context — an agent parsing a docs page can read the JSON-LD block and immediately understand the product, product version, and content type rather than having to infer it from the natural language text. For a first pass, establishing the product/content-type relationship cleanly was the highest-value thing to ship, with room to iterate as needs evolve.
A “Docs for Agents” landing page
To tie all of this together for humans and agents alike, we published a Docs for Agents page explaining how everything fits together: where the Markdown alternates are, how the llms.txt files are structured, and recommended patterns for loading focused context on a specific product vs. discovering what’s available across the full site.
This isn’t necessarily a standard, but it’s a useful resource, and one that others in the industry have also opted to do (Cloudflare has a similar page, for instance). It’s a way to signal to developers that “yes, we know you’re using AI agents, and yes, we have features in place to support that.”
Accept: text/markdown content negotiation
Some agents send an Accept: text/markdown header with their requests, expecting the server to return Markdown if it’s available. We’ve recently added support for this, including a fallback to the HTML version if the Markdown version isn’t available for some reason.
This approach is gaining a lot of traction as a more elegant way to handle Markdown alternates, since it doesn’t require the agent to manage two separate URLs for each page. The agent can just request the normal URL and let content negotiation handle the rest. It’s still early days for this pattern, and not all agents support it yet, but we wanted to be ready for it as it becomes more common. You can learn more about this approach at Accept: text/markdown.
Try it out
If you’re building an integration and want to give your AI coding agent focused context on a Ping product:
- Point the agent at the product’s
llms.txt: For example,https://docs.pingidentity.com/pingam/llms.txtfor PingAM. The agent can scan the titles and selectively fetch the pages most relevant to what you’re building. - Fetch Markdown instead of HTML: Swap
.htmlfor.mdon any page URL, or extract the<link rel="alternate" type="text/markdown">URL from the page<head>. If your agent supports it, you can also just send anAccept: text/markdownheader and let content negotiation do the work. - Not sure which product docset you need? Start at
https://docs.pingidentity.com/llms.txtorhttps://developer.pingidentity.com/llms.txtfor an index of everything available.
What’s next
Enriched page descriptions for llms.txt entries: Right now each entry in an llms.txt file is just the page title and URL. The spec recommends a short description (≤150 characters) alongside each link. We’re updating all of our docs pages to include these descriptions, which should help make the llms.txt indexes even more useful.
Docs MCP server: The longer-term vision is a Docs MCP server that exposes our documentation as callable tools rather than just static files. An agent using MCP would be able to search across products, retrieve specific sections, and get structured answers without having to manage the llms.txt/Markdown workflow manually. This is in active development and should be launching soon.
Monitoring and iteration: We’re actively monitoring how agents are using these features and gathering feedback from developers. The AI ecosystem is evolving rapidly, so we’re committed to iterating on our approach as best practices and standards emerge.
Do you have thoughts or questions on this article? Join the discussion on the Ping Identity developer community!
Resources
Specifications
“Docs for Agents” Examples
