llms.txt is a proposed web standard file placed at a site's root directory that provides AI crawlers and large language models with a structured, markdown-formatted guide to the site's most important content for AI consumption.
Quick Answer
llms.txt is a proposed web standard file placed at a site's root directory that provides AI crawlers and large language models with a structured, markdown-formatted guide to the site's most important content for AI consumption.
llms.txt explicitly guides AI crawlers to your highest-value content—the AI equivalent of an XML sitemap
Include your 20–50 most authoritative, answer-dense pages with descriptive summaries
Update llms.txt whenever major new content is published to keep AI crawlers current
Key Takeaways
llms.txt explicitly guides AI crawlers to your highest-value content—the AI equivalent of an XML sitemap
Include your 20–50 most authoritative, answer-dense pages with descriptive summaries
Update llms.txt whenever major new content is published to keep AI crawlers current
How llms.txt Works
Proposed by Answer.AI's Jeremy Howard in 2024, llms.txt is a markdown file placed at yourdomain.com/llms.txt that provides AI crawlers with a curated, human-readable guide to a site's content hierarchy. Where robots.txt tells crawlers what NOT to access, llms.txt tells AI systems what IS most valuable—listing key pages, their purpose, and the topics they cover in clean markdown format. A companion file, llms-full.txt, can contain complete page content in a single AI-digestible document. Major platforms including Perplexity and several LLM providers have expressed support for the standard.
Why llms.txt Matters for B2B Marketing
For B2B brands, llms.txt serves a strategic content curation function: rather than relying on AI crawlers to independently discover and prioritize your most important pages, you explicitly guide them to your pillar content, product pages, and authoritative resources. This is particularly valuable for large sites where AI crawlers may spend crawl budget on low-value pages and miss high-value content. It also allows brands to present their content in the clean, structured format that LLMs process most effectively.
llms.txt: Best Practices & Strategic Application
Best practices for llms.txt implementation include listing your top 20–50 most authoritative and AI-citable pages with descriptive titles and one-sentence summaries, organizing by topic cluster rather than navigation hierarchy, including your most data-rich and answer-dense content, and updating the file when major new content is published. The file should be clean markdown with no JavaScript, no tracking scripts, and no HTML—pure content.
Agency Perspective: llms.txt in Practice
MV3 implements llms.txt as part of our AI search optimization engagements, treating it as the AI equivalent of an XML sitemap. We curate the file to highlight the pages with the highest citation potential—definition content, original research, and comprehensive guides—and monitor AI citation rates before and after implementation to measure impact.
Frequently Asked Questions: llms.txt
llms.txt is a proposed web standard file placed at a site's root directory that provides AI crawlers and large language models with a structured, markdown-formatted guide to the site's most important content for AI consumption.
No—llms.txt is a community-proposed standard, not an official Google specification. Google has not confirmed native support as of mid-2025. However, Perplexity and several AI providers have expressed interest in supporting it, and implementing it creates no downside risk while offering potential upside as AI crawler support expands.
robots.txt is a directive file that tells crawlers what NOT to access and is widely supported by all major search engines. llms.txt is a guidance file that tells AI systems what IS most valuable—it's additive, not restrictive. They serve complementary purposes and should both be present on an AI-search-optimized site.
Prioritize: pillar content pages (comprehensive guides), product/service definition pages, original research and data reports, FAQ pages, case study index pages, and author bio pages (for E-E-A-T signals). Exclude: thin category pages, pagination, archive pages, and any content you wouldn't want AI systems synthesizing into search responses.
MV3 Marketing helps B2B companies apply these strategies to drive measurable pipeline growth. Our team executes ai marketing for technology, SaaS, and professional services companies.
ID used to identify users for 24 hours after last activity
24 hours
_gat
Used to monitor number of Google Analytics server requests when using Google Tag Manager
1 minute
_gac_
Contains information related to marketing campaigns of the user. These are shared with Google AdWords / Google Ads when the Google Ads and Google Analytics accounts are linked together.
90 days
__utma
ID used to identify users and sessions
2 years after last activity
__utmt
Used to monitor number of Google Analytics server requests
10 minutes
__utmb
Used to distinguish new sessions and visits. This cookie is set when the GA.js javascript library is loaded and there is no existing __utmb cookie. The cookie is updated every time data is sent to the Google Analytics server.
30 minutes after last activity
__utmc
Used only with old Urchin versions of Google Analytics and not with GA.js. Was used to distinguish between new sessions and visits at the end of a session.
End of session (browser)
__utmz
Contains information about the traffic source or campaign that directed user to the website. The cookie is set when the GA.js javascript is loaded and updated when data is sent to the Google Anaytics server
6 months after last activity
__utmv
Contains custom information set by the web developer via the _setCustomVar method in Google Analytics. This cookie is updated every time new data is sent to the Google Analytics server.
2 years after last activity
__utmx
Used to determine whether a user is included in an A / B or Multivariate test.
18 months
_ga
ID used to identify users
2 years
_gali
Used by Google Analytics to determine which links on a page are being clicked