10 tips for writing non-commodity content to dominate LLM optimisation

Date: 21st May 2026

Most of what gets written for the web is commodity content. It paraphrases what already exists, restates the median view of a topic, and sits on a page that ranked well enough in 2019. Google rewarded volume and structure, so volume and structure is what people produced.

That content still gets cited by LLMs. Domain authority, backlinks, freshness, and traditional search signals continue to drive a large share of which pages get surfaced inside ChatGPT, Perplexity, Claude, and Google's AI Overviews. Many of these systems sit on top of conventional search indexes, so the old signals still feed the new pipeline.

What commodity content cannot do is defend itself against substitution. Two identical paraphrases of the same Wikipedia article are interchangeable, and a retrieval system will pick whichever has the stronger ranking signals on the day. Non-commodity content is content with original information, a specific point of view, and a structure that allows passages to be lifted cleanly, so that the retrieval system has fewer substitutes to choose from in the first place. Over time, that defensibility is what compounds.

The strongest research base for this is the Princeton, Georgia Tech, Allen AI, and IIT Delhi study published at ACM SIGKDD 2024 (Aggarwal et al., "GEO: Generative Engine Optimization", KDD '24). The team tested nine content modification tactics across 10,000 queries on a Bing-style generative engine and validated on Perplexity. Three tactics performed consistently well across their reported metrics: Cite Sources, Quotation Addition, and Statistics Addition. The reported improvements sit in the order of 25 to 40 per cent depending on which of the paper's metrics you read (the main two being Position-Adjusted Word Count and Subjective Impression), so the specific number for each tactic varies by table. Keyword stuffing, simplification, and content padding either did nothing or slightly hurt.

A note on evidence before the tips. Tips 2, 3, and 4 below map directly onto findings in that paper. The other seven are practitioner reasoning that extends the same logic into related areas. I have flagged the difference within each tip rather than asking you to remember the distinction.

1. Answer the question in the first paragraph

This is practitioner reasoning about how retrieval works rather than a measured finding. Retrieval systems chunk pages into passages and score them. A passage that contains the direct answer near the top is easier to score as relevant than a passage that builds towards the answer over several paragraphs. Lead with the direct answer. Add nuance afterwards.

It is also good practice for human readers, who do the same thing in reverse. They scan the first paragraph, decide whether you know what you are talking about, and either keep reading or close the tab.

2. Bring original data nobody else has

This is one of the research-backed tips. Statistics Addition was among the top-performing tactics in the GEO study, with reported improvements in the order of 30 to 40 per cent depending on the metric (Aggarwal et al., KDD 2024).

The intuition is about substitutability. If you publish a number that exists nowhere else, the retrieval system has fewer competing sources for that specific claim, which raises the probability that your page is the one selected. Grounding-focused systems are more likely to attribute correctly when they have a single clear source. The mechanism is probability and source scarcity, not the model "deciding" to cite you.

Original data does not have to mean a 2,000-person survey. It can be:

Numbers from your own operations, anonymised where needed
Results from a controlled test you ran on your own stack
A comparison you measured yourself with documented methodology
An audit of a public dataset that nobody has audited that way before

If the number is yours, the citation is yours.

3. Quote named experts with attribution

Also research-backed. Quotation Addition was among the top-performing tactics in the GEO study, with reported improvements in a similar range to Statistics Addition depending on the metric and domain (Aggarwal et al., KDD 2024). The paper notes the lift is strongest in domains like People & Society, Explanation, and History, which often involve direct quotation in the source material.

Quotes are extractable as discrete passages, easy to attribute to a named speaker, and raise the credibility signal of the page. The quotes can come from interviews you conducted, from yourself if you have relevant standing, or from named figures in source documents you cite directly. What they cannot do is float in the page without a name and a context. "Industry experts agree" is not a quote. It is filler. Replace it with a person, a role, a date, and a sentence they actually said.

4. Cite primary sources, not aggregators

The third research-backed tip. Cite Sources was particularly effective for factual questions in the GEO study, and the paper reports the lift was disproportionately large for lower-ranked pages (Aggarwal et al., KDD 2024).

The intuition again is about substitutability and trust. A page that links to the original government report, the original peer-reviewed paper, or the original court ruling looks more like a node of authority than a page that links to a blog post that links to the report. When you find a stat in a Forbes article, do not cite Forbes. Find the study Forbes was citing and cite that. Read the abstract. Confirm the number matches. If it does not, you have a small scoop and a better article.

5. Write with a voice nobody else has

Practitioner reasoning, not a measured finding. The argument is that generic prose tends to blend into the training set and is easier for a retrieval system to substitute, while distinctive prose with opinions, idioms, sentence rhythm, and arguments the author is willing to defend, is harder to substitute.

A passage that frames a question in a way the reader has not seen before may also carry information the model could not have generated by averaging the rest of the index, which gives the retrieval system a reason to favour it. This is plausible mechanism, not measured fact, but it is consistent with how non-commodity content defends itself.

Take a position. Defend it. Be willing to be wrong on the record.

6. Find the gap and write the article that does not exist yet

Practitioner reasoning, not a measured finding. The logic follows from substitutability again. Before you write anything, ask the models the question you are about to answer. See which sources they cite. Read the cited sources. Look for what they all miss.

Often it is a sub-question that everyone gestures at and nobody actually answers. Sometimes it is a comparison nobody has run. Sometimes it is a basic definition that every article assumes the reader already knows. Write the piece that fills that gap precisely. You will not need to outrank anyone to be cited, because there are no competing sources for that specific point.

7. Structure each section to stand alone

Partly grounded in how retrieval-augmented systems work, partly practitioner reasoning about the specific tactics. Retrieval systems chunk pages into passages before the generating model ever sees them. A passage that depends on three earlier sections to make sense is more likely to be scored as low relevance. A passage that opens with a clear claim, supports it, and closes cleanly is more likely to be retained.

Practical rules:

One main idea per section under a descriptive H2 or H3
Restate the subject at the start of each section rather than using "it" or "this"
Keep paragraphs short, typically under five sentences
Put definitions and key numbers near the section heading, not buried at the end

Treat every H2 as a mini-article. If a reader landed on that section with no context, would they get the answer they needed?

8. Name specific entities, versions, and dates

Practitioner reasoning, with partial grounding in how entity-disambiguation works inside retrieval. "A leading e-commerce platform" is harder for a retriever to anchor to a specific entity than "Shopify, which expanded its B2B features to merchants outside the Plus tier in 2025". The latter contains a named entity, an event, and a year, all of which can be matched against the model's prior knowledge of that entity.

The same applies to people, products, regulations, theme versions, and frameworks. Use the full name on first reference. Use the version number if there is one. Name the date the thing happened or shipped. Whether this directly improves citation rates has not been measured as far as I know, but it makes the passage more specific and less interchangeable.

9. Show your working with examples and case studies

Practitioner reasoning. Abstract advice sounds the same as every other piece of abstract advice and is therefore easy to substitute. A worked example with a real client, a real number, and a real before-and-after is harder to commoditise because it contains specific data points that exist nowhere else.

If you are bound by NDA, anonymise. "A Devon-based homeware retailer running Shopify Impulse 8.1.0" is still specific enough to be useful and still hard to substitute. The shape of the case matters more than the brand name.

10. Build entity authority off your own site

Practitioner reasoning rather than a measured finding from the GEO paper. The argument, widely held in GEO circles but not cleanly proven, is that a model's likelihood of citing your brand depends partly on how often your brand appears across independent sources. Pages that earn citations tend to belong to entities mentioned consistently across Wikipedia, Reddit threads, podcast transcripts, industry directories, and the press, which strengthens entity recognition during retrieval.

This is slower to pay off than the on-page tips above, and it has to be earned rather than bought. Write guest pieces with primary sources. Get quoted by name in trade press. Maintain consistent author bios with the same credentials across every byline. The aim is for the model's retrieval and grounding layers to recognise your name as a stable, well-attested entity in your field rather than a string of text that appears on one website.

What this article does not cover

Traditional SEO signals still drive a large share of LLM citation surface area. Domain authority, backlink profile, freshness, schema markup, and crawlability all continue to matter, often more than any on-page content tactic. Non-commodity content sits on top of those signals rather than replacing them. If your domain has no authority and no backlinks, original data and named quotes alone will not get you cited at meaningful volume.

How to measure this

Measurement for LLM visibility is still primitive. The honest answer is that nobody has a clean equivalent of Search Console for ChatGPT yet. Practical proxies:

Prompt the major models with questions in your niche and log which sources they cite
Track brand mentions across AI answers using one of the emerging tools, then sense-check the output by hand
Watch referrer traffic from chat.openai.com, perplexity.ai, and similar
Monitor whether your traditional Google rankings hold as AI Overviews grow

Treat all of these as directional. The space is moving fast enough that any specific tactic will need to be retested in six months.

A closing note on the trade-off

Non-commodity content is more expensive to produce than commodity content. It needs research, opinions, evidence, and a willingness to be specific in ways that can be checked.

That is also why it works. Most competitors will not pay that cost. The pages that do pay it become the citation sources for everybody else. Over time, those pages compound. The article you write this quarter with original data and named sources is still extractable next year, while the paraphrase-of-a-paraphrase has already been replaced by a fresher paraphrase.

That is the bet. Write fewer pieces. Make each one impossible to replace.

Source

Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., and Deshpande, A. (2024). "GEO: Generative Engine Optimization." Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24), pp. 5 to 16. https://doi.org/10.1145/3637528.3671900. Preprint: arXiv:2311.09735.

Note on figures from the paper: the paper reports results across multiple metrics (Position-Adjusted Word Count and Subjective Impression being the headline two) and across multiple engine configurations. Specific percentage uplifts for individual tactics vary by table and metric. The directional finding (Cite Sources, Quotation Addition, and Statistics Addition outperformed baseline and outperformed Keyword Stuffing) is consistent across the paper.

About the Author

Billy Lindon - E-commerce, Web Design, and Digital Marketing Specialist

Billy brings over three decades of experience in technology, sales, and marketing to the fields of e-commerce and product page optimisation. His expertise stems from a diverse background, including extensive formal sales training and close collaboration with sales teams during his time at Nokia. This experience provided him with crucial insights into customer needs and effective selling strategies.

As a web designer, Billy is intimately familiar with UI design and UX principles, which he applies to create user-friendly and conversion-optimised e-commerce sites. His years of hands-on work building and optimising Shopify stores for a variety of businesses have given him deep insights into creating compelling product pages that not only look appealing but also deliver an intuitive user experience, maximising conversion rates in real-world e-commerce environments.

Currently, Billy applies his blend of sales expertise and design knowledge to help small businesses establish and improve their online presence through Shopify and Squarespace platforms. His approach combines data-driven marketing strategies with a thorough understanding of e-commerce best practices and web design principles. This article draws from his years of direct experience in sales, digital marketing, web design, and e-commerce strategy, offering practical insights for businesses looking to enhance their product pages and compete effectively in the global online marketplace.

LinkedIn Profile