Experiment VRP Research ● Running

EXP-11 · HTTP Header Canonical & X-Robots-Tag on Non-HTML Resources

Google documents support for Link: <...>; rel="canonical" and X-Robots-Tag in HTTP response headers for non-HTML resources. This experiment measures the actual behavior.

Hypothesis

Googlebot parses the Link and X-Robots-Tag HTTP response headers on non-HTML resources (PDFs, images, JSON). Edge cases where headers conflict with HTML meta tags, or where headers point cross-origin, are under-tested publicly and may expose logic bugs — e.g. unintended deindexing of referring pages, or cross-origin canonical acceptance that transfers ranking signals in ways the spec does not intend.

Probes

Probe Resource HTTP Header What it tests
pdf-link-canonical PDF Link: <target-a>; rel="canonical" Does Google canonicalise a PDF via HTTP header?
png-xrobots-noindex PNG X-Robots-Tag: noindex Does noindex on image bleed to the referring page?
html-header-vs-meta HTML Link header → target-c
HTML meta → target-a
Which wins — HTTP header or HTML meta?
json-link-canonical JSON Link: <target-a>; rel="canonical" Is JSON even indexed? If so, is the Link header honored?

Canonical Targets

  • Target A — canonical destination for 3 probes
  • Target C — canonical destination for the conflict probe's HTTP header
  • Baseline — control page, should remain independently indexed

Cross-Origin Variant

A mirrored receiver on genbox.cloud serves the same four probes with canonical headers pointing to this site (genbox.app). The question: does Googlebot accept cross-origin canonical directives on non-HTML resources? Spec says origins should match. Real behavior is the research target.

Embedded PNG (triggers X-Robots bleed-through test)

The following image is the probe with X-Robots-Tag: noindex. If this page becomes deindexed after Googlebot fetches the image, that's a bug.

How to Read the Results

  1. Request indexing for each probe URL in GSC URL Inspection
  2. Wait for Googlebot crawl (visible in /api/hp/results)
  3. Re-inspect each probe URL → note "User-declared canonical" and "Google-selected canonical"
  4. Inspect this page + target-a + target-c → compare
  5. Any divergence from Google's documented behavior is a research finding

Scope

Every probe serves benign content (blank PDF, 1px PNG, innocuous JSON). Canonical targets point only at pages on this site. The cross-origin variant uses genbox.cloud, which is owned by the same author. No third-party resources are involved. Findings will be documented and, if a logic bug is confirmed, reported to Google VRP.

← Back to all experiments