EXP-11 · HTTP Header Canonical & X-Robots-Tag on Non-HTML Resources
Google documents support for Link: <...>; rel="canonical" and X-Robots-Tag in HTTP response headers for non-HTML resources. This experiment measures the actual behavior.
Hypothesis
Googlebot parses the Link and X-Robots-Tag HTTP response headers on
non-HTML resources (PDFs, images, JSON). Edge cases where headers conflict with HTML meta tags,
or where headers point cross-origin, are under-tested publicly and may expose logic bugs —
e.g. unintended deindexing of referring pages, or cross-origin canonical acceptance that
transfers ranking signals in ways the spec does not intend.
Probes
| Probe | Resource | HTTP Header | What it tests |
|---|---|---|---|
pdf-link-canonical | Link: <target-a>; rel="canonical" | Does Google canonicalise a PDF via HTTP header? | |
png-xrobots-noindex | PNG | X-Robots-Tag: noindex | Does noindex on image bleed to the referring page? |
html-header-vs-meta | HTML | Link header → target-c HTML meta → target-a | Which wins — HTTP header or HTML meta? |
json-link-canonical | JSON | Link: <target-a>; rel="canonical" | Is JSON even indexed? If so, is the Link header honored? |
Canonical Targets
- Target A — canonical destination for 3 probes
- Target C — canonical destination for the conflict probe's HTTP header
- Baseline — control page, should remain independently indexed
Cross-Origin Variant
A mirrored receiver on genbox.cloud serves the same four probes with canonical
headers pointing to this site (genbox.app). The question: does Googlebot accept
cross-origin canonical directives on non-HTML resources? Spec says origins should match. Real
behavior is the research target.
Embedded PNG (triggers X-Robots bleed-through test)
The following image is the probe with X-Robots-Tag: noindex. If this page
becomes deindexed after Googlebot fetches the image, that's a bug.
How to Read the Results
- Request indexing for each probe URL in GSC URL Inspection
- Wait for Googlebot crawl (visible in /api/hp/results)
- Re-inspect each probe URL → note "User-declared canonical" and "Google-selected canonical"
- Inspect this page + target-a + target-c → compare
- Any divergence from Google's documented behavior is a research finding
Scope
Every probe serves benign content (blank PDF, 1px PNG, innocuous JSON). Canonical targets
point only at pages on this site. The cross-origin variant uses genbox.cloud,
which is owned by the same author. No third-party resources are involved. Findings will be
documented and, if a logic bug is confirmed, reported to
Google VRP.