Canvassing the Fingerprinters | Undetect Research

1. Executive Summary

The paper measures modern canvas fingerprinting at web scale and shows that the test canvas itself can identify the fingerprinter. Identical rendered canvases group deployments across sites, making vendor footprints visible even when script origins are masked, bundled, routed through subdomains, or served from common infrastructure.

The authors found canvas fingerprinting on 12.7% of successfully crawled popular sites and 9.9% of successfully crawled tail sites. They also found that the ecosystem is concentrated: popular sites generated 504 unique fingerprinting canvases, tail sites generated 288, and twelve attributed services covered most sites that generated test canvases.

Operational takeaway: classifying fingerprinting by script domain alone is fragile. The paper supports reviewing rendered canvas output, first-party serving context, blocklist behavior, and repeated-render checks as separate signals.

2. Research Question

The paper asks how common canvas fingerprinting is in 2025, whether identical test canvases can reveal the service or vendor responsible, and what deployment context suggests about tracking, security, advertising, and defense evasion.

Prevalence

How many popular and less-popular homepages extract canvases that look fingerprintable under the paper's heuristics?

Clustering

Can identical toDataURL outputs group sites using the same deterministic test canvas?

Attribution

Can demos, known customers, script URL patterns, and script contents tie canvas groups to named fingerprinting services?

Context

Do blocklists, ad blockers, first-party serving, subdomain routing, and repeated-render checks change how these deployments should be interpreted?

3. Method

The crawl used the May 2025 Tranco ranking and visited homepages from two site sets: the top 20,000 sites and a random sample of 20,000 sites ranked 20,001 through 1,000,000. The core prevalence analysis used 16,276 successfully crawled popular sites and 17,260 successfully crawled tail sites.

Step	Paper method	Review significance
Corpus	Top-level pages from popular and tail Tranco site sets in May 2025.	Results are homepage crawl measurements, not whole-site or whole-web rates.
Instrumentation	Modified DuckDuckGo Tracker Radar Collector, a Puppeteer-based crawler, to intercept Canvas API calls and property accesses.	Records script source URLs, arguments, return values, and timestamps around canvas behavior.
Rendering control	Main crawl ran on one Intel Ubuntu 22.04.2 LTS machine; a second Apple M1 crawl checked grouping behavior.	Canvas bytes differed across machines, but identical-canvas site grouping stayed consistent in validation.
Consent and behavior	The crawler handled some anti-bot checks, scrolled pages, waited five seconds, and used autoconsent to opt in to common consent banners.	Automated measurement still may miss behavior behind login, checkout, or specific user actions.
Attribution	Vendor attribution combined public demos, known customer crawls, script URL patterns, and script-content patterns.	Attribution is evidence-backed but should be treated as a review classification, not complete ground truth.

4. Fingerprintable Canvas Detection

The paper focuses on extracted canvases because a toDataURL value captures the rendered result of previous Canvas API calls. It then removes likely benign cases with three high-level filters: lossy JPEG/WebP outputs, canvases smaller than 16 by 16 pixels, and canvases generated by scripts that also invoked animation-associated methods such as save and restore.

Classifier result. After filtering, the paper classified 83% of extracted canvases across the popular and tail crawls as fingerprintable.
Manual checks. The authors found all 200 sampled excluded canvases benign and found two false positives in a sample of 300 unique fingerprintable canvases; both false positives appeared on only one domain.
Scope. The method is tuned for extracted deterministic test canvases, not every possible use of the Canvas API.

Examples of small canvases excluded from the paper's fingerprintable-canvas analysis. — Figure 2 crop: examples of small canvases excluded as likely benign.

5. Prevalence and Clustering

The central measurement result is moderate but widespread deployment. The crawler found 2,067 popular sites and 1,715 tail sites extracting at least one fingerprintable canvas. Among sites that generated a test canvas, the six most frequent canvases accounted for 70.1% of popular sites and 47.1% of tail sites.

2,067 popular sites12.7% of 16,276 successfully crawled top-20k sites extracted at least one fingerprintable canvas.

1,715 tail sites9.9% of 17,260 successfully crawled tail sites extracted at least one fingerprintable canvas.

3.31 canvases per siteAverage fingerprintable canvases per site, with median 2 and maximum 60.

483 popular sitesThe most common popular-site canvas appeared on about 3% of successfully crawled popular sites.

Top 50 test canvas frequency chart comparing popular and tail sites. — Figure 1 crop: top-50 test canvas groups show a concentrated head and long tail.

The paper treats identical test-canvas groups as an upper bound on a single organization's cross-site reach. That distinction matters: two organizations could independently choose the same canvas, while some vendors deliberately use per-site canvases that defeat grouping.

6. Vendor Attribution

The authors attribute canvases for twelve prominent services, covering 73% of popular and 71% of tail sites that generated fingerprinting test canvases. The attributed set includes security-oriented services, mixed-use services, advertising or analytics-adjacent services, and bot-detection vendors.

Vendor or service	Popular sites	Tail sites	Review note
Akamai	485	205	Largest attributed popular-site footprint; paper describes this as bot-detection context.
FingerprintJS	462	298	Mixed-use risk because commercial customers can use browser identifiers beyond security use cases.
mail.ru	242	173	Broad reach across sampled Russian-domain sites.
Shopify	32	457	Tail-site outlier linked to storefront performance monitoring.
Total attributed set	1,513	1,222	Totals across the services attributed in Table 1.

Table 1 showing fingerprinting service attribution counts for popular and tail sites. — Table 1 crop: service attribution counts and public service categories.

Table 3 showing vendor attribution methods and script pattern examples. — Table 3 crop: attribution sources and script-pattern examples used as evidence.

7. Tracking, Blocklists, and Blockers

The paper uses blocklists as a proxy for advertising or tracking context, while explicitly treating them as imperfect evidence of intent. Scripts generating 2,696 top-20k test canvases and 1,635 tail test canvases appeared in at least one of EasyList, EasyPrivacy, or Disconnect. Scripts generating 942 top-20k canvases and 670 tail canvases matched all three lists.

Blocklist coverage

Table 4 shows substantial list coverage: 45% of top-20k test canvases and 37% of tail test canvases were generated by scripts included in at least one checked list.

Practical blocker impact

Installed ad blockers reduced observed canvas fingerprinting by only about 5%, much less than raw blocklist inclusion might suggest.

First-party serving

The paper reports that 49% of top-20k and 52% of tail canvas-fingerprinting sites had at least one test canvas rendered by a first-party-served script.

Origin masking

The gap is attributed to rule context, first-party exceptions, bundled JavaScript, CNAME cloaking, subdomain routing, and common CDN hosting.

Table 4 showing counts of test canvases generated by scripts present in EasyList, EasyPrivacy, and Disconnect. — Table 4 crop: blocklist membership across EasyList, EasyPrivacy, and Disconnect.

Table 2 showing control, Adblock Plus, and uBlock Origin counts for test canvases and sites. — Table 2 crop: ad blocker recrawls only modestly reduced canvas-fingerprinting observations.

8. Browser Defense Evasion

The paper highlights a practical weakness in canvas defenses that randomize output on every render. A fingerprinter can render the same deterministic canvas twice and compare outputs; if they differ, the script can disregard the canvas component or adapt its fingerprinting strategy.

Observed check. Nearly half of sites performing canvas fingerprinting, 45%, had at least one test canvas generated and extracted twice.
Open-source library behavior. The paper notes that a popular open-source fingerprinting library performs this kind of inconsistency check and disregards canvas when repeated renders differ.
Scoped claim. The discussion applies to defenses that add different noise across repeated renders. The authors separately note that persistent per-session noise behaves differently.

Algorithm 1 pseudocode for canvas randomization detection by comparing two rendered canvases. — Algorithm 1 crop: repeated rendering as a canvas-randomization detection strategy.

9. Code Artifacts

These browser-console helpers are reconstructed from the paper's algorithm, measurement rules, and appendix tables. They are not author-provided source code.

Canvas randomization stability probe Reconstructed from Algorithm 1

Purpose: Render the same deterministic canvas several times and compare hashes to detect per-render canvas noise.
Paper basis: Section 5.3 and Appendix A.7 describe repeated rendering as a way to detect randomization defenses.
Caveat: This uses a synthetic canvas, not a vendor's exact test canvas, and does not evaluate session-persistent noise.

(async () => {
  const hashText = async (value) => {
    const bytes = new TextEncoder().encode(value);
    const digest = await crypto.subtle.digest("SHA-256", bytes);

    return [...new Uint8Array(digest)]
      .map((byte) => byte.toString(16).padStart(2, "0"))
      .join("");
  };

  const renderTestCanvas = () => {
    const canvas = document.createElement("canvas");
    canvas.width = 300;
    canvas.height = 120;

    const context = canvas.getContext("2d", { willReadFrequently: true });
    if (!context) {
      throw new Error("2D canvas is unavailable");
    }

    context.textBaseline = "top";
    context.fillStyle = "#f60";
    context.fillRect(4, 4, 292, 112);
    context.fillStyle = "#069";
    context.font = "16px Arial";
    context.fillText("Cwm fjordbank glyphs vext quiz", 12, 16);
    context.font = "18px serif";
    context.fillText("Canvas fingerprint stability probe", 12, 44);
    context.strokeStyle = "rgba(0, 0, 0, 0.55)";
    context.beginPath();
    context.arc(238, 62, 28, 0, Math.PI * 2);
    context.stroke();

    return canvas.toDataURL("image/png");
  };

  const dataUrls = Array.from({ length: 5 }, renderTestCanvas);
  const hashes = await Promise.all(dataUrls.map(hashText));
  const uniqueHashes = [...new Set(hashes)];
  const result = {
    artifact: "canvas-randomization-stability-probe",
    stableAcrossRepeatedRenders: uniqueHashes.length === 1,
    renderCount: dataUrls.length,
    uniqueHashCount: uniqueHashes.length,
    hashes,
    dataUrlLengths: dataUrls.map((value) => value.length),
    interpretation:
      uniqueHashes.length === 1
        ? "Repeated renders were stable for this synthetic canvas."
        : "Repeated renders differed; per-render canvas noise or another nondeterministic factor may be present."
  };

  console.log(result);
  return result;
})()

Canvas extraction observer Inferred from Sections 3.1-3.2

Purpose: Patch Canvas APIs on the current page, log toDataURL extractions, and classify likely fingerprintable canvases using the paper's high-level filters.
Paper basis: Section 3.1 records Canvas API activity; Section 3.2 excludes lossy output, small canvases, and animation-associated methods.
Caveat: Install before reloading a page for useful coverage. Stack-derived script URLs are best-effort and browser-dependent.

(async () => {
  const globalName = "__canvasExtractionObserver";

  if (window[globalName] && window[globalName].restore) {
    window[globalName].restore();
  }

  const records = [];
  const canvasState = new WeakMap();
  const originals = {
    getContext: HTMLCanvasElement.prototype.getContext,
    toDataURL: HTMLCanvasElement.prototype.toDataURL,
    "2d.save": CanvasRenderingContext2D.prototype.save,
    "2d.restore": CanvasRenderingContext2D.prototype.restore
  };

  const ensureState = (canvas) => {
    if (!canvasState.has(canvas)) {
      canvasState.set(canvas, {
        methods: Object.create(null)
      });
    }

    return canvasState.get(canvas);
  };

  const stackUrls = () => {
    const stack = String(new Error().stack || "");

    return [...new Set(stack.match(/https?:\/\/[^\s)]+/g) || [])];
  };

  const classify = ({ canvas, mimeType, state }) => {
    const excludedReasons = [];

    if (mimeType === "image/jpeg" || mimeType === "image/webp") {
      excludedReasons.push("lossy-or-compatibility-format");
    }

    if (canvas.width < 16 || canvas.height < 16) {
      excludedReasons.push("small-canvas-under-16x16");
    }

    if (state.methods.save || state.methods.restore) {
      excludedReasons.push("animation-associated-save-restore");
    }

    return {
      excludedReasons,
      likelyFingerprintable: excludedReasons.length === 0
    };
  };

  const hashText = async (value) => {
    const bytes = new TextEncoder().encode(value);
    const digest = await crypto.subtle.digest("SHA-256", bytes);

    return [...new Uint8Array(digest)]
      .map((byte) => byte.toString(16).padStart(2, "0"))
      .join("");
  };

  HTMLCanvasElement.prototype.getContext = function patchedGetContext(...args) {
    const context = originals.getContext.apply(this, args);

    if (args[0] === "2d" && context) {
      ensureState(this);
    }

    return context;
  };

  for (const method of ["save", "restore"]) {
    CanvasRenderingContext2D.prototype[method] = function patched2dMethod(...args) {
      const state = ensureState(this.canvas);
      state.methods[method] = (state.methods[method] || 0) + 1;

      return originals["2d." + method].apply(this, args);
    };
  }

  HTMLCanvasElement.prototype.toDataURL = function patchedToDataURL(type, ...args) {
    const startedAt = performance.now();
    const dataUrl = originals.toDataURL.call(this, type, ...args);
    const mimeType = String(type || "image/png").toLowerCase();
    const state = ensureState(this);
    const classification = classify({ canvas: this, mimeType, state });

    records.push({
      at: new Date().toISOString(),
      args: [type, ...args],
      dataUrl,
      dataUrlLength: dataUrl.length,
      dataUrlPrefix: dataUrl.slice(0, 80),
      durationMs: performance.now() - startedAt,
      height: this.height,
      mimeType,
      observed2dMethods: { ...state.methods },
      stackUrls: stackUrls(),
      width: this.width,
      ...classification
    });

    return dataUrl;
  };

  window[globalName] = {
    records,
    async summary() {
      const enrichedRecords = await Promise.all(
        records.map(async ({ dataUrl, ...record }) => ({
          ...record,
          sha256: await hashText(dataUrl)
        }))
      );
      const groups = new Map();

      for (const record of enrichedRecords) {
        const group = groups.get(record.sha256) || {
          count: 0,
          example: record,
          sha256: record.sha256
        };
        group.count += 1;
        groups.set(record.sha256, group);
      }

      const result = {
        artifact: "canvas-extraction-observer",
        capturedExtractionCount: records.length,
        likelyFingerprintableCount: enrichedRecords.filter((record) => record.likelyFingerprintable).length,
        groupedByCanvasHash: [...groups.values()].sort((a, b) => b.count - a.count),
        records: enrichedRecords
      };

      console.log(result);
      return result;
    },
    restore() {
      HTMLCanvasElement.prototype.getContext = originals.getContext;
      HTMLCanvasElement.prototype.toDataURL = originals.toDataURL;
      CanvasRenderingContext2D.prototype.save = originals["2d.save"];
      CanvasRenderingContext2D.prototype.restore = originals["2d.restore"];
      delete window[globalName];
    }
  };

  const result = {
    artifact: "canvas-extraction-observer",
    installed: true,
    nextStep: "Reload or interact with the page, then run await window.__canvasExtractionObserver.summary()",
    trackedInterfaces: [
      "HTMLCanvasElement.toDataURL",
      "CanvasRenderingContext2D.save",
      "CanvasRenderingContext2D.restore"
    ]
  };

  console.log(result);
  return result;
})()

Script pattern and serving-context classifier Inferred from Table 3 and Section 5.2

Purpose: Classify observed script URLs by vendor-pattern leads and serving context: first-party, same-site, subdomain, or popular CDN.
Paper basis: Table 3 lists attribution patterns; Section 5.2 discusses first-party exceptions, CNAME cloaking, subdomain routing, and CDN hosting.
Caveat: Vendor matches are review leads, not ground truth. The same-site check is an approximation and does not implement the public suffix list.

(async () => {
  const records = window.__canvasExtractionObserver ? window.__canvasExtractionObserver.records : [];
  const candidateUrls = [...new Set(records.flatMap((record) => record.stackUrls || []))];
  const pageHost = location.hostname.toLowerCase();
  const vendorPatterns = [
    { vendor: "Akamai", pattern: /\/akam\//i },
    { vendor: "FingerprintJS or FingerprintJS legacy", pattern: /fpnpmcdn\.net/i },
    { vendor: "mail.ru", pattern: /privacy-cs\.mail\.ru/i },
    { vendor: "AWS Firewall", pattern: /awswaf\.com/i },
    { vendor: "InsurAds", pattern: /insurads\.com/i },
    { vendor: "Signifyd", pattern: /signifyd\.com/i },
    { vendor: "PerimeterX", pattern: /px-cloud\.net/i },
    { vendor: "Sift Science", pattern: /sift\.com/i },
    { vendor: "Shopify", pattern: /shopifycloud/i },
    { vendor: "Adscore", pattern: /adsco\.re/i },
    { vendor: "GeeTest", pattern: /geetest\.com/i }
  ];
  const popularCdns = [
    "akamai.net",
    "azureedge.net",
    "b-cdn.net",
    "bootstrapcdn.com",
    "cdn.jsdelivr.net",
    "cdnjs.cloudflare.com",
    "cloudflare.com",
    "cloudfront.net",
    "fastly.net",
    "googleusercontent.com",
    "gstatic.com",
    "googleapis.com"
  ];

  const approximateSite = (host) => host.split(".").slice(-2).join(".");

  const classifyContext = (url) => {
    const parsed = new URL(url, location.href);
    const host = parsed.hostname.toLowerCase();
    const cdnMatch = popularCdns.find((domain) => host === domain || host.endsWith("." + domain));

    return {
      host,
      firstPartyHost: host === pageHost,
      sameSiteApproximation: approximateSite(host) === approximateSite(pageHost),
      subdomainOfPageHost: host.endsWith("." + pageHost),
      popularCdn: cdnMatch || null
    };
  };

  const classified = candidateUrls.map((url) => ({
    url,
    context: classifyContext(url),
    vendorMatches: vendorPatterns
      .filter(({ pattern }) => pattern.test(url))
      .map(({ vendor }) => vendor)
  }));
  const result = {
    artifact: "canvas-script-pattern-and-context-classifier",
    candidateUrlCount: candidateUrls.length,
    pageHost,
    classified,
    note: "Use vendor matches as review leads, not ground truth. First-party and same-site context can change blocklist behavior."
  };

  console.log(result);
  return result;
})()

10. Limitations

Homepage scope. The crawl visited homepages and did not follow inner links, so it may miss fingerprinting on login, checkout, account, or interaction-heavy pages.
Automation effects. Sites may detect automation and change behavior, even though the crawler handled some common anti-bot mechanisms.
User-action triggers. Fingerprinting that appears only after specific gestures, forms, purchases, or authenticated flows may be absent from the crawl.
Grouping is an upper bound. Identical canvases can indicate a common service footprint, but two organizations could independently use the same canvas.
Intent remains qualified. Blocklist membership, public vendor positioning, and script context are proxies. They do not prove the operator's purpose on every site.
Defense result is scoped. Repeated-render detection targets defenses with per-render noise and should not be generalized to every canvas anti-fingerprinting strategy.

11. Reviewer Notes

The PDF metadata and extracted analysis identify the paper as Elisa Luo, Tom Ritter, Stefan Savage, and Geoffrey M. Voelker, Canvassing the Fingerprinters: Characterizing Canvas Fingerprinting Use Across the Web, Proceedings of IMC '25, DOI 10.1145/3730567.3764500.

Do not present 12.7% and 9.9% as whole-web rates; they are rates over successfully crawled homepages from the authors' selected site sets.
Keep site counts, canvas counts, and attributed-service counts distinct. Tables 1, 2, and 4 measure different quantities.
Use the vendor table as an attribution aid, not a complete fingerprinting-service taxonomy.
The extracted metadata does not include a stable R2 PDF URL, so this page intentionally renders without a PDF download link.

Appendix

Updated 2026-06-20

Glossary

Linked terms for the canvas fingerprinting measurement: Canvas API mechanics, first-party serving context, blocklists, crawler tooling, and measurement caveats.

DOM and Browser APIs

<canvas> / HTMLCanvasElement

Context: The paper treats the HTML canvas element as the rendering surface whose output can expose stable machine and browser differences.

Meaning: `HTMLCanvasElement` is the DOM interface for `<canvas>` elements and exposes methods for selecting drawing contexts and exporting rendered output.

HTMLCanvasElement.toDataURL()

Context: The crawler records `toDataURL` return values because they capture the rendered result of previous Canvas API calls and can be grouped across sites.

Meaning: `toDataURL()` serializes canvas contents into a data URL, defaulting to PNG unless another supported image type is requested.

HTMLCanvasElement.getContext()

Context: The report and reconstructed observer use `getContext("2d")` to obtain the drawing interface before rendering a test canvas.

Meaning: `getContext()` returns a drawing context for a canvas, such as a 2D canvas context or a WebGL context, when supported.

CanvasRenderingContext2D

Context: The crawler intercepts calls and property accesses on `CanvasRenderingContext2D` to reconstruct the drawing activity that produced extracted canvases.

Meaning: `CanvasRenderingContext2D` is the 2D drawing API for canvas shapes, text, images, paths, styles, transforms, and pixel operations.

CanvasRenderingContext2D.save() and restore()

Context: The paper excludes canvases produced by scripts that invoke animation-associated methods such as `save` and `restore`.

Meaning: `save()` pushes the current drawing state onto a stack, and `restore()` pops the previous drawing state back onto the context.

WebGL

Context: The background section notes that canvas fingerprinting can exploit subtle differences in rendering text or WebGL scenes.

Meaning: WebGL is a browser API for rendering interactive 2D and 3D graphics in a canvas through the GPU-backed graphics pipeline.

MDN WebGL API

data: URLs

Context: The method focuses on extracted canvas data URLs because the encoded value can be compared across sites using identical rendering conditions.

Meaning: A `data:` URL embeds resource bytes directly in a URL, often with Base64 encoding for binary data such as images.

MDN data URLs

WebP

Context: The fingerprintable-canvas filter excludes WebP canvas extractions, partly to avoid counting browser image-format compatibility checks as fingerprinting.

Meaning: WebP is an image format supported by modern browsers; a canvas export as WebP can be a feature-support test rather than a fingerprinting signal.

MDN image file type guide

crypto.subtle.digest()

Context: The reconstructed code artifact hashes canvas data URLs to compare repeated renders and group extracted outputs without displaying full data URLs.

Meaning: `digest()` computes a fixed-length cryptographic hash, such as SHA-256, over supplied bytes.

Browser, Chrome, and V8 Internals

Browser fingerprinting

Context: Canvas fingerprinting is presented as one browser-fingerprinting technique among many explicit and implicit browser/device signals.

Meaning: Browser fingerprinting identifies or re-identifies a browser by combining observable characteristics of the browser, operating system, device, and environment.

Canvas fingerprinting

Context: The whole paper measures canvas fingerprinting prevalence and uses the chosen test canvas as a way to identify the fingerprinter.

Meaning: Canvas fingerprinting renders a deliberately chosen image or scene, exports the result, and uses rendering differences as part of a browser or device fingerprint.

Fingerprinting surface

Context: The report treats canvas output, script serving context, and repeated-render behavior as separate surfaces or signals to review.

Meaning: A fingerprinting surface is any observable browser, device, network, or environment characteristic that can contribute to identification or correlation.

W3C identifying fingerprinting surface

Canvas randomization

Context: Section 5.3 analyzes defenses that add random noise to canvas output and the paper's repeated-render check for detecting them.

Meaning: Canvas randomization changes returned canvas pixels to reduce stable fingerprinting, but per-render noise can reveal itself when the same canvas is rendered twice and differs.

First-party and third-party script context

Context: The paper reports that about half of canvas-fingerprinting sites had at least one first-party-served fingerprinting script, weakening blocklist assumptions.

Meaning: First-party context generally refers to resources served from the same site the user is visiting, while third-party context refers to resources served from another site.

CNAME cloaking

Context: The paper lists CNAME cloaking as an origin-masking technique that can make third-party fingerprinting scripts appear first-party.

Meaning: CNAME cloaking uses DNS aliases so a subdomain of the visited site resolves to infrastructure controlled by another service, complicating hostname-based blocking.

Subdomain routing

Context: Section 5.2 reports subdomain routing on 9.5% of top-20k and 2.1% of tail canvas-fingerprinting sites.

Meaning: Subdomain routing serves a third-party service through a customer-controlled subdomain so the script URL appears closer to the site's own origin.

Content delivery network (CDN) hosting

Context: The paper checks whether fingerprinting scripts are served through popular CDNs because blockers avoid broad CDN blocking to prevent site breakage.

Meaning: A CDN is a distributed network used to serve web content from locations near users; shared CDN hostnames can obscure the operator of a script.

MDN CDN glossary

Blocklist rule context

Context: The authors distinguish offline URL matches in EasyList, EasyPrivacy, and Disconnect from actual ad blocker behavior during recrawls.

Meaning: Filter-list rules can depend on resource type, document context, exceptions, and modifiers, so a listed URL does not necessarily mean a script will be blocked in a live page.

Statistics and Measurement Concepts

Entropy

Context: The background explains that canvas rendering can provide high entropy for distinguishing browsers or devices.

Meaning: In this context, entropy is the amount of distinguishing information a signal contributes; higher entropy means a value can separate users or devices into smaller groups.

False positive

Context: The authors manually inspected 300 unique fingerprintable canvases and found two false positives, both appearing on only one domain.

Meaning: A false positive is a benign canvas classified as fingerprintable by the paper's heuristic.

Wikipedia false positives and false negatives

Long-tailed distribution

Context: The report describes canvas groups as concentrated at the head, with many less-common unique canvases in the tail.

Meaning: A long-tailed distribution has a small number of frequent values and many rare values, which matters when interpreting the reach of the most common test canvases.

Wikipedia long tail

Upper and lower bounds

Context: The page treats identical-canvas groups as an upper bound on one organization's reach and homepage-only crawling as a lower bound on prevalence.

Meaning: An upper bound is a ceiling implied by the method, while a lower bound is a floor; neither should be read as the exact real-world value.

Wikipedia upper and lower bounds

Sample design

Context: The crawl uses May 2025 Tranco top-20k sites plus a random tail sample from ranks 20,001 through 1,000,000.

Meaning: Sample design determines which sites can support the paper's claims; these measurements apply to the selected and successfully crawled homepages, not to the whole web.

Research Tools and Datasets

Tranco

Context: The corpus is built from the May 2025 Tranco ranking, split into popular and tail site sets.

Meaning: Tranco is a research-oriented top-sites ranking designed to be more stable and reproducible than older popularity lists.

DuckDuckGo Tracker Radar Collector

Context: The authors modified Tracker Radar Collector to intercept Canvas API calls and property accesses while crawling pages.

Meaning: Tracker Radar Collector is DuckDuckGo's Puppeteer-based crawler for collecting third-party request data.

Tracker Radar Collector repository

Puppeteer

Context: The crawler is Puppeteer-based, which matters because automated browser behavior can affect site responses and anti-bot handling.

Meaning: Puppeteer is a browser automation library that controls Chrome or Chromium through high-level JavaScript APIs.

Puppeteer documentation

autoconsent

Context: The crawl used autoconsent to opt in to common consent banners before measuring canvas behavior.

Meaning: autoconsent is a rules-based project for automatically responding to consent management interfaces.

autoconsent repository

EasyList and EasyPrivacy

Context: The paper checks whether scripts generating test canvases match EasyList or EasyPrivacy entries, then recrawls with blockers that use EasyList rules.

Meaning: EasyList is a widely used ad-blocking filter list, while EasyPrivacy focuses on privacy and tracking filters.

Disconnect Tracker Protection List

Context: The paper uses Disconnect alongside EasyList and EasyPrivacy as a third blocklist signal for tracker or advertising context.

Meaning: Disconnect's tracker protection list is a canonical services file used for identifying tracking domains.

Disconnect tracker protection repository

Adblock Plus and uBlock Origin

Context: Section 5.2 recrawls the same site sets with Adblock Plus and uBlock Origin installed to measure practical blocker impact.

Meaning: Both are browser extensions that apply filter-list rules in live browsing contexts, which can differ from offline URL matching.

FingerprintJS

Context: The paper attributes a large canvas group to FingerprintJS and notes that the open-source library performs a repeated-render consistency check.

Meaning: FingerprintJS is an open-source browser fingerprinting library; the commercial service and legacy versions complicate attribution and use-case interpretation.

FingerprintJS repository

adblockparser

Context: The authors use adblockparser to check whether EasyList and EasyPrivacy rules apply to script URLs in an offline analysis.

Meaning: adblockparser is a Python parser for Adblock Plus filter rules.

adblockparser on PyPI