Archive

Flag	Description
`--archive-body-size-limit`	Maximum per-resource body size captured into an archive STD
`--archive-cdx`	Path to write a CDXJ index of the WARC's records STD
`--archive-spill-max-bytes`	Cap the total bytes a crawl or sitemap walk spills to disk PRO
`--crawl-allow-url`	Re-permit a URL that a --crawl-deny-url glob blocked PRO
`--crawl-delay-cap`	Ceiling in seconds for a robots.txt Crawl-delay during a sitemap walk PRO
`--crawl-deny-url`	Skip URLs matching this glob during a sitemap walk or link crawl PRO
`--crawl-link-depth`	Hops to follow from the seed when crawling links PRO
`--crawl-link-selector`	Additional CSS selector for crawlable links PRO
`--crawl-links`	Crawl links discovered on each captured page into the same WARC PRO
`--crawl-max-links`	Cap on total pages fetched by a crawl or sitemap walk PRO
`--crawl-media-sources`	Fetch <audio>/<video> source URLs on each walked page so deferred media is archived PRO
`--crawl-page-timeout`	Per-page capture budget in seconds for WARC output PRO
`--crawl-sitemap-max-depth`	Levels of <sitemapindex> to follow when capturing a sitemap PRO
`--crawl-url-is-sitemap`	Treat the target as a manifest of URLs to capture into one WARC PRO
`--har`	Path to write a diagnostic HAR (HTTP Archive) file STD
`--har-capture-bodies`	HAR will contain response bodies STD
`--har-captures-navigation`	HAR will contain navigator session PRO
`--warc`	Path to write a WARC for this request STD
`--warc-captures-navigation`	WARC will contain navigator session STD
`--warc-no-gzip`	Disable gzip compression for output WARC STD