zshot/cliDownload

Archive

FlagDescription
--archive-body-size-limitMaximum per-resource body size captured into an archive STD
--archive-cdxPath to write a CDXJ index of the WARC's records STD
--archive-spill-max-bytesCap the total bytes a crawl or sitemap walk spills to disk PRO
--crawl-allow-urlRe-permit a URL that a --crawl-deny-url glob blocked PRO
--crawl-delay-capCeiling in seconds for a robots.txt Crawl-delay during a sitemap walk PRO
--crawl-deny-urlSkip URLs matching this glob during a sitemap walk or link crawl PRO
--crawl-link-depthHops to follow from the seed when crawling links PRO
--crawl-link-selectorAdditional CSS selector for crawlable links PRO
--crawl-linksCrawl links discovered on each captured page into the same WARC PRO
--crawl-max-linksCap on total pages fetched by a crawl or sitemap walk PRO
--crawl-media-sourcesFetch <audio>/<video> source URLs on each walked page so deferred media is archived PRO
--crawl-page-timeoutPer-page capture budget in seconds for WARC output PRO
--crawl-sitemap-max-depthLevels of <sitemapindex> to follow when capturing a sitemap PRO
--crawl-url-is-sitemapTreat the target as a manifest of URLs to capture into one WARC PRO
--harPath to write a diagnostic HAR (HTTP Archive) file STD
--har-capture-bodiesHAR will contain response bodies STD
--har-captures-navigationHAR will contain navigator session PRO
--warcPath to write a WARC for this request STD
--warc-captures-navigationWARC will contain navigator session STD
--warc-no-gzipDisable gzip compression for output WARC STD