Job details « Go back to category
Full-time Web Crawl Engineer, Archive-ItPublished at 03.05.2018 - Viewed: 1184 times - Internet Archive in San Francisco, United States
Internet Archive | Web Crawl Engineer, Archive-It - San Francisco, CA or remote - Full Time
Running large-scale web harvests on global and national domain levels and focused and specialized crawls using Heritrix, our open-source crawler, as well as other open-source technologies developed internally, including Umbra, Brozzler, warcprox and others. Configuration, monitoring, and improvement of large-scale web crawls to ensure their quality and timely completion. Processing, analysis and quality assurance of archived web content to ensure it is complete and of the highest quality. Contribute to development of tools for automated analysis and reporting of crawl material, and to development projects focused on crawling, processing, and access. Manage both large ingests and exports of web data, derivatives, logs, and reports. Demonstrated experience of delivering on commitments with deadlines and project timelines and working in a collaborative team of engineers and project/product managers.
Skills & Requirements
To Apply: To apply please email cover letter, salary expectations, and résumé to firstname.lastname@example.org with the subject line "Web Crawl Engineer."