About the organization
ACLED (Armed Conflict Location & Event Data) is a disaggregated data collection, analysis, and crisis mapping institution. The ACLED team collects conflict information, conducts analysis to describe, explore, and test conflict scenarios, and makes both data and analysis open for free use by the public. ACLED’s work is regularly used to inform journalism, academic research, and public discourse on conflict, and to support practitioners and policymakers. ACLED is the highest quality and most widely used real-time data and analysis source on political violence and protest around the world.
The Role:
ACLED is recruiting a Data Extraction Specialist to assist with the extraction and processing of news articles used to code ACLED events. The Data Extraction Specialist will work within ACLED’s Data Science team.
This position is fully remote and can be done from any location with reliable internet service. The preferred candidate’s time zone is GMT +1 (CET) to GMT -8 (Pacific time).
This position is open to nationals of any country. For more information, please review the Applicant FAQs. Please submit your salary range for consideration.
Specific tasks and responsibilities
The Data Extraction Specialist will be responsible for the following tasks:
- Develop bespoke web scrapers and general web crawlers to extract data from online news sources.
- Deploy and monitor the performance of scrapers and crawlers to ensure reliable and continuous data acquisition.
- Implement logging and alerting systems for scraper performance and failures.
- Maintain and troubleshoot existing scrapers and crawlers, adapting them to changes in website structures and data formats.
- Deploy models and monitor ML/NLP models for filtering and classifying extracted news articles.
- Develop and deploy scalable data engineering pipelines using cloud platforms (preferably AWS), Docker, or serverless technologies.
- Ensure all developed scrapers and crawlers adhere to best practices for web scraping, including ethical considerations and respect for website terms of service.
- Document all configurations, assumptions, and limitations to ensure consistency and transparency.
- Collaborate with other teams at ACLED to understand data needs and align scraper strategy with broader organizational goals.
Skills and competencies
Required
- Proven experience in developing and deploying web scrapers and crawlers for production use.
- Proven experience developing language models for practical applications.
- Strong proficiency in Python and JavaScript.
- Experience with major web scraping frameworks and libraries (Scrapy, Beautiful Soup, Selenium, Playwright, Puppeteer, etc.).
- Experience deploying machine learning models, including transformers-based NLP models, for processing source texts.
- Experience with data engineering tools and frameworks (e.g., Docker, Kubernetes, Airflow, Databricks).
- Understanding of ethical and legal considerations in web scraping, including compliance with robots.txt and terms of service.
- Experience working to deadlines with limited supervision.
- An extremely high level of attention to detail.
- Flexible team player, especially across a remote, global team.
- Fluency in English.
- Bachelor’s degree required.
Preferred
- Familiarity with ACLED’s data and methodology.
- Experience and/or interest in conflict or international development.
- Familiarity with version control via git and GitHub.
- Familiarity with cloud-based computing.
- Experience with a remote work environment.
Applications:
To apply, please submit a CV and cover letter detailing qualifications, experience, and salary requirements.
Further information is available online at acleddata.com. Applications will be reviewed on a rolling basis.
Interested candidates are advised to apply early.