Data Engineer – Web Crawling Team


Sayari is looking for a Data Engineer specializing in web crawling to join its Data Engineering team! Sayari has developed a robust web crawling project that collects hundreds of millions of documents every year from a diverse set of sources around the world. These documents serve as source records for Sayari’s flagship graph product, which is a global network of corporate and trade entities and relationships. As a member of Sayari’s data team your primary objective will be to work on maintaining and improving Sayari’s web crawling framework, with an emphasis on scalability and reliability. You will work with our Product and Software Engineering teams to ensure our crawling deployment meets product requirements and integrates efficiently with our ETL pipeline


Sayari is a venture-backed and founder-led global corporate data provider and commercial intelligence platform, serving financial institutions, legal & advisory service providers, multinationals, journalists, and governments. We are building world-class SaaS products that help our clients glean insights from vast datasets that we collect, extract, enrich, match and analyze using a highly scalable data pipeline. From financial intelligence to anti-counterfeiting, and from free trade zones to war zones, Sayari powers cross-border and cross-lingual insight into customers, counterparties, and competitors. Thousands of analysts and investigators in over 30 countries rely on our products to safely conduct cross-border trade, research front-page news stories, confidently enter new markets, and prevent financial crimes such as corruption and money laundering.

Our company culture is defined by a dedication to our mission of using open data to prevent illicit commercial and financial activity, a passion for finding novel approaches to complex problems, and an understanding that diverse perspectives create optimal outcomes. We embrace cross-team collaboration, encourage training and learning opportunities, and reward initiative and innovation. If you enjoy working with supportive, high-performing, and curious teams, Sayari is the place for you.


  • Investigate and implement web scrapers for new sources
  • Maintain and improve existing crawling infrastructure
  • Improve metrics and reporting for web crawling
  • Help improve and maintain ETL processes
  • Contribute to development and design of Sayari’s data product


Skills and Experience:

Need to Have:

  • Experience with Python
  • Experience managing web crawling at scale, any framework, Scrapy is a plus
  • Experience working with Kubernetes
  • Experience working collaboratively with git

Nice to Have:

  • Experience with Apache projects such as Spark, Avro, Nifi, and Airflow
  • Experience with datastores Postgres and/or RocksDB
  • Experience working on a cloud platform like GCP, AWS, or Azure
  • Working knowledge of API frameworks, primarily REST
  • Understanding of or interest in knowledge graphs


What We Offer:

  • A collaborative and positive culture – your team will be as smart and driven as you
  • Limitless growth and learning opportunities
  • A strong commitment to diversity, equity, and inclusion
  • Team building events & opportunities

Sayari is an equal opportunity employer and strongly encourages diverse candidates to apply. We believe diversity and inclusion mean our team members should reflect the diversity of the United States. No employee or applicant will face discrimination or harassment based on race, color, ethnicity, religion, age, gender, gender identity or expression, sexual orientation, disability status, veteran status, genetics, or political affiliation. We strongly encourage applicants of all backgrounds to apply.

Apply now
To help us track our recruitment effort, please indicate in your cover//motivation letter where ( you saw this job posting.

Leave a Reply