Data Automation Engineer

Cialfo

Cialfo

Software Engineering

Delhi, India

Posted on May 7, 2026

About Manifest Global

Manifest Global is building the infrastructure for global human capital mobility — connecting students, schools, universities, and employers across 50+ countries. Our portfolio spans Cialfo (AI-powered college counseling, 2,000+ schools), BridgeU (university guidance for international schools globally), Kaaiser (trusted study abroad counseling across India and Southeast Asia), and Explore (AI-powered university outreach, 1,000+ university partners). Together, we move talent across borders at scale. $80M raised. Still early.

About the Role

Cialfo's University Data Engineering team is the data backbone of everything students see on the platform — university profiles, course listings, entry requirements, fees, deadlines, rankings, and scholarship information across 544 partner universities and thousands more. Every piece of that data has to be collected, validated, and kept current.

We are hiring a Data Automation Engineer to own the automation function end-to-end — building the scrapers, AI-powered workflows, and data pipelines that replace manual data collection with reliable, production-grade automation. You will report to Engineering and work alongside the University Data Engineering team as the sole owner of the technical stack.

What makes this role different from a standard data engineering role: You will not be maintaining someone else's pipelines. You are building the function from scratch. The team has deep domain knowledge — they know what correct university data looks like, and they will QC your output. You bring the technical capability they do not have. Together, you replace hours of manual work per week. Your work ships directly to a product used by hundreds of thousands of students making university decisions.

What You Will Build

Your first 90 days have a defined backlog. The first priority is a notification classification pipeline to handle 450 alerts per week, replacing 6 hours of daily manual signal vs. noise triage across the team. Closely behind that is a signal addressal workflow covering 150 signals per week, replacing 6 hours of daily core updates — research, format, verify, push. You will also build an automated quality audit agent that runs nightly across all recent updates, replacing 6–7 hours of daily manual data accuracy checks. Beyond that, you will own rankings and key stats ingestion across 4,441 universities, replacing the full manual collection cycle for QS, THE, and US News rankings, as well as entry requirements extraction from dynamic JavaScript-rendered pages across 150+ universities.

Beyond the initial backlog, you own the full 25-task automation portfolio — maintaining what is already built, extending scrapers when source sites change structure, and designing new automations as the team's data commitments grow.

What You Own

  • You own the complete automation stack — N8N workflows, Python scripts, LLM-powered extraction pipelines, and web scrapers. You build it, you maintain it, you fix it when it breaks.
  • Production reliability is yours. When a scraper fails silently because a university website restructured, you diagnose and fix it without waiting to be told. You have monitoring in place so you know before the team does.
  • AI output accuracy is your responsibility. The notification classifier, the quality audit agent, and any LLM-powered pipeline you build must meet defined accuracy gates. You own the test sets, the iteration, and the decision to go live.
  • You make the data pipeline architecture decisions — N8N vs. Python, Firecrawl vs. Playwright, Claude API vs. rule-based extraction. You explain your reasoning; you do not wait for someone to decide for you.
  • Working within Engineering, you will own direct API write access to Core, Contentful, and Explore for approved data types, removing the manual ticket loop for routine data pushes.

What You Bring

Non-Negotiable

  • Python. Production-grade scripts you have owned — not scripts you contributed to. You wrote them, you ran them, you fixed them when they broke in production.
  • REST API integration. You have built and maintained API clients against real production systems with rate limits, auth, pagination, and error handling. Not tutorial projects.
  • Web scraping. Dynamic pages, JavaScript-rendered content, sites that block scrapers. You have handled all of these. You have rebuilt scrapers when source sites changed structure.
  • Independent ownership. You have owned an automation or data pipeline without a senior engineer making the architecture decisions above you. You have been the person others came to when something broke.

Strong Preference

We have a strong preference for candidates with LLM API usage in production — you have shipped something where Claude or OpenAI was doing real work (classification, extraction, structured output) and you have dealt with the accuracy and reliability problems that come with it. We are also looking for 4–6 years of relevant experience, since independent ownership in messy production environments takes time to develop. Familiarity with N8N or equivalent workflow automation is a plus — you should be able to read, edit, and build N8N workflows without a tutorial, though it does not need to be your primary tool. Experience with unstructured, inconsistent source data — PDFs, scraped HTML, university websites with no consistency across 500 sources — is highly relevant.

Nice to Have

EdTech or university data domain knowledge is a plus, though the team teaches this faster than any candidate will self-learn it. SQL is useful for validation queries but is not required on day one.

What Good Looks Like at 90 Days

The notification classification pipeline is live and saving the team 20+ hours per week. You shipped it, it is in production, it has monitoring. You have diagnosed and fixed at least one automation that broke in production without asking Engineering for help. The team comes to you with data collection problems, and you come back with working solutions — not questions about how to approach them. You have made at least one tooling decision that changed how the team operates, and you can explain clearly why you made it. University Data Engineering Leads trust your QC gates without checking every output. Your accuracy track record has earned that.

Why This Role, Why Now

Cialfo serves hundreds of thousands of students making one of the most important decisions of their lives. The quality of the data on the platform directly affects what universities they see, whether application deadlines are accurate, and whether the fees they are planning around are correct.

The team has deep domain knowledge and operational discipline. What it does not have is the technical capability to automate the work that should not be manual. You are that capability — not a support function, but the reason the team can operate at a scale it currently cannot.

The work is real production automation problems: scrapers covering 4,441 universities, AI classifiers handling 450 alerts per week, quality agents running nightly across every recent data update. The team knows the domain deeply and will QC your work honestly. When something is wrong, you will hear it — that is a feature, not a bug. Every automation you ship converts manual hours into expanded data coverage — more universities, more countries, more students.