We are seeking a highly skilled and motivated Data Engineer to join our Growth & Retention Intelligence team. In this role, you will design, build, and scale the data that power our machine learning ecosystem – enabling consistent, reliable, and real-time access to features across development, training, and production environments. You’ll collaborate closely with data scientists, ML engineers, and data platform teams to streamline feature engineering workflows and ensure seamless integration between offline and online data sources.

You’ll be expected to work across multiple domains including data architecture, distributed systems, software engineering, and MLOps. You will help define and implement best practices for feature registration, drift, governance, lineage tracking, and versioning, all while contributing to the CI/CD automation that supports feature deployment across environments.

What You’ll Do

Key Responsibilities:

Feature Store Development: Implement and maintain scalable features serving offline (batch), online (real-time), and streaming ML use cases.
Streaming & Real-Time Data Processing: Design and manage streaming pipelines using technologies like Kafka, Kinesis, or Flink to enable low-latency feature generation and real-time inference.
Feature Governance & Lineage: Define and enforce governance standards for feature registration, metadata management, lineage tracking, and versioning to ensure data consistency and reusability.
Collaboration with ML Teams: Partner with data scientists and ML engineers to streamline feature discovery, definition, and deployment workflows, ensuring reproducibility and efficient model experimentation.
Data Pipeline Engineering: Build and optimize ingestion and transformation pipelines that handle large-scale data while maintaining accuracy, reliability, and freshness.
CI/CD Automation: Implement CI/CD workflows and infrastructure-as-code to automate feature store provisioning and feature promotion across environments (Dev → QA → Prod).
Monitoring & Observability: Develop monitoring and alerting frameworks to track feature data quality, latency, and freshness across offline, online, and streaming systems.

What We’re Looking For

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
5+ years of experience in data engineering roles.
Strong proficiency in Python and familiarity with distributed data frameworks such as Airflow, Spark or Flink.
Hands-on experience with feature store technologies (e.g., Feast, SageMaker Feature Store, Tecton, Databricks Feature Store, or custom implementations).
Experience with cloud data warehouse (e.g., snowflake) and transformation framework (e.g. dbt) for data modeling, transformation and feature computation in batch environment.
Expertise in streaming data platforms (e.g., Kafka, Kinesis, Flink) and real-time data processing architectures.
Experience with cloud environments (AWS preferred) and infrastructure-as-code tools (Terraform, CloudFormation).
Strong understanding of CI/CD automation, containerization (Docker, Kubernetes), and API-driven integration patterns.
Knowledge of data governance, lineage tracking, and feature lifecycle management best practices.
Experience with unstructured databases(vector or graph databases) and RAG pipelines
Excellent communication skills, a collaborative mindset, and a strong sense of ownership.

Preferred Qualifications (Bonus Points):

Experience with Salesforce Ecosystem
Experience with context engineering including structuring data, prompts, and logic for AI systems, managing memory and external knowledge, etc

See more open positions at Own Company

Powered by Getro.com

Privacy policy Cookie policy