About The Position
VI is the market leading Enterprise-AI platform for health, serving the world’s largest health organizations — from Fortune 500 health providers to pharma and consumer brands — helping them maximize acquisition, enrollment, engagement, retention, and health outcomes. Vi offers 3 main product lines: Activate, Engage and Transform.
Backed by $125M+ in R&D, our powerful platform serves over 175 million members daily — and growing. We are based in New York, Austin, Nashville & Tel Aviv.
We are seeking a talented Data Engineer to join our R&D team and work directly with our VP of Data Science. In this unique role, you’ll be responsible for shaping our data infrastructure, empowering data science initiatives, and ensuring the reliability of our data ecosystem. You will build scalable data pipelines, ETLs, and ELTs to unify, normalize, and aggregate data in our data warehouse and feature store. Your role will also involve developing complex preprocessing pipelines for machine learning and advanced quality checks, identifying and mitigating anomalies, drifts, and inconsistencies that could impact downstream analyses and models. Beyond technical proficiency, a strong business intelligence (BI) acumen will be essential for understanding the context and implications of each data point, ensuring we drive data-informed decision-making across the organization.
Responsibilities
- Pipeline Development: Design, build, and optimize complex, scalable, and robust data pipelines to ingest data from multiple sources into our data warehouse and feature store.
- Data Transformation & Normalization: Work with both structured and unstructured data to unify, normalize, and aggregate datasets, ensuring accuracy, consistency, and readiness for analysis and model development.
- ML Preprocessing Pipeline: Collaborate with data scientists to create sophisticated preprocessing pipelines, handling the nuances of feature engineering, and preparing data for machine learning models.
- Data Quality & Integrity: Implement advanced data validation techniques beyond simple checks, focusing on detecting and addressing anomalies, concept drift, data drift, and ensuring data integrity across various sources.
- Strong Business Understanding: Bring a business intelligence mindset to understand the context and significance of each data point, working closely with stakeholders to translate business needs into technical solutions.
- Collaboration: Collaborate effectively with data scientists, analysts, and business stakeholders to ensure alignment on data requirements, quality standards, and project goals.
Requirements
- B.Sc degree in Computer Science or a related technical field or equivalent practical experience
- 2+ years of experience in a data engineering role
- Coding experience with Python
- Experience with SQL and NoSQL databases
- Experience with data modeling, data warehousing, and building ELT/ETL pipelines
- Experience with Cloud technologies – preferred AWS
- Experience building data pipelines with big data frameworks such as Spark, Hadoop, etc.
- Experience working with Linux OS/ Git / CI-CD, Docker, Kubernetes
- Technologically diverse background and ability/willingness to learn new things quickly
- Experience working closely with data scientists / on a data science project