Hitachi Digital Services - Data Platform Engineer

Hitachi Digital Services

Data Platform Engineer

Database Specialist Full Time Senior ~ $120,000 - $170,000/yr
Dallas, Texas, United States

Job Description

Data platform engineers architect and optimize ETL/ELT pipelines in PySpark and Databricks, integrating Delta Lake and Unity Catalog for large-scale data processing. Hitachi Digital Services clients benefit from secure, scalable platforms supporting insights and innovation. These roles involve frequent collaboration with analytics and business teams to leverage AWS for infrastructure management.

Hitachi Digital Services, part of Hitachi Ltd., delivers digital transformations for urbanization, resource conservation, and safety. Operating in 150 countries, they employ deep expertise in technology and innovation. Their operations focus on accelerating customer progress through cloud and AI solutions with over 250,000 employees globally.

Engineers build pipelines across data sources such as S3 and connectors, implement workflows using Delta Live Tables, manage IAM roles and policies for access, establish CI/CD pipelines via GitHub Actions, drive quality through testing frameworks, optimize cluster performance, enforce Medallion Architecture principles, create diagrams and specifications, and ensure ACID compliance in distributed systems. Mentorship and documentation support ongoing development. The pace involves handling massive datasets, resolving performance bottlenecks, and adapting to evolving technologies like real-time streaming.

Compensation details not explicitly stated. Roles typically involve remote or hybrid options based on location. Professionals receive equity awards, health benefits, and support for holistic wellbeing. Development includes flexible work arrangements and training in new methodologies.

Responsibilities

  • Build high-scale ETL/ELT pipelines for diverse sources
  • Implement Databricks workflows with PySpark, DLT, and Unity Catalog
  • Configure VPC, IAM, S3 for secure AWS environments
  • Establish CI/CD pipelines using GitHub Actions
  • Drive quality via unit, integration, performance testing
  • Optimize Spark clusters for performance and cost
  • Apply Medallion Architecture and ACID principles
  • Create technical documentation, diagrams, and specs

Requirements

  • 10+ years in scalable data engineering platforms and pipelines
  • 3+ years in Databricks including Delta Lake, Unity Catalog, and Delta Live Tables
  • Workspaces, Repos, Jobs, and Databricks SQL experience
  • 3+ years in AWS with VPC, subnets, endpoints, routing, IAM, S3
  • Expertise in Python, PySpark, and advanced SQL
  • Deep knowledge of ETL/ELT, data lake, warehouse, and distributed computing
  • Git and Agile/Scrum proficiency
Apply now
Receive similar alerts to Data Platform Engineer in Dallas, Texas, United States