* Develop and implement data pipelines for ingesting and collecting data from various sources into a centralized data platform
* Develop and maintain ETL jobs using AWS Glue services to process and transform data at scale
* Optimize and troubleshoot AWS Glue jobs for performance and reliability
* Utilize Python and PySpark to efficiently handle large volumes of data during the ingestion process
* Design and implement scalable data processing solutions using PySpark to transform raw data into a structured and usable format
* Apply data cleansing, enrichment, and validation techniques to ensure data quality and accuracy
* Create and maintain ETL processes using Python and PySpark to move and transform data between different systems
* Optimize ETL workflows for performance and efficiency
* Collaborate with data architects to design and implement data models that support business requirements
* Ensure data structures are optimized for analytics and reporting
* Work with distributed computing frameworks, such as Apache Spark, to process and analyze large-scale datasets
* Manage and optimize databases, both SQL and NoSQL, to support data storage and retrieval needs
* Implement indexing, partitioning, and other database optimization techniques
* Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver effective solutions
* Work closely with software engineers to integrate data solutions into larger applications
* Implement monitoring solutions to track data pipeline performance and proactively identify and address issues
* Ensure compliance with data privacy regulations and company policies
* Stay abreast of industry trends and advancements in Data Engineering, Python, and PySpark
Requirements
* Proficiency in Python and PySpark
* Strong knowledge of Data Engineering concepts and best practices
* Hands-on experience with AWS Glue and other AWS services
* Experience with big data technologies and distributed computing
* Familiarity with database management systems (SQL and NoSQL)
* Understanding of ETL processes and data modeling
* Excellent problem-solving and analytical skills
* Strong communication and collaboration skills
* Bachelor's degree in Computer Science, Information Technology, or a related field