Qualifications & experience
- Data Engineering and Data Modeling skills
- Experience with Extract Transform Load (ETL) or ELT (Extract Load Transform) and data warehousing (Talend, Pentaho, DataStage)
- Strong programming skills, particularly in at least one scripting language (Shell, Bash)
- Proficiency in SQL Server, MySQL, Postgre SQL, NoSQL (MongoDB, Oracle, Cassandra, HBase)
- Experience with distributed systems and big data technologies (Hadoop, Spark, Apache, Hive, Impala, Kafka)
- Experience with cloud services, such as AWS, Azure, or Google Cloud Platform, Cloudera, Alibaba Cloud
- Have knowledge in Programming languages (Java, Phyton, PHP)
- Experience with Operational System (Window, Linux)
- Have knowledge in Visualitation tools (Tableu, Power BI, Looker Data Studio, Grafana)
Tasks & responsibilities
- Perform data exploration, data cleaning, data imputation, and feature engineering on unstructured and structured data.
- Build the infrastructure for optimal extraction, transformation, and loading (ETL) of data from a wide variety of data sources.
- Develop and maintain optimal data pipeline architecture for training statistical and machine learning models such as regression and classification.
- Collaborate with data scientists and machine learning engineers to develop a comprehensive data science/machine learning solution pipeline.
- Develop and maintain evaluations to measure the effectiveness of training data. This includes measuring the capabilities of models on a variety of tasks and domains.