Enterprise BigData solution in medical / pharma domain.
We are looking for an experienced engineer in the challenging stack of Big Data technologies for international Customer.
Requirements:
Hand-on experience with data ingestion and ETL tools (Apache NiFi, Airflow)
Understanding and experience in one of the following platform/tools: Apache Stack- HDFS & YARN, Zookeeper, Kafka, Storm, Spark, Hive.
English level: Intermediate+.
Self-organized team player, ideally if able to build and develop the team.
Nice to have: Experience in Cloud data platforms (AWS, Azure)
Responsibilities:
- Participates in all software development end-to-end product lifecycle phases
- Executes the development, maintenance, and enhancements of data ingestion solutions across various data sources like DBMS, File systems (structured and unstructured), APIs and Streaming on on-prem and cloud infrastructure
- Optimize and reengineer model code to be modular, efficient and scalable, and to deploy models to production.
- Identifies, designs, and implements internal process improvements: automating manual processes, debug long running and inefficient pipelines, re-design infrastructure for greater scalability, monitor, capture & analyze pipeline metadata and usage.
- Develop end-to-end solutions for enterprise strategic initiatives and performance improvement.
- Responsible for building, testing and enhancement of BI solutions from a wide variety of sources like Hive, Hbase and File systems; develops solutions with optimized data performance and data security