Data Engineer – Spark/Big Data

Posted 6 months ago
Pune, Maharashtra
Application deadline closed.

Job Description

Data Engineer – Spark/Big Data

Green Arrow Career Services

Pune, Maharashtra

Job Title : Big Data – Data Engineer

Total Experience : 4 – 10 Years

Job Location : PUNE (KHARADI)

Notice Period : IMMEDIATE TO 30 DAYS

Salary Budget : 10-15 LPA

Below are the job Detail :

Ideal Candidate :

EXPERIENCE : – 4 – 10

– In-depth knowledge of Big Data technologies – Spark, HDFS, Hive, Kudu, Impala

– Solid programming experience in Python, Java, Scala, or other statically typed programming language Production experience in core Hadoop technologies including HDFS, Hive and YARN

– Strong wor king knowledge of SQL and the ability to write, debug, and optimize distributed SQL queries

– Excellent communication skills; previous experience working with internal or external customers

– Strong analytical abilities; ability to translate business requirements and use cases into a Hadoop solution, including ingestion of many data sources, ETL processing, data access, and consumption, as well as custom analytics

– Experience working with workflow managers like Airflow… Prefect, Luigi, Oozie

– Experience working with Data Governance tools like Apache Sentry, Kerberos, Atlas, Ranger

– Experience working with streaming data with technologies like Kafka, Spark streaming

– Strong understanding of big data performance tuning

– Experience handling different kinds of structured and unstructured data formats (Parquet/Delta Lake/Avro/XML/JSON/YAML/CSV/Zip/Xlsx/Text etc.)

– Experience working with distributed NoSQL storage like ElasticSearch, Apache Solr

– Experience deploying big data pipelines in the cloud preferably using GCP and AWS

– Well versed with Software Development Life Cycle Methodologies and Practices

– Spark Certification is a huge plus

– Cloud experience is a must have preferably with GCP

– Contribution to open source community and Apache committer will be big plus

Responsibilities :

– Integrate data from a variety of data sources (data warehouse, data marts) utilizing on-prem or cloud-based data structures (GCP/AWS); determine new and existing data sources

– Develop, implement and optimize streaming, data lake, and analytics big data solutions

– Create and execute testing strategies including unit, integration, and full end-to-end tests of data pipelines

– Recommend Kudu, HBase, HDFS, and relational databases based on their strengths

– Utilize ETL processes to build data repositories; integrate data into Hadoop data lake using Sqoop (batch ingest), Kafka (streaming), Spark, Hive or Impala (transformation)

– Adapt and learn new technologies in a quickly changing field

– Be creative; evaluate and recommend big data technologies to solve problems and create solutions

Recommend and implement best tools to ensure optimized data performance; perform Data Analysis utilizing Spark, Hive, and Impala

– Work on a variety of internal and open source projects and tools