Senior Big Data DevOps Engineer
K.T. Automation (India)
• Manage large scale Hadoop cluster environments including capacity planning, cluster setup, performance tuning, monitoring and Alerting.
• Perform proof of concepts on scaling, reliability, performance and manageability.
• Work with core production support personnel in IT and Engineering to automate deployment and operation of the infrastructure. Manage, deploy, and configure infrastructure with Ansible or other automation tool sets.
• Monitoring Hadoop jobs and recommend optimization
• Job Monitoring
• Rerun jobs
• Job Tuning
• Spark Optimizations
• Data Monitoring and Pruning
• Creation of metrics and measures of utilization and performance.
• Capacity planning and implementation of new/upgraded hardware and software releases as well as for storage infrastructure.
• Ability to work well with a global team of highly motivated and skilled personnel.
• Research and recommend innovative, and where possible, automated approaches for system administration tasks.
•… Integrating ML libraries
• Hardware accelerations
• SQream / Kinetica / Wallaroo monitoring and maintenance)
• Should be able to develop and patches
• Debugging Infrastructure issues (Like – Underlying network issue or Issues with the nodes)
• Addition/replacement of Kafka cluster/consumer (Not sure if this is covered in Hardware acceleration)
• Testing/Support of infrastructure component change (like changing the load balancer to F5).
• Deployment during the release.
• Help QA team with production parallel testing and performance testing.
• Help out Dev team with POC/Adhoc execution of some of the jobs for debugging/cost analysis
• 5 to 10 years of professional experience in Java, Scala and Python.
• 3 years of experience of Spark/MapReduce in production environment
• A deep understanding of Hadoop design principals, cluster connectivity, security and the factors that affect distributed system performance.
• Experience on Kafka, Hbase and Hortonworks is mandatory.
• Prior experience with remote monitoring and event handling using Nagios, ELK.
• Good collaboration & communication skills, the ability to participate in an interdisciplinary team.
• Strong written communications and documentation experience.
• Knowledge of best practices related to security, performance, and disaster recovery.
• BE/BTech/BS/BCS/MCS/MCA in Computers or equivalent
• Excellent interpersonal, written, and verbal communication skills
This job is provided by Shine.com