Sr. Site Reliability Engineer - Hp - Andover

Job description

Sr. Site Reliability Engineer



Job Description:



Sr. Site Reliability Engineer



Job Description:



InfoSight is HPE’s powerful big data analytics based monitoring and predictive analytics solution. It collects about 70 million pieces of information from each Nimble Storage array as well as from vSphere deployments, every day. It analyzes such data across the entire install base to show the customer (and resellers) deep insights in very simple and intuitive visuals. It is been recognized by customers and the analyst community as one of the core differentiators for HPE Nimble Storage. InfoSight collects and analyzes sensor data from all HPE products and Predictive Analytics are then used to correlate vast amounts of information to find the needle in the haystack and solve the customers’ most complex infrastructure issues. As impressive as they are, InfoSight’s current capabilities are only the beginning – the possibilities for analytics based monitoring and management are endless and so is the infrastructure that keeps this running like a well oiled machine.

That’s where you come in. As a SRE, you own the customer experience by ensuring that all of our production infrastructure works flawlessly to ensure that Infosight is up and running. This is not about how reactive you are to issues but about how proactive you are to plan, setup to prevent issues from happening and if they do, ensure that there is automation in-place to handle any issue.  Are you passionate about automation, measurement, monitoring, security, frequent production releases, and building tools to automate everything ? Are you all about ensuring that there are no manual or mundane tasks done by humans ? Can you imagine doing things at scale, yet be part of every stage leading up to large-scale compute and storage ? You will build, scale, automate and maintain a number of infrastructure projects including but not limited to - monitoring and alerting, automated deployments (cloud and on premise), software defined networking, resilient storage, automated containerization using Docker, Container management using K8 and virtualization, Catastrophic system failure recovery, high availability etc.. You will have the freedom to explore and visualize the overall framework that allows engineers to execute their tasks (e.g the ability to execute a single command to bring-up/down/upgrade micro-services hosted on docker containers, manage multiple docker instances using dockerswarm, upgrade a library, OS etc..).

In addition, you will :

·     Write code that is maintainable, well documented, extensible and scalable
·     Write detailed design documents
·     Work closely with our Architects, Engineers, Product Managers and other clients and partners of the SRE team to meet the needs of the organization to stay competitive - from the infrastructure up to the highest level of applications
·     Build automation for improving the build system by adding parallelized builds, quicker build failure indication, automated notification of results and packaging
·     Analyze, architect, and develop new deployments, mechanisms and procedures for high-availability environments, both in-premise and public/private cloud
·     Build CI/CD solutions to improve developer productivity and rapid deployments.
·     Refine/improve security, resiliency and performance.
·     Continuously improve our infrastructure to be easy to deploy, scalable, secure and fault-tolerant
·     Take on the same responsibilities of being on-call, writing deployment scripts, debugging applications, evaluating new technologies and delivering 24x7 operations through a customer-focused approach including Pager duty and responding to production alerts

What you need to bring with you:

·     1+ years of experience in SRE
·     Strong Unix background (preferably RHEL/CentOS)
·     1+ years of recent development expertise with Python, shell scripting, and databases in a production, customer-facing context
·     Proficiency with configuration management technologies like ansible, chef, puppet, and salt
·     Nice to have experience working with large data sets and distributed computing tools (Map/Reduce, Hadoop, Hive, or Spark)
·     Excited about learning languages such as Go, Scala, Rust
·     Experience with continuous integration/build automation tools (Jenkins, BuildBot, Hudson, Bamboo, CruiseControl), Maven, Ant
·     Experience with Jira, SVN, Git, Mercurial, Virtual machine infrastructure (VMware, Hyper-V, Xen, KVM, Amazon)
·     Experience in Java, JavaScript, and Django are desirable
·     Experience with MySQL and NoSQL, Cassandra, Vertica and Redis Databases.
·     Experience with monitoring and logging tools like NewRelic, Nagious, Splunk, Graphite, Graphana etc
·     Experience with application disaster recovery, migration, roll-back plans, expansion, routine deployments, and system upgrades
·     Desirable to have experience using JavaScript, React, AngularJS, RESTful services, Yum, RPM and JSON
·     Familiarity with web application development and object oriented programming in Python, Django a plus
·     Strong tendency to automate and monitor everything
·     Strong aptitude in troubleshooting / problem-solving
·     Self-motivated and comfortable working in an Agile methodology (rapid iterations, rapid releases, etc)
·     Experience with managing scalable, distributed systems
·     Exemplary communication skills and ability to work with cross-functional teams.
·     Flexible, adaptable, and able to multi-task
·     BS or Master degree in Computer Science, Engineering or related discipline
·     Ensure SLA requirements are continuously improved through technology innovation and end-to-end monitoring and support capabilities.
· Build and foster a high performance engineering culture, mentor team members and provide your team with the tools and motivation to make things happen
· Knowledgeable in security requirements
· Strong verbal and written communication skills; demonstrated influencing skills, high level of technical and team leadership skills.
·     Strong tendency to automate and monitor everything
·     Strong aptitude in troubleshooting / problem-solving
·     Self-motivated and comfortable working in an Agile methodology (rapid iterations, rapid releases, etc.

Job:
Engineering

Job Level:
Expert



Hewlett Packard Enterprise is EEO F/M/Protected Veteran/ Individual with Disabilities.



HPE will comply with all applicable laws related to the use of arrest and conviction records, including the San Francisco Fair Chance Ordinance and similar laws and will consider for employment qualified applicants with criminal histories.

Offers “Hp”

Job description