Site Reliability Engineer 2-IT - Oracle - Austin

Job description

Define, design, and implement network communications and solutions within a fast-paced, leading edge database/applications company.

Perform performance trend analysis and manage the server/network capacity. React to potential problems using automation, scheduling, and monitoring tools -- escalating to management where appropriate. Participate in configuration and implement technical solutions to enhance and/or troubleshoot the system. Responsible for support documentation as well.

Duties and tasks are standard with some variation. Completes own role largely independently within defined policies and procedures. 2-4 years of related experience in a medium to large network distributed and computing environment. BS in Computer Science or related field.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status or any other characteristic protected by law.

Desired profile

Qualifications :

Our Site Reliability Engineering (SRE) team fills a vital role in operations: they serve as the first point of contact for all issues that may impact our customer-facing production environment. The team's foremost responsibility is to keep the site live and functioning using the resources available to identify, resolve, or escalate issues to the appropriate person or team. Collectively, they strive to create and maintain effective monitoring that evolves with the product by working closely with the development engineers.

SRE is global with members in Austin, TX; Brno, Czech Republic; and Sydney, Australia. Despite the distance, the team is a close-knit and collaborative group. Each member brings a unique skillset to create a robust and knowledgeable team. What will you bring to the table?

Responsibilities:

Ensure customer-facing site is functioning
Own all alerts and escalations in customer-facing production environment
Automate manual tasks
Use SRE toolset to identify, resolve or escalate issues in production
Build effective monitoring that evolves with the product
Work closely with development engineers who build the product
Build, test and run Disaster Recovery procedures
Gain familiarity with NetSuite solutions and customer needs
Interface with Customer Support in the event of customer-facing problems
Work to constantly increase the number of issues resolved directly by SRE

Qualifications:

Experience with Unix or Linux
BS in Computer Science or related field
Scripting experience in Bash, Perl, Python, or similar
Excellent troubleshooting skills
Motivated to work quickly and accurately under pressure in time-critical situations
A self-starter who takes pride in job ownership and consistently creates innovative ways to improve efficiency and effectiveness
Preferred Qualifications

Networking experience
Database knowledge (Oracle, NoSQL such as Cassandra and Redis)
3-4 years' experience working in a large-scale production operations environment providing mission critical services to customers