Site Reliability Engineer

The Role:

The SRE will partner with both the business and technical teams to drive building scalable and stable platform solutions for our platform. This position focuses on building a strong SRE/Dev Ops function and support of an on-line, customer facing eCommerce platform.

This position will work cross functionally with Product Management, Project Management, Release Engineering, Quality Assurance, Development and Infrastructure to develop stable and scalable solutions that meet market needs with respect to functionality, performance, scalability, reliability, realistic implementation schedules, and adherence to development goals and principles.

Primary Responsibilities:

  • Partner with other Site Reliability Engineers
  • Execute Reliability Engineering activities as well as projects
  • Point person for Level 3 priority incident troubleshooting activities
  • Monitor production applications including resolving incidents and alerts
  • Directly influence CI/CD activities for improving efficiency around Software delivery
  • Establish strong technical criteria around release acceptance
  • Work with Development teams to help evolve our platform to a highly available and operable system
  • Write and maintain scripts and tools to improve operation automation

Primary skills required :

  • 3 plus years hands on experience with large scale distributed systems (Hardware configurations, OS, Platform, Network and Data modeling)
  • 3 plus years direct experience with Cloud implementations (Azure, VMware, Rackspace, AWS, etc) including provisioning, monitoring and scaling
  • End to end platform and application implementation experience
  • Architecture Assessment for performance, availability and scalability
  • Architecture and design standards definition and enforcement focused on non-functional properties
  • IT Change Management
  • Continuous Integration and Continuous Delivery
  • Performance monitoring and tuning
  • Establish Operational Level Agreements and SLAs
  • Automated incident analysis and feedback
  • Proactive application monitoring and alerts
  • Operations dashboard and reporting (KPIs)
  • BCP design and implementation analysis

Required Skills:

  • 3 plus years object-oriented application development in a Microsoft .Net environment
  • PowerShell Scripting
  • Web technologies such as HTML5, Javascript, CSS, and AJAX
  • CICD pipeline management using Visual Studio Team Services or similar
  • Agile software development methodologies
  • Successful candidates will be required to consent to a criminal background check