Job Description
Position Purpose:
The incumbent will be a subject matter expert on monitoring tools and processes used by the commonwealth and is responsible for collaborating with technical specialists, agency teams, and vendors to implement actionable monitoring and reporting. The position’s responsibilities also include coordinating efforts to transform person-centric processes into structured, repeatable, and highly documented automated workflows. Additionally, this position is responsible forthe management and continuous improvement of key enterprise monitoring processes, including changes, incident reporting, and problem resolution. The incumbent will also be responsible for evaluating, preparing, and implementing technical solutions for on-prem and cloud-based applications and technology resource. The position will develop and maintain standard operating procedures(SOPs) and ensure consistent communication strategies to enhance operational efficiency and service delivery.
Description Of Duties:
- Responsible for functioning as the Technical SME on an enterprise-wide systems.
- Responsible for implementations of products/services that involve significant Commonwealth oversight.
- Interpret, process, and report data to create meaningful business and operational dashboards.
- Maintain (patch, troubleshoot) existing and future monitoring tools including System Center Operations Manager, SolarWinds,SightLine, and SquaredUp.
- Identifies improvements to existing processes and tools to achieve high quality services/products.
- Create Azure Monitor resources and Log Analytics queries.
- Create, document, and maintain on-prem and cloud automations.
- Create, document, and maintain SOAP/REST/JSON/API calls using PowerShell or other compatible languages.
- Maintain and troubleshoot monitoring tool connectivity to endpoints.
- Creates documentation for new processes
- Updates documentation for existing process
- Documents incidents and problems impacting monitoring services.
- Collaborate with the enterprise change manager to ensure processes are standardized and documented workflows are followed.
- Collaborate with the Enterprise Incident Manager to ensure that standardized SOPs and processes are consistently applied across incident and problem management.
- Monitor incident and problem resolution processes to ensure timely and effective service restoration and root cause analysis.
- Manage and document the operational procedures and responses of NOC teams to service delivery and incident management.
- Ensure all processes and workflows are documented in an accessible, organized, and secure manner for future reference.
- Establish and maintain Standard Operating Procedures (SOPs)for all relevant operational processes.
- Emphasize the transition from informal, person-dependent workflows to formal, role-driven processes.
- Develop and document a process documentation workflow that ensures all operational procedures are captured and updated regularly.
- Ensure that consistent and clear communication processes are in place for changes, incidents, and problem management across the NOC.
- Create and manage distribution lists for technical and non-technical stakeholders to ensure relevant parties are informed of NOC updates.
- Enable self-management of distribution lists via subscription options to streamline communication across the organization.
- Work closely with NOC staff to ensure effective communication regarding change, incident, and problem management on behalf of NUTSO.
- Ensure collaboration between different departments to harmonize efforts in incident, problem, and change communication.
- Complies with and develops recommendations for executive public and enterprise policy objectives as it relates to the delivery of Commonwealth IT services.
- Utilizes the Service Now Change management tool to input request for changes.
- Directs the development of policies and procedures consistent with Commonwealth standards and direction.
- Participates in Enterprise change management meetings for enterprise level service configuration and access changes for all supported locations is not impacted.
- Provides on-going data submissions regarding network availability, problem resolution and infrastructure enhancements for use in compilation of the monthly/quarterly customer Service Level Agreement (SLA) reports.
- Designs agency disaster recovery plans for the network infrastructure and participates in periodic plan updates and testing exercises.
- Reviews technical manuals and other literature, attends seminars, conferences,and training classes to maintain currency with new information services,products, and information technology developments in network technology.
- Performs other related duties as assigned, to include those outlined in the CoG Plan when the Plan is activated. Responds to the designated alternate or secondary location when directed in response to a catastrophic incident.
This position is expected to adhere to established organizational service management processes and procedures.
Qualifications:
- 5+ Years as SolarWinds admin/deployment experience
- 5+ Years of Ansible admin/deployment experience
- 3+ Years of Experience of Log Analytics Azure experience
- 8+ Years of MS Windows Server admin/deployment experience
- 3+ Years of Linux Server admin/deployment experience
- 5+ Years of PowerShell scripting experience
- 3+ Years of Incident Management Experience
Skills:
- Experience with MS Windows Server administration/deployment Required 8 Years
- Experience with SolarWinds administration/deployment Required 5 Years
- Experience with Ansible administration/deployment Required 5 Years
- Experience with Log Analytics in Azure Required 3 Years
- Experience with Linux Server administration/deployment Required 3 Years
- PowerShell scripting experience Required 5 Years
- Incident Management experience Required 3 Years