Reporting to the Security Manager, the Storage Systems Specialist (the Specialist) is responsible for capacity planning, allocation, scheduling, maintenance and optimization of storage systems to meet the university’s research data storage requirements and ensure the efficiency, performance, availability, scalability and high reliability service delivery of ongoing storage system operation activities at the Centre for Advanced Computing (CAC). Working as a key member of the Security Operations Team, the incumbent plans, implements, maintains, manages, troubleshoots and actively monitor storage and backups solutions, system configuration, hardware, software, and related infrastructure. This position provides storage and backups systems infrastructure design and solution support withina complex, backup and recovery enterprise environment, including the congoing critical evaluation, modeling, simulation, testing and selection of storage system software and hardware components of the infrastructure, ensuring uninterrupted support to the CAC) and its clients. The Storage Systems Specialist maintains a high degree of technical knowledge to develop and maintain backup and recovery procedures, provide technical support / services to the research platform, and create technical and user documentation for training purposes.
The schedule for this position requires the incumbent to work occasional early mornings, evenings and / or weekends, and may also require participation in an on-call rotation to provide system support outside of regular work hours.
- Install, configure, maintain, monitor, test and upgrade hardware, operating systems, and storage solutions, in accordance with security standards and project/operational requirements.
- Works closely with CAC’s analytics developers, analysts, system engineers, and client users to understand research storage requirements and translate those requirements into refined, technical solutions. This involves defining and developing product architecture through the provision of designs, performing needs assessments, evaluation and advice on selection of storage products that best meet research needs.
- Liaise with various external software and hardware providers to ensure the security, maintenance and support of CAC’s storage systems infrastructure. This includes maintaining relationships with providers, troubleshooting network problems, and assessing vendor products.
- Provides technical support, guidance and direction for storage products employed to clients of the CAC, including the OHDP -Q platform. Monitors troubleshooting to isolate and diagnose common system problems, resolves problems identified by CAC technical staff or through monitoring software, and documents events to ensure continuous functioning. Refer more complicated hardware and software problems to vendors for repair.
- Create storage partitions, file sets and shares based on client needs.
- Assign file and sharing permissions and ACLs based on client needs implementing security best practices.
- Develop solutions in the storage and backup and recovery environments, implement and manage storage sharing protocols such as NFS and SMB.
- Manage, schedule and monitor backups, replication and snapshots, deploying Backup configuration setup and providing all levels of support of storage and/or backup infrastructure and services.
- Install, configure and manage backup clients on servers, workstations and endpoints. Monitors server/storage infrastructure and any processes related to these systems.
- Ensure the appropriate backup level provisioning is maintained, by reviewing daily backup operations, ensuring all required file systems and system data are successfully backed up to the appropriate media, recovery tapes or disks are created, and media is recycled/destroyed and sent off site as necessary.
- Ensure that all systems conform to standards for security controls and disaster recovery; documenting downtime; completing technical and systems reviews; and performing regular restoration and disaster recovery testing of storage-based systems. Investigates, identifies and documents proposals that will improve application recoverability.
- On-going evaluation of systems’ efficiency, connectivity, and communication by performing daily system monitoring, verifying the integrity and availability of hardware, logging and event management, server resource allocation, systems and key processes.
- Document, and maintain installation and storage system configuration and procedures as part of the internal knowledge base.
- Collaborates with team members on the development and analysis of system standards, thresholds and recommendations to maximize system performance. Analyzes and monitors system performance through the implementation of baseline changes and infrastructure upgrades, and prepares storage usage reporting.
- Assist in the development of proposals, reports, and analyses with respect to the department’s IT strategies, by recommending and implementing innovative, and where possible automated, approaches for system administration tasks.
- Identify approaches that leverage CAC resources and provide economies of scale.
- Evaluates and ensures changes are in accordance with appropriate operating procedures; recommends revisions or changes based upon results.
- Maintain a high degree of knowledge/expertise in the use of computer operating systems, storage and backups, security, and other aspects of information technology.
- Provide maintenance of technical support for production systems. Ensure that system security, backup, and recovery mechanisms are operational.
- Willingness to promote equity, diversity and inclusion in the workplace.
- Undertakes other duties as required in support of the CAC.
- University degree, with a technical concentration, such as Engineering, Computer Science or related field.
- Several years (minimum 5 years) relevant experience working within the information technology field.
- Demonstrated experience with direct and shared storage infrastructure, and competency with Ethernet and Fibre channel storage networks.
- Experience managing, maintaining, tuning and updating storage solutions such as IBM Spectrum Scale GPFS or Lustre filesystem.
- Proven experience maintaining backup and recovery technologies, with experience in IBM Spectrum Protect backup solutions preferred.
- Experience in HPC using Linux (CentOS/Redhat/Ubuntu) with Linux certifications such as RHCSA /RHCE.
- Criminal background check (CPIC) required.
- Consideration maybe given to an equivalent combination of education and experience.
- High degree of knowledge on backup concepts relating to type, iteration and retention in accordance with compliance and legal statutes. Backup automation, scheduling, verification and data migration experience.
- Knowledge and application of Python and other scripting languages such as Bash.
- Demonstrable technical proficiency in managing a grid environment (SLURM Workload Manager) is an asset.
- Familiarity with Windows Active Directory and/or OpenLDAP and the ability to integrate systems with it.
- Possess expertise in Linux and Windows server operating systems.
- Ability to successfully and effectively deliver and support high availability systems.
- Ability to support and maintain VMWare and Tape backup appliances:
- Well-developed analytical and problem-solving / troubleshooting skills to understand problems across a variety of technologies, assist in systems design / development as well as research solutions for problems that may have no readily available support.
- Must be able to work in a client-service, team-based environment.
- Effective time management skills to handle a broad range of responsibilities and frequent interruptions.
- Demonstrated capability to manage all forms of stakeholder, client and vendor communication in a professional and timely manner.
- Strong written and verbal communication and interpersonal skills in order to advise, teach, consult, and exchange data with individuals at various levels of technical familiarity and proficiency.
- Ability to integrate technology into the work environment in order to maximize efficiency and accuracy.
- Effective time management skills to handle a range of responsibilities under time pressure.
- Ability to adapt to a changing work environment and to acquire new technical skills as it becomes necessary.
- Ability to work using a “client service-centred” perspective in a team-based, project-focused, collaborative and innovative environment.
- Ability to work on own and with limited supervision as well as collaborate with team members, as required by the assigned task.
- Determine appropriate problem-solving procedures and decide how to best rectify the problem. Decide if a problem should be referred to others.
- Determine how best to meet the department's need for Storage systems.
- Allocate time, prioritize tasks and determine work flow. Continually assess and adjust priorities and manage tasks in a fast-paced and demanding environment.
- Ability to evaluate and make recommendations such as the purchase or acquisition of new storage systems, peripherals and software.
- Assesses the nature of a request and assists the customer as appropriate.
- Determine which reports and analyses are required, who to distribute information to, and how to present the information clearly and effectively.
- Make recommendations toward the allocation of resources.
- Confidentiality is paramount, therefore an aptitude to differentiate what information is sharable when and to whom.