SR HPC System Engineer (100% REMOTE)
Job description
Clearance Level None Category Systems Engineering Location Remote, Based in the USARequisition Type: Pipeline
Your Impact
Own your opportunity to work alongside federal civilian agencies. Make an impact by providing services that help the government ensure the well being of U.S. citizens.
Job Description
At GDIT, people are our differentiator. Our work depends on a Senior HPC Systems Engineer joining our team to support the National Oceanic and Atmospheric Administration (NOAA), Weather and Climate Operational Supercomputer System (WCOSS). This position is remote with some travel required.
WCOSS provides NOAA the operational High Performance Computing (HPC) resources essential to process sophisticated numerical models used to predict and understand atmospheric and oceanic phenomena for weather and climate operational use. Operating 24/7, the next 10-year WCOSS program will deliver significant computational capability that will evolve over time to keep pace with NOAA’s growing environmental modeling needs.
We think. We act. We deliver. There is no challenge we can’t turn into opportunity.
In this role, a typical day will include:
Applying current HPC systems administrative skills; desire to learn and deploy new technologies.
Developing and deploying monitoring capabilities.
Developing and implementing tools for cluster administration.
Providing technical support with team of HPC System & Storage Administrators to resolve operational issues.
Providing off-hour on-call support on a rotating basis.
REQUIRED QUALIFICATIONS
Bachelor’s degree or equivalent and 10+ years of related experience (5+ years of experience with HPC systems operations highly preferred).
Parallel filesystem configuration and monitoring experience (e.g., Lustre, NFS).
Batch management/scheduling experience, PBSpro preferred.
Experience working in a 24X7 operational environment.
U.S. Citizenship is required.
DESIRED QUALIFICATIONS
Demonstrated experience to deploy and manage large-scale HPC systems using OS provisioning tools (e.g., xCat).
Demonstrated experience using configuration management tools (e.g., Ansible, Puppet).
Linux system administration experience (e.g., SLES, RedHat or CentOS).
Network interconnect configuration and monitoring experience (e.g., Infiniband, Ethernet).
Programming or scripting in at least two languages (e.g., Bash, Perl, Python, C).
Strong writing skills for technical documents, system procedures, user wiki’s and FAQs.
Ability to work both independently and as part of a team.
GDIT CAREERS
Opportunity Owned
Discover more at www.gdit.com/careers
blackflymedia.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, blackflymedia.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, blackflymedia.com is the ideal place to find your next job.