CASY-MSCCN Jobs

CASY-MSCCN Logo

Job Information

Texas A&M University High-Performance Computing Engineer in United States

Job Title

High-Performance Computing Engineer

Agency

Texas A&M University - Kingsville

Department

I Tech

Proposed Minimum Salary

Commensurate

Job Location

Kingsville, Texas

Job Type

Staff

Job Description

Job Summary

The High-Performance Computing Engineer (HPC) is a unique role that combines the design, development, and operational management of the institution's high-performance computing resources. This position offers the opportunity to work closely with faculty, researchers, and students, supporting their computational research projects and ensuring the HPC infrastructure meets their needs. The engineer will play a crucial role in optimizing computational methods and facilitating groundbreaking research across disciplines. The High-Performance Computing Engineer manages the High-Performance Computing cluster administration, unit coordination, maintaining HPC systems, strategic planning for the University’s HPC infrastructure, and providing advanced technical support for using HPC systems.

Essential Duties and Responsibilities

System Architecture and Design

  • Develop and design HPC infrastructure: Select, design, and implement HPC systems, including compute clusters, storage networks, and high-speed interconnects, to meet the varied computational needs of the research community.

  • Evaluate and integrate new technologies: Stay abreast of advancements in HPC, cloud computing, and storage technologies. Evaluate the potential benefits of new technologies and integrate them into the existing HPC ecosystem to enhance performance and capabilities.

System Administration and Maintenance

  • Manage HPC resources: Perform system administration tasks on HPC clusters, including configuration, maintenance, and troubleshooting of hardware, software, and networking components.

  • Optimize system performance: Monitor system performance, identify bottlenecks, and implement optimizations to ensure efficient utilization of HPC resources.

  • Ensure system security: Implement and maintain security measures to protect sensitive data and computing resources from unauthorized access and cyber threats.

User Support and Collaboration

  • Provide technical support and training: Assist researchers with technical issues related to HPC usage. Organize training sessions and workshops on HPC best practices, programming, and optimization techniques.

  • Collaborate on research projects: Work closely with research groups to understand their computational requirements and assist in developing efficient computational strategies, code optimization, and parallelization.

Software and Application Management

  • Install and manage scientific software: Deploy and maintain a wide range of scientific applications, libraries, and development tools on HPC systems to support research activities.

  • Develop custom tools and scripts: Write custom scripts and tools to automate common tasks, improve system management, and facilitate complex computational workflows.

Research and Development

  • Stay informed about the latest research in HPC and related fields.

Data Management and Storage

  • Implement data management policies: Develop and enforce data storage, backup, and archiving policies to ensure data integrity and availability.

  • Optimize data storage and access: Design and maintain scalable storage solutions that provide fast and efficient access to large datasets, integrating with HPC compute resources.

Networking and Collaboration

  • Build partnerships: Establish and maintain collaborations with industry, other academic institutions, and HPC networks to share knowledge, resources, and best practices.

    The above represents the major duties, responsibilities, and authorities of this job, and is not intended to be a complete list of all tasks and functions. Other duties may be assigned.

Minimum Requirements

Education – Bachelor’s degree in information technology, computer science, or a related field.

Experience – 12 years of related experience in research computing or administering HPC systems.

Knowledge of – Knowledge of word processing and spreadsheet applications. Knowledge of advanced systems theory, project portfolio management, strategic change management, risk management, strategic disaster recovery and business continuity, return on investment (ROI) analysis, solutions integration, leveraging strategic resources, and strategic partnering.

Ability to – Ability to multitask and work cooperatively with others. Excellent written communication, analytical, interpersonal, and organizational skills.

Preferred Requirements

Education – Master's in Computer or Computational Science, Statistics, or Engineering program.

Experience – Twelve years of experience in HPC related to hands-on system administration and management of large-scale supercomputing clusters at all levels, the use of parallelization techniques, the use of programming languages, tools, and techniques with Fortran, C/C++, Java, or POSIX threads, etc., and mass storage architecting and planning.

  • Five years of management and leadership experience in HPC or research computing centers.

  • Experience with computing clusters in Windows and Linux and virtualized environments.

  • Ability to evaluate and benchmark cluster architectures and their key subsystems (e.g., mass storage, interconnect, processor technology). Knowledge of scripting languages like Bash, Python, and Perl to maintain HPC systems and scientific computing. Knowledge of C/C++, Fortran, CUDA, OpenCL, OpenMP, and MPI for scientific computing. Configuration management tools include Puppet, Chef, Ansible, Salt, etc. Knowledge of container technologies such as Docker, Singularity, and Kubernetes. Excellent troubleshooting skills include quickly recognizing failure modes and corresponding symptoms. Excellent intercommunication skills.

  • Higher Education Experience

Licensing / Professional Certification – Linux/UNIX certifications related to systems administration.

  • Certifications related to managing high-performance storage systems.

Supervision of Others

This position generally does not supervise employees.

Other Requirements

This position is on-site only

All positions are security-sensitive. Applicants are subject to a criminal history investigation, and employment is contingent upon the institution’s verification of credentials and/or other information required by the institution’s procedures, including the completion of the criminal history check.

Equal Opportunity/Affirmative Action/Veterans/Disability Employer.

Thank you for your interest in a career with Texas A&M University-Kingsville (TAMUK) ! We are a public research university located in Kingsville, Texas. TAMUK is the southernmost campus of the Texas A&M University System. It is located 40 miles southwest of Corpus Christi and is within a 4-hour or less drive to multiple metro areas, including: San Antonio, Austin, Houston, and Brownsville- McAllen- Harlingen.

The mission of Texas A&M University- Kingsville is to enrich lives through education, discovery, and service in South Texas and beyond. The university's enrollment of approximately 7,000 students reflects the ethnic diversity of South Texas, with approximately 65% of the students of Hispanic origin. The university consists of five academic colleges: Arts & Sciences, Business Administration, Engineering, Agriculture & Natural Resources, and Education & Human Performance. The University has the Carnegie classification of: Doctoral University: High Research Activity .

According to Forbes Media, ratings from Javelina alumni rank Texas A&M University- Kingsville among the top 25 Universities in the nation. The rankings are based on alumni feedback about their experience at the institution.

Kingsville, a friendly and safe community , is home to nearly 30,000 people and represents a vital part of the Texas ranching industry with more than 60,000 cattle and 300 quarter horses. In addition to being home to the first institution of higher learning in South Texas, Kingsville is also home to Naval Air Station Kingsville and several Fortune 500 industrial companies.

From its quaint, historical buildings downtown to the longhorn and Santa Gertrudis cattle that graze in the legendary King Ranch, Kingsville still retains much of its unique historical charm while continuing to steadily grow into its future .

Visi t www.tamuk.edu or www.kingsvilletexas.com for more information.

If you need assistance in applying for any position , please contact a recruiting partner at (361) 593-3705.

The following links will open in a new tab

  • TAMUK Employee Services Home Page (http://www.tamuk.edu/finance/hr/)

  • Pro s pective Employees

  • TAMU S y stem Job Opportunities (http://apps.system.tamus.edu/jobsearch/)

  • E m ployee Benefits

  • Campus Crime St a tistics (http://www.tamuk.edu/dean/cleryreport.html)

DirectEmployers