Morgan Stanley is a leading global financial services firm providing a wide range of investment banking, securities, investment management and wealth management services.
The Firm's employees serve clients worldwide including corporations, governments and individuals from more than 747 offices in 42 countries.
In Morgan Stanley, Technology works as a strategic partner with Morgan Stanley business units and the world's leading technology companies to redefine how we do business in ever more global, complex, and dynamic financial markets.
Morgan Stanley's sizeable investment in technology results in quantitative trading systems, cutting-edge modelling and simulation software, comprehensive risk and security systems, and robust client-relationship capabilities, plus the worldwide infrastructure that forms the backbone of these systems and tools.
Our insights, our applications and infrastructure give a competitive edge to clients' businesses and to our own.
Technology & Reliability and Production Engineering
The mission of Technology is to provide a highly reliable and commercial technology platform, which supports the Firm's strategy, delivered by an innovative, world-class team of professionals.
Reliability and Production Engineering (RPE), a super-department within Technology, provides global services for Institutional Securities and Support Services applications.
Consolidated support functions include Plant Management / Engineering, Capacity Management, and Grid Management.
RPE includes a horizontal Plant management (PLM), Tools and Engineering practice area that complements its direct production activities.
This covers operational plant management, grid computing, platform engineering, capacity management and production tooling functions.
PLM is a global practice area operating out of New York, London, Montreal, Toronto, Shanghai, Tokyo and Bengaluru
Job Description :
QAPM Plant Management Group is looking for an experienced High Performance Computing (HPC) Grid Reliability Engineer.
Key expectations from this role are :
Innovative and proactive technology professional who can wear multiple hats
Primary environment is Unix server (Linux)
Operational support professional as well as assume technical ownership to automate / optimize various Grid engineering functions
Demonstrate technical & operational acumen to deal with escalations from respective application Support teams to troubleshoot
Resolve incidents as a HPC Grid subject matter expert in a global coverage role
Effectively communicate and coordinate with relevant Development / Production Management teams, work stream leads & stakeholders
Assume design, development & test accountability of assigned automation projects
Be able to drive automated RFB (Ready for Business) checks for various multi-tenant Grids and silo grids in order to timely escalate and respond to potential issues
Support relevant work stream Leads within Grid Engineering in identification and resolution of performance and capacity bottlenecks, risk and control gaps and infrastructural upgrades, as and when required
Demonstrate good understanding of Production Management methodologies i.e. Incident Management, Problem Management, and Event Management etc.
Manage incidents on behalf of the global team
Develop strong working relationships with Global Plant Management staff, Application Technology teams & RPE Leads
The successful candidate must demonstrate :
Strong technical skills along with strong communication and the ability to organize and manage their body of work.
Communicate across all levels of the organization to the appropriate level of detail to convey their message.
Experienced in Linux, command line tools, how processes interact with each other, memory, & storage management, and can demonstrate performance and troubleshooting skills.
Experience in scripting languages (Unix shell scripting, Perl, Python etc.)
Excellent communication, interpersonal, and writing skills along with an organized approach to manage a high volume of work.
Highly motivated problem solver that can multi-task, work under time pressures and be self-sufficient were required
Takes ownership and holds themselves and others accountable while also receptive to constructive feedback
Works well under pressure.
Bachelor?s degree in Computer Science or Computer Engineering from a 4-year program Experience working in an Equities and / or Fixed Income eTrading environment
4 yrs professional workplace experience
Experience of similar roles, developing or supporting infrastructure systems
IBM Spectrum Symphony grid software.
Cloud technologies - Amazon Web Services, IBM Cloud, Azure.
Ansible Infrastructure Automation and Configuration Management.
Source code control with git.
Advanced Unix knowledge of Kernels, File system, and memory management.
Experience of Agile methodologies using Atlassian Jira.
Demonstrate the proactive nature of scaling up / down application infrastructure (i.e., servers, storage) leveraging moving average trends, peaks, and business forecasts to take the necessary actions before an incident occurs where an application has no more room to grow
bility to gain consensus in formal settings by preparing agendas, presentations, and meeting minutes.
Understanding of the SDLC (Software Development Lifecycle) process with the ability to work closely with development teams to ensure properly designed infrastructure for the application needs, following proper change management process.
Understanding of and experience of Operating within an ITIL Framework.