Must Have Skills:
5+ years Cloud (AWS, Azure or other)
5+ Dev (Java, Ruby, Python or C++)
5+ Linux and Windows OS
TCP/IP and UDP expertise
This is a FTE position for US citizens and or Green card applicants, no H1b's or contractors
Our client who is one of the main stage players for the Global Cloud which is growing in every dimension; features, customers, dependent services, and usage.
We are looking for a highly-motivated Software Engineers who shares our passion for the complex technical and process engineering challenges that are part of delivering cloud solutions for enterprise customers. You will partner across teams to deliver technical results which improve service performance, reduce the impact of infrastructure failures for key components and pipelines. In addition to technical challenges, you will also optimize key processes and technologies to ensure secure, efficient and effective service delivery. You will work upstream with partners to ensure common engineering challenges are solved. As an expert in services you will play a critical role shaping, defining, implementing, and sustaining cloud service platforms in new geographical regions and markets. An ideal candidate is hands-on with code and loves to analyze data obtained through instrumentation from thousands of server nodes and drives recommendations and improvements through ambiguity amidst evolving requirements, environments, topologies, and workloads.
- Driving new technology solutions with evaluation, design, development, integration, and certification.
- Analyze real-time metrics and provide insights to influence demand shaping for large customers and global capacity planning/buildout efforts.
- Champion customer experience and drive cross-team monitoring efforts to improve availability, reliability, and performance.
- Participate in on-call rotation.
- Design, write and deliver software to improve the deployment, service health monitoring, availability, and efficiency. Focused on optimization, tuning, and when appropriate architectural changes.
- Drive efforts to improve and streamline continuous deployment & build out infrastructure.
- Build and operate systems to manage hardware health and capacity.
- Debug/Troubleshoot mission critical performance and functional issues, service and architecture bottlenecks and come up with solutions to prevent future recurrences .
Day to day activities:
- The initial part of this permanent role is you will be involved in is to migrate existing bare-metal infrastructure to Azure in the form of IaaS, migrate the instrumentations, monitoring tools from Linux to Azure environment.
- Going forward you will participate as a DevOPS engineer to design, develop and operate the service with the main focus on making the services “Operationally Ready” as the team start to rebuild Linux based applications into Azure hosted applications which will be based on PaaS v1 and v2.
- Develop fundamental strategic models for scaling service infrastructure from physical topology through application traffic management.
- Design and Automate complex multi environment infrastructure configurations, QC, and provisioning.
- Able integrate large scale solutions based on off the shelf technologies and where appropriate building custom components as needed. Respond to emergency scale driven system strain through solutions focused on optimization, tuning, and when appropriate architectural changes.
- Debug mission critical performance and functional issues, service and architecture bottlenecks and come up with solutions to prevent future recurrences.
- Design and implement solutions based on machine learning and anomaly detection techniques to improve incident response, identification of issues related to system changes, and correlation of system data related to customer experiences.
- Solid understanding of Data Center Platform Engineering
- Apply service management solutions to bring consistency and transparency across hybrid bare metal/cloud architectures.
- Identify opportunities to optimize across connected services based on end to end customer workflows rather than specific subservices or components.
The ideal candidate:
- Expertise in developing and operating Linux and Azure
- Understanding the networking stack TCP/IP & UDP
- Must have DevOps/Development experience
- You must be currently and have been working within AWS, Azure, Softlayer, etc. or one of the core Cloud environments over the past five years
- You are a full stack Engineer from the network layer on up.
- You have expertise in developing workflows to completely manage the entire lifecycle code of hardware so that it can be swapped out as it ages. You have skills interacting with multiple teams that own the servers and be able to work towards a centralize mgmt. of it all
- You’re a hands on technical leader & contributor able to coordinate across multiple engineering teams and disciplines to delivery complex projects
- You’re a technically focused, exceptional engineer and enjoys solving the challenges of scale and integration through code
- You continuously evaluate and apply new and/or emerging technologies to meet appropriate business goals
- You’re the engineer that sees the technical pathway through late breaking changes and emergent priorities and can bring thoughtful and critical decision making emphasis to any business + technical discussions
- You embrace conflicting ideas and leverage data and experimentation to help make key decisions
- You have dev skills but more around dev automation & designing infrastructure developing an application
- You are less Networking or IT, more Systems and Development
- You may have expertise in Microsoft Stack, Open Source, Docker, etc.
Must have Core skills required for this role include:
- 5+ years Java, Ruby (you must be currently doing dev, daily, weekly)
- 5+ years LINUX, WIN OS
- 5 + yrs Powershell
- 3+ Data Center
- 3+ years Cloud: AWS, Azure, Softlayer (any of the core cloud's, you must be working within one over now and over the past 3 yrs min.)
- 3+ years of service engineering experience
- 3+ Solid background and understanding of TCP/IP and/or UDP
- 3+ Demonstrated Cloud/Network Troubleshooting expertise
Nice to have:
- C++, Python 5+ yrs
- Layer 2/3/4 & 7
- Understanding of user identity concepts and implementations: - Active Directory and Federation scenarios
- Experience in running and maintaining a 24×7 internet-oriented production environment with on-call responsibilities
- Understand and troubleshoot networking problems, configuration and application workflow changes.
- Ability to influence, including partners and technical teams across organization
- MS of PhD in Computer Science, Mathematics, or Computer Engineering or relevant work experience
Core Questions a candidate must be able to answer min. 3 of these questions , if you can correctly answer most of them then you are a strong candidate:
- Can you subnet without a computer or calculator (on a whiteboard, cold turkey)?
- Have you ever used a debugger and would know how to walk a call stack?
- Do you know the difference between User mode and kernel Mode and can explain it @ some level of detail?
- Can you write (in C#, PowerShell, python) on my whiteboard to produce a hash-table merging two sets of data? How about order a list or do a simple sort? Or, perhaps simply filter key values out of an XML file?
- Can you process a web server log file or CSV to aggregate data based on any field that I select, perhaps quantized by 15 minute interval? How would you approach this for 10 TB of information?
- Can they figure out a way to scale out collect performance data from applications remotely? g. Can you explain DNS, HTTP, SSL, and underlying protocols (TCP/UDP) accurately (packet by packet)?
- Could you explain BGP or AnyCast in any level of detail?
- Do you know how to scale these protocols out at internet scale? (Hardware/software load balancing, global traffic management – both DNS and other protocols).
- Can you explain how large scale software distribution works (Modern engineering practices like: Slicing, experimentation, partitioning, upgrade domains, etc…)
- Are you familiar with the concept of Fault Injection and how it fits into the modern/agile software worlds (Chaos Monkey)?
- Can you come up with a BC/HA/DR plan for a simple stateful web application? (Business continuity, High Availability, disaster recovery…). And do you understand where the complicated challenges are in HA applications?
Compensation: Comp based on experience, skill set match, etc.
- Total Comp Packs range from: $140 up to $210k +/- ( includes base, bonus and stock ) depending on qualifications, your performance during interviews, potential offer could be less.
- The client will pay for relocation if accepted (this is negotiated)
US Citizens or Green card applicants need apply