As a Lead DevOps Engineer, you would be incharge of managing the
devOps team and your remit shall include the following :
-
Private Cloud - Design & maintain a high performance and reliable
network architecture to support HPC applications
- Scheduling
Tool - Implement and maintain a HPC scheduling technology like
Kubernetes, Hadoop YARN Mesos, HTCondor or Nomad for processing &
scheduling analytical jobs. Implement controls which allow analytical
jobs to seamlessly utilize ideal capacity on the private cloud.
- Security - Implementing best security practices and
implementing data isolation policy between different divisions
internally.
- Capacity Sizing - Monitor private cloud usage
and share details with different teams. Plan capacity enhancements on
a quarterly basis.
- Storage solution - Optimize storage
solutions like NetApp, EMC, Quobyte for analytical jobs. Monitor their
performance on a daily basis to identify issues early.
- NFS
- Implement and optimize latest version of NFS for our use case.
- Public Cloud - Drive AWS/Google-Cloud utilization in the
firm for increasing efficiency, improving collaboration and for
reducing cost. Maintain the environment for our existing use cases.
Further explore potential areas of using public cloud within the firm.
- BackUps - Identify and automate back up of all crucial
data/binary/code etc in a secured manner at -such duration warranted
by the use case. Ensure that recovery from back-up is tested and
seamless.
- Access Control - Maintain password less access
control and improve security over time. Minimize failures for
automated job due to unsuccessful logins.
- Operating System
- Plan, test and roll out new operating system for all production,
simulation and desktop environments. Work closely with developers to
highlight new performance enhancements capabilities of new versions.
- Configuration management - Work closely with DevOps/
development team to freeze configurations/playbook for various teams &
internal applications. Deploy and maintain standard tools such as
Ansible, Puppet, chef etc for the same.
- Data Storage &
Security Planning - Maintain a tight control of root access on various
devices. Ensure root access is rolled back as soon the desired
objective is achieved.
- Audit access logs on devices. Use
third party tools to put in a monitoring mechanism for early detection
of any suspicious activity.
- Maintaining all third party
tools used for development and collaboration
-
This shall include maintaining a fault tolerant environment for
GIT/Perforce, productivity tools such as Slack/Microsoft team, build
tools like Jenkins/Bamboo etc
Qualifications :
-
Bachelors or Masters Level Degree, preferably in CSE/IT
- 10+
years of relevant experience in sys-admin function
- Must
have strong knowledge of IT Infrastructure, Linux, Networking and
grid.
- Must have strong grasp of automation & Data
management tools.
- Efficient in scripting languages and
python
Desirables :
- Professional attitude,
co-operative and mature approach to work, must be focused, structured
and well considered, troubleshooting skills.
- Exhibit a high
level of individual initiative and ownership, effectively collaborate
with other team members.