implementing, and maintaining our cloud-based infrastructure and CI/CD
pipelines. You will collaborate closely with software developers and other
teams to automate and streamline our development and deployment processes,
while are also responsible for the stability and reliability of our systems.
* Design, deploy and maintain environments hosted in the cloud such as
AliCloud, Azure, AWS, GCP, including cloud infrastructure, security, and
commonly used cloud components by applications.
* Use Infrastructure as Code (IaC) methodology to manage cloud
infrastructure. Automate provisioning and scaling of resources to ensure high
availability and reliability.
* Develop and maintain CI/CD pipelines using tools like AliCloud DevOps,
Jenkins, ArgoCD etc. Enable automated testing, deployment, and rollback
mechanisms to accelerate software release.
* Implement monitoring and alerting solutions to ensure the health and
performance of systems.
* Monitor and analyze system performance and availability.
* Participate in incident response and troubleshooting efforts to minimize
service disruptions and ensure the operational stability of production systems.
* Identify and resolve performance bottlenecks and optimize resource
utilization and Implement cost-saving strategies for cloud resources.
* Maintain detailed documentation of system configurations, processes, and
* Work closely with the security team to implement best practices for
securing cloud resources and ensure compliance with industry standards and
local cybersecurity laws and regulations.
* Bachelor’s degree in computer science or information technology related
* At least 5 years’ working experience as a SRE or DevOps engineer.
* Strong cloud operations background preferred and proficiency with
* Strong knowledge of Infrastructure as Code (IaC) tools like Terraform, and
automation tools like Ansible.
* Strong knowledge of CI/CD process and hands-on experience of CI/CD tools
management and pipeline development.
* Experience with containerization and orchestration technologies (e.g.,
* Experience with monitoring and logging solutions.
* Knowledge of scripting languages (e.g., Python, Bash).
* Understanding of security best practices in cloud environments.
* Excellent problem-solving skills and a proactive attitude toward addressing
* Good verbal and written English communication, excellent communication, and
* Relevant cloud certifications (e.g., AliCloud ACP, AWS SAA) is a plus.
* Knowledge of IoT platform is a plus.