We seek a Senior DevOps Engineer / Head of Infrastructure to lead and continuously improve our infrastructure and deployment practices. This is a critical role for a growing fintech/crypto company operating on AWS with a fully containerized microservice architecture.
You’ll ensure uptime, scalability, and security across multiple environments. You’ll work closely with backend teams, security experts, and compliance to deliver a resilient infrastructure that meets the standards of regulated financial services.
This role includes participation in an on-call rotation, where you will be expected to respond quickly to incidents, mitigate issues, and improve system resilience through automation and root cause analysis.
Key Responsibilities * Lead the design and implementation of cloud infrastructure (AWS) for scalable and secure deployments * Own and manage our GitLab CI/CD pipelines, automating everything from builds to production releases * Build and maintain Infrastructure-as-Code using Terraform * Ensure system observability: set up and manage logging (e.g., ELK, CloudWatch), metrics (e.g., Prometheus), and alerting (e.g., Grafana, PagerDuty) * Maintain containerized application environments using Docker and orchestrators (preferably ECS or EKS) * Design and implement secure and compliant infrastructure practices (networking, IAM, secrets management) * Define and enforce SLAs, SLOs, and participate in incident management lifecycle * Be part of the on-call rotation to respond to production incidents and ensure rapid recovery * Collaborate with engineering teams to improve development, release, and rollback workflowsRequirements * 5+ years of DevOps, SRE, or Infrastructure Engineering experience * Strong expertise in AWS core services: EC2, RDS, ALB/NLB, S3, IAM, Route53, CloudFront, Secrets Manager * Advanced knowledge of Terraform and/or similar IaC tools * Deep understanding of CI/CD pipelines, particularly GitLab CI * Hands-on experience with monitoring/alerting stacks (Prometheus, Grafana, ELK, etc.) * Proficient with Docker and container orchestration (ECS, EKS, or Kubernetes) * Excellent troubleshooting skills and a proactive approach to reliability and scalability * Strong knowledge of Linux-based systems and scripting (Bash, Python, or similar) * Prior experience with incident response and on-call duties
Nice to Have * Experience in fintech or crypto platforms, or other regulated environments * Understanding of compliance standards like DORA, PCI DSS, SOC2, GDPR, etc. * Experience working with active-passive failover setups and disaster recovery planning