Manager/Cloud Operations at Ecobank
Ecobank View all jobs
- Accra
- Permanent
- Full-time
- As part of the Technology Infrastructure Operations unit, the Cloud Operations function plays a mission-critical role in delivering secure, stable, scalable, and cost-efficient cloud services across the Bank's multi-cloud landscape (AWS, Azure, Oracle Cloud Infrastructure, and GCP). The unit is responsible for operating cloud-native and hybrid infrastructure supporting digital banking platforms, internal enterprise systems, and innovation programs involving AI/ML and advanced analytics.
- The Manger, Cloud Operations will provide strategic and operational leadership across all cloud environments, ensuring that platform performance, reliability, security, compliance, and cost-efficiency targets are consistently met. The role acts as a technical bridge between Cloud Engineering, Application Development, Cybersecurity, and Business stakeholders to ensure end-to-end service continuity and resilience across production systems.
- To lead and manage the end-to-end operations of the Bank's multi-cloud infrastructure, ensuring high availability, security, cost efficiency, and compliance of cloud services that power business-critical workloads. The role is responsible for defining and implementing operational standards, monitoring and incident management, disaster recovery, service reporting, and continuous improvement of platform uptime and customer experience.
- Oversee daily operations of enterprise cloud environments to ensure optimal performance, availability, and resilience.
- Lead the CloudOps team, providing direction, mentorship, and capacity planning support.
- Manage and continuously improve operational processes including incident, change, and problem management aligned with ITIL practices.
- Define and own the Cloud Operations roadmap, aligning with the overall cloud strategy and enterprise goals.
- Lead operational readiness for new cloud-native deployments across AWS, Azure, OCI, and GCP.
- Participate in IT governance bodies such as the Change Advisory Board and Architecture Design Forums to represent CloudOps interests and influence standards.
- Own the operational health of critical cloud workloads by implementing and monitoring SLAs, SLOs, and error budgets.
- Coordinate 24/7 on-call schedules and escalation processes for critical services.
- Drive site reliability engineering (SRE) principles such as automation, self-healing, and chaos engineering where applicable.
- Oversee capacity planning and performance tuning, ensuring cloud environments can scale to meet evolving workload demands.
- Implement and maintain disaster recovery and backup strategies, aligned with defined RTO/RPO objectives, including cross-region replication where necessary.
- Implement and manage observability tools (e.g., CloudWatch, Datadog, Prometheus, Grafana, Splunk).
- Lead root cause analysis and post-incident reviews to ensure continuous learning and issue prevention.
- Ensure that telemetry, logging, metrics, tracing, and alerting are fully implemented across all environments.
- Establish centralized observability across all supported platforms using tools such as Azure Monitor, OCI Logging, GCP Operations Suite, and OpenSearch.
- Coordinate escalation workflows across L1-L3 support tiers and ensure major incident reviews are conducted with timely RCA documentation.
- Partner with FinOps and Architecture teams to track cloud costs, identify waste, and recommend optimization strategies (e.g., rightsizing, Graviton/AMD migrations, auto-scaling).
- Support tagging and chargeback models for cloud resource usage transparency.
- Monitor budgets and cost trends, enforce lifecycle management policies, and assist in Reserved Instance/Savings Plan adoption.
- Promote cost optimization playbooks, including shutdown schedules for idle resources and policy enforcement for cost guardrails.
- Track SLA/OLA adherence for cloud services and align FinOps targets with service availability metrics.
- Enforce security best practices across cloud environments, including IAM hygiene, encryption, patching, and secrets management.
- Ensure audit readiness, participate in compliance reviews, and respond to regulatory inquiries.
- Oversee vulnerability remediation efforts, including OS patching, container scanning, and image compliance enforcement.
- Collaborate with security and compliance teams to align operations with frameworks such as ISO 27001, SOC 2, PCI-DSS, and internal audit requirements.
- Maintain evidence packs and logs to support internal/external audit reviews.
- Lead initiatives to automate cloud operations tasks-patching, backups, scaling, DR drills, etc.
- Champion infrastructure-as-code (IaC) and GitOps practices to reduce manual intervention.
- Collaborate with DevSecOps teams to streamline CI/CD processes and shift-left monitoring.
- Promote immutable infrastructure and self-healing mechanisms as default deployment strategies.
- Drive platform standardization efforts by implementing SOPs, runbooks, and automated recovery workflows.
- Leverage tools like Terraform, GitLab CI/CD, and Ansible to build consistent, repeatable environments across clouds.
- Interface with product teams, cybersecurity, governance, application support, and leadership to align on priorities and communicate status.
- Provide executive dashboards and regular operational reports on KPIs, SLAs, incidents, and cost performance.
- Represent CloudOps in cross-functional steering committees, strategy reviews, and executive updates.
- Advocate for operations-centric input in early project phases to ensure readiness and avoid handover gaps.
- Ensure knowledge management through the upkeep of documentation, SOPs, and collaborative portals.
- Bachelor's or Master's degree in computer science, Engineering, Information Technology, or related discipline.
- Cloud Certifications: AWS SysOps Administrator/DevOps Engineer Pro, Azure Administrator/Expert, OCI Operations Associate, GCP Cloud DevOps Engineer.
- ITIL v4 Foundation or Practitioner (required).
- Terraform Associate, FinOps Certified Practitioner, or equivalent (preferred).
- Minimum 10 years of experience in IT Infrastructure, with at least 5 years in cloud operations leadership roles.
- Proven experience managing multi-cloud environments and hybrid deployments in a high-availability enterprise context.
- Strong background in cloud networking, container operations (EKS, AKS, OKE), serverless architectures, and service mesh frameworks.
- Experience in building and operating observability platforms and SRE/DevOps practices.
- Demonstrated ability to lead through influence and deliver under pressure in a 24/7 production environment.
- Proactive and ownership driven.
- Strong collaboration and communication skills.
- Ability to troubleshoot complex, distributed systems.
- Security and cost-conscious mindset.
- Willingness to be on-call and participate in incident response rotations.
JobDirecta