This session will delve into the intricacies of migrating a large, mission-critical production Elasticsearch cluster to Amazon OpenSearch. The migration aimed to enhance the cluster’s stability and increase the team’s and DevOps’s velocity by avoiding manual configuration changes.
We will begin by discussing the motivation behind the migration - the complexity of self-managing an ElasticSearch cluster and the many issues we faced. We will continue to emphasize the importance of conducting a thorough POC to ensure that the new cluster meets all of our requirements, both for ingesting hundreds of millions of writes per day and supporting dozens of millions of client-facing low-latency reads per day. The POC phases included feasibility analysis, cost estimation, production cluster settings support, sanity management/maintenance, scalability testing, compatibility assessment, system tests verification, and more.
One of the major challenges faced during the migration was the absence of an out-of-the-box tool for migrating from our self-managed to Amazon OpenSearch. Additionally, we encountered compatibility limitations that necessitated an alignment of versions of our cluster, complicating the migration process as we needed more than just loading snapshots for data migration. We will explore the solutions we devised to overcome these obstacles, including data migration strategies, validation techniques, and stress testing to ensure the new cluster’s resilience and performance.
Attendees will learn about the comprehensive approach to migrating the cluster, including data migration tools exploration, integration and stress testing, cluster sizing assessment, management, monitoring and scalability testing.
By the end of this session, you will clearly understand the challenges and solutions involved in migrating a large Elasticsearch cluster to a managed OpenSearch environment and how such a migration can significantly enhance stability, resilience and increase team velocity.