OpenSearch has become a popular open-source platform for log analytics, search, security analytics and observability. However, as with any technology, there are good and bad ways to use it. This talk will explore how OpenSearch clusters can commonly fail and how these best practices can improve the reliability and resiliency of the clusters. From common pitfalls, to building clusters that scale and continue to operate during failures, you will learn how to leverage some of the best practices such as client side timeouts & retries, health checks, configuring back-pressure mechanisms, handling zonal failures, tunings like GC, queues, troubleshooting failures and many more. This talk will provide practical advice for anyone looking to implement OpenSearch in their production environment. s