Lightning-fast troubleshooting for AWS: How to find the root cause fast with Sumo Logic

It’s time to stop firefighting. With Sumo Logic’s AWS Observability, companies like Snoop have been able to simplify data collection, achieve unified visibility across AWS accounts and regions and leverage machine learning to troubleshoot — fast.

This re:Invent, we’re excited to showcase how our capabilities for AWS have evolved. Offering a unified approach to monitoring and troubleshooting for AWS, Sumo Logic lets DevOps and SRE teams improve the reliability of their services and cut troubleshooting toil in just a few clicks.

Looking for lightning-speed troubleshooting? Here’s how Sumo Logic can help you find the root cause and reclaim your time.

Your starting point: a unified view of your AWS environment

In the fast-paced world of e-commerce, timely order processing and inventory updates are crucial for maintaining customer satisfaction. But what happens when an efficient, serverless architecture starts showing intermittent delays?

Here the processing and inventory update system for our e-commerce site leverages Amazon SQS for queuing orders, AWS Lambda for the core business logic, and Amazon RDS as the persistent data store. Customers are reporting experiencing intermittent delays in placing orders and during checkout.

To understand what might be going wrong, you first need a centralized view of your AWS environment that brings together your relevant logs and metrics. With AWS Observability, you unlock a comprehensive view across your AWS accounts, regions and individual namespaces. This content is provided out of the box after deploying the solution via the CloudFormation template or Terraform.

Your starting point a unified view of your AWS environment

Detecting issues with pre-built alerts

AWS Observability comes with pre-built alerts for different AWS services, including Amazon SQS, AWS Lambda, and Amazon RDS. These alerts can notify you about the issue with the e-commerce site. In our example, the “Amazon SQS - Message processing not fast enough” alert was triggered.

From the alert, you can determine the characteristic of the issue – if it triggers often, how long it has been unresolved, and other relevant details. In addition, you can understand how long messages are waiting in the queue before they are processed.

High-speed troubleshooting in action

Now, with this knowledge, the troubleshooting begins.

You start your investigation by diving into SQS, where messages from the Order Processing Service are queued. CloudWatch metrics for SQS provide the first clues.

You observe that the NumberOfMessagesSent is much higher than NumberOfMessagesReceived, indicating that messages are being queued faster than they are being consumed. The ApproximateAgeOfOldestMessage metric shows that some messages have been in the queue for a long time, which could indicate a bottleneck.

Next, you turn your attention to AWS Lambda, responsible for processing SQS messages to update your inventory. Log entries give evidence of prolonged function execution and timeouts, suggesting potential issues with the Lambda function's efficiency or resource allocation.

Here, Sumo Logic’s out-of-the-box dashboards for AWS Lambda error analysis indicate the following log entry.

Because the Lambda function interacts with an Amazon RDS instance, checking RDS would be your next step.

The RDS performance metrics show high CPU utilization and errors related to database locks.

Again, Sumo Logic’s out-of-the-box dashboards for Amazon RDS error log analysis help to locate particular log error messages confirming the database issue.

2023-11-09T01:45:00Z [ERROR] Deadlock found when trying to get lock; 
try restarting transaction

A closer look into the RDS slow query logs analysis out of the box dashboard revealed sub-optimal queries significantly dragging down performance.

# Query_time: 899.00 Lock_time: 0.594385 Rows_sent: 45 Rows_examined: 54392
SELECT * FROM inventory;

You can see that the culprit is a full table scan caused by a missing index.

By thoroughly examining each component of the serverless architecture, you can now address any delays. As the next steps, you can adjust the Lambda function's timeout settings and increase the memory allocation. Additionally, you can add an index to the RDS instance to speed up the problematic query.

It’s time to reclaim your time

Without a unified view of your AWS environment, and the ability to pivot between services and centralized logging, getting to the root cause of this issue may have been extremely difficult, if not impossible.

Looking to reclaim your time? Get started today with AWS Observability, which you can deploy in minutes via the CloudFormation template or Terraform. Learn more and start your trial here.

Have questions? We’ll be hosting a special webinar on Dec. 11, where attendees can hear tips and tricks for implementing Sumo Logic for AWS troubleshooting from our product lead, Greg Ziemiecki. Register now to attend the workshop.

And last but not least, if you’re at re:Invent, swing by booth #789 to see our powerful monitoring and troubleshooting capabilities firsthand.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Start free trial

개요

운영 인텔리전스

보안 인텔리전스

역할 별

산업별

기술 별

사용 사례 별

리소스 센터

Sumo Logic의 우수한 고객지원

Sumo Logic 소개

파트너 프로그램

Lightning-fast troubleshooting for AWS: How to find the root cause fast with Sumo Logic

Your starting point: a unified view of your AWS environment

Detecting issues with pre-built alerts

High-speed troubleshooting in action

It’s time to reclaim your time

Complete visibility for DevSecOps

Categories

스포트라이트

Sumo Logic cloud-native SaaS analytics

Michael Ziemiecki

Michael Riordan and Greg Ziemiecki

People who read this also enjoyed

All you need to know about HAProxy log format

Secure your CI/CD pipelines from supply chain attacks with Sumo Logic’s Cloud SIEM rules

Introducing Sumo Logic Live Tail

개요

운영 인텔리전스

보안 인텔리전스

역할 별

산업별

기술 별

사용 사례 별

리소스 센터

Sumo Logic의 우수한 고객지원

Sumo Logic 소개

파트너 프로그램

Your starting point: a unified view of your AWS environment

Detecting issues with pre-built alerts

High-speed troubleshooting in action

It’s time to reclaim your time

Complete visibility for DevSecOps

Categories

스포트라이트

Share

Sumo Logic cloud-native SaaS analytics

Michael Ziemiecki

Michael Riordan and Greg Ziemiecki

전 세계 2,100여 고객의 신뢰를 받는 솔루션으로, 최신 애플리케이션 및 클라우드 인프라의 구축/실행/보안을 지원합니다.