Newslurp

<< Stories

Istio Upgrades at Scale πŸ“ˆ, Grafana Alerting 🚨, Amazon EVS General Availability πŸ–₯️

TLDR DevOps <dan@tldrnewsletter.com>

August 11, 11:09 am

TLDR DevOps
Since 2019, Airbnb has upgraded Istio 14 times while maintaining high availability through a canary upgrade model. Its Service Mesh team uses a file β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ  β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ 

TLDR

Together With AWS

TLDR DevOps 2025-08-11

AWS observability patterns and tools built for the cloud (Sponsor)

Cloud engineers implementing a comprehensive observability strategy often need to dedicate significant time into evaluating and integrating multiple specialized tools. To make it easier, you can get on-demand access to observability solutions made to work with AWS along with technical guidance for common observability patterns in AWS Marketplace.

Try a broad selection observability tools with streamlined AWS integrations for free, and pay-as-you-go through your AWS account when you're ready to scale.

Explore top AWS observability patterns and tools >

πŸ“±

News & Trends

What the transition to IBM means for HashiCorp customers: Greater value, same commitment (4 minute read)

Beginning September 1, all HashiCorp business operations will transition to IBM, with product naming updates, billing, and customer support handled by IBM and aligned with IBM systems while maintaining current capabilities and pricing. HashiCorp products will be integrated into IBM's Automation portfolio to deliver enhanced hybrid cloud infrastructure, security management, and AI adoption support.
AWS announces general availability of Amazon Elastic VMware Service (Amazon EVS) (2 minute read)

Amazon Elastic VMware Service is now generally available, enabling users to run VMware Cloud Foundation directly within their Amazon VPC without re-platforming applications. The service offers full administrative control, flexible pricing options, and integration with over 200 AWS services while supporting license portability and VCF version 5.2.1 on i4i.metal instances.
Announcing k0rdent v1.0.0: Manage Distributed Infrastructure at Massive Scale Leveraging Kubernetes (6 minute read)

K0rdent v1.0.0, a Kubernetes-native distributed container management environment, was released with production-grade stability and new features for platform engineering teams. The release includes improvements to service upgrades, IP address allocation automation, and the k0rdent Observability & FinOps (kOF) module for metrics, logging, and cost visibility. Users can try out k0rdent v1.0.0 by following the QuickStart guide and deployment instructions.
πŸš€

Opinions & Tutorials

Azure Support Slack Bot on Azure Container Apps: Production-ready guide (16 minute read)

A fully automated, secure Slack bot for managing Azure support tickets can be deployed using Azure Container Apps with zero secrets in code, autoscaling, and native integration with managed identity, Key Vault, and monitoring tools. The solution offers a production-grade, serverless architecture ideal for DevOps-light teams and startups, and includes a one-command deployment script to streamline setup from local development to production.
Redesigning Workers KV for increased availability and faster performance (15 minute read)

Cloudflare made improvements to its Workers KV service after an outage on June 12 that affected various critical services. The company is now storing all data on its own infrastructure and serving requests from its own infrastructure in addition to any third-party cloud providers used for redundancy, ensuring high availability and aiming to eliminate any reliance on third-party providers as redundant backups. The modified system now races writes to both backends simultaneously, and the hybrid architecture has led to internal latency improvements.
πŸ§‘β€πŸ’»

Resources & Tools

Honeycomb Hosted MCP: Brings observability data into your IDE of choice, no local servers required (Sponsor)

Ever had Claude Code confidently "fix" a performance issue that didn't exist? AI agents hallucinate without production context. Honeycomb MCP connects your AI coding agents directly to distributed traces and queries, so they have the same observability data that you do. Hosted MCP makes it simpler and more secure. See it first.
Turborepo (GitHub Repo)

Turborepo is a build system for JavaScript and TypeScript codebases written in Rust to optimize performance.
Tensorlake (Tool)

Tensorlake is a document parsing and data orchestration platform that extracts structured data from real-world documents for Python-based workflows.
🎁

Miscellaneous

Seamless Istio Upgrades at Scale (7 minute read)

Since 2019, Airbnb has upgraded Istio 14 times while maintaining high availability through a canary upgrade model. Its Service Mesh team uses a file called rollouts.yml to specify workload namespaces and the percentage distribution of Istio versions, allowing them to selectively upgrade workloads on both Kubernetes and virtual machines with minimal risk. The team has achieved zero downtime by deploying the istio-proxy version alongside the configuration of which Istiod to connect to.
Why is GitHub UI getting so much slower? (2 minute read)

GitHub's UI has become significantly slower, with client-side routing using Turbo making some page transitions, such as switching PR tabs, take over 5 secondsβ€”slower than a full page reload. Inefficiencies like rendering massive numbers of unnecessary DOM elements cause freezes, and there is no clear indication of performance improvements on GitHub's roadmap.
⚑

Quick Links

Amazon ECR now supports 100,000 images per repository (1 minute read)

Amazon Elastic Container Registry now supports up to 100,000 images per repository, an increase from the previous 20,000 limit.
New in Grafana Alerting: a faster, more scalable way to manage your alerts in Grafana (9 minute read)

Grafana has introduced a redesigned alert rules list page featuring a faster, paginated backend and a streamlined UI with two distinct views for managing alerts efficiently.
10 key questions about designing a secure cloud environment (7 minute read)

Business and technical leaders are urged to assess their cloud security and compliance by asking their cloud and platform teams 10 key questions to reduce risks and ensure audit readiness.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!
Track your referrals here.

Want to advertise in TLDR? πŸ“°

If your company is interested in reaching an audience of devops professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? πŸ’Ό

Apply here or send a friend's resume to jobs@tldr.tech and get $1k if we hire them!

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Kunal Desai & Martin Hauskrecht


Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR DevOps isn't for you, please unsubscribe.