Newslurp

<< Stories

Modern VMs 🧱, System Observability πŸ”, Kubernetes GPU Management πŸͺ

TLDR DevOps <dan@tldrnewsletter.com>

December 17, 12:09 pm

TLDR DevOps
exe.dev is launching a public developer preview of a VM hosting service that lets users spin up many real, fast-starting Ubuntu VMs β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ  β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ β€Œ 

TLDR

Together With Chronosphere

TLDR DevOps 2025-12-17

Devs are confident about release stability... but 1 in 4 ends in downtime (Sponsor)

According to a recent Google Cloud + Chronosphere report:

  • 59% of orgs feel β€œvery confident” in production readinesses, but almost 25% encounter downtime during releases.
  • >92% monitor apps frequently or continuously, but many still report limited visibility. 

Find out what's going wrong (and right) by grabbing a copy of Unveiling the Landscape of Day 1: Release Readiness in Cloud-Native CI/CD. This report looks at recent survey data to show how orgs are preparing for day 1 -- with practical advice on how to prepare your apps, infrastructure, and processes for Day 1 success.

⬇️ Download the eBook

πŸ“±

News & Trends

CoreDNS-1.13.2 Release (2 minute read)

The new CoreDNS release introduces initial DoH3 support, core performance and stability fixes, and plugin improvements across forwarding, GeoIP, cache, and orchestration integrations. GeoIP has deprecated returning 0,0 for missing coordinates, switching to empty values in the next release.
Meet exe.dev, modern VMs (2 minute read)

exe.dev is launching a public developer preview of a VM hosting service that lets users spin up many real, fast-starting Ubuntu VMs via an SSH-based API, sharing a fixed CPU/RAM budget across as many VMs as they want. The platform focuses on private-by-default, persistent, agent-friendly VMs with sub-second startup and no per-VM marginal cost, aiming to make running lots of small tools and agents frictionless.
πŸš€

Opinions & Tutorials

Building platforms using kro for composition (6 minute read)

The Kube Resource Orchestrator (kro) has been integrated into Amazon's new EKS capabilities, signaling major cloud provider recognition of its potential to simplify Kubernetes-native resource grouping for platform engineers. The powerful mechanism abstracts complex Kubernetes objects through ResourceGraphDefinitions, enhancing composition while integrating with broader orchestration frameworks.
Machine Learning Engineering: Complete Guide to Building Production ML Systems (13 minute read)

Machine Learning Engineering (MLE) has emerged as a critical and highly lucrative field, serving as the essential bridge between data science research and the deployment of production-grade artificial intelligence systems at scale. This in-demand role requires a diverse skill set encompassing programming, ML algorithms, and data engineering, all guided by a proven methodology to ensure successful and maintainable solutions.
System Observability: Metrics, Sampling, and Tracing (6 minute read)

Whole-system observability using metrics and sampling is cheap and effective for capacity planning and fault detection, but it cannot explain what happens inside individual operations. Per-process tracing provides exact timelines needed to identify and prioritize the highest-ROI optimizations, and can be implemented efficiently using techniques like leveraging early-exit syscall paths to correlate application events with system traces.
πŸ§‘β€πŸ’»

Resources & Tools

Don't let testing be the bottleneck in an AI world (Sponsor)

As AI speeds up development, brittle test suites slow teams down. Momentic turns real user flows into reliable automation in minutes, so your team can focus on shipping. See how Notion, Webflow, and Quora use it in production, and get a custom demo.
Cognee (GitHub Repo)

Cognee is an open-source platform designed to create persistent and dynamic AI memory for agents, replacing RAG with scalable ECL (Extract, Cognify, and Load) pipelines. It achieves this by combining vector search with graph databases, making data both searchable by meaning and connected by relationships.
Kimi CLI (GitHub Repo)

Kimi CLI, a new command-line interface agent for software development and terminal operations, has been released in technical preview. This Python package, installable via `uv`, functions as both a coding agent and a shell, offering integrations with ACP-compatible editors like Zed and JetBrains, as well as Zsh.
🎁

Miscellaneous

Kubernetes GPU Management Just Got a Major Upgrade (5 minute read)

Kubernetes 1.34 adds dynamic resource allocation, enabling precise GPU and accelerator requests beyond simple counts. An upcoming workload abstraction aims to coordinate multinode AI pods with all-or-nothing scheduling, reshaping how complex AI workloads are deployed.
Microsoft's commitment to supporting cloud infrastructure demand in the United States (4 minute read)

Microsoft has announced major US cloud infrastructure expansions to meet growing cloud and AI demand, including a new East US 3 Azure region in the Atlanta area opening in early 2027, expanded Availability Zones across multiple existing regions through 2026–2027, and increased capacity for commercial and government customers with a focus on resiliency, sustainability, and secure, compliant architectures.
AI ETL: How Artificial Intelligence Automates Data Pipelines (14 minute read)

AI ETL platforms are automating data integration, leveraging machine learning to dynamically map schemas, transform data, and detect quality issues. This significantly overcomes traditional ETL's struggles with complex unstructured data and schema evolution by automatically adapting to changing conditions and optimizing performance.
⚑

Quick Links

Get 100% automatable coverage in weeks from human engineers with QualityLogic (Sponsor)

TestNitro is a fully managed service that delivers 100% automatable coverage in weeks β€” with only human-verified bugs reported. Get 15x faster scripting without AI-washing. Start here
GKE: From containers to agents, the unified platform for every modern workload (6 minute read)

Google is celebrating GKE's 10th anniversary by advancing Kubernetes for AI and agentic workloads with Agent Sandbox, massive scale up to 130,000 nodes, faster autoscaling, and optimized inference.
Highlights from AWS re:Invent: Supercharging Kiro with Docker Sandboxes and MCP Catalog (5 minute read)

Docker Sandboxes and the MCP Toolkit can run AI agents like Kiro in isolated containers, preventing access to host files and credentials while enabling safe code changes, testing, and tool use.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!
Track your referrals here.

Want to advertise in TLDR? πŸ“°

If your company is interested in reaching an audience of devops professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? πŸ’Ό

Apply here, create your own role or send a friend's resume to jobs@tldr.tech and get $1k if we hire them! TLDR is one of Inc.'s Best Bootstrapped businesses of 2025.

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Kunal Desai & Martin Hauskrecht


Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR DevOps isn't for you, please unsubscribe.