The PagerDuty Vision for AI-First Operations (9 minute read)
AI-powered systems require new strategies focused on model accuracy, data freshness, governance, and human-AI collaboration. This post outlines PagerDuty's vision for AI-first operations. It introduces a framework that details seven guiding principles and three AI types that together reduce manual toil, improve incident resolution, and free humans for higher-value work.
|
Tuning Linux Swap for Kubernetes: A Deep Dive (6 minute read)
Kubernetes v1.34 is expected to introduce a stable NodeSwap feature, enabling swap usage on Linux nodes to improve resource utilization and reduce out-of-memory (OOM) kills. Performance and stability rely on tuning Linux kernel parameters like swappiness and min_free_kbytes, which influence memory management, I/O latency, and Kubelet's eviction mechanism
|
|
Solving secret zero with Vault and OpenShift Virtualization (12 minute read)
Red Hat OpenShift Virtualization can use Kubernetes service accounts and Vault Agent to securely authenticate virtual machines to HashiCorp Vault without relying on secret zero. By templating these configurations, organizations can standardize and automate secure VM provisioning while enabling access to Vault-managed secrets across workloads.
|
How to monitor Claude usage and costs: introducing the Anthropic integration for Grafana Cloud (7 minute read)
Grafana Cloud now integrates with Anthropic, allowing users to monitor the costs and performance of Claude LLMs. The integration pulls usage data directly from the Anthropic Usage and Cost API, converts it into Prometheus-format metrics, and stores it in Grafana Cloud, providing a pre-built dashboard and customizable alerts for tracking token consumption, costs, and model usage. To get started, users must enter their Anthropic Admin API key in the integration settings to automatically add pre-built dashboards and alerts.
|
Monitor Claude usage and cost data with Datadog Cloud Cost Management (5 minute read)
Datadog's Cloud Cost Management (CCM) now integrates with the Anthropic Usage and Cost Admin API, enabling users to ingest Claude usage and cost data directly into existing dashboards, reports, and monitors. With this integration, organizations can monitor costs by model, workspace, API key, and service tier, and then set alerts for cost and usage data with prebuilt monitor templates. The integration normalizes Claude usage and cost data using the FinOps Foundation's FOCUS format, providing a unified schema for analysis and cost governance.
|
|
Rotel (GitHub Repo)
Rotel, a high-performance and resource-efficient solution for OpenTelemetry data collection, processing, and exporting, is now available under the Apache 2.0 license. Configured via command-line flags or environment variables, Rotel supports multiple exporters (including OTLP, Datadog, and Kafka) and includes a Python processor SDK for custom data modification.
|
|
Patterns for safe and efficient cache purging in CI/CD pipelines (17 minute read)
Caching in CI/CD pipelines improves build speed, reduces compute costs, and enhances runtime performance by storing dependencies, build artifacts, and static assets across various stages. Effective cache purging strategies like content hashing, TTL, and automated invalidation help prevent stale data and ensure fresh user content.
|
AI Agents Transform Platform Engineering at Microsoft (5 minute read)
Microsoft's platform engineering team uses AI coding agents to autonomously implement security updates, modernize pipelines, manage dependencies, and enforce code quality across thousands of repositories, reducing months of work to weeks with greater consistency and less developer disruption. Amanda Silver envisions smaller, more strategic teams focused on designing systems and standards while AI handles repetitive, large-scale implementation tasks.
|
|
Love TLDR? Tell your friends and get rewards!
|
Share your referral link below with friends to get free TLDR swag!
|
|
Track your referrals here.
|
Want to advertise in TLDR? π°
If your company is interested in reaching an audience of devops professionals and decision makers, you may want to advertise with us.
Want to work at TLDR? πΌ
Apply here or send a friend's resume to jobs@tldr.tech and get $1k if we hire them!
If you have any comments or feedback, just respond to this email!
Thanks for reading,
Kunal Desai & Martin Hauskrecht
|
|
|
|