Kubernetes v1.34: Moving Volume Group Snapshots to v1beta2 (2 minute read)
Kubernetes v1.34 introduced a second beta for volume group snapshots, initially an Alpha feature in v1.27 and then Beta in v1.32, enabling crash-consistent snapshots for grouped volumes using CSI volume drivers. A new VolumeSnapshotInfo struct was added in v1beta2, replacing VolumeSnapshotHandlePairList, to address an issue where the restoreSize field was not set for VolumeSnapshotContents and VolumeSnapshots when the CSI driver didn't implement the ListSnapshots RPC call. Depending on feedback and adoption, the Kubernetes project plans to push the volume group snapshot implementation to general availability (GA) in a future release.
|
Distributed performance testing for Kubernetes environments: Grafana k6 Operator 1.0 is here (9 minute read)
Grafana's k6 Operator, a Kubernetes operator used for running distributed k6 tests, has reached its 1.0 release. Featuring bug fixes, improved Helm chart configurations, and better versioning, the update includes a commitment to releasing a new minor version every eight weeks and follows Semantic Versioning 2.0 for greater stability. k6 Operator simplifies distributed k6 tests across multiple machines, enabling synchronized testing within private networks, and integrates with Grafana Cloud k6.
|
|
Enterprise AKS Multi-Instance GPU (MIG) vLLM Deployment Guide (10 minute read)
This guide details how to deploy vLLM on Azure Kubernetes Service using NVIDIA H100 GPUs with Multi-Instance GPU technology, enabling multiple AI models to run simultaneously on a single GPU with hardware isolation. The solution delivers 50% cost savings, production-ready management, enterprise-grade security, and seamless integration with Azure API Management for hybrid AI infrastructure.
|
|
Cachey (GitHub Repo)
Cachey is a high-performance read-through cache for object storage, mapping requests to 16 MiB page-aligned ranges and using standard HTTP semantics. Throughput stats are provided as JSON via GET /stats, while a comprehensive set of metrics in Prometheus text format is available via GET /metrics.
|
Task (GitHub Repo)
Task is a task runner/build tool that aims to be simpler and easier to use than alternatives like make.
|
|
P50 vs P95 vs P99 Latency: What These Percentiles Actually Mean (And How to Use Them) (4 minute read)
Use latency percentiles like P50, P95, and P99, rather than averages, to understand user experience and set SLOs. Histograms should be implemented to collect these percentiles correctly, as they reveal distribution clarity and expose systemic friction that can be addressed through architectural changes. Architectural changes often require pre-warming, partitioning, caching layers, concurrency isolation, and adaptive retries.
|
Monitor your LiteLLM AI proxy with Datadog (7 minute read)
Datadog has released an Agent integration and SDK with LiteLLM that allows teams to monitor, troubleshoot, and optimize LLM-powered applications. The LLM Observability SDK traces every request end-to-end, giving insights into model and provider performance, while the Datadog Agent integration monitors the LiteLLM proxy service, tracking metrics like request volumes and error rates. The integration and SDK together provide full-stack observability across LLM workflows.
|
|
Love TLDR? Tell your friends and get rewards!
|
Share your referral link below with friends to get free TLDR swag!
|
|
Track your referrals here.
|
Want to advertise in TLDR? 📰
If your company is interested in reaching an audience of devops professionals and decision makers, you may want to advertise with us.
Want to work at TLDR? 💼
Apply here or send a friend's resume to jobs@tldr.tech and get $1k if we hire them!
If you have any comments or feedback, just respond to this email!
Thanks for reading,
Kunal Desai & Martin Hauskrecht
|
|
|
|