Posts

The Useful Part of AI Is Finally Learning Where the Buttons Are

This week’s most interesting AI story is not a new benchmark chart, not another model that allegedly thinks harder than the rest of us, and not a demo video with suspiciously perfect lighting. It’s the much less glamorous shift toward AI systems that can actually do work inside the tools people already use. Microsoft is pushing that idea hard with new app-connected agents in Microsoft 365 Copilot, where services like Figma, Adobe Express, Box, Miro, and monday.com can surface directly inside the chat experience. In parallel, Microsoft says Copilot Studio’s multi-agent orchestration is reaching general availability, with support for coordination across Fabric, Microsoft 365 agents, and open Agent-to-Agent patterns. Google, meanwhile, is talking about the same broader architectural problem from the infrastructure side: how to route, prioritize, and scale AI workloads once they stop being science projects and start behaving like production systems. That’s the part I find refreshing. The c...

AI Inference Is Quietly Becoming a Capacity Routing Problem

I think one of the more honest AI infrastructure stories right now is that the glamorous part is over and the traffic engineering part has begun. Google Cloud’s recent writing on GKE Inference Gateway and its guidance on reaching the efficient frontier of LLM inference point to the same boring, important truth: once you try to run large models as a real service, the hard part is no longer just model quality. It is deciding which requests get accelerator time, how to preserve low latency for live traffic, and how to stop expensive GPUs from spending their days in a weird half-idle limbo because nobody trusted the scheduler. That is less cinematic than another benchmark chart, but it is much closer to where production AI starts charging rent. The useful signal here is that Google is describing inference in terms systems people already understand. The gateway story is about workload separation, queue discipline, and smarter routing between real-time and async jobs that share the same ac...

AI Inference Is Becoming a Scheduling Problem

The interesting part of enterprise AI is no longer the model demo. It is the queue. Google Cloud’s recent GKE work keeps circling the same unglamorous truth: once you try to run LLMs as an actual service instead of a conference prop, the hard part is deciding what gets GPU time, when, and under which latency promises. In one post, Google describes an Inference Gateway that lets real-time and async workloads share the same accelerator pool, with live traffic taking priority while batch jobs quietly eat the leftover capacity. In another, it lays out the bigger picture more bluntly: inference is a tradeoff surface between latency, throughput, and cost, and most teams are still operating below the efficient frontier because their routing and caching are dumb. That sounds dry until you remember what the alternative looks like: expensive GPUs sitting half-idle because nobody wanted the political risk of letting a document-indexing job share space with a chatbot. That is why I think the real ...

Linux 7.0 Is Out, but the More Interesting Story Is What It Says About Modern System Maintenance

Linux 7.0 arrived this week, and the funny part is that the round number is probably the least important thing about it. Linus Torvalds himself treated the version bump as housekeeping, basically the kernel equivalent of finally renaming a folder because the old numbering was getting silly. The real signal is in the surrounding details: kernel.org lists 7.0 as the new mainline release dated April 12, KernelNewbies highlights practical additions like a new file I/O error reporting API, XFS health-event monitoring, better io_uring filtering, and faster container-setup plumbing through new open_tree() behavior, and The Register pulled out Torvalds’ remark that AI-assisted tools may now keep finding corner cases as part of the “new normal.” That last bit is the one that stuck with me. We may be entering a phase where AI in infrastructure is not mainly about replacing programmers with a PowerPoint fantasy, but about increasing the volume of bug reports, weird edge cases, and low-grade main...

Cloudflare Is Trying to Make AI-Built Apps Less Disposable

AI-generated apps have had an awkward little secret: they are pretty good at producing disposable interfaces, but the moment you want one to remember anything, the infrastructure starts looking like a junk drawer. Cloudflare's new Durable Object Facets are interesting because they attack exactly that problem. Dynamic Workers already let developers run generated code inside lightweight isolates instead of heavier container-style setups, which is why Cloudflare keeps stressing the speed and memory advantage. The new piece is persistence. A platform can now let AI-written code run as a facet inside a supervised Durable Object, with its own SQLite-backed storage attached locally to that object. In plain English, each tiny generated app can get a small brain and a memory without the platform owner handing over the keys to a giant database buffet. That detail matters more than the demo-friendly phrase "give each app its own database" suggests. The clever part is not just storag...

When AI Agents Need a Registry, They’ve Stopped Being a Demo

One useful sign that AI agents are leaving the demo phase is that vendors are quietly building the boring paperwork around them. Google keeps improving the model layer with releases like Gemini 2.5 Flash and its hybrid reasoning pitch, but Microsoft’s recent agent registry and agent-to-agent management docs point to the more practical shift: companies are starting to assume these things will need names, owners, inventories, and boundaries. That is not glamorous. It is also the part that usually determines whether a technology survives contact with a real organization. Once an agent can call tools, hit endpoints, and trigger work across teams, the problem stops being “is the model impressive?” and becomes “who approved this thing, what can it touch, and how many near-duplicates are already roaming around the environment?” That is why registries, catalogs, and governance layers matter more than another round of benchmark chest-thumping. If 2025 was the year companies experimented with au...

Red Hat’s latest AI pitch sounds boring, which is exactly why it matters

The most believable AI news lately has been the stuff that sounds a little boring on first read. Red Hat’s new Red Hat AI Enterprise announcement is a good example. The headline promise is a “metal-to-agent” stack, which is marketing language doing its usual costume change, but the practical point underneath it is solid: enterprises do not actually need another glamorous demo box, they need AI infrastructure that behaves like the rest of their estate. That means repeatable deployment, policy, observability, hardware flexibility, and fewer weird one-off science projects hiding in a corner rack like an expensive pet. Red Hat is packaging inference, model tuning, agent deployment, and lifecycle controls around the same Linux-and-OpenShift foundation big companies already trust for workloads that are allowed to break only during other people’s maintenance windows. What caught my attention is not the usual “AI will transform everything” wallpaper paste. It’s the emphasis on making inferenc...