Uncategorized

Claude Just Got Way Faster. Here’s What It Means For Your Agency.

Mike Kwal
· 11 min read

What’s in this article

🚀 Plug this into Claude Code or Claude Desktop

This spec prompts Claude to audit your agency’s current AI workflows. It helps you identify where old rate limits were a bottleneck and maps out new, more ambitious services you can now offer reliably.

Want help turning this audit into a real service offering for your clients? We build these systems every day in the Talk-to-Build community. Or, book a working session and we’ll scope your first production-grade AI service together.

If you’ve ever used Claude to run a big job, you know the feeling. You hit “run” on a complex agent workflow, and you get the dreaded “rate limit exceeded” error. Or you try to work during peak hours and everything just… slows… down. That friction was the single biggest barrier to building mission-critical systems on top of AI.

That barrier is now gone. Anthropic just signed a deal with SpaceX to bring 220,000 new NVIDIA GPUs online at the Colossus 1 data center. The immediate result for builders is simple: rate limits have doubled for all Pro, Max, Team, and Enterprise users. Peak-hour slowdowns are a thing of the past.

This isn’t just a technical update for engineers. It’s a green light for every designer, agency, and business owner who has been waiting for AI to be reliable enough to bet on. This is about the new class of services you can now build and sell with confidence.


What actually shipped

Anthropic, the company behind Claude, needed more raw computing power. They found it by partnering with SpaceX. They are now the first tenant of the new Colossus 1 data center, a facility with 300 megawatts of power capacity. To put that in perspective, that’s enough energy to power a small city.

Inside that data center, Anthropic has secured 220,000 of the latest NVIDIA GPUs. These are the specialized chips that run large language models. More chips mean more capacity. More capacity means Claude can handle more requests from more users at the same time, without breaking a sweat.

The outcome for anyone using the Claude API is concrete. As of this month, rate limits are officially doubled across all paid tiers. The unofficial, frustrating limits we all felt during busy afternoons — those are gone too. The system now has the headroom to handle the load.

BEFORE: The Peak-Hour Bottleneck

[Your Request] ─→ [Is it 2 PM?] ─→ YES ─→ [Rate Limit Queue] ─→ [SLOW]
                                    ↓
                                    NO ─→ [Immediate Response] ─→ [FAST]

AFTER: The New GPU Headroom

[Your Request] ─→ [220,000 GPUs] ─→ [Massive Capacity] ─→ [FAST]
    (any time of day)

This isn’t about making Claude incrementally faster for a single chat message. It’s about making it possible to run hundreds or thousands of tasks in parallel, reliably, for client work. It’s the shift from a tool that’s good for brainstorming to an infrastructure layer you can build a business on.

This deal isn’t about speed; it’s about reliability. Claude just went from being a brilliant-but-flaky creative partner to a production-ready assembly line.


Why this matters for agency work

For any agency owner, reliability is everything. You can’t sell a client a service that only works on weekends or when web traffic is low. The old rate limits made it risky to sell ambitious, Claude-powered workflows. You never knew if the demo you perfected would fall over when you tried to run it for a real client project.

This infrastructure upgrade de-risks building with AI. It means you can now confidently scope, price, and deliver services that were previously just experiments. Think about a workflow that generates 500 unique social media posts for a client’s campaign. Before, you’d have to run that in small batches, carefully managing your API calls. Now, you can run it all at once.

This change moves AI from a cool feature you can add to a website to the core engine of the website itself. It allows for always-on agents that monitor analytics, AI-powered content generation pipelines that run on a schedule, and conversational interfaces that can handle thousands of concurrent users. These are no longer just possibilities; they are services you can sell today.


Here’s how I’d actually use this

More capacity is only useful if you have a plan to use it. This isn’t about working faster; it’s about working bigger. Here’s a simple, four-step process I’d use to take advantage of the new headroom.

  1. Audit your old failures. Make a list of every ambitious AI workflow that failed or timed out in the past. A script that tried to analyze 100 competitor websites, an agent that choked on a 50-page PDF, a content generator that hit a rate limit halfway through. Run them again. My guess is most of them will work now.
  2. Double your batch sizes. Look at your existing, working scripts. If you have a process that generates 10 product descriptions at a time, try running it with 20, then 50, then 100. Find the new ceiling. For my own content system on mikekwal.com, I used to generate video scripts one by one. Now I can batch-process the entire month’s content calendar in a single run.
  3. Introduce concurrency. Instead of running tasks one after another, run them at the same time. For example, have one Claude agent writing blog post drafts while another agent generates a unique hero image for each post using a tool like Midjourney. The new infrastructure can handle these parallel requests without slowing down.
  4. Build an “always-on” monitor. This is where it gets interesting. I’d build a simple agent that checks a client’s Google Analytics every hour, looks for unusual traffic spikes or dips, and sends a plain-English summary to Slack. This was impractical before because it would eat up your rate limit. Now, it’s a simple, high-value service you can add to any retainer.

This is about changing your mindset from scarcity to abundance. Stop thinking about how to conserve your API calls and start thinking about what you could build if you had ten times the capacity. Because now, you do.


What this changes for designer-run agency work

This infrastructure upgrade creates three immediate shifts for how you can package and sell your services. It’s a chance to move upmarket and increase the value of your client relationships.

You can now sell Service Level Agreements (SLAs) for AI work. Before, you had to hedge. You’d say, “We’ll use AI to help generate content,” but you couldn’t guarantee it would be done by a specific time. Now, with reliable infrastructure, you can sell a retainer that includes “24-hour turnaround on all AI-generated content requests.” That’s a premium service clients will pay for.

New, data-intensive services become possible. You can offer things like weekly competitive analysis reports, where an AI agent scrapes the top 20 competitors, summarizes their marketing changes, and delivers a report. This was technically possible before, but practically a nightmare. Now it’s a sellable, scalable service.

It justifies a move to value-based pricing. Stop billing for hours or API tokens. The underlying infrastructure is now a fixed, predictable cost for you. This allows you to price your services based on the outcome for the client. An “AI-powered SEO content engine” that generates 30 articles a month isn’t worth the hours it takes; it’s worth the traffic and leads it produces. This change makes that conversation much easier to have.

The bottom line is that AI is no longer the experimental part of your stack. It’s the most reliable part. Price it that way.


My $0.02 — How I’d roll this out for a design business

This is a real opportunity to upgrade your agency’s offerings. Here’s the exact three-day plan I’d follow to turn this news into revenue.

Day 1 — Internal stress test. Before you promise anything to clients, prove it to yourself. Take your most demanding internal AI workflow — for me, it’s the pipeline that turns a transcript into a YouTube script, blog post, and social media content. Run it with five times the normal volume. Push it until it breaks. Document the new limits. This becomes your internal benchmark for what you can confidently sell.

Day 2 — Create the “Plus” package. Go through your current proposals and service offerings. For each item, create a new, premium tier called “Plus” or “Accelerated.” The base tier is what you offer today. The Plus tier uses the new capacity to deliver more volume, faster turnarounds, or run more complex jobs. For a website build, the Plus package might include generating all the site’s copy and imagery with AI in 48 hours.

Day 3 — Proactive client outreach. Pick your three best clients. Don’t send a generic email. Record a short Loom video for each one. Show them a demo of a new, high-intensity workflow that could benefit their business — like the competitive analysis agent. Say, “The tech to do this just got a major upgrade, and I think we could use it to achieve X. Can we chat for 15 minutes next week?”

This is how I roll out every new service at MK-Way. I build it for myself, I productize it, and then I offer it to a handful of trusted partners. It turns abstract industry news into a real conversation about value.


FAQ

What is a GPU and why should I care?
A GPU is a Graphics Processing Unit. Think of it as a specialized brain for doing the massive math required to run AI models like Claude. More GPUs mean the AI can think faster and handle more requests at once. You care because it makes the tools you use more reliable.

Do I need to change my code or prompts to get these benefits?
No. The changes are all on Anthropic’s side. Your existing API calls and prompts will just run faster and fail less often. You don’t need to do anything to activate this.

Does this make Claude more expensive?
No. Your pricing per token or per user remains the same. You can just get more work done in the same amount of time, which might actually lower your overall cost for a given project if you were previously paying for retries or dealing with timeouts.

Is this only for big enterprise customers?
No. Anthropic has confirmed the rate limit increases apply to Pro, Max, and Team plans, in addition to Enterprise. This benefits everyone from solo designers to large agencies.

How does this compare to what OpenAI or Google has?
All major AI labs are in a race for more computing power. OpenAI has a deep partnership with Microsoft Azure, and Google builds its own custom chips (TPUs). This deal with SpaceX and NVIDIA brings Anthropic’s infrastructure up to a similar, competitive level. It levels the playing field.

Will this let me build real-time applications on Claude?
It’s a huge step in that direction. While you still have to account for model inference time, the removal of queueing and peak-hour slowdowns makes near-real-time use cases much more feasible, especially for conversational agents on websites.


Want help applying this?

Four ways to go deeper:

  • Build with Builders. Join the Talk-to-Build community to learn how to Earn money with AI, Download our AI Skills, Advance your business, and learn to build real assets — AI-native websites, cinematic AI video, agent-driven workflows — that you can sell to SMBs who want the outcomes but don’t have time to learn the skills.
  • 1-on-1 working session. Skip the friction. Book a screen-share with me — bring a real problem, leave with a working piece of it.
  • Done-for-you. MK-Way builds AEO-ready websites, apps, and AI agent workflows for design agencies and founders who want it shipped fast.
  • Quick question. DM me on Instagram or connect on LinkedIn. I read every message.

This post is part of the AI Pulse atomic series. If you commented “INFRA” on one of my videos — this is the breakdown. Sources: Anthropic News.

Last updated: 2026-06-02.