The Mac mini is having a moment. Apple stores are reporting weeks-long wait times for higher-memory configurations. The reason is not video editing or music production. It is AI agents.

First OpenClaw turned the Mac mini into the go-to hardware for running always-on AI agents, triggering a surge in purchases that caught even Apple off guard. Now Perplexity has joined the party. On March 11, 2026, at their inaugural Ask developer conference in San Francisco, they announced Personal Computer: software that turns a Mac mini into a 24/7 AI worker with access to your local files, apps, and sessions.

It builds on Perplexity Computer, the cloud-based AI agent they launched two weeks earlier that orchestrates 19 different AI models. Personal Computer takes that cloud engine and gives it a persistent local home on your Mac. The AI processing still happens on Perplexity’s cloud servers. But the Mac mini becomes the always-on interface, the local file access layer, and the bridge between your physical machine and a fleet of cloud AI models.

As CEO Aravind Srinivas put it: “A traditional operating system takes instructions; an AI operating system takes objectives.”

This post breaks down how it all works, what is happening under the hood, and what developers can learn from the architecture. If you are building AI agent systems of your own, the design patterns here are worth studying.

TL;DR: Perplexity Personal Computer runs on a Mac mini as an always-on AI agent. The Mac handles local file access. Perplexity’s cloud handles AI processing using 19 models (Claude Opus 4.6 for orchestration, Gemini for research, GPT-5.2 for long-context, Grok for speed). Each session runs in an isolated Firecracker VM. A separate cloud browser handles web automation. Sub-agents communicate via filesystem. Safety is architectural (mandatory approval for risky actions, kill switch, audit logs). The local-cloud hybrid architecture, task-semantic model routing, and filesystem-based IPC are patterns every developer building agents should understand.


Why a Mac Mini?

Before getting into the AI architecture, it is worth understanding why the Mac mini keeps showing up as the hardware of choice for AI agents. This is not marketing. It is engineering.

Factor Mac mini M4 Pro Typical GPU Server
Idle power 15W 100W+
Load power 30W 500W+
Annual electricity (24/7) ~$15 ~$400+
Noise Silent Fan noise
Form factor 5 x 5 inches Rack-mounted
Unified memory Up to 64GB shared CPU/GPU Separate VRAM
One-time cost $599 to $2,000 $5,000+

Three things make the Mac mini stand out for always-on AI workloads:

Unified memory architecture. CPU and GPU share one memory pool. There is no data copying penalty between them. For AI workloads that move data between CPU processing and GPU inference, this is a meaningful advantage over discrete GPU setups. The M4 Pro with 64GB can run 30-32B parameter models locally at 10-15 tokens per second.

Power efficiency. 15 watts idle, 30 watts under load. That is roughly $15 per year in electricity for 24/7 operation. Compare that to a GPU server that costs $400+ per year just in power. For an agent that needs to be always on, this is the difference between practical and impractical.

Silent, compact, and headless-friendly. A 5-by-5 inch box with no fan noise that can sit on a shelf and run indefinitely. No dedicated cooling. No server room. Just plug it in and forget it is there.

The OpenClaw craze already proved this. When that open-source AI agent went viral in January 2026, Mac mini purchases surged so hard that Business Insider reported Apple stores had weeks-long wait times for the higher-memory models. Perplexity is riding the same wave, but with a managed cloud backend instead of a DIY setup.

For more on running AI models on your own hardware, see How to Run LLMs on Your Own Computer.


The Local-Cloud Hybrid Architecture

Personal Computer is not just a local app. It is not just a cloud service. It is a hybrid that splits responsibilities between your Mac mini and Perplexity’s cloud infrastructure.


flowchart TB
    User["fa:fa-user You\n(Any device, anywhere)"] -->|"Remote control"| Mac

    subgraph Local["fa:fa-desktop Mac mini (Always On)"]
        Mac["Personal Computer App"]
        Files["fa:fa-folder Local Files"]
        Apps["fa:fa-th-large Local Apps"]
        Mac --- Files
        Mac --- Apps
    end

    subgraph Cloud["fa:fa-cloud Perplexity Cloud"]
        Orch["fa:fa-brain Orchestrator\n(Claude Opus 4.6)"]
        Router["fa:fa-random Model Router\n(19 Models)"]
        Orch --> Router
    end

    subgraph Exec["fa:fa-server Isolated Execution"]
        VM["fa:fa-shield Firecracker VM\n(Per Session)"]
        Sand["fa:fa-terminal Linux Sandbox\n2 vCPU, 8GB RAM"]
        VM --- Sand
    end

    subgraph Browse["fa:fa-globe Cloud Browser"]
        BrowserAuto["fa:fa-chrome Web Automation"]
        Screenshot["fa:fa-camera Screenshots"]
        BrowserAuto --- Screenshot
    end

    Mac <-->|"Secure connection"| Orch
    Router --> VM
    Router --> BrowserAuto
    Sand -->|"Results"| Mac

    style Local fill:#e8f5e9,stroke:#2e7d32
    style Cloud fill:#e3f2fd,stroke:#1565c0
    style Exec fill:#fff3e0,stroke:#e65100
    style Browse fill:#fce4ec,stroke:#c62828
    style User fill:#f5f5f5,stroke:#616161

Here is what each layer does:

The Mac Mini: Interface and File Access

The Personal Computer app runs continuously on your Mac mini. It is the always-on entry point. You can control it remotely from any device over the internet. Srinivas calls it “It never sleeps.”

This layer handles:

  • Local file access: Reading, modifying, organizing files on your Mac
  • App integration: Working with local applications
  • Session persistence: Maintaining context across conversations
  • Remote accessibility: Controllable from anywhere

The Mac mini is not doing the AI processing. It is the hands and eyes on the ground. The brain lives in the cloud.

The Cloud Backend: Orchestration Engine

This is where the AI reasoning happens. Perplexity’s cloud infrastructure runs Claude Opus 4.6 as the central orchestrator. When you give Personal Computer a task, this is the layer that:

  1. Decomposes your goal into discrete subtasks
  2. Decides which AI model handles each subtask
  3. Spawns sub-agents to run tasks in parallel
  4. Manages persistent memory about you across sessions
  5. Coordinates 400+ external service connectors (Gmail, Slack, GitHub, Notion, Snowflake, and more)

The internal API uses protobuf definitions with a service called LlmApi that handles both streaming and non-streaming completions. This clean abstraction separates the model routing logic from the LLM layer, meaning they can swap models without redesigning the system.

Firecracker VMs: Isolated Execution

Every session runs inside its own isolated Firecracker virtual machine. Firecracker is the same microVM technology AWS built for Lambda. It boots in under 125 milliseconds and provides hardware-level isolation between sessions.

The sandbox specs: 2 vCPUs, 8GB RAM, ~20GB disk. Pre-installed with Python, Node.js, ffmpeg, and standard Unix tools. A Go binary called envd manages the lifecycle via gRPC.

The sandbox is ephemeral. When the session ends, it is destroyed. No user data accumulates between sessions. This is the security model, not a limitation.

Security researcher Yang Fan confirmed this architecture by reverse-engineering the system from within its own sandbox. He verified that no API keys, tokens, or secrets are visible inside the execution environment. OAuth tokens for connectors are stored server-side on Perplexity’s backend. The sandbox never sees a credential.

Cloud Browser: Separate Web Layer

Perplexity runs a completely separate cloud browser instance for web automation. Yang Fan confirmed this by fetching httpbin.org/headers from both the sandbox and the browser tool. Different IP addresses, different User-Agent strings, different network fingerprints. Two separate machines.

This means browser-based vulnerabilities (malicious JavaScript, fingerprinting, session hijacking) cannot propagate back into the code execution environment. The two do not share memory, filesystem, or network stack.

Developer takeaway: This four-layer split (local device, cloud orchestrator, isolated execution, separate browser) is the pattern that matters. Each layer can be swapped independently. New model? Update the routing. Better sandbox provider? Swap the execution layer. Neither change affects the others. This is separation of concerns applied to AI infrastructure. For more on how attackers exploit weak boundaries between these layers, see Prompt Injection Explained.


How 19-Model Routing Works

Most AI products pick one model and use it for everything. Perplexity’s approach is fundamentally different. It runs roughly 20 models and routes each task to whichever model handles it best.

The Model Fleet

Model Role Why This Model
Claude Opus 4.6 Core reasoning, orchestration Best at multi-step logic and complex decisions
Claude Sonnet 4.6 General-purpose sub-agents Good balance of capability and cost
Claude Haiku 4.5 Lightweight browser tasks Fast and cheap for repetitive operations
GPT-5.2 Long-context recall, web search Strong at maintaining state across large documents
GPT-5.3 Codex Coding tasks Specialized for code generation and debugging
Gemini 3 Flash Browser automation Fast for repetitive browser interactions
Gemini 3.1 Pro Deep research Good at multi-step investigations
Grok Speed-sensitive tasks Fastest response time for lightweight operations
Nano Banana 2 Image generation Internal/custom image model
Veo 3.1 Video generation Google’s video model
ElevenLabs TTS v3 Voice synthesis High-quality text-to-speech

Task-Semantic Routing

The routing is not random. A meta-router analyzes each task for intent, complexity, and required capabilities, then matches it to the optimal model.


flowchart LR
    Task["fa:fa-tasks Incoming Task"] --> Analyzer["fa:fa-search-plus Task Analyzer"]
    
    Analyzer -->|"Research query"| Gemini["fa:fa-book Gemini\nDeep Research"]
    Analyzer -->|"Write code"| Codex["fa:fa-code GPT Codex\nCode Gen"]
    Analyzer -->|"Complex reasoning"| Opus["fa:fa-brain Claude Opus\nOrchestration"]
    Analyzer -->|"Quick lookup"| Grok["fa:fa-bolt Grok\nFast Response"]
    Analyzer -->|"Browse web"| Flash["fa:fa-globe Gemini Flash\nBrowser"]
    Analyzer -->|"Generate image"| Banana["fa:fa-image Nano Banana\nImage Gen"]

    style Task fill:#e3f2fd,stroke:#1565c0
    style Analyzer fill:#fff3e0,stroke:#e65100
    style Gemini fill:#e8f5e9,stroke:#2e7d32
    style Codex fill:#e8f5e9,stroke:#2e7d32
    style Opus fill:#e8f5e9,stroke:#2e7d32
    style Grok fill:#e8f5e9,stroke:#2e7d32
    style Flash fill:#e8f5e9,stroke:#2e7d32
    style Banana fill:#e8f5e9,stroke:#2e7d32

Perplexity’s internal data backs this up. In January 2025, 90% of their enterprise queries routed to just two models. By December 2025, no single model commanded more than 25% of usage. Users were already switching between models based on task type. Computer automates that behavior.

Dmitry Shevelenko, Perplexity’s head of business, explained the timing: “I don’t think Computer could have been as powerful as it is if we had launched it even three months ago. You have to have the right harness at the right time. Every technology we’ve built laddered up to this moment, where you finally had a smart enough agent model to orchestrate through that harness.”

Developer takeaway: If you are building any AI application, consider routing to different models based on task type. Browser automation does not need Opus-level reasoning. Complex analysis does not need Grok-level speed. Match the model to the job. The cost savings are significant. For a foundational understanding of how models process tasks differently, see How LLMs Generate Text.


The Sub-Agent System

When Personal Computer encounters a task too complex for a single pass, it spawns sub-agents. This is where the system starts looking like a distributed system rather than a chatbot.

How Sub-Agents Work

The parent agent (the orchestrator you talk to) creates sub-agents via a run_subagent call. Each sub-agent type gets different default tools and potentially a different underlying model.

Sub-Agent Type Purpose Typical Model
research Web research and synthesis Gemini 3.1 Pro
coding Code writing and debugging Claude Sonnet 4.6
codex_coding Specialized coding GPT-5.3 Codex
asset Document, image, media creation Varies
website_building Frontend/backend development Claude Sonnet 4.6
general_purpose Flexible task execution Claude Sonnet 4.6

The parent can dispatch multiple sub-agents simultaneously. Need to research ten competitors? It spawns ten research agents in parallel and synthesizes the results.

This is not theoretical. Perplexity’s own teams use it this way. Their finance team automated accounts receivable. Their sales team automated proposal creation. An enterprise user asked Computer to identify their most loyal B2B media customers and the account reps who manage those relationships. The agent delivered three recommendations in minutes, collapsing what Shevelenko described as a “15-step process” into two or three.

Filesystem as Inter-Process Communication

Sub-agents do not return large results via return values. Those get truncated. Instead, sub-agents write outputs to files in a shared workspace, and the parent reads those files.

1
2
3
4
5
# Sub-agent writes its results
write("/home/user/workspace/research_results.json", data)

# Parent agent reads the results
content = read("/home/user/workspace/research_results.json")

This is a classic message-passing architecture using the filesystem instead of pipes or sockets. It has real advantages:

  • Inspectable: All inter-agent communication exists as files you can read
  • Logged: Everything persists for the session duration
  • Debuggable: If something goes wrong, there is an artifact trail
  • Decoupled: Agents do not need to know about each other’s APIs

flowchart TB
    Parent["fa:fa-crown Parent Agent\n(Claude Opus 4.6)"]
    
    Parent -->|"Spawns"| Sub1["fa:fa-search Research Agent\n(Gemini Pro)"]
    Parent -->|"Spawns"| Sub2["fa:fa-code Coding Agent\n(GPT Codex)"]
    Parent -->|"Spawns"| Sub3["fa:fa-paint-brush Asset Agent\n(Nano Banana)"]
    
    Sub1 -->|"Writes"| FS["fa:fa-folder Shared Filesystem\n/home/user/workspace/"]
    Sub2 -->|"Writes"| FS
    Sub3 -->|"Writes"| FS
    
    FS -->|"Reads"| Parent

    style Parent fill:#e3f2fd,stroke:#1565c0
    style Sub1 fill:#e8f5e9,stroke:#2e7d32
    style Sub2 fill:#e8f5e9,stroke:#2e7d32
    style Sub3 fill:#e8f5e9,stroke:#2e7d32
    style FS fill:#fff3e0,stroke:#e65100

As Yang Fan noted in his reverse engineering analysis, this is what Unix knew in 1972. The filesystem is a surprisingly robust substrate for agent coordination.

Permission Caps

Three deliberate restrictions prevent sub-agents from going rogue:

  1. No spawning children: Hierarchy is exactly two levels deep. Parent and children. No grandchildren. This prevents cascading agent creation.
  2. No memory access: Only the parent holds persistent user memory. Sub-agents are stateless unless the parent explicitly passes information.
  3. Truncated return values: Forces filesystem-based communication rather than large payload returns.

Developer takeaway: If you are building multi-agent systems, cap the hierarchy at two levels. Every additional level adds debugging complexity and cascading failure potential. Use the filesystem or a message queue for inter-agent communication. It is boring, old, and it works.


What Personal Computer Can Actually Do

Here are the use cases that show the gap between a cloud-only agent and one running on your Mac mini with local file access.

Always-On Development Workflows

Monitor a GitHub repository overnight and drop a formatted Slack summary into the team channel before standup. No cron scripts, no custom integrations, no webhook configuration. Just tell it what you want. Personal Computer watches, processes, and delivers while you sleep.

Email and Communication Triage

Point it at Gmail, tell it what matters, and it watches for client inquiries. It drafts replies based on how you have written before and only bothers you when something actually needs a human. One of the demo use cases showed it condensing email threads before you even open them in the morning.

Research That Runs While You Sleep

Throw a half-formed research brief at it before heading to bed. Wake up to a structured report pulled from live sources, sitting in your inbox. During the viral launch, users demonstrated building Bloomberg Terminal-style financial dashboards, and one user claimed to have replaced $225,000 per year in marketing tools over a single weekend.

Local File Management

This is what separates Personal Computer from cloud-only Computer. If you have a folder of photographs on your desktop, Personal Computer can analyze, rename, resize, and prepare them for web use without you specifying each step. It reads local files, processes them via cloud AI, and writes the results back locally.

Enterprise Workflows

The enterprise version launched alongside Personal Computer at the Ask 2026 conference. Enterprise customers can query @computer directly inside Slack channels. The Snowflake and Datadog connectors let non-technical employees query data warehouses in plain English. Shevelenko described asking what percentage of users who ran a Computer query also purchased additional credits. “I’m not technical, so I don’t know how to write SQL. Within a minute, I got a full analysis.”


Safety and Security Architecture

This is the section that matters most when you are giving an AI agent persistent access to your local machine.

Architectural Safety, Not Prompt-Based

Perplexity Computer has a tool called confirm_action. Before the agent sends an email, posts a message, makes a purchase, or deletes data, it must call confirm_action and wait for user approval.

This is not a prompt instruction. It is a mandatory tool call baked into the execution model.

Approach How It Works Failure Mode
Prompt-based safety System prompt says “ask before deleting” Can be overridden by prompt injection
Architectural safety Tool requires async user approval Cannot be bypassed without modifying the tool

Credential Isolation

No API keys, tokens, or secrets are visible inside the sandbox. Connectors use OAuth with tokens stored server-side. When the agent calls send_email, the backend handles the OAuth exchange. The sandbox never touches a credential.

1
2
3
4
# Environment variables inside the sandbox:
PATH, HOME, SHELL, TERM, LANG    # Standard Unix vars
E2B_SANDBOX_ID                    # Sandbox identifier
# No API keys. No tokens. No secrets.

VM-Level Isolation

Each session runs in its own Firecracker VM, the same technology AWS built for Lambda serverless. This provides hardware-level isolation between user sessions. One user’s session cannot access another’s data or infrastructure.

Kill Switch and Audit Trail

Every action Personal Computer takes is logged. If something goes wrong, there is a kill switch to stop all operations immediately. For the enterprise version, this extends to SOC 2 Type II certification, SSO/SAML authentication, SCIM provisioning, granular admin controls, and optional zero data retention.

Developer takeaway: If you are building AI agents that take real-world actions, do not rely on prompts for safety. Build it into the architecture. Make dangerous actions require explicit tool calls with user approval. Keep credentials out of the execution environment. Log everything. This is the same principle of least privilege from traditional security, applied to AI agents. For more on this topic, see Context Engineering Guide for how to structure what your AI sees.


The Tool and Connector Ecosystem

Personal Computer exposes 40+ built-in tools and 400+ external connectors via MCP (Model Context Protocol). Here is the full picture:

Built-in Tools

Category Tools
Execution bash, write, read, edit, grep, glob
Web Research search_web, search_vertical, search_social, fetch_url, screenshot_page, browser_task, wide_browse, wide_research
Media generate_image, generate_video, text_to_speech, transcribe_audio
Memory memory_search, memory_update, schedule_cron, pause_and_wait
Safety confirm_action, ask_user_question

The wide_research and wide_browse tools fan out across multiple targets simultaneously. Need to research twenty companies? wide_research dispatches twenty parallel operations and aggregates the results.

External Connectors

Consumer tier: Gmail, Slack, GitHub, Notion, HubSpot, Salesforce, Linear, Jira, Airtable, Shopify, Stripe, and hundreds more.

Enterprise tier adds: Snowflake, Datadog, SharePoint, plus the ability to install custom connectors via MCP. Administrators control which connectors specific employees can access.

The Skill System

Rather than cramming every domain’s expertise into one system prompt, Personal Computer loads domain-specific skill files on demand. There are 50+ playbooks covering office documents, data analysis, marketing, legal, product management, customer support, and accounting.

When you ask it to “write a PRD,” it loads the pm/feature-spec skill. This keeps the base context lean and token costs down. It is the same plugin architecture pattern from traditional software, applied to context engineering for LLMs.


Personal Computer vs OpenClaw vs Claude Cowork

  Personal Computer OpenClaw Claude Cowork
Architecture Mac mini + cloud hybrid Fully local Local sandboxed VM
Models ~20, auto-routed Your choice (bring API keys) Claude only
Setup Install app, join waitlist Heavy (configs, API keys, plugins) Minimal
Always-on Yes (designed for 24/7) Yes (if you keep machine running) No
Local file access Yes Yes (full access) Yes (folder-scoped)
Web browsing Excellent (dedicated browser) Via plugins Limited
Security Cloud VM isolation + audit logs Your responsibility Sandboxed VM
Cost $200/month + $2K hardware Free + $30-60/month API $20-200/month
Open source No Yes (247K+ GitHub stars) No

The core trade-off is control vs. convenience.

OpenClaw gives you full control. You pick the model, configure the plugins, manage the security. But you are responsible for everything. Peter Steinberger, the creator, has publicly asked users to stop buying expensive Mac minis for it, since OpenClaw only needs 2 vCPUs and 4GB RAM for cloud API routing.

Personal Computer gives you managed orchestration. No setup, no model selection, no connector wiring. But you give up control, and you pay $200/month for it.

Claude Cowork sits between them. Local execution, enterprise-grade security, but locked to Claude models only and not designed for 24/7 always-on operation.

For software developers specifically, none of these are great for writing code yet. For coding, AI-powered IDEs like Cursor remain the better choice. For a guide on that, see Getting the Most Out of AI Coding Assistants.


What Does Not Work Well (Yet)

Based on real-world testing from multiple reviewers:

The Black Box Problem

Everything happens inside a cloud sandbox with no window in. No live preview, no hot reloading, no way to see what the agent is building in real time. Verifying visual output requires pushing to a hosting service and waiting for a preview build. The feedback loop is dramatically slower than local development.

Connector Bugs

Perplexity advertises 400+ integrations. In practice, many are unreliable. Builder.io’s Alice Moore reported that the Vercel connector’s OAuth token expired every session, the Ahrefs connector only surfaced backlink data, and she ended up bypassing the official GitHub connector entirely with a manual Personal Access Token.

Cost Spirals

While building a basic website, npm install silently failed in the sandbox. The agent did not report this. Instead, it kept pushing broken builds, burning through 10,000 credits. Two days. One page. $200 in compute credits on top of the subscription. A developer could debug this. A non-developer would have burned credits indefinitely.

Unusual Agent Decisions

When asked to work on a codebase, Computer sometimes uses the GitHub API to directly add and remove files in the repo, skipping clone, branch, dev, test, push entirely. Files appear and disappear in the remote repo with no local development step.


Eight Lessons for Developers Building AI Agents

Whether you are building your own AI agent platform or studying where the industry is heading, these are the practical takeaways:

1. The Local-Cloud Hybrid is the Right Architecture

Personal Computer splits local file access from cloud AI processing. The Mac handles what needs to be local (files, apps, persistence). The cloud handles what needs scale (model inference, parallel execution, connector management). This separation gives you the best of both worlds without the security nightmares of giving cloud AI full local access.

2. Separate Orchestration From Execution

Perplexity’s layered architecture (orchestrator, execution VM, browser) lets them swap any layer independently. New model? Update routing. Better sandbox? Switch providers. Neither change affects the others. Apply the same separation of concerns in your own systems. For more principles like this, see the System Design Cheat Sheet.

3. Route Tasks to Specialized Models

Using Claude Opus for lightweight browser automation is like hiring a senior architect to move furniture. Match the model to the task:

  • Cheap and fast models for repetitive, structured tasks
  • Mid-tier models for general-purpose work
  • Premium models only for complex reasoning and planning

4. Use the Filesystem for Inter-Agent Communication

Files are inspectable, persistent, and every language can read and write them. Do not over-engineer the communication layer between agents. This is the same philosophy behind Unix pipes.

5. Cap Your Agent Hierarchy at Two Levels

Parent and children. No grandchildren. Every additional level adds debugging complexity and cascading failure potential. If you think you need deeper hierarchy, redesign your task decomposition instead.

6. Build Safety Into Architecture, Not Prompts

Prompts can be overridden by injection attacks. Tool-level checkpoints that require user approval cannot be bypassed without modifying code. For any action with real-world consequences, require an explicit confirmation step.

7. Load Context Dynamically

Do not bloat your system prompt. Build a skill or plugin system that loads relevant domain knowledge on demand. This mirrors how context engineering works in any well-designed LLM application, and it is how RAG applications retrieve only what is relevant.

8. Make Everything Observable

Filesystem-based inter-agent communication creates an artifact trail. Every piece of agent work exists as a file you can inspect. Audit logs, kill switches, and action confirmations are not nice-to-haves. They are requirements for any agent with real-world permissions.


The Bigger Picture

The Mac mini as an AI server is not a gimmick. It is a signal of where personal computing is heading.

The trajectory is clear: Perplexity Computer launched as a cloud-only agent in February 2026. Three weeks later, Personal Computer brought it to local hardware. The direction is toward AI agents that live persistently on your devices, bridge local and cloud resources, and work autonomously for days or weeks at a time.

Perplexity’s bet is that “multi-model is the future.” Models are specializing, not commoditizing. No single model dominates every capability. Their internal data proves it: usage shifted from 90% on two models to evenly distributed across a dozen models in just one year. The orchestrator that routes to the right specialist wins.

The engineering patterns are not new inventions. Separation of concerns. Least privilege. Plugin architectures. Message passing. Capability-based security. Local-cloud hybrid architecture. Every design decision has an analogy in conventional software systems.

We are watching software engineering rediscover good principles under new constraints. The systems that define the next decade of AI will be built by people who understand both AI capabilities and engineering fundamentals. Not just one or the other.

If you are getting started with building AI applications, check out How to Build Your First LLM Application for a practical starting point. And if multi-agent systems feel like overkill for your use case, they probably are. Start with a single well-designed agent. You can always add orchestration later.