Claude Code 2.0: Loops, Scheduled Tasks, Google Workspace, and Skills with Evals

Claude Code just shipped four features that move it from “smart assistant you talk to” into something closer to an autonomous operating layer. Here is what each one does, when it matters, and where the edges are.

1. Loops: Short-Term Recurring Tasks

The /loop command runs a prompt or slash command on a recurring interval inside your current session. No manual re-prompting.

/loop every 10 minutes check my inbox for important emails

That creates a cron job. It fires every 10 minutes without interaction. You can reference skills in loops:

/loop every day check my YouTube for new videos and run my content repurposing skill

One-off reminders work the same way under the hood – they are just cron jobs that fire once.

The constraints matter. Loops expire after 3 days. They live only in the current session. Close the terminal, they are gone. Missed runs are skipped permanently – no catch-up. This is intentional. Loops are for active monitoring during a project sprint, watching an inbox for a few hours, tracking changes across a couple of days. Not for long-term automation.

2. Scheduled Tasks: Persistent Workflows

Where loops are session-bound and short-lived, scheduled tasks are persistent. Daily, weekly, hourly – they survive across sessions.

Each scheduled run starts a fresh Claude Code instance. It reads project files, runs necessary skills, executes the command, and ends. Think of them as lightweight n8n workflows that live inside Claude Code.

You create scheduled tasks in the desktop app (not terminal or VS Code). The UI takes a name, a prompt, a schedule, a model, and a project folder.

Key difference from loops: if your machine is off when a scheduled task should fire, it catches up when you reopen the app. Loops do not catch up. Missed loop runs are lost forever.

The mental model: loops for “right now,” scheduled tasks for “every day at 9 AM.”

3. Google Workspace Access via CLI

Google released an open-source Workspace CLI that gives Claude Code access to the full Google ecosystem – Drive, Gmail, Calendar, Docs, Sheets, Slides. Over 100 built-in “recipes” for common operations.

The real improvement is document quality. Previously, pushing content to Google Docs through automation meant raw markdown or fighting formatting APIs. The new CLI produces properly formatted documents – headers, images, links, styling – through bash commands that talk directly to Google’s services.

Setup is straightforward. You can install the CLI directly or ask Claude Code to walk you through it.

It is labeled “not an officially supported Google product” because it is in beta, not because it is unreliable.

4. Skills 2.0: Built-In Evaluation and Testing

This is the most architecturally significant update. Before 2.0, improving skills was manual iteration – write, run, eyeball the output, tweak, repeat. No structured feedback on what helped.

Skills 2.0 adds built-in evals. You define criteria, run tests against them, and get scored results rather than pass/fail.

The Workflow

Step 1: Build the initial skill using the skill creator meta-skill. Give it a clear name, trigger description, goal, required tools, reference files, step-by-step process, and human-in-the-loop checkpoints.

Step 2: Define evaluation criteria. Specificity matters. “Run some tests” gives generic results. Instead:

“Run a new test optimized for ensuring my copy follows the persuasive techniques listed in my persuasion toolkit reference file.”

You define criteria like: Does it use the reference file? Does it employ curiosity gaps? How often does it include proof or founder-led stories?

Step 3: Run parallel test instances. The skill creator launches multiple sub-agents, runs 5 test instances in parallel, and returns a structured HTML report.

The Reports

The evaluation report shows each run’s output, the original prompt, whether the skill was used, and formal grades with commentary per criterion. An example from the video scored roughly 50% – 6 passes, 6 failures out of 12 criteria – with specific findings like “did not sustain information gaps over multiple sentences.”

Step 4: Iterate. Adjust skill.md, re-run evals, compare scores. You can A/B test with vs. without a skill, or full vs. lean versions. Reports include benchmarking – per-run duration and token counts – so you can assess whether a skill is actually improving quality or just burning tokens.

The target: get from “kind of works” to 9 out of 10. Once a skill consistently hits that, stop iterating.

What This Means

Each feature addresses a different time horizon:

The combination is what matters. A scheduled task can fire daily, invoke a skill that has been refined through 20 rounds of evals, push the output to a properly formatted Google Doc, and loop on monitoring the response. That is not “assistant” behavior. That is an automated workflow with quality controls.

The gap between this and a full agentic operating system is narrowing fast.

Source: Claude Code 2.0 Has Arrived (It’s Insane) by Simon Scrapes