Anthropic Launches Claude Science: An AI Workbench for Scientists, with 60+ Built-in Research Skills
- Anthropic has launched Claude Science, an AI workbench app for scientists, now in beta for Pro, Max, Team, and Enterprise users, usable locally on macOS/Linux or remotely via SSH/HPC login nodes.
- The app ships with 60+ pre-configured skills and connectors spanning genomics, single-cell, proteomics, structural biology, and cheminformatics, wired into hundreds of specialized data sources (UniProt, PDB, Ensembl, and more) plus journals and preprint resources.
- It can autonomously draft compute jobs and, with the user's consent, submit them to the user's own HPC cluster or Modal cloud GPUs, scaling analyses from a single GPU to hundreds — while raw data always stays inside the user's own systems.
- A built-in reviewer agent continuously checks whether citations in the generated output are real, whether the numbers trace back to the computation, and whether charts match the code that produced them, fixing problems automatically when it finds them.
- Real cases already exist: an Allen Institute researcher produced about ten reviews (several over 100 pages) that used to take two years each; a UCSF team cut its end-to-end germline variant analysis to one-tenth of the original time, independently verified by the lab.
Scientists Now Have Their Own AI Workbench
Anthropic recently launched Claude Science, an AI workbench app that pulls scientists' everyday tools, databases, and compute resources into a single environment, now in beta for Pro, Max, Team, and Enterprise users.
It's an app that runs on your own computer or server: you ask an AI a scientific question in plain language, and it marshals dozens of specialized tools to pull data, run analyses, draw charts, and write manuscripts — with every output traceable back to how it was made. You can use it locally (macOS/Linux) much like Jupyter Notebook, or on a remote machine via SSH or an HPC login node.
Just How Much of a Headache Science Really Is
Everyday research is full of tedious chores. Researchers hop between dozens of databases, each with its own data structure (schema); the file formats they run into often need dedicated processing pipelines and viewers; and the tools are a long list — PubMed, Jupyter, R, cluster terminals — switched one after another.
Just wiring these tools together and getting data to flow between them eats up a huge amount of a researcher's energy. What Claude Science aims to do is gather these scattered pieces into one environment, where you can go from searching the literature all the way to producing a manuscript.
One Lead Coordinating Agent, a Team of Specialists Behind It
The one you talk to is a general-purpose coordinating agent. It holds those 60+ pre-configured skills and connectors, can spin up domain-specialist sub-agents, and can also call the custom agents you've created yourself. They fan out to pull data, run analyses, and produce results.
Think of it as an app store plus shortcuts for your phone: whichever database or software you need to work with, you install the matching "skill pack" on the coordinating agent and it knows how to drive that tool. Connectors, meanwhile, bring in the tools your lab already uses.
The key piece is that at the end of the chain stands a role dedicated to catching errors — the reviewer agent. It watches the other agents' output, checks it item by item, and fixes problems on its own when it finds them.
ask in plain language
fixes it automatically
This reviewer agent is like a peer reviewer on call the whole way through: it fixates on whether citations really have a source, whether the numbers given can be traced back to the original computation, and whether charts line up with the code that generated them. Find an error, and it fixes it itself rather than leaving the problem to you. This takes direct aim at that old flaw where AI-generated content loves to make things up with a straight face.
One agent generates content — the "actor"; the other is dedicated to checking accuracy and the credibility of citations — the "critic," and the two divide the labor and keep each other in check. It's like one reporter writing the story and a dedicated fact-checking editor going over it line by line, with neither vouching for the other.
Every Chart It Generates Traces Back to Its Code
Research leans heavily on visuals, so when Claude Science produces charts and manuscripts, it hands over the code that generated them alongside. It can also natively render research-specific visualization formats — 3D protein structures, genome browser tracks, chemical structures, and more — without opening a separate dedicated viewer.
When it generates a chart, it attaches, all together: the exact code and runtime environment that produced it, a one-line note on how it came to be, and the full conversation record. That means when you look back months later, you can still figure out what was fed in, how the result was verified, and how to reproduce it.
- The code that produced it
- Runtime environment
- A one-line note on its origin
- The full conversation record
- See every input
- Verify anytime
- Reproduce it months later
Changing a chart doesn't mean touching code yourself either. Tell it in plain language to "remove the gridlines" or "switch the y-axis to a log scale," and it goes and edits the code it wrote and re-renders the chart.
The AI Drives Your Supercomputer Itself — Yet the Data Never Moves
Big analyses are a hassle: folding a protein, running a genomics pipeline over a massive dataset — researchers often have to drop the scientific question at hand to configure the compute job, wait for it to queue onto the cluster, watch whether it succeeds, and then pull the results back. Claude Science takes this whole routine off your hands.
It first drafts a plan and asks you before drawing on new resources — you can review or even revoke any decision. Only with your consent does it write the job and submit it to the compute your lab already uses: your own HPC cluster over SSH, or on-demand cloud GPUs through your Modal account. The scale can flex from a single GPU to hundreds.
review/revoke
/ Modal GPU
needed context
The whole process runs on your lab's own infrastructure — your laptop, a Linux machine, or an HPC login node. So large, sensitive datasets never have to leave the systems they already live on; each step passes Claude only the sliver of context that step's analysis needs. Compute can be outsourced to the AI to schedule, while the raw data stays put.
Because these agents work inside a session that holds context in memory, even a massive dataset only has to be loaded once. As jobs run, that reviewer agent checks the output in step, catching wrong citations, numbers that can't be traced, and charts that don't match the code — self-correcting as it goes.
Halfway through a job, you can spin off a parallel branch and run each with a different method, the two sides not affecting each other, and the original conversation thread isn't lost either. It's like saving one document as two versions and editing each separately — botch one and the original stays untouched.
Expert Out of the Box: Databases and Domain Models Already Wired In
Scientific knowledge is scattered across hundreds of specialized sources. In biology alone, the relevant data may be spread across UniProt, PDB, Ensembl, Reactome, ClinVar, ChEMBL, and GEO — each with its own structure and query language — plus journals, preprint servers, and domain-specific open-source models. You ask one question in plain language, and specialist agents query and synthesize across these sources, sparing you from poking at them one by one.
It also plugs into NVIDIA's BioNeMo Agent Toolkit, natively connecting to the life-science models and libraries in BioNeMo, including Evo 2, Boltz-2, and OpenFold3. And the models, datasets, and pipelines scientists already trust can be brought in too: any pipeline can be saved as a reusable skill, any go-to tool hooked up with a connector, and later sessions inherit them automatically. You don't have to ditch the toolchain you already trust just to use AI.
What Three Labs Have Already Gotten Out of It
Over the past few months, researchers have used the beta for single-cell RNA sequencing analysis, CRISPR screen design, protein structure prediction, cheminformatics, and more. Three cases best show what it does in practice.
| Lab | What they did with it | Quantified result |
|---|---|---|
| Manifold Bio | End-to-end screening of targets for tissue-targeted drugs, assessing surface expression, in vivo transport, and safety one by one, ranking by criteria learned from their own proprietary data | Ran the whole pipeline in one go; the key difference from a general-purpose coding assistant is that it finds the right data itself and makes judgments carrying experience from past projects |
| Allen Institute neuroscientist Jérôme Lecoq | Built a multi-agent "computational review template" of about 20 custom skills: sub-agents read thousands of papers, extract core arguments and key quantitative findings into an evidence store, then write the review section by section, each section handed to a dedicated sub-agent that writes and checks in tandem using an actor-critic pairing | A single review used to take up to two years; he has now produced about 10, several over 100 pages, with citations all checked by the reviewer agent |
| UCSF Brain Tumor Center epidemiology associate professor Stephen Francis | Studying the molecular epidemiology of glioma: how thousands of small-effect germline variants add up to shape individual susceptibility, running a comprehensive germline analysis across multiple methods | Cut the time to roughly one-tenth; his team independently double-checked the results, confirming they were both fast and solid |
Who Can Use It Now, and How
The Claude Science app is now available in beta on macOS and Linux for Pro, Max, Team, and Enterprise users. Team and Enterprise users need an admin to enable it. Anthropic says it released early so scientists can get hands-on with real problems first and then feed back on how to refine it.
For active labs at academic and nonprofit research institutions, there's also a Team plan with discounted seats.
There's also funding for science projects (open for details)
Anthropic will support up to 50 Claude Science "AI for Science" projects, each with up to $30,000 in credits; Modal separately provides up to $2,000 in compute for selected projects. Priority goes to biology and biomedical research. Applications are open until July 15, 2026, with notifications by July 31, and a project period running September 1 to December 1, 2026.
Every output carries an auditable record of how it was generated, so you can verify and reproduce the results. Anthropic, "Claude Science, an AI workbench for scientists"