Why AI Needs a Tissue Foundry to Solve Biology: The Physical Test Unit Problem
By Polyphron
March 30, 2026

Why AI Needs a Tissue Foundry to Solve Biology: The Physical Test Unit Problem

There are really only two questions that matter in medicine.

  1. If we do X to the human body, what happens?
  2. If we want Y to happen, what is X?

X is any intervention: a drug, a biologic, a gene therapy, a cell therapy, a tissue graft, at any dose and schedule. 

Y is any physiological outcome: a restored function, a halted degenerative cascade, a normalized metabolic state. 

Medicine has been trying to answer these questions, in fits and starts and at extraordinary expense, for the entire history of the field, and no existing approach has come close to answering them reliably.

Despite great progress, leashed to more and more advanced tools and technologies, our ability to test whether interventions do what we think to cure disease still takes months to years and a myriad of experiments. Clinical trials, our gold standard approach to interrogate this causally, have become more sophisticated but are still limited by throughput and sparsity. Logistical practicalities and design dictate that they have to answer questions one compound at a time, in populations too heterogeneous to tell you why anything worked or failed, at a cost that guarantees most questions never get asked at all. The new wave of computational approaches is heralded as the solution to this. We can now design molecules in silico in days but predicting whether they do what we want remains the bottleneck, independently of how impressive the architecture is and however large the pretraining corpus might be.

We need to be specific about why, because the failure mode matters.

A generative model that proposes novel therapeutic molecules is only as useful as the throughput and fidelity of the system that tests them. Without a rapid, scalable, biologically faithful assay platform, generative chemistry and virtual screening remain elaborate hypothesis generators, not engines of validated therapeutic design. This is the physical test unit problem, and it is the central constraint on every computational approach to drug discovery currently in existence.

People have noticed this, of course. The response has been to look for ways to use solutions that have worked extremely well in other domains, such as training on more, often publicly available, data using larger and larger models. As some before us have noted, these approaches have an upper bound that is visible from the outset. 

We believe the bound is not going to be breached through discussions on which transformer variant, which tokenization scheme, or which pretraining objective to use, but rather through a serious debate about what is the ideal abstraction layer to model biology in a way that can translate to successful medicines. We think the time is now ripe for such a serious debate. The answer to that question shapes everything downstream: what data you need, what architecture you use, and what predictions you can actually make.

Polyphron’s answer is the tissue. Not the molecule. Not the single cell. The tissue.

This is our company’s foundational bet. We built a platform that guides induced pluripotent stem cells (iPSCs) through the same developmental sequences the body uses, producing tissue units that increasingly resemble native human tissue in structure, composition, and function. These tissues can be generated from hundreds of genetically diverse donors, connected in arrays, and continuously monitored in a fully autonomous set up. We call it a tissue foundry. 

But we did not build a tissue foundry only to make tissue. We built it because the foundry, as a byproduct of its normal operation, generates the only dataset from which a real biological simulator can be trained: causal, time-resolved, genotype-specific measurements of what actually happens when you intervene on human tissue. We can generate a cohort of cardiac tissues from different genetic backgrounds to then dose them with a drug. We can then record the electrophysiology, the secretome, the structural remodeling, as well as what happens to their contractile dynamics over 72 hours. We can increase the complexity of the perturbation regimen, designing and testing combinations or trying multiple modalities at once. We can do this over and over. Each experiment is a biological movie: a continuous, multimodal recording of how a living human tissue responds to an insult, captured at a resolution that lets you toggle back and forth along the arrow of time. That is the training signal a simulator needs, and no other data source on Earth produces it. 

Run forward, the simulator answers question (1). 

Run in reverse, planning over the action space to reach a target state, it answers question (2). 

There is of course a gap that no ex vivo system can close on its own: engineered tissues are not inside a living human body. We close that gap by implanting them. Polyphron's therapeutic program, the actual transplantation of our tissues into patients, is both a business line designed to throw off blockbusters and the in vivo calibration layer of the simulator, and the only mechanism by which ex vivo predictions can be validated against human ground truth. 

The foundry generates the data the simulator needs to learn. The simulator tells the foundry which experiments to run next. And neither is trustworthy without the therapeutic program, which validates both against what actually happens inside a living human body. Those three programs are really the same program, and the rest of this document is the argument for why. 

Where to situate a model of biology

There is a physics-first family of reductionist approaches that assume that complex dynamics can be directly modeled based on biomolecular interactions. However, a drug binding a hERG potassium channel with a given IC50 tells you almost nothing about what happens to the cardiac action potential. The answer depends on the relative expression of every repolarizing current, resting membrane potential, intracellular ion concentrations, state-dependent channel kinetics, cell-to-cell coupling, and tissue geometry. The physics is not wrong but the system is too complex and too underdetermined for bottom-up prediction to work. You have to observe what happens at a higher level.

Most recently, a new wave of computational approaches started exploiting the ability to measure, and most recently perturb and measure, phenotypes at the single cell level to learn complex interactions underlying changes in cell state and positioned these models as the right foundation to tackle the complexity of human biology. While this is certainly a step forward with respect to mechanistic models, we believe these approaches are still missing something important. 

The way cells respond to perturbations in vivo cannot be captured by assaying isolated cells. Indeed, cells do not behave the same in isolation as they do embedded in a multicellular architecture under mechanical, chemical, and electrical constraints. The emergent properties observed at the tissue level do not exist at the single-cell level but are collective phenomena and are crucial points where molecular interactions resolve into function.  

Taking all of this into account, we posit that the narrowest point in the information bottleneck is at the tissue level. Below it, you can measure precisely but cannot predict emergence. Above it, at the clinical level, you can observe outcomes but cannot disentangle mechanisms. This is the argument that virtual cell approaches get fundamentally wrong. They attempt to model biology at a level of abstraction where the properties that determine therapeutic response do not yet exist. Emergent tissue behavior is not derivable from a sum of single-cell models any more than turbulence is derivable from applying fluid equations one molecule at a time.

Closing the ex vivo/in vivo gap

We believe that the tissue arrays that Polyphron can manufacture are situated at that bottleneck. As we improve in our ability to generate iPSC-derived human tissue constructs from population-level cohorts of genetically diverse donors we can close the loop between computational prediction and biological ground truth at a throughput and scale that has not previously been possible.

The outputs of our tissue foundry, as discussed so far, still present some of the limitations that other ex vivo approaches have suffered from. 

First, ex vivo grown tissues, however similar to their natural counterpart, exist in a controlled microenvironment that does not experience the same degree of stimulation nor the same microenvironment as natural human tissue. This puts a hard limit on their ability to correctly replicate natural tissue functions, and hence work as substrate for a biological simulator. Every other approach in the field accepts this gap and cannot move past a very specific level of predictive validity. 

Our therapeutic program, the actual implantation of Polyphron's engineered tissues into patients as a tissue replacement therapy, is the necessary step to overcome this limitation. By proving that our tissues can restore lost tissue function, we calibrate the simulator against the highest ground truth available: what our tissues actually do inside a living human being. 

Importantly, this is not a separate business line that happens to share infrastructure with the simulator. That data, fed back into the simulator's clinical calibration layer, continuously narrows the ex vivo/in vivo gap for every subsequent prediction. 

This is what distinguishes Polyphron from every other approach attempting to build a biological simulator. Purely computational approaches have no physical validation at all. Experimental platforms that generate data but do not implant their tissues accumulate ex vivo measurements without ever confirming that those measurements predict in vivo outcomes. We are uniquely positioned to do so because our tissues are designed to be transplantable from the outset.

Second, we cannot ignore the fact that many critical biological events and outcomes are mediated by the immune system and local or distal interaction between and within organs and that these can hardly be modeled using isolated tissue units. Solving this bottleneck will require further technical novelty, which we predict to be based on a mix of new experimental devices and computational approaches that can model the connections between different organs and immune interactions. These models will potentially be built on the readouts that we will be able to extract after each implantation, such as how the construct integrates with host vasculature, how the immune system responds over weeks and months, how the tissue remodels under physiological mechanical stress, and whether the functional readouts we measured ex vivo (contractile dynamics, barrier integrity, electrophysiological signatures) translate to the predicted in vivo phenotype.

There is a compounding effect here that no competitor can replicate without following the same path. A platform that has implanted tissues in humans and monitored their performance in vivo for years develops a calibration advantage that scales with each cohort. Together with it, the gap between what the tissue array predicts and what happens in a patient shrinks in a way that is inaccessible to any other dataset that does not include therapeutic implantation.

Why data alone is not enough

There is a tempting version of this idea that goes: collect enough high-quality human biological data, train a large enough model, and the rest will follow. Recent work in spatial transcriptomics has shown that models trained on human tumor samples do exhibit meaningful scaling behavior, and that they appear to learn something real about how tissues spatially organize.

But the ceiling is visible from the outset. A model trained purely on observational human data can learn to describe the distribution of biological states it has seen. What it cannot do is predict the consequences of interventions it has never observed. Observational data conflates correlation with causation in ways that matter enormously for therapeutic design. Drug repurposing and patient stratification are tractable in this regime. Answering question (2) (“if we want Y, what is X?”) is not. You need predictions about states the system has never been in, under interventions that have never been tested, in genotypes that were not in the training set. That requires causal, interventional data: the kind that comes from actually doing things to controlled human biological systems and measuring what happens.

Equally important is the validation gap. A model that generates predictions but cannot rapidly test them against physical biology will accumulate undetected errors. Its representations drift from ground truth in proportion to how far its predictions venture from the training distribution, which is precisely when accurate predictions matter most. Without a physical test unit capable of generating causal, interventional data at scale, the model cannot be corrected. It becomes progressively less trustworthy exactly where it is most needed.

Why in vivo approaches don't solve the problem

An intuitive response to the validation gap is to look in vivo: use animal models, or in vivo perturbation screens conducted directly in diseased tissue, to generate large-scale causal intervention data in a native biological context. Recent platforms have demonstrated modality-agnostic delivery of gene perturbations across hundreds of targets directly into living tissue. This is technically impressive and useful for target identification. It does not solve the biological simulator problem, for three reasons.

First, animal tissue is not human tissue. The divergence is not merely sequence-level; it is architectural, cellular, and physiological. Drug candidates that succeed in mouse models fail in human clinical trials at rates that have barely moved in decades. A model trained on animal in vivo data learns the wrong emergence function and no amount of genetic humanization fully bridges that gap.

Second, in vivo systems are opaque. You cannot decouple a drug's direct tissue effect from its systemic PK, immune interactions, neuroendocrine feedback, and behavioral confounds. The signal you observe is the integrated output of a whole organism, which is exactly the gap the tissue layer needs to bridge, not bypass.

Third, in vivo approaches are expensive, slow, and ethically constrained at the scale needed for simulator training. Building a system capable of answering our two questions requires combinatorial coverage across thousands of interventions, hundreds of genotypes, and multiple tissue types. That experiment cannot be run in animals. It can only be run in controlled human tissue systems, at industrial throughput, with continuous automated measurement.

Why general intelligence is not enough

There is a stronger version of the computational argument that the preceding sections do not address. It goes something like this: Surely a sufficiently powerful general-purpose reasoning system, something like the next generation of frontier language models, can simply read all of biology, design experiments, interpret results, and iteratively close the gap between hypothesis and ground truth?

This is probably the strongest steelman counterargument to what we are building. It is also wrong.

A general-purpose reasoning system absolutely can (and will) accelerate biological research. It can blaze through papers faster than any human, generate hypotheses across disparate domains no single researcher would connect, design experimental protocols, and interpret complex results. The problem is that all of that operates on the same side of the bottleneck. 

As a thought experiment, posit a general or superintelligent AI system tasked with predicting what a novel drug combination will do to the cardiac tissue of a patient with a specific previously unseen genotype. While the system can retrieve every relevant paper, map every known interaction and mechanistic pathway, reason about ion channel kinetics, drug-drug interactions, and pharmacogenomics, its answer will almost certainly be wrong, because the outcome depends on emergent tissue-level dynamics that have never been measured in that genotype under that combination. The information does not exist in the literature nor in any database. It exists only in the tissue and you need the substrate to generate data from that experiment.

To be clear, this is not some temporary gap that closes as models get smarter but rather it is a structural feature of biology. The relevant dynamics are nonlinear, path-dependent, and context-specific to a degree that no amount of reasoning over prior observations can resolve. You cannot deduce what a drug combination does to a tissue from first principles, any more than you can deduce the weather in Paris next Tuesday from the laws of thermodynamics. You have to run the simulation, and when it comes to biology, running the simulation means running the experiment.

To put it bluntly, you are not excused from running the experiments by superintelligence. And if the experiments you need to run are causal perturbations of human tissue at combinatorial scale across diverse genotypes, then you need the infrastructure to run those experiments at the throughput and fidelity the model requires. This means you’ll need something that looks very much like a tissue foundry.

There is a subtler (but no less pernicious) problem. A superintelligent system that reasons over biological literature inherits every bias, gap, and error in that literature if it has no independent access to biological ground truth. The skew of results in the literature to ‘things that worked’ or were cherry-picked will be a millstone around this AI system. It cannot check its own predictions against human biological reality unless someone builds the physical system that lets it do so. Without that closed loop, the system's confidence in its biological predictions will be calibrated to the coherence of its reasoning, not to the correspondence between its predictions and what actually happens. That is the very gap where most, if not all, clinical failures live.

The right way to think about the relationship between general intelligence and what we are building is not as substitutes but as complements with a strict dependency. A frontier reasoning system would essentially use our tissue arrays as a physical actuator to test the results of its proposed interventions on human biology. Intelligence without data is speculation. Without intelligence, data is noise. The combination is what we are building toward, and the binding constraint today and tomorrow will not be intelligence. It will be the data.

A report from the future

It is 2036. The tissue foundry operates continuously, and over a decade of operation it has generated the largest causal interventional dataset in the history of biology: millions of experiments, conducted in hundreds of iPSC donor lines, across tissue constructs representing nearly the entire human body, covering thousands of drugs, biologics, gene therapies, and combination regimens, all with time-resolved multimodal readouts. That dataset trained a general purpose biological simulator. A patient presenting with heart failure, an unusual genotype, and a poor response profile to standard regimens no longer waits for a clinical trial that may never be run in their subpopulation. The simulator runs that trial. When no existing compound achieves the target phenotype, it identifies the intervention action vector that would, and generative design produces a candidate. The candidate enters the tissue array before it enters a human. If the tissue response matches the prediction, it advances. If not, the discrepancy becomes a training signal, the design is revised, and the cycle repeats. Every patient who passes through the system makes the next patient's design cheaper and faster. 

The two questions that organized the entire effort of drug development for a century now have reliable answers.