(3.2k words, 12 minutes reading time)
There are really only two questions that matter in medicine.
- If we do X to the human body, what happens?
- If we want Y to happen, what is X?
X is any intervention: a drug, a biologic, a gene therapy, a cell therapy, a tissue graft, at any dose and schedule. Y is any physiological outcome you'd want to achieve: a restored ejection fraction, a halted fibrotic cascade, a normalized metabolic state. These questions sound simple. They are not. Medicine has been trying to answer them, somewhat badly and at extraordinary expense, for the entire history of the field, and no existing approach has come close to answering them reliably.
Clinical trials answer them one compound at a time, in populations too heterogeneous to tell you why anything worked or failed, at a cost that guarantees most questions never get asked at all. Molecular biology answers them at the wrong level of abstraction. It can tell you what a drug binds, not what the tissue does in response. And the new wave of computational approaches, however impressive the architecture and however large the pretraining corpus, have all been trained on data that is either the wrong species, the wrong level of resolution, or causally uninterpretable. You can design a molecule in silico in days. Testing whether it does what you think it does in relevant human biology still takes months to years and a myriad of experiments. This is the bottleneck. It has always been the bottleneck. And no amount of scaling a foundation model on observational data is going to make it go away.
We want to be specific about why, because the failure mode matters.
A generative model that proposes novel therapeutic molecules is only as useful as the throughput and fidelity of the system that tests them. Without a rapid, scalable, biologically faithful assay platform, generative chemistry and virtual screening remain elaborate hypothesis generators, not engines of validated therapeutic design. The field has gotten quite good at generating hypotheses. It remains terrible at testing them. This is the physical test unit problem, and it is the central constraint on every computational approach to drug discovery currently in existence.
People have noticed this, of course. The response has been to look for shortcuts: train on more data, use bigger models, add spatial transcriptomics, run animal screens to generate causal data. Each of these has real merit, and each of them hits a ceiling that is visible from the outset, if you know where to look.
But first, the claim we are actually making. We think the time is now ripe for a serious debate about where to situate a model of biology. Not which transformer variant to use, not which tokenization scheme, not which pretraining objective. Where. At what level of biological abstraction. The answer to that question shapes everything downstream: what data you need, what architecture you use, and what predictions you can actually make.
Our answer is the tissue. Not the molecule. Not the single cell. The tissue.
This is Polyphron's foundational bet. We built a platform that guides induced pluripotent stem cells (iPSCs) through the same developmental sequences the body uses, producing tissue units that increasingly resemble native human tissue in structure, cellularity, and function.
We call it a tissue foundry.
These tissues can be generated from hundreds of genetically diverse donors, connected in arrays, and continuously monitored in microfluidic and mesofluidic devices under automated multimodal measurement. But we did not build a tissue foundry only to make tissue.
We built it because the foundry, as a byproduct of its normal operation, generates the only dataset from which a real biological simulator can be trained: causal, time-resolved, genotype-specific measurements of what actually happens when you do X to human tissue. Dose a cardiac construct with a drug. Watch what happens to its contractile dynamics over 72 hours. Record the electrophysiology, the secretome, the structural remodeling. Do it again in a different genetic background. Do it with a combination regimen. Do it a thousand times. Each experiment is a biological movie: a continuous, multimodal recording of how a living human tissue responds to a perturbation, captured at a resolution that lets you scrub back and forth along the arrow of time. That is the training signal a simulator needs, and no other data source on Earth produces it.
When we say biological simulator, we mean something specific. We mean an action-conditioned world model trained on causal, time-resolved tissue data that can predict, for a given genotype and tissue state, the trajectory that system will follow under a proposed intervention. Run forward, it answers question (1). Run in reverse, planning over the action space to reach a target state, it answers question (2).
There is of course a gap that no ex vivo system can close on its own: engineered tissues are not inside a living human body. We close that gap by implanting them. Polyphron's therapeutic program, the actual transplantation of our tissues into patients. It is both a business line designed to throw off blockbusters and the in vivo calibration layer of the simulator, and the only mechanism by which ex vivo predictions can be validated against human ground truth.
The foundry generates the data the simulator needs to learn. The simulator tells the foundry which experiments to run next. And neither is trustworthy without the therapeutic program, which validates both against what actually happens inside a living human body. Those three programs are the same program, and the rest of this document is the argument for why.
Where to situate a model of biology
The physics-first argument holds that if you model biomolecular interactions with sufficient fidelity, everything else follows. This is almost certainly wrong. A drug binding a hERG potassium channel with a given IC50 tells you almost nothing about what happens to the cardiac action potential. The answer depends on the relative expression of every repolarizing current, resting membrane potential, intracellular ion concentrations, state-dependent channel kinetics, cell-to-cell coupling, and tissue geometry. The physics is not wrong but the system is too complex and too underdetermined for bottom-up prediction to work. You have to observe what happens at a higher level.
The single-cell argument holds that virtual cells, models trained on large-scale transcriptomics or spatial data at cellular resolution, are the right foundation. This is closer, but it misses something fundamental. Cells do not behave the same in isolation as they do embedded in a multicellular architecture under mechanical, chemical, and electrical constraints. A cardiomyocyte in suspension is not a cardiomyocyte in a beating syncytium. The emergent properties of tissue (coordinated contraction, action potential propagation, barrier function, mechanoresponse, paracrine crosstalk) do not exist at the single-cell level but are collective phenomena.
The tissue level is the narrowest point in the informational bottleneck. Below it, you can measure precisely but cannot predict emergence. Above it, at the clinical level, you can observe outcomes but cannot disentangle mechanisms. The tissue is where the biological computation happens: where molecular interactions resolve into functional physiology. A drug's effect on a patient is, to first approximation, the sum of its effects on that patient's tissues, translated through pharmacokinetics. If you have high-fidelity tissue-level pharmacology for every relevant tissue type and genotype, the remaining unknowns (PK translation, multi-organ integration, temporal dynamics) are tractable. If you don't, no amount of molecular data or clinical data fills the gap.
This is what virtual cell approaches get fundamentally wrong. They attempt to model biology at a level of abstraction where the properties that determine therapeutic response do not yet exist. Emergent tissue behavior is not derivable from a sum of single-cell models, for the same reason you cannot recover the weather from a catalog of atmospheric molecules: the phenomena live in the interactions, not the components.
Closing the ex vivo/in vivo gap
We are now confident that Polyphron's tissue arrays are a solution to the physical test unit problem. With iPSC-derived human tissue constructs from hundreds of genetically diverse donors, continuously monitored under automated multimodal measurement, we can close the loop between computational prediction and biological ground truth at a throughput and scale that has not previously been possible.
But we want to be honest about what this does not solve on its own.
Engineered tissues, however well-characterized ex vivo, exist in a controlled microenvironment. They are not perfused by a living circulation. They do not experience immune surveillance, mechanical loading, or the systemic metabolic context of a real human body. The translation from ex vivo measurement to in vivo performance involves unknowns that can be modeled and approximated, but ultimately must be observed.
Every other approach in the field accepts this gap and works around it. We intend to close it.
The therapeutic program, the actual implantation of Polyphron's engineered tissues into patients, is not a separate business line that happens to share infrastructure with the simulator. It is the only mechanism by which the simulator can be calibrated against the highest ground truth available: what our tissues actually do inside a living human being. Each implantation generates data that no microfluidic platform can produce: how the construct integrates with host vasculature, how the immune system responds over weeks and months, how the tissue remodels under physiological mechanical stress, and whether the functional readouts we measured ex vivo (contractile dynamics, barrier integrity, electrophysiological signatures) translate to the predicted in vivo phenotype.
That data, fed back into the simulator's clinical calibration layer, continuously narrows the ex vivo/in vivo gap for every subsequent prediction.
This is what distinguishes Polyphron from every other approach attempting to build a biological simulator. Purely computational approaches have no physical validation at all. Experimental platforms that generate data but do not implant their tissues accumulate ex vivo measurements without ever confirming that those measurements predict in vivo outcomes. In vivo animal screens generate causal data in the wrong species. None of them close the loop in human. We do, because our tissues are designed from the outset to be transplantable: not as a concession to clinical necessity, but as a deliberate epistemic strategy.
There is a compounding effect here that no competitor can replicate without following the same path. A platform that has implanted tissues in humans and monitored their performance in vivo for years develops a calibration advantage that scales with each cohort. The ex vivo/in vivo translation model improves. Confidence intervals narrow. The gap between what the tissue array predicts and what happens in a patient shrinks and that calibration is not available from any dataset that does not include therapeutic implantation. It cannot be licensed, approximated, or synthesized: only earned.
Why data alone is not enough
There is a tempting version of this idea that goes: collect enough high-quality human biological data, train a large enough model, and the rest will follow. Recent work in spatial transcriptomics has shown that models trained on human tumor samples do exhibit meaningful scaling behavior, and that they appear to learn something real about how tissues spatially organize. Genuine progress.
But the ceiling is visible from the outset. A model trained purely on observational human data can learn to describe the distribution of biological states it has seen. What it cannot do is predict the consequences of interventions it has never observed. Observational data conflates correlation with causation in ways that matter enormously for therapeutic design. Drug repurposing and patient stratification are tractable in this regime. Answering question (2) (“if we want Y, what is X?”) is not. You need predictions about states the system has never been in, under interventions that have never been tested, in genotypes that were not in the training set. That requires causal, interventional data: the kind that comes from actually doing things to controlled human biological systems and measuring what happens.
Equally important is the validation gap. A model that generates predictions but cannot rapidly test them against physical biology will accumulate undetected errors. Its representations drift from ground truth in proportion to how far its predictions venture from the training distribution, which is precisely when accurate predictions matter most. Without a physical test unit capable of generating causal, interventional data at scale, the model cannot be corrected. It becomes progressively less trustworthy exactly where it is most needed.
Why in vivo approaches don't solve the problem
An intuitive response to the validation gap is to look in vivo: use animal models, or in vivo perturbation screens conducted directly in diseased tissue, to generate large-scale causal intervention data in a native biological context. Recent platforms have demonstrated modality-agnostic delivery of gene perturbations across hundreds of targets directly into living tissue. This is technically impressive and useful for target identification. It does not solve the biological simulator problem, for three reasons.
First, animal tissue is not human tissue. The divergence is not merely sequence-level; it is architectural, cellular, and physiological. Drug candidates that succeed in mouse models fail in human clinical trials at rates that have barely moved in decades. A model trained on animal in vivo data learns the wrong emergence function. No amount of genetic humanization fully bridges that gap.
Second, in vivo systems are opaque. You cannot decouple a drug's direct tissue effect from its systemic PK, immune interactions, neuroendocrine feedback, and behavioral confounds. The signal you observe is the integrated output of a whole organism, which is exactly the gap the tissue layer needs to bridge, not bypass.
Third, in vivo approaches are expensive, slow, and ethically constrained at the scale needed for simulator training. Building a system capable of answering our two questions requires combinatorial coverage across thousands of interventions, hundreds of genotypes, and multiple tissue types. That experiment cannot be run in animals. It can only be run in controlled human tissue systems, at industrial throughput, with continuous automated measurement.
Why general intelligence is not enough
There is a stronger version of the computational argument that the preceding sections do not address, and it deserves to be taken seriously. It goes like this: forget biology-specific foundation models. Forget tokenized transcriptomics and masked prediction objectives. What if a sufficiently powerful general-purpose reasoning system, something like the next generation of frontier language models, can simply read all of biology, synthesize across it, design experiments, interpret results, and iteratively close the gap between hypothesis and ground truth?
This is a real position held by serious people. It is probably the most formidable version of the counterargument to what we are building. And it is wrong, but not for the reason one might expect.
A general-purpose reasoning system can absolutely accelerate biological research. It can read papers faster than any human, generate hypotheses across domains no single researcher would connect, design experimental protocols, and interpret complex results. None of that is in question. The problem is that all of it operates on the same side of the bottleneck, while making the cognitive work faster it does not make the physical work go away.
Consider what happens when you give a brilliant AI system the task of predicting what a novel drug combination will do to cardiac tissue in a patient with a specific genotype. The system can retrieve every relevant paper, every known interaction, every mechanistic pathway. It can reason about ion channel kinetics, drug-drug interactions, and pharmacogenomics. It will produce an answer that is impressively well-reasoned and almost certainly wrong, because the outcome depends on emergent tissue-level dynamics that have never been measured in that genotype under that combination. The information does not exist in the literature nor in any database. It exists only in the tissue, and only after you do the experiment.
This is not a temporary gap that closes as models get smarter. It is a structural feature of biology. The relevant dynamics are nonlinear, path-dependent, and context-specific to a degree that no amount of reasoning over prior observations can resolve. You cannot deduce what a drug combination does to a tissue from first principles, any more than you can deduce the weather in Paris next Tuesday from the laws of thermodynamics. The physics is correct. The system is too high-dimensional and too sensitive to initial conditions for deduction to work. You have to run the simulation, and in biology, running the simulation means running the experiment.
A general-purpose AI makes you faster at deciding which experiments to run. It does not excuse you from running them. And if the experiments you need to run are causal perturbations of human tissue at combinatorial scale across diverse genotypes, then you need the infrastructure to run those experiments at the throughput and fidelity the model requires. You need a tissue foundry.
There is a second, subtler problem. A general-purpose system that reasons over biological literature inherits every bias, gap, and error in that literature. It has no independent access to biological ground truth. It cannot check its own predictions against reality unless someone builds the physical system that lets it do so. Without that closed loop, the system's confidence in its biological predictions is calibrated to the coherence of its reasoning, not to the correspondence between its predictions and what actually happens. Those are very different things, and the gap between them is where clinical failures live.
The right way to think about the relationship between general intelligence and what we are building is not as substitutes. It is as complements with a strict dependency. A frontier reasoning system makes the simulator more powerful: it can propose interventions the simulator should test, interpret unexpected results, and reason about mechanisms that the world model captures only implicitly. But the reasoning system cannot replace the simulator, because the simulator is grounded in physical measurements the reasoning system has no way to generate on its own. Intelligence without data is speculation. Data without intelligence is noise. The combination is what we are building toward, and the binding constraint today and tomorrow will not be intelligence. It is the data.
A report from the future
It is 2036. The tissue foundry operates continuously, and over a decade of operation it has generated the largest causal interventional dataset in the history of biology: millions of experiments, conducted in hundreds of iPSC donor lines, across tissue constructs representing nearly the entire human body, covering thousands of drugs, biologics, gene therapies, and combination regimens, all with time-resolved multimodal readouts. That dataset trained a general purpose biological simulator. A patient presenting with heart failure, an unusual genotype, and a poor response profile to standard regimens no longer waits for a clinical trial that may never be run in their subpopulation. The simulator runs that trial. When no existing compound achieves the target phenotype, it identifies the intervention action vector that would, and generative design produces a candidate. The candidate enters the tissue array before it enters a human. If the tissue response matches the prediction, it advances. If not, the discrepancy becomes a training signal, the design is revised, and the cycle repeats. Every patient who passes through the system makes the next patient's design cheaper and faster.
The two questions that organized the entire effort of drug development for a century now have reliable answers.
