One theme I keep thinking about throughout my molecular biology experiments is that when we run an assay (some molecular measurement technique), we end up with some data that we want to represent some particular way in a computer. I think of this as almost a mathematical root, or root principle, from which we can no longer subdivide the problem, at least within the bounds of existing knowledge.
This naturally leads to data mining and formal representation of patterns in that data- we don’t really care about the labels and descriptors that humans gave it so much as we care about the patterns within the numbers themselves.
I recently went to an ONT event where scientists (primarily in medicine/biology) were presenting their findings after using ONT devices. One of the things I noticed is the presenters consistently described diseases and other phenotype’s via complex words. These words are basically labels we, as humans, had assigned to these phenomena.
One could argue that there are patterns in the language themselves, but what would be cheaper- collecting all the data, having humans create words to mean different phenomena + conditions in the data, then doing pattern matching in those words? Or collecting the numbers from the experiments and using models to derive patterns from that data? To be determined, but it is looking like you start with the former and evolve to the latter to grow exponentially (or logarithmically until we hit a plateau).
Back to the presentation: I kept asking: “Why is nothing being presented purely as numbers?” Meaning “we noticed this pattern of signal happening under these circumstances, represented this way numerically.” A “truth” purely interpreted through the quantitative information produced from the measurement devices.
After-all, that’s what my Tesla does when making decisions via Autopilot. It has sensors, a model, and uses the combination of what is sensed, converted to numbers, and matching up wioth the decision engine (neural network) to decide what to do. And it is both learning from and teaching the neural network real time (or at least per training cycle, though we are quickly converging on real time learning as of June 2026).
Same with quantitative finance- we have this plethora of price movement and events that have happened historically. What useful pattern recognition can we derive from this information to make successful predictions about something happening in similar (or even different) market conditions? (which are really just numbers that are similar enough to the ones that the model/formal representation is built on)
Why is this attitude/perspective not ubiquitous in biology, I wonder? It is an excellent domain for this type of reasoning, I believe.
It is:
- tractable
- observable
- extremely information dense
- becoming faster
I think the constraint is we just don’t have enough people in this area doing this- meaning collecting data, representing it mathematically, then extracting patterns from it. The engineering interface is lacking too- people building the sensors + technologies that will capture this information to represent mathematically.
I am probably harping on the same theme- you can find similar sentiments from my older posts from last year. I think, subconsciously, this is what I am most focused on. Will be interesting to see if this manifests into something new, as it will be something you, as the reader, and I, as the “experiencer", discover at the same time.
Going back to the title of this- Sid’s Osteosarcoma data is what I keep returning to. I think this dataset will become very important for this new industry.