Emerging new artificial intelligence tools such as large language models (LLMs) have already and will continue to impact how we build new companies at Ferment. AlphaFold 2 brought the use of LLMs into the common toolkit of molecular biology with its convincing answer to the protein folding challenge in 2020. Since then, we have seen the expansion of new LLMs & computational tools to address many of the common challenges of metabolic engineering & synthetic biology product development workflows, spanning from enzyme engineering, to expanding the aperture of metabolomics readouts (Enveda Biosciences), to better understanding genetic regulation (DeepMind). Just last week, Ginkgo Bioworks announced the largest AI-focused commitment to-date of an industrial biotech platform via its new partnership with Google Cloud.
As company-builders and product-developers building on top of biotech platforms, we are energized by the development of new LLMs for synthetic biology applications. We know the story of the impact of these new tools on industrial biotech is still being written, but believe we can start to forecast - and perhaps modestly influence - how these tools will affect our field.
At Ferment, we believe that the consumer and industrial biotech companies built today should look different from ones created in the past, as a result of the existence of powerful ‘open access’ synthetic biology platforms. As exemplified by the six companies Ferment has built to-date, we believe today’s industrial biotechnology product companies should embrace an organizational structure based on the ‘specialization of labor’. Product companies can focus on becoming the best application developers of biotechnology in their end-industry, while strategically outsourcing the biotechnology and cell engineering to best-in-class partner technology platforms with scale.
Ferment pays close attention to the latest evolutions and new capabilities of synbio platform technology companies in order to maximally leverage those capabilities for our existing portfolio companies, and also in the companies that we’re currently ideating and building.
This familiarity with our partner platforms allows us to quickly incorporate new elements of their technology stacks into Ferment company ideation, incubation, and building processes, such as: using Ginkgo Bioworks’s high-throughput sequencing & metagenomics bioinformatics pipelines developed during COVID for massive scale discovery & characterization of new microbiome products by companies like Verb and BiomEdit; leveraging novel protein and lipid production chassis via their Dutch DNA and Novogy acquisitions across our companies; and incorporating cutting-edge automated adaptive evolution methods to evolve new probiotics for optimal efficacy or stability.
For this reason, we are thrilled to see Ginkgo’s announcement this past week introducing its collaboration with Google Cloud to double-down on its development and use of new AI and computational tools. Ginkgo has amassed a tremendous amount of biological data over its history, and the new partnership with Google Cloud underpins its commitment to growing the value of that data repository via development of new LLMs and data querying tools.
While the application of LLMs has become relatively routine in certain workflows, such as protein engineering for enzyme optimization, these tools are now permeating each unit step of the ‘Design-Build-Test-Learn’ cycle of synthetic biology. For these workflows, the combination of Foundry-scale wet lab data generation and task-specific machine learning models have been powerful.
We are enormously excited about the potential value to be created from the combination of Foundry-scale data with world-class foundation models. We optimistically anticipate (as do many in our field) that LLMs trained on the right datasets will be sufficiently powerful to model cellular states with accuracy within the next several years. The chief barrier to that future’s materialization will be producing quantities of rich data for training, and scaled Codebase repositories like those Ginkgo and others have amassed are the natural leverage points. We see these platforms and their emerging toolkits of synthetic biology-focused LLMs as enabling across three vectors for company creation at Ferment:
Over time, tools such as these will make possible synthetic biology product development programs that look similar in objective to those undertaken today, but with far greater “efficiency”. This potential reduction in cost and turnaround time will be driven by the ability to remove even more manual workflows, to design-build-test smaller, smarter libraries (e.g., of enzymes or microbial hosts), and to reduce the number of ‘iteration cycles’ needed to achieve the desired results. While we don’t foresee a day in the near-future where the hard work of cell programming can be done entirely in silico, we do see the potential for new computational tools to reduce the manual input and library sizes needed for a given project. Human labor and synthetic DNA libraries are two of the leading line-items on any metabolic engineering project, so program budgets should continue to fall as a result.
We expect this to be true across a wide range of project types that Ferment companies have employed, ranging from novel enzyme discovery, to full metabolic pathway engineering, to microbial strain or bioactive metabolite discovery for health and nutrition applications. In time, this will enable our companies to take on even more ambitious R&D targets as the required cost and time inputs fall and the chance of success rises, enabling greater ROI for a given product development effort.
Thinking beyond molecular biology, we also see potential for this marriage between application-specific wet lab datasets and foundation models to enable wholly new types of product development loops. We have written before about the ‘Molecule Selection Gap’ in industrial biotechnology, where an inability to functionally characterize bio-molecules constrains the discovery and commercialization of new bio-based ingredients and products. This bottleneck arises from the inability to synthesize and assess bio-molecules for functional traits at a large enough scale and with ‘real-world’ predictive power; for instance, consider these needs and use cases:
Stringing together foundation models across fields such as metabolic engineering and end-market application science fields (e.g., food science, material science, nutrition) will support better design and selection of novel, fit-for-purpose molecules. Next-generation models can enable greater predictive power of a bio-molecule’s likely functionality and performance, enabling our companies to identify new bio-molecules with applications in materials, food, and personal care.
We note that an interesting question arises as a result: Who will do this work? Will these application-specific models be developed by new, end-market focused, product companies or are they a natural extension of the existing biology technology platforms? Who is best equipped to build out the wet lab platforms & functional datasets to fuel these models? Companies like Citrine (materials and polymers) and Shiru (plant protein function) provide interesting templates here.
When building our companies, we are highly focused on smart R&D program construction and the de-risking value of every dollar spent. This includes a robust feasibility study for a given program and granular estimates of the costs and timelines required to achieve the desired result (whether that result be a high titer on a production strain, an efficiency-stability spec on an enzyme, or an efficacy readout on a bioactive). The ability to better utilize historical metabolic engineering datasets to better forecast anticipated costs, timelines, and likely ultimate outcomes will be a boon for company creators and product developers looking to bring new biomanufactured products to the market. The potential for generative models to predict best-fit metabolic pathways and expected strain performance parameters, similar to a supercharged version of the Lila program developed by Amyris, will be a boon for aspiring industrial biotech product developers.
We are excited for how new foundation models & computational techniques will both advance core synthetic biology and metabolic engineering workflows, as well as improve our ability to evaluate and design new products based on the full palette of accessible bio-molecules. The above are a few of our initial thoughts that we’re certain will need revision and evolution as LLMs are deployed into industrial biotechnology contexts in the years to come.
If you are a founder, industry executive, investor or corporate partner wondering how the new tools of synthetic biology & foundation models will impact your industry, please get in touch at www.ferment.co/engage.