AI
Harnessing AI for alternative protein development: promising advances with realistic challenges
Dean Sherry
Senior Scientist, Bioinformatics, De Novo Foodlabs, Cape Town, South Africa
KEYWORDS
Alternative proteins
Artificial intelligence
Strain engineering
Precision fermentation
Technological challenges
Abstract
This article explores the potential of artificial intelligence in revolutionising alternative protein development, highlighting both the opportunities and challenges within the rapidly evolving foodtech landscape. Artificial intelligence offers unprecedented capabilities to accelerate protein discovery, streamline strain development, and enhance precision fermentation. However, significant hurdles remain, such as data scarcity, lack of standardisation, regulatory complexities, and the high energy consumption of AI models. We examine how interpretability, transparency, and data-sharing frameworks can help address these challenges, drawing parallels from industries with established benchmarks and collaborative platforms. The goal of this piece is to provide a holistic view of how AI can drive innovation while ensuring sustainability, safety, and industry-wide growth. By fostering collaboration and investing in efficient, ethical AI practices, the foodtech industry can realise the full potential of AI-enhanced alternative proteins.
Introduction
The global food industry is undergoing a continuous shift towards the utilisation of alternative proteins driven by the urgent need to address environmental sustainability, ethical concerns, and the growing demand for nutritious food to feed an increasing population. However, developing alternative proteins without relying on animals, through precision fermentation, presents its own significant challenges. Achieving the desired nutritional profiles, scaling up production, consumer acceptance, and ensuring consistency across batches are just some of the major hurdles that the industry faces. As such, conventional approaches alone may not be sufficient to meet the demands of this rapidly evolving market.
Recent advances in artificial intelligence (AI), particularly in machine learning (ML), have already provided revolutionary developments in several sectors including medical imaging and diagnostics, precision agriculture, drug discovery and protein biochemistry. Similarly, AI holds tremendous potential for revolutionising alternative protein development and precision fermentation by enabling the rapid analysis of vast datasets, optimising processes, and even predicting outcomes that would be difficult or impossible to achieve through traditional methods. The potential for AI to accelerate innovation, reduce costs, and create sustainable, high-quality products is encouraging. While the potential of AI is undeniable it is important, nonetheless, to temper our expectations with a dose of realism. In its current state, AI is not a panacea that will solve all of the challenges in alternative protein development overnight. Its successful application requires careful consideration, expertise, and, perhaps, concerted industry-wide cooperation.
An introduction to AI in alternative protein production
Ingredient Discovery
AI is playing an increasingly vital role in the discovery of novel protein sources. Traditional methods of protein discovery often involve labour-intensive and time-consuming experimental processes. AI holds the potential to accelerate the discovery process to predict which proteins might possess desirable nutritional properties or techno-functionality. Alongside traditional computational techniques, predictive modelling can narrow down a potentially infinite number of molecules into a reasonably sized subset of promising candidates. Indeed, ML models have been used to predict protein folding (1), functionality (2), digestibility (3), and potential allergenicity (4).
High Throughput Screening of Robust Strains
The successful development of alternative proteins relies heavily on the selection of robust microbial strains which are capable of efficiently producing high-quality proteins under various environmental conditions. Traditional methods of strain screening involve intensive experimental methods to test thousands of strains to identify those with the desired traits. Moreover, it can be significantly challenging to predict which strains will fail in large-scale production environments. The power of ML models comes from the ability to predict the performance of different strains based on their genetic and phenotypic profiles using large volumes of existing datasets (Figure 1). Predictive modelling can significantly whittle down the search space of viable genetic variants from a seemingly infinite number to a manageable subset of only the most promising candidates that can be validated through targeted experiments.
Figure 1. AI-guided microbial strain screening reduces the number of potentially useful strains from tens of thousands to less than 100. Models trained on genetic, proteomic, metabolic, phenotypic, experimental and environmental data filter the strains based on predicted criteria including stress resistance, yield, and growth rate.
Artificial metabolic networks offer a novel solution to the challenge of predicting and optimising microbial phenotypes. Such models have demonstrated superior performance in predicting growth rates and gene knockout effects in species such as E. coli and P. putida (5). In addition, large language models (LLMs), are now being increasingly adapted to biological tasks. LLMs process and interpret vast amounts of text data relating to protein-protein interaction networks, gene expression profiles, and metabolic pathways (6,7). By integrating these diverse data sources, LLMs can enable high-throughput screening of strains by predicting which genetic modifications or culture conditions are most likely to yield robust, high-performing strains. Similarly, graph neural networks (GNNs) can interpret protein-protein interaction networks or metabolic pathways, providing insights into how changes in one part of the system might affect overall strain robustness (8).
Leveraging AI in Precision Fermentation
ML models are increasingly being applied to precision fermentation to optimise several aspects, from monitoring and control to scaling up production, with the goal of significantly improving efficiency, yield, and consistency. The integration of AI and digital twins has recently led to renewed interest in use data-driven modelling to support bio-process design. Digital twins are virtual replicas of physical fermentation processes that use real-time data to simulate and optimise conditions in a controlled environment. Using digital twins, fermentation processes can be continuously modelled, monitored, and improved in real time to provide recommendations or automatically adjust parameters to maintain optimal fermentation conditions (9,10). Moreover, ML models can also detect anomalies early, preventing batch failures by predicting when a process might deviate from the ideal pathway. This proactive approach enhances process consistency, reduces waste, and improves overall production efficiency. Precision fermentation is positioned well as an early adopter of ML methods for precision fermentation by leveraging sequencing data, high-throughput library screening, and multi-omics data to optimise metabolic pathways, strains, yields and scale-up. Scaling precision fermentation from lab to industrial scale is often challenging. Here, AI-guided process scaling can simulate scale-up conditions and anticipate issues by predicting microorganism responses to changes in bioreactor size and environment, thereby reducing the risk of failure (11).
Challenges in leveraging AI for alternative protein development
Despite its transformative potential, the application of AI in alternative protein development is accompanied by several significant challenges that must be addressed to fully realise its potential. While AI and ML have generated significant hype, the current state of the technology presents notable limitations. As the industry continues to evolve, it is essential to temper expectations and approach AI integration with a measured, cautious strategy that acknowledges both its opportunities and its present constraints.
Disparate and Inconsistent Data
One notable issue lies in the quality and availability of appropriate data. AI models depend on large, diverse datasets that are often difficult to obtain, prone to biases, or inconsistently formatted. For example, the data used in AI models for strain engineering and alternative protein development can come from multiple sources including academic publications, public and proprietary databases, and experimental results. The quality of data can also vary significantly; high-quality, experimentally validated data may be mixed with less reliable, predicted, or inferred data. These data related challenges can lead to issues such as model biases and even hallucinations (confabulations) (12), where models generate outputs that are plausible but incorrect. These errors often occur because of the model’s attempt to provide coherent outputs based on incomplete or imprecise data which can make it difficult to derive meaningful, reliable insights from these models. Data sharing may provide an avenue to overcome such limitations, however, many datasets are locked behind institutional or corporate firewalls, restricting access and preventing the broader scientific community from leveraging this information. Even when data is shared, it often comes with restrictions, such as usage rights or licensing agreements, that limit its applicability. The lack of open, shared datasets hinders the development of more robust and generalisable models.
Environmental Concerns
As the capabilities of AI continue to advance, the growing concern about the power needed for data storage and computational processes becomes increasingly pronounced. The demand for processing large-scale datasets and training complex models requires substantial computational resources, which in turn drives up energy consumption (13). Somewhat ironically, this escalating energy requirement has significant implications for the sustainability and environmental impact of AI applications. The environmental footprint of these technologies is not only a matter of direct energy use but also includes the carbon emissions associated with the production of computing hardware. As AI models become more sophisticated and the volume of data increases, the challenge of balancing technological progress with environmental responsibility increases.
Model interpretability
A common criticism of deep learning models is their lack of interpretability. Even when models produce accurate predictions, understanding how these results were derived can be difficult, making it challenging to validate and trust AI-driven insights. Model interpretability is not only essential for trust and validation, but also for fostering collaboration between AI practitioners and domain experts. When models are transparent and explainable, this clarity enables experts to refine models based on domain knowledge, while AI-driven insights can guide the design of experiments, leading to faster iteration cycles. Suppose an AI model is developed to predict whether a novel protein will be allergenic, however, the model itself is a "black box," meaning that we cannot explain why it reached that conclusion. Later during development and commercialisation the protein causes allergic reactions in a small subset of the population, and investigations reveal that the model overlooked certain molecular structures that are rare but highly allergenic. When the reasons behind a model's predictions aren’t interpretable, such oversights can go unnoticed, leading to health risks for consumers and a costly product recall. In a field like foodtech, where safety, functionality, and nutritional value are paramount, the ability to explain and trust model predictions is crucial for ensuring that AI-assisted discoveries align with regulatory standards and consumer expectations.
Regulation and Safety Testing
As the alternative protein industry evolves, ensuring regulation and safety testing is critical. While AI accelerates the discovery of novel protein sources, it introduces challenges in ensuring these products are safe and compliant with standards from agencies like the FDA and EFSA. AI can help companies navigate these regulatory landscapes by predicting safety issues, such as allergenic or toxicological risks, or adverse reactions early in the development process. Regulatory bodies may need to establish new guidelines for the ethical use of AI in food development, which could involve setting standards for data privacy, algorithmic transparency, and the responsible use of AI in making decisions that impact food safety and quality. These challenges will also rely heavily on the aforementioned interpretability of the models used in foodtech applications.
Rethinking AI and alternative protein innovation in foodtech
The challenges facing AI underscore the need for a balanced approach that recognises both the potential and the limitations in this rapidly evolving field. As models become more sophisticated and datasets more comprehensive, the precision and efficiency of alternative protein development will continue to improve. However, while AI holds immense promise, its successful application requires a balance between innovation and practicality. Data privacy and scarcity are major barriers to AI advancements in foodtech. One potential solution is the concept of federated learning which is gaining traction in the healthcare, energy, and pharmaceutical sectors. Federated learning, or collaborative learning, is an approach that enables AI models to be built using data from several organisations, but without needing to directly share sensitive or proprietary data (14). As a result, over several training iterations the models built using federated learning get exposed to a significantly wider range of data than what any single organisation possesses in-house.
Data standardisation and benchmarking are critical for advancing AI-driven innovation in foodtech. However, these concepts remain underdeveloped in this industry. In fields like structural biology, initiatives such as CASP (15) provide a standardised framework for evaluating model accuracy. Similarly, platforms like Kaggle (16) have fostered data challenges across various industries, promoting collaboration and benchmarking. For foodtech, developing similar concepts would drive AI innovation by offering a competitive environment for solving industry-specific challenges, promoting transparency and accelerating progress. Moreover, developing explainable AI systems through industry-wide research could address the challenges of regulatory compliance and model interpretability, making it easier for regulatory agencies to understand, trust, and approve AI-driven processes. To address the environmental impact of AI’s high energy consumption, research into so-called “Green AI” (17) could promote the development of energy-efficient algorithms and computational infrastructures. Indeed, such initiatives will be accelerated by industry-wide collaboration and innovation, as collective efforts to develop eco-friendly AI tools are critical. Collaborative platforms and partnerships can drive the adoption of sustainable practices, fostering an ecosystem where AI's environmental footprint is minimised while its benefits are maximised.
The adoption of AI in the alternative protein and foodtech industries is still in its early stages, but it holds immense promise for driving rapid and transformative advancements. By fostering collaboration and promoting data sharing the industry can unlock the full potential of AI-driven innovation. If companies, researchers, and stakeholders pool their resources and collective expertise, we can accelerate progress while ensuring that AI-enhanced alternative protein development is sustainable, safe, and widely accepted. Such a concerted effort will be key to revolutionising food production and meeting the growing global demand for sustainable nutrition.
References and notes
- Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Aug;596(7873):583–9.
- Bonetta R, Valentino G. Machine learning techniques for protein function prediction. Proteins: Structure, Function, and Bioinformatics. 2020;88(3):397–413.
- Malvar S, Bhagavathula A, Balaguer MA de L, Sharma S, Chandra R. Machine learning can guide experimental approaches for protein digestibility estimations [Internet]. arXiv; 2022 [cited 2024 Sep 3]. Available from: http://arxiv.org/abs/2211.00625
- Wang L, Niu D, Zhao X, Wang X, Hao M, Che H. A Comparative Analysis of Novel Deep Learning and Ensemble Learning Models to Predict the Allergenicity of Food Proteins. Foods. 2021 Apr;10(4):809.
- Faure L, Mollet B, Liebermeister W, Faulon JL. A neural-mechanistic hybrid approach improving the predictive power of genome-scale metabolic models. Nat Commun. 2023 Aug 3;14(1):4669.
- Liu J, Yang M, Yu Y, Xu H, Li K, Zhou X. Large language models in bioinformatics: applications and perspectives [Internet]. arXiv; 2024 [cited 2024 Aug 31]. Available from: http://arxiv.org/abs/2401.04155
- Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020 Feb 15;36(4):1234–40.
- Hasibi R, Michoel T, Oyarzún DA. Integration of graph neural networks and genome-scale metabolic models for predicting gene essentiality. npj Syst Biol Appl. 2024 Mar 6;10(1):1–10.
- Ting TY, Li Y, Bunawan H, Ramzi AB, Goh HH. Current advancements in systems and synthetic biology studies of Saccharomyces cerevisiae. Journal of Bioscience and Bioengineering. 2023 Apr 1;135(4):259–65.
- Patra P, B.r. D, Kundu P, Das M, Ghosh A. Recent advances in machine learning applications in metabolic engineering. Biotechnology Advances. 2023 Jan 1;62:108069.
- Du YH, Wang MY, Yang LH, Tong LL, Guo DS, Ji XJ. Optimization and Scale-Up of Fermentation Processes Driven by Models. Bioengineering. 2022 Sep;9(9):473.
- Lin Z, Guan S, Zhang W, Zhang H, Li Y, Zhang H. Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models. Artif Intell Rev. 2024 Aug 10;57(9):243.
- Chow AR. TIME. 2024 [cited 2024 Sep 11]. How AI Is Fueling a Boom in Data Centers and Energy Demand. Available from: https://time.com/6987773/ai-data-centers-energy-usage-climate-change/
- Rieke N. NVIDIA Blog. 2019 [cited 2024 Sep 11]. What Is Federated Learning? Available from: https://blogs.nvidia.com/blog/what-is-federated-learning/
- Moult J, Pedersen JT, Judson R, Fidelis K. A large-scale experiment to assess protein structure prediction methods. Proteins: Structure, Function, and Bioinformatics. 1995;23(3):ii–iv.
- Kaggle: Your Machine Learning and Data Science Community [Internet]. [cited 2024 Sep 11]. Available from: https://www.kaggle.com/
- Bolón-Canedo V, Morán-Fernández L, Cancela B, Alonso-Betanzos A. A review of green artificial intelligence: Towards a more sustainable future. Neurocomputing. 2024 Sep 28;599:128096.