@phdthesis{82fb9928f06d4cf9959a2a339d9c83e5, title = "AI in antibody design", abstract = "Proteins are essential molecules for life, performing virtually all biological processes in our cells. Understanding their function is key to many scientific advances, including the development of new protein drugs, such as antibodies used in immunotherapy. Recent developments in artificial intelligence have significantly enhanced our ability to synthetically design new proteins. This thesis develops three new deep learning tools in three areas of protein design: predicting protein secondary structures, identifying antibody binding sites (epitopes), and designing novel antibodies. These tools are freely available online, making them accessible to researchers worldwide. The thesis first describes the crucial role of antibodies in our immune system, the basis for their highly specific binding, and how their binding sites may be predicted. Next, I describe how deep learning models can train on large, unlabelled datasets of biological sequences or structures, learning highly informative mathematical representations called features. Using such features has led to drastic improvements in AI models predicting protein 3D structures, the effects of genetic mutations, and designing neverbefore-seen proteins.The first developed tool in this thesis, NetSurfP-3.0, uses such features to rapidly speed up the way we predict protein secondary structures. Traditionally, this process relied on screening very large sequence databases for relevant information. Instead, NetSurfP-3.0 extracts this information using features from the ESM-1b protein language model. This change results in a 700-fold increase in speed without sacrificing accuracy. The tool can accurately predict various protein properties, such as its secondary structure, backbone angles, surface-accessible, and disordered regions. NetSurfP-3.0 is freely available at: https://services.healthtech.dtu.dk/services/NetSurfP-3.0/The second developed tool, DiscoTope-3.0, focuses on predicting epitopes, which are the parts of proteins that antibodies recognize and bind to. This tool extracts features from ESM-IF1, a deep learning model pre-trained on protein structures, achieving significant improvement in the prediction of antibody binding sites. Unlike many previous tools, DiscoTope-3.0 remains highly effective for both experimental and predicted protein structures. DiscoTope-3.0 is freely available at: https://services.healthtech.dtu.dk/services/DiscoTope-3.0/The third developed tool, AntiFold, improves the process of designing antibodies. AntiFold fine-tunes the ESM-IF1 model to work particularly well on antibody structures. The tool improves accuracy in designing antibody binding regions while maintaining their 3D structure and suggesting mutations with high binding affinity. It can also be used to design nanobody and antibody-antigen structures, commonly used in therapeutic antibody design.AntiFold is freely available at: https://opig.stats.ox.ac.uk/webapps/antifold/Lastly, the thesis discusses the growing commercialization of AI models in drug discovery. It proposes a protocol applying AntiFold for designing less immunogenic antibodies, a common problem where the immune system rejects the therapeutic drug. It also suggests how to closely mimic the setup of AntiFold to create a T cell-based design tool and improvements to several of the opensource tools developed in the thesis.The open-sourcing of these advancements could represent a significant leap forward in antibody design, providing powerful, accessible tools for researchers and opening new possibilities in medicine and biotechnology.", author = "H{\o}ie, {Magnus Haraldson}", year = "2024", language = "English", publisher = "DTU Health Technology", }