In terms of the number of approved drugs and the chemical diversity these molecules represent, however, a look at drugs that have made it into the clinic reveals that the pharmaceutical industry has only explored a fraction of that universe. Small molecular changes in activity and toxicity can make a big difference as medicinal chemists tweak compounds along the way to a final drug. However, scientists must first determine which direction they wish to pursue. “You have to start with something in the drug design process,” says Matthias Rarey of the University of Hamburg. Rarey is a cheminformatics, which means he describes molecules computationally. He claims that drug hunters do not want to waste time and money developing the wrong compounds from the start.

Many diseases are caused by malfunctioning proteins, and drugs frequently target these to treat the disease by modifying the protein or how it works. The drug hunters’ job is to find the right molecule to do that. To begin, they must identify a “hit” upon which to build and improve through rounds of experimentation.

High-throughput screening, which relies on arrays of small quantities of compounds, usually stored in organic solvents, that are tested against a target, has traditionally been one way drug hunters have looked for a starting point. Wendy Warr of Wendy Warr & Associates, a pharmaceutical consultant, recalls how, in the mid-1990s, chemists were solely focused on creating or acquiring more and more compounds to feed into these assays.

“And then they realized,” she says, “you don’t just make everything.” “You have to use some common sense.” You must create a library of diverse, drug-like compounds.”

However, the number of screening compounds available is now increasing and becoming larger than ever, thanks to both physical and virtual libraries. Companies such as Ukraine’s Enamine and OTAVA chemicals, as well as China’s WuXi AppTec, provide catalogues containing billions of synthetically available compounds. The in-house virtual libraries, or spaces, owned by large pharmaceutical companies are even larger. Merck KGaA, for example, has a virtual space of 1020 molecules called the Merck Accessible Inventory (MASSIV). That is comparable to the number of stars in the universe.

These libraries effectively store compounds that labs can synthesize from building-block molecules on hand. When chemists place an order for these compounds, they are not looking through a catalogue but searching computationally, and the compounds that are returned are created dynamically from data about the constituent building blocks and the reactions they can undergo. To navigate this growing complexity, researchers must make wise decisions about how they design new compounds, construct those compounds, and combine screening methods. They can’t screen everything, so they must focus their efforts where they believe they will have the most success. As a result, the importance of computation has grown. “I think we really see a shift now because of these large, make-on-demand compound catalogs,” Rarey says. The number of possibilities is just too large, he says. “So there is always a computational element in early-phase drug discovery now, and I think this will remain also in the future.”


Since the early 1990s, Enamine, for example, has been supplying screening compounds to drug hunters. However, what began as a library of a few thousand physical compounds from which customers could order has now expanded to a catalogue of 23 billion possibilities. The company doesn’t have them all, but it has the pieces and expertise to build what customers want.

According to Yurii Moroz of screening-chemical supplier Chemspace, Chemspace customers ordered over 200,000 Enamine compounds that did not physically exist before they were ordered in 2021. Only because chemical manufacturers encoded both their building blocks and experimental data on reactions in computer-accessible form have these libraries of make-on-demand compounds become possible. The information in these libraries has also enabled growth in fields such as automated synthesis planning and screening.

“For initial hit discovery, you want to make compounds financially accessible,” Moroz says. Large-scale high-throughput screening for early-stage drug discovery can cost up to $1 million. He cites this as one reason for lowering the cost of screening chemicals.

Snapping compounds together on demand can help cut costs, lowering the price per compound to $100-$150 rather than $1,000. Following that initial hit, chemists can confidently return to the building blocks to expand from the initial molecule and design more compounds in new areas of chemical space.

Rarey compares these virtual catalogues to an ice cream shop. If you have 10 different ice cream flavors and 10 different toppings, you can quickly make a variety of ice cream sundaes. The ice cream shop does not keep premade sundaes on hand. It keeps the components separate until an order is received, at which point the various scoops and components are assembled.

Where the analogy falls apart is that in an ice cream parlor, you can choose to combine every scoop and topping, even if they don’t taste good together. Not all building blocks will react to form new compounds, though, and not all compounds will make sense for a particular drug target. The trick is to develop ways to ensure the potential chemical space reflects the chemical reality. For example, scientists can encode reaction data so that chemists look in the right place.

While these database approaches have expanded the library of compounds that customers can order, other virtual tools can work as a first pass for predicting how molecules could dock or bind to a target protein, allowing researchers to focus their first wet-chemistry screens only on molecules that they think have a good chance of working on the target.

“Those are the programs that are increasing,” says Petro Borysko, Enamine’s director of biology, who oversees the screening experiments that the firm runs on behalf of clients. He says he’s seen a change in high-throughput screens toward more “pointed actions” that are “usually the results of some kind of virtual screen.”


Key to the development of these virtual libraries is the drug-hunting approach called fragment-based drug discovery, which Rarey says “paved the way to where we are now.”

In fragment-based drug design, researchers test much smaller chemical groups against a target than in high-throughput screening. Once they find one that binds, they build a more complicated chemical scaffold around the initial binding fragment with the goal of filling in the binding site and engineering a strong bond to the target.

The initial fragment needs a moiety that will bind to the protein, but it’s best to keep the initial fragment small so you can expand the fragment in as many ways as possible, according to Gianni Chessari, head of chemistry at Astex Pharmaceuticals, a fragment-based screening specialist. Starting from weakly binding first fragments helps drug hunters anchor at a starting point for exploring chemical space.

Another efficient way that researchers have explored the vastness of chemical space is with DNA-encoded libraries (DELs). These libraries are made up of screening molecules attached to unique DNA sequences that identify them. The approach allows all the screening molecules to be mixed in a single tube along with the target protein. DELs have traditionally focused on large, flexible molecules like peptides, which have more DNA-compatible chemistry.

At the DEL firm X-Chem, Ying Zhang, vice president of chemistry, has been assembling a library of smaller molecules by accumulating as many building blocks as she can but running only two or three coupling reactions between them. The result has significantly broadened the chemical space explored by the firm’s DELs (Bioorg. Med. Chem. 2021, DOI: 10.1016/j.bmc.2021.116189). Today, X-Chem has billions of compounds in multiple targeted libraries.

But with DELs too, researchers have to be smart to narrow the vast space they cover. They are being strategic about which compounds to put in the tube to begin with, often using machine learning to help guide their decisions (J. Med Chem. 2020, DOI: 10.1021/acs.jmedchem.0c00452). Combining screening technologies is also increasingly common. A firm may begin with a hit from a DEL, for instance, and then use a fragment-based search—perhaps one guided by computation—to refine the hunt.


BioSolveIT is a company cofounded by Rarey that offers multiple pieces of software to help chemists navigate these new chemical spaces. Director of Application Science Marcus Gastreich says it would be a mistake to think that this computational expansion of chemical space is due to only an increase in computing power. The work chemists have done to encode chemistry in computers has also massively enabled the field.

In fact, Gastreich says, just as chemists would take too long to synthesize every potential compound, it would require far too much computing power to compute every possible compound from all the building blocks available.

Instead, he says, recently developed algorithmic technologies based on the field of computational chemistry called cheminformatics are needed to search the huge chemical spaces without huge computational cost. These tools can quickly search the building blocks and compounds that are described using strings of characters or databases. For example, he says, if you have a small fragment sitting in a protein pocket, computers can quickly search for other building blocks that could be added or related fragments that might also fit. And that search can help a creative chemist trying to solve a problem.

Because chemists have built ways to encode the properties of these building blocks and how they can react, those well-validated data can also feed into artificial intelligence and machine learning (AI/ML) applications. The drug industry is investing heavily in this new breed of computer-powered applications, which can offer services such as chemical design, synthesis planning, and automated synthesis. AI/ML services may also help chemists navigate the ever-growing constellation of available compounds.

“The main driving force here is the convergence of what we call traditional computational chemistry and AI/ML,” says Ashwini Ghogare, head of AI-enabled drug discovery at Millipore Sigma, part of the life sciences business of Merck KGaA. “This combination is the winning formula.”

Ghogare and her team have been part of that investment at Millipore Sigma. For example, they have been building AI-powered drug discovery software that she says will allow users to search ultrarare chemical space and then design new screening compounds. Most large drug companies are trying to build these sorts of systems internally, she says. The new effort aims to make the same tools available to small and midsize companies as well.