Richard Goldman: [00:00:05] This session’s going to be a fireside chat, basically a discussion between myself and Stefano Pacifico. Stefano is the co-founder and CEO of Epistemic. And I'm going to describe it because I saw a line in their website that I thought just captured it. It's a knowledge discovery platform that leverages artificial intelligence and natural language processing to reveal the hidden connections in biomedical knowledge. I'm really excited about this discussion, because I think it's important to show different perspectives on how to address challenges. And a lot has been done in the health care industry. And we'll see why I think it's relevant. There's a lot of similarities. So, thanks again. Stefano. Maybe you can start off with some context on how you came about this effort and your background.
Stefano Pacifico: [00:00:55] Thank you. Thank you. First, thanks, Rich for inviting me here. Thanks for LSEG, you know, for sponsoring this event and organising it. I think that there's got to be some disclaimers. First of all, I'm Italian, so you might see me gesticulating and that will not transfer well on the on the podcast, I believe. But bear with me. The second is I'm going to have some opinions. And, you know, I'm known to be a little bit opinionated about technology, so I hope I will not poke too many, too many bears tonight. Before I even, you know, talk how I got here. I just want to make a comment about the previous panel. It was very interesting, but I definitely want to share for a second the fact that we keep looking at LLMs and what's happening with generative AI with a healthy dose of scepticism, but I think that scepticism is kind of precluding us to actually realising the tremendous revolution that is happening. I've been spending my entire career to solve problems that were unsolved and that then have been obliterated by these technologies. So, I mean, I think that there is a revolution happening. I think that the cost reduction in pricing is happening. So, everything is great. No. In fact, we'll talk about, you know, what is not great, what is not working and why. And I think that that will also help understand how we can use these technologies better and how that can be done in the life sciences as well as in the world of finance. As for my background, I've been involved in artificial intelligence applied research for the past 20 years. I've seen the good, the bad, and the ugly of it. And I worked for many years at Bloomberg LP, out of grad school, and started really getting interested in how do you get information of many different types to come together and be used intelligently by the consumers of this information? After that, I've done a lot of other stuff you can look at, look it up on LinkedIn. It's not very interesting to talk about it, But I then met David Hager, who is my co-founder. He's a renowned neuroscientist at NYU, and together, what we have figured out is that in the world of the life sciences, much like the world of finance, there are thousands of sources of biomedical knowledge, most of which are unknown to most people, most of which are disconnected from one another and for which, different from finance, there was no technology made available to its users to consume it as a whole. Like all the financial analytics companies instead have made available in finance throughout the years. So that's me -in a nutshell. And it's a pleasure to be here tonight. Maybe I can talk a little bit about Epistemic AI. Epistemic AI is a Start-Up company. So, we're a group of 15 people very motivated. We punch above our weight, and we sell in the life sciences and biotech industry. We work with organisations that range from Fortune 500 companies to small Start-Ups, government agencies, and we help them really understand how not to miss important information when developing a drug. And I want to share a couple of pieces of information that may not be immediate to people working in finance. To develop a drug. it takes about ten years. It takes an average cost of $2.5 billion, and 90% of the times it doesn't happen, it fails. So that's a tremendous amount of inefficiency. Part of this inefficiency is due to a lot of different things that have to do with regulation and social aspect of life. You will be surprised the amount of times drugs fail because they didn't take into account information that they already had. And so, this is something that I guess we'll, we'll dissect a lot more and talk about. But certainly, I think can speak to the, to the world of finance. So great to be here. Thank you, Rich.
Richard Goldman: [00:05:10] You're welcome. Let's start with the similarities. What are the similarities to using LLMs and the challenges in healthcare that we're experiencing in finance?
Stefano Pacifico: [00:05:21] Yeah. So, let's first start what's similar between finance and life sciences. And that's not you know, it's kind of weird, right. Well, first of all there's a lot of money at play. The top 20 companies in in in life sciences sell half $1 trillion of products every year. Okay. So that's like, you know, lots of money and a lot of money are in actually invisible in that market because they're spent in R&D, and you don't see it, often fails or it's subsidised by the government. So, there's a lot of money at play. And like in finance, it's a very complex decision making process. This entire life cycle, you know, you need to come up with an idea of what could be the drug that goes into what protein that affects, what system of biology that interacts with what disease, and then all sorts of different things, including clinical trials, regulatory patents, etc.. Well, in the world of finance, it's very similar. You know, when you start thinking about, well, what's the price of soybeans? Because I'm sure you all of you, you know, watched all the Christmas movies, right. Like, so, you know, price of soybeans and how do you determine that? How do you predict that? Well, okay, you can really go into the rabbit hole and never emerge. Right. And so that's very similar. It's a high stake. Big decisions. And they require very, very, very in-depth knowledge. And if you miss that knowledge, you can make a catastrophic mistake. And so in that way, they're very similar. The other way in which they're very similar is that that knowledge is humongous. There is a lot of data that gets generated every day in finance. There's a lot of data that gets generated in life science. So just to give you a metric, a few years ago and that's probably has increased thanks to ChatGPT, we had about 4,000 public research publications being published in, in biomedicine every day. So, these are like, you know, 15 to 50 pages plus supplementary data publications. And that's just academic publications, right? So just to give you a little bit of a sense of the scale. The last way in which they're similar is that they are both industry that are very susceptible to being helped by technology. Because of the scientific approach and kind of like quantitative approach that both disciplines embed into themselves. Right? So, in finance, you have quants, you have mathematicians, you have statisticians. Well, in life sciences you have chemists, obviously. But then you have biostatisticians you have, biochemists, you have physical chemists. And you have all sorts of scientists that are quantitative oriented. And so both industries, I think, look at technology to really be a differentiating factor and an advantage. So now how does that work. You know obviously LLMs, AI, everybody's talking about it today. And it's been a big important part of the life sciences. So I've been talking about it. Everybody was kind of Debbie Downer a little bit before, and I said, well, you guys should be so amazed, actually, about the potential for humanity in this technology has it is true. But let's think about it, what it is first and how it breaks down. So, what it is. For those of you who are not familiar with the technology, I'm going to do a very, very crude analogy, but it's like a magician. They have a hat. They have all the words of the, the universe in that hat. They can mix it, pull out a word, show it to you, put it back in, mix it again, pull out another word. But the magic is that, not that the hat contains all the words in the universe. The magic is that the words are pulled out in a way that kind of makes sense. And you're like, wow, that sounds really intelligent sometimes. And then some other times, actually it doesn't. And so, if you think about this for a moment, it kind of reveals some things. What it is not. Well, it's not a database. It's not a reference kind of tool. It's not a tool that embeds logic. It's a process of putting words one after the other. I'm obviously, you know, simplifying in many, many ways that would be horrendous for any participant to the artificial intelligence community. But that should give you an idea. And the reason I'm saying this is because when you think about that, now you can understand the ways in which it breaks down. Number one, it hallucinates. Everybody understands that, right? You know, it says who's the husband of who? And then you're like, whatever. They make up a name and or, you know, you ask how many digits are in nine one, one, you know, 2 or 1, whatever. It's the wrong answer. Two, it's actually the wrong one. So, they give incorrect information, but they sound very, confident. The second thing that they will do, and this is already less discussed by people, they'll give you incomplete information. So, I have a child who's a, you know, sort of cancer survivor. He had very rare cancer at age one. And I've been lucky enough to, you know, having seen in remission for the past four years. And we hope it stays that way. But that means I do a lot of clinical research on my own. I go and learn and educate myself. And so, if you go to ChatGPT and ask this question, what are all the drugs that target this protein? In my case, protein called A-Trust that regulates cell proliferation. ChatGPT will give you four, five drugs. Well guess what? There are 40 drugs and more than 600 chemicals that are known to interact with that protein, but ChatGPT has no idea of that gap.
Richard Goldman: [00:11:16] And your point is that inaccurate or incomplete information is just as bad as a wrong answer. In some cases.
Stefano Pacifico: [00:11:21] Well, in many scenarios where you need to be that sure about the answer? Absolutely. Like, how far are you from complete? This is epistemic knowledge that these models don't have. Another way in which they break down is that they cannot, as they are, really perform some kind of articulate reasoning. And this is demonstrable in two very simple ways. One, we've been talking about it. There's more kind of usability, like everybody talks about summarisation. But what does it even mean to summarise something? Well, if I ask Rich, Rich, what does it mean for you to summarise something. Well. Well, probably it depends. It depends on what it is. It depends on what you're doing. Depends on what your job is. So, when you ask the LLM like, can you summarise this? The LLM should come back and tell you, F- you. What do you want me to summarise? Right. But it doesn't. It doesn't come back to you. It doesn't tell you that. It just tells you something because it's not capable of having that kind of reasoning of saying, well, what do you? The second thing is you can see is like, well, if you want to answer some sort of logical, even easy, logical questions, they often break down, they give silly answers, and they're like, you know, there are three people sitting on the chair. How many people are sitting on a chair? And there are two people on the chair. You can see, because it says that there are three people sitting on the chair. Like what? So, they break down in ways that are not useful for rigorous analysis. And last but not least, they have no reference ability. You cannot know where that information is coming from. So, when you talk to ChatGPT, you have no idea why it's saying something. It may be true. It may be not true. In fact, I'll tell you something funny. We recently ran an experiment on a specific type of drugs, and ChatGPT mentioned the company that, when we looked at why ChatGPT explained was mentioned in that company, it didn't make any sense. It made no sense. Except that actually, we went and looked and did some research. There was some other information about it that actually fit our criteria that was not mentioned at all. Now, was it chance? Was there some actual probabilistic phenomenon by which it produced that text because of that other data that we didn't see? Who knows? Right. But that's a breakdown. A trust breakdown problem. And trust is very important in environments that are highly regulated.
Richard Goldman: [00:13:54] And so how do you deal with that with, health care where obviously the risks are very high. You want to have a high percentage?
Stefano Pacifico: [00:14:02] Well, yeah. And even when again, like in the life sciences, you're not directly dealing with a patient before approval. So let alone. Yeah. Doctor doing diagnosis. Right. But even before when it's more research oriented if you will. The problem is exactly that. And the way you deal with that is by understanding what technology you're using and understanding that the technology is insufficient. So, what we did at Epistemic AI for that is we built a proprietary architecture that we called with a lot of creativity, EpistemicGPT. And we made EpistemicGPT into an architecture that integrates many different large language models, many different machine learning models, many different artefacts of software and data engineering, and together in a multi-agent reasoning based sort of framework, they manage to put guardrails. They manage to steer the answers into helpful detailed referenceable, mostly complete for the mathematicians in the room, answers. And so ultimately, what it's required is a lot more sophistication for the more sophisticated users.
Richard Goldman: [00:15:22] Do you use RAG or retrieval augmented generation at all as part of that solution?
Stefano Pacifico: [00:15:28] I love that question. So, everybody is being swept away by RAG and it's a phenomenal tool again, like we're witnessing a revolution. Because now, mom and pop websites can put an actual useful bot in front of their website that can answer using their website pages and give reasonable answers. But when you're looking at 273 reference papers, a lot of this RAG, as it's commonly thought about, which for the people in the room is retrieval augmented generation. So basically, means I ask a question to the LLM, your software somehow produces a set of documents that may contain that answer. And then the LLM uses some magic to kind of use those documents as context or in other ways to provide an answer. So, it's great again, for daily uses that two years ago were unthinkable, but it still breaks down completely when you have a billion dollar program of research, and you need to make sure that you're not going to get fired by, by, you know, distractedly not considering a piece of evidence that was right there in front of you.
Richard Goldman: [00:16:42] Have you seen anything in applying LMS and Life sciences that surprised you? That may be applicable in finance.
Stefano Pacifico: [00:16:51] Transformer models are sequence models. So, everything that is a sequence, an ordered sequence of things is susceptible to being I'll say disrupted is silly, but like, you know, to benefit from this technology, right? So, in, in the world of life sciences one of the things that became surprising is, oh, wow, we can actually use the amino acids that compose a protein in a sequence to model proteins. And maybe we can hallucinate new proteins and wow, that's pretty cool. And then it kind of went on and then it's like, wow, maybe we can even hallucinate what are probable ways in which these proteins can mutate, or can be mutated such that something happens, for example, with Covid. I mean you can look at the spike protein and say, well, based on all the ways it's evolved in the past, now, what should we generate that may be the next mutation. But then there are, like many other cases in which you can look at sequence problems, for example, in the patient population. Well, now I have a patient. Their history is that they do this, they do that. They do that up and down. They get the blood work, they get their X-rays and blah, blah, blah. Well, that's a sequence. Well, now can you build sequences of this? They require, obviously there's a lot can be done. So, I think that going beyond text is absolutely a phenomenal application for pure large language models. But I think, again, the drum beating. And for anybody that didn't hear it yet is that large language models alone are insufficient for very complex domains like finance or the life sciences.
Richard Goldman: [00:18:44] Well, we're running out of time. Anything that we forgot to touch on?
Stefano Pacifico: [00:18:49] No. Maybe, you know, get a drink at the bar.
Richard Goldman: [00:18:52] Yeah! So, first of all, thank you.
Stefano Pacifico: No. Thank you.
Richard Goldman: Stefano, that was really interesting. Thank you, everybody, for joining. This will be part of our podcast series. It's sort of a live recorded event. The regular series is great. I strongly recommend everybody look at it. We actually have the moderator from that right here. So, you can talk to James McDonald at your leisure. So, thank you very much for joining us today. I hope this was useful.
Stefano Pacifico: Thank you.