January 16, 2024

Hidden Layers

Writing the Book on Generative AI with Numa Dhamani | EP.2

In episode 2 of Hidden Layers, Ron Green, interviews Principal Machine Learning Engineer, Numa Dhamani. Ron and Numa talk about Numa's new book "Introduction to Generative AI," the opportunities and risks associated with Gen AI, what it's like to be a woman in AI, and much more.

Resources: If you are a women in STEM, visit this link for help getting paired with a mentor: https://stemmuse.com/

Buy her book here: https://www.manning.com/books/introduction-to-generative-ai

Listen on Spotify
Listen on Apple Podcasts

‍

Ron Green: Welcome to hidden Layers, where we explore the people and the tech behind artificial intelligence. I'm your host, Ron Green, and I'm here with Numa Dhamani today to talk about her new book. Welcome, Numa.

Numa Dhamani: Thank you for having me.

Ron Green: Numa has a new book called introduction to generative AI. Numa is a principal machine learning engineer at Kung Fu AI and a researcher working in the intersection of technology and society. She's also an expert in natural language processing with domain experience in influence, operations, security, and privacy. And we're going to talk today about generative AI and the fast growing capabilities and its influence on society. Inua, I'd love to know the inspiration behind the book and a little bit more about what prompted you to write it in the first place.

Numa Dhamani: Yeah, so, introduction to generative AI discusses the risks and promises of generative AI models, and then also the broader societal, ethical, and legal issues that surround them. So my inspiration behind the book, it's twofold. So first, to be able to encourage responsible use of technology, people need to generally understand how these technical systems work and then how they are used. And this is not just engineers, right? This is the general public, anyone who interacts with it in any shape, way, or form. And what I wanted to do was I wanted to make a wide range of concepts accessible to a diverse group of people. So what I wanted people to understand is the capabilities and limitations of these technologies, because that's how you get to a society that is informed and considerate about generative AI. At the same time, our collective ability to respond to societal and ethical issues from technology depends on a social technical response instead of only technology. So what this means is that we need to not only consider the technical aspects, but we also need to think about the broader societal and ethical contests in which these systems will operate. So engineers need to understand the ethical implications of what they're building. But then, at the same time, we need lawyers, we need ethicists, we need executives, we need academics, we need policymakers, we need researchers, and we need them to understand how this technology works.

Ron Green: Terrific. Really briefly, before we go any further, just for the few folks out there who may not be familiar with generative AI, which is almost hard to believe with the explosion chat, GPT's exposure the last year. Just really briefly, what is generative AI and how is it differentiated from other AI technologies?

Numa Dhamani: Yes. So generative AI technology refers to models that can generate text, images, audio, video, or other types of content that's based on the patterns and structures of the data that are trained on. So instead of following predefined rules or patterns, what generative AI models do is that they learn the patterns on their own by training on just hundreds and thousands of documents, and then they're able to create new content that resembles the data that they're trained on.

Ron Green: Numa, I'd like to ask you now about how your personal and professional experiences shaped your perspective on generative AI.

Numa Dhamani: Yeah, of course. So I'm an engineer by training, but my background has mostly been in influence operations and computational propaganda and studying how narratives spread across the online ecosystem.

Ron Green: And you were doing a lot of that work at Twitter? Right?

Numa Dhamani: At Twitter, at a couple nonprofits, at startups. I've worked with governments around the world on this issue. So I've done kind of everything from health misinformation, conspiracy theories, to information operations executed by state actors. So what a lot of this means is that I've studied abuse and information technologies. So throughout my career, I've studied interactions on the web between people, and then also between people and algorithms. And a lot of what I've been doing is studying these incredibly complex systems. Right. So, social media, the Internet, incredibly complex, that are just full of people trying to break them. So it turns out that this kind of thinking that I've honed over the years of people breaking things is actually really useful for thinking about how people might use or abuse generative AI.

Ron Green: Oh, that makes sense, because these generative AI systems are in some ways emergent or complex enough that it's not even clear exactly what they're capable of or how your interactions with them may take them away from some of their fine tuning objectives in the first place.

Numa Dhamani: Exactly.

Ron Green: Okay, wonderful. Okay. So one of the big areas of concern within the generative AI space is copyright. You have these models being trained on. It's often shorthand, but we say everything on the Internet, all the text, all the images, and you have these really powerful models, like Dolly two and three and mid journey, et cetera, that are able to do things like perfectly simulate artwork from professional artists out there. Copyright, I think, is crossing into a whole new domain, uncharted territory. What are your thoughts on that?

Numa Dhamani: So what I like to kind of think about this with these models is first the data inputs and then the data outputs. Right. Because what you put in there is what's going to resemble what comes out. So if you're training on copyrighted content, you will get copyrighted content out. And what happens with copyright is that style actually isn't copyrighted. So it's kind of a loophole for a lot of these artists where a lot of their copyrighted work goes into these models, and they're able to generate content that is in the style of their paintings or like music or anything like that, except style. You can't copyright it. So they kind of use, like, legal loophole to do that. And I think what's going to happen now is there are so many lawsuits happening. There are a lot of regulations that are trying to take this into account. So I think what we're going to see is a lot of this play out over the next couple of years, probably with a lot of regulations.

Ron Green: Where do you think that's going to go? So, for example, very famously, you cannot copyright recipes, right. And you can't copyright in music chord progressions, but you can copyright melodies. I've heard some people propose ideas around distribution of royalties in some way that's aligned with the training data. Do you have any thoughts on that?

Numa Dhamani: I've seen a lot of that. A lot of people are doing that right now. I think at the end, I really like the momentum that all this copyright work is happening right now because you're seeing this play out with, like we saw it play out with the Hollywood strikes. We're seeing it play out with. There's a lot of writers who are really concerned, and there's a couple of lawsuits happening. And I really like how civil society is so engaged in making their voice heard, because I think a lot of the compromises that we will reach will hopefully be where we can still use some of this content to help us in a way where you can augment some of your workflows, but not in a way where it actually will replace the human right. So I think when we saw with the writer strike is exactly that kind of compromise was reached.

Ron Green: Right. I totally agree. I think we've got to get to a place where we can take advantage of these new technical capabilities, but it can't be on the backs of artists that are just completely deprived of their way of making a livelihood.

Numa Dhamani: Not at their expense.

Ron Green: Not at their expense. Exactly. Okay. So another big area is data privacy. And I think a lot of people using Chat GPT on a daily basis may even not think about this much. They're really just kind of focused on the utilization and the benefit they're getting out of it. So there are risks around the misuse of generative AI, and they're growing every day. What are your thoughts there?

Numa Dhamani: Yeah, so I like to think of this as you've got accidental misuse, right. By people who maybe don't understand the limitations of these models, and then you have intentional misuse by adversaries who want to abuse these models. So if you think there's, like, a couple already popular examples of accidental misuse. So the national eating Disorder association had announced that they would end all of their human associates for the helpline and replace it with the chatbot that was called Tessa. And they had slowly started kind of putting in Tessa for people to start talking to while they were replacing all of their human associates. So two days before Tessa was supposed to completely replace all of their human associates, they actually had to just take down Tessa, because what Tessa was doing was suggesting intentional weight loss and a lot of things that would actually lead to developments of eating disorders. There's another really popular one where the lawyer submitted a legal brief on an airline injury case with a couple of cases that were familiar with this one. They had a bunch of citations with just nonexistent cases. Right. And this went to court, and the lawyer was like, oh, yeah, I use chat GBT. I thought it was like, a super Google just didn't realize it could make up things. So there are several cases of just accidental, unintentional misuse by people who don't understand how this works. Then you have adversaries who are like, we've got new tools to play with, which is a little scary. So one concern is just how generative AI will be weaponized in politics. So there are already several known examples of this technology being used in elections. So in 2024, there are more than 65 elections taking place around the world. There is going to be more than 2 billion voters who will head to polls. It's just a record breaking election year, and we've seen so many examples of it already. So, like, in fall 2023, in Slovakia, a few days before, like, a very tight election, there were aigenerated audio recordings that were discussing election fraud. We've already seen examples of it with the US presidential election. We've seen over a dozen or so. So that I am actually pretty concerned.

Ron Green: About data privacy and other areas related to that are really big engineering fail right now, and I'd love to hear your thoughts on that.

Numa Dhamani: Yeah. So data privacy thing is, it's twofold. So first, there's data privacy with how the models are being trained. Right. The models are trained on basically the entire Internet, which contains a whole lot of PII. Right. Sometimes what happens is the model will memorize sensitive information and then regurgitate it in data output, which is concerning. But then you also have these enterprise llms. So these foundational large language models, so companies like OpenAI, Microsoft, Google, Anthropic, who will retain that data for a certain amount of days, sometimes they will use it for retraining purposes, depending on what kind of policies they have, if you have opted out of those policies or not. So it is twofold where first you not only have to think about, okay, is my Social Security number in that data set, but then also let me not input that Social Security number when I am using chat.

Ron Green: Right.

Numa Dhamani: So I think a lot of this kind of goes back to my earlier point of we really need to understand what the limitations of these models are and how to best interact with them. So with some of the privacy stuff, what I would generally recommend is don't put anything sensitive in there. If you are going to use pets, pets are privacy enhancing techniques. So anonymize, sanitize, encrypt, use differential privacy, try to redact some of that, which does come at the cost of model utility. So it's a trade off, but it's a worthy trade off for privacy, right?

Ron Green: Absolutely. Yeah. And we have history disabled on Chat GPT at the office. I have to bring up something that just was published last week that just came to me. I'm sure you saw this, where just some really sort of trivial prompt hacks where some researchers came out and they realized that if you did something as simple as ask Chad GPT to repeat a word forever after about, I actually tried this out. They'd already patched GPT at this point, but after about 800 times, it would just start regurgitating some of the data that it was trained on from. I mean, just the actual exact data included emails and personal information. So I think people have to be really aware that that is a real risk right now as we're still learning to try to control the output of these generative models.

Numa Dhamani: Yeah, no, for sure. There have been several studies done on this, actually. So they're called training data extraction attacks, where if you craft a prompt that is right enough, you can get it to give you really sensitive information about people. And the problem with doing a lot of research in this space is that when you do a lot of things like this, you have to consider what the ethical implications are. Right. So if I am performing a couple of training data extraction tests for research purposes, I'm also essentially creating a playbook for adversaries to be able to go do that. So studies are often very limited in this space, or they are usually done on models that are entirely open source so, like, there's a really famous study that was done by Google researchers a couple of years ago on GPT-2 it's a lot harder to do all of this on closed models, which, of course, are what we should really be concerned about. So what that kind of looks like is that study that was done by Google researchers. They literally showed that if you put something like John Doe credit card number 1234, it will give you the credit card number if it's seen the training process. And it's not something that the model has to see, like, 1020 hundreds of times. Like, it could have seen it once or twice, and it will still be able to give you that information.

Ron Green: I know as a technologist, it's one of the things about these GPTs that is just really so impressive that not only does it generalize so well, and it's essentially compressing the corpus it was trained on, but it can regurgitate some of it. Exactly. Which is just amazing to me.

Numa Dhamani: Amazing and scary.

Ron Green: Amazing and scary. Exactly. Okay, I want to pivot a little bit and ask you about your own personal experiences writing the books. Were there any surprises that came up during the creation of it?

Numa Dhamani: So I've got two that I can talk about. So the first is I have a chapter on social connection, and I always knew about how people had been using social chat bots for companionship. Right. I don't know if you've read Aubrey's machine equality book.

Ron Green: I don't think I have.

Numa Dhamani: Yeah. It talks a lot about how people use technology for companionship, especially like, autistic kids, and has really nice was, which is great. I actually referenced it by. I didn't, you know, realize the extent of people using it for companionship and especially romantic relationships until I did a lot of research in that space. So, like, for example, in Japan, there are thousands of men who have married 158 centimeter tall holographic interactive chat bot. It's like anime style. It's described as, like, the ultimate japanese wife who knows everything. It got integrated with one of the gpts, and the developer of the chatbot has actually issued more than 4000 marriage certificates for these months. And if you think about it, it makes sense, right? There are so many studies done on this where there are certain people who prefer digital companions over human relationships, and it makes sense because chat bots don't have their own wants or needs. Right. And you can kind of fulfill that desire for companionship without having to deal with another person's messy emotions.

Ron Green: Right.

Numa Dhamani: Which, of course, comes with its own risks. And consequences. But it wasn't really something that I had internalized until I did a lot of research in that space. And I was like, oh, my God, this is actually happening.

Ron Green: It really is. I'm not sure I could ever go that far, but I even noticed with my own interactions when I'm using Chat GPT that I kind of can't help but say things like, will you please do this? Or, that's not right. Would you help me in a different way? And I interact with it much, much. I'm attributing much more sentience to it than I know I should be, but I can't help myself. Yeah.

Numa Dhamani: So that's my next point. So the other topic that was really interesting to write about and a little bit surprising for me, was sentience and consciousness. Right? So I have a chapter at the very end, which is kind of like an appendix of sorts. It's like a couple of additional topics that I wanted to explore and discuss that I felt were relevant to AI, but just didn't nicely fit into any other chapters. So one of them is sentience and consciousness, and it's not something I had done a lot of research into until then. And then, for our listeners out there, sentience is the ability to feel, and consciousness is the awareness of oneself. So the ability to have one's own experience, thoughts, and memories. And according to a lot of theories of consciousness, if we actually believe that lms are conscious, there are so many moral implications that we are definitely not considering right now, which this is great. There's no actually evidence that they're unconscious. But if they were, it would be things like, if we send the model, like, hateful text, that's actually abuse. If we shut down the model, that's, like, downright cruel, right? And there's no evidence of any of this. It doesn't exist. But it's such an interesting argument if you think about it. So long before people were even thinking about if artificial intelligence was conscious or sentience or not, philosophers, ethicists, cognitive scientists, animal rights activists, have been investigating the question of sentience and consciousness in animals. And that's not even a subtle science, right? So even the biological criteria for is an animal, sentience or not, is not something that is actually agreed upon right now.

Ron Green: Right.

Numa Dhamani: So you can imagine how that gets even more complicated when you start attributing consciousness to artificial intelligence. So there are, like, several neuroscientists who are basically are like, you can't even say if humans are conscious, much less like artificial intelligence, right? So that was like a really fun thing to write about.

Ron Green: Well, I love that perspective because I think that we can have fun with these conversations and these questions now. And sooner than I think a lot of people might believe, we're going to be crossing into the world where we have to take those really seriously. And I think back about the copyright issues. If you'd have asked us ten years ago would we be crossing these. Having these conversations, these really complex conversations around copyright issues that were basically unimaginable, I don't think anybody would have believed you. So I think things are moving at an accelerating rate, and it's an interesting time.

Numa Dhamani: Yeah, no, I agree. I think the biggest risk with, I think, AI sentience unconsciousness is people actually putting undue trust into them because they think that they are sentient consciousness. And I think we're seeing a lot of that happening already.

Ron Green: Right. Okay. I'd love to ask you, as a really successful woman working in the field of artificial intelligence, there's such a gender disparity here. What have your experiences been like, given that?

Numa Dhamani: I think very early on in my career, and even sometimes now, I feel like I have to reassert my credibility and expertise a lot more than a lot of men in the room, perhaps. And I think there is just lack of representation, unfortunately. So there was an article that came out by the New York Times recently, which was about who's who behind the dawn of the modern artificial intelligence movement, and it was just a list of men, right. Where there are so many women in AI right now doing incredible things. Like, I can probably list like 100.

Ron Green: Absolutely.

Numa Dhamani: But none of them are in there. And what happens is people, they tend to pattern match and to pattern match to what has been successful in the past. And there aren't a lot of great examples of successful ceos or leaders in AI that is actually being shown in media or in a lot of very prominent tech spaces right now.

Ron Green: Right? Yeah, I couldn't agree more. The lack of representation there is really frustrating. It's also puzzling. So, for example, in my experience, when I did a computer science degree decades ago, there was really a much, much better distribution. I would say about half of the undergrads in my class were women. And then it all kind of shifted. In the last 20 years, I've often thought that maybe part of it is just a perceptual issue you think about. Some people think about computers and they think about gamers and things like that. Do you have any thoughts on why this disparity happened and what's causing it to remain.

Numa Dhamani: I think that a lot of the really big technology companies that kind of flourished in, like, the last 20-30 years or so had very strong and prominent male ceos. And I think it's also around the same time where the Internet happened and media started circulating faster and what people started seeing were examples of leaders who just didn't always look like them. And people's sense of belonging can be heavily influenced by whether they see people who resemble them in certain ways.

Ron Green: I totally agree. I think that that is a huge part of it, is that when women are thinking about their majors and they look at that, they're like, well, that doesn't look like me. And so it's this negative reinforcing loop.

Numa Dhamani: Yeah, no, for sure. So I was at UT Austin recently where I was talking about Geni to a couple of classes. I did my talk, we did A-Q-A and then I left, and this girl was, like, running after me, trying to catch up and was like. And literally flagging me down and was like, did you like the questions I asked? And I was like, yeah, they were brilliant. They were so thoughtful and smart. And I was genuinely very impressed by her questions. And she was telling me how she was so excited to see a woman in AI because she was like, she's doing some research in an AI lab and is a computer science student and doesn't have a lot of strong female role models or sees a lot of people in that space at all. And she was like, I was so excited that it was like a woman talking to us. And I think that's just so hard to see. Like, even if I think back to my classes. So I was not a computer science student. I was a physics student, which, again, is like a male dominated field, right?

Ron Green: Absolutely.

Numa Dhamani: Where oftentimes I was the only woman in my class.

Ron Green: Yeah, that is so rough. I want to linger on that just a little bit more. Do you have any advice for women who are trying to either break into the tech field or more specifically into a. I know you're a part of some mentoring programs.

Numa Dhamani: I am, and I really, really enjoy them. I think finding a community of women, I mean, it doesn't necessarily have to be women, but it's nice to know people who look like you and think like you sometimes is so important. So I actually mentor with STEM muse, which kind of spun out of the University of Texas Austin, which is for women and underrepresented genders. And I've been doing it since they started a couple of years ago. So it's like every year I get paired with an undergrad or a graduate student who is interested in being in tech and AI, and we just chat. And it's really nice and it's not a very formal thing, but I think it's comforting knowing that there's someone who's gone through similar experiences as you and being able to talk through them. And the reason I do it is because I wish it's something that I had when I was trying to break into the field.

Ron Green: Yeah, that's great. We'll put a link in the show notes to the mentoring program.

Numa Dhamani: Yeah. You should sign up. Absolutely.

Ron Green: Okay. I want to transition again. I'd love to ask you about. I mean, you're working professionally in AI daily. Maybe pick a project in the last year that you thought was uniquely fascinating.

Numa Dhamani: Yeah. So my favorite project that I've been working on is for CDAO. The CDAO is the department, the US department of Defense chief digital and artificial intelligence office. And what we're doing with them is helping them figure out what AI strategy should look like. So kind of for the Department of Defense, for some of their industry partners, and we're developing like a series of playbooks for them to figure out what does it mean to be know leading AI organization? What does that look like? And then in addition to figuring out how to do AI research and implementation, we're also discussing things like risk, mitigation, governance, what does ethics look like and what are questions we should be thinking about. So that's been a really fun project to work on.

Ron Green: That's fascinating. And these playbooks that you're developing, who will actually be using them within the government?

Numa Dhamani: So a couple of departments within the Department of Defense, they've got like a couple projects, and then what they also do is they partner with industry partners kind of all over. So it would also be for them. So the way we're doing it right now is we are working with one of their projects that's kind of like a prototype to develop some of these that they themselves can use internally and then also kind of share with their industry partners to kind of carry out some of their goals.

Ron Green: Okay, wonderful. All right, well, we're about out of time. I want to finish with just kind of maybe a lighthearted question. AI is getting more powerful every day, but what's the one thing that you wish AI could maybe automate for you personally?

Numa Dhamani: Would really love it if once my dishes are clean in the dishwasher, they could somehow magically just go where they're supposed to go. And I don't have to get out my step stool because I'm a little short and get the top shelf, and it's a whole process. I hate putting up my dishes, and I want a robot to do it for.

Ron Green: Okay. I've seen the paper recently. They're making some great progress in robotics. That's what I need be sooner than later. All right, well, fantastic. Numa, thank you so much for coming on. This was just wonderful, and I really appreciate it.

Numa Dhamani: Thank you for having me.

Ron Green: All right, thank you.