Episode 8 Transcript

Sandy Pentland: The area of privacy and data ownership is the main thing that I’m trying to sort of push on. So we have things like GDPR, the California privacy protocols, things like that. But having rights over your data doesn’t really do much for you. It’s just like, bunches of bits, right? What do you do? And we have this problem that some small number of organizations have huge amounts of data, very unequal. And I think that the solution to that is the area that web science and people ought to think about, which is how can people take control of their data to get the medical service, the government, etc, that they want? 

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

Just a moment ago, you heard Professor Alex “Sandy” Pentland talk about the intersection of data and privacy, which is just one of the areas he studies. Sandy is one of the most cited web scholars at the crossroads of web science, network science, and computational social science. He’s a professor of Media Arts and Sciences at MIT, and directs the MIT Connection Science Research Initiative. He also helped create and direct the MIT Media Lab and Media Lab Asia in Mumbai, India. He heads MIT’s Human Dynamics Group, which is one of two groups at MIT that is a member of the Web Science Trust global network of laboratories. And he co-leads the World Economic Forum Big Data and Personal Data initiatives. In 2011, Forbes named Sandy as one of the seven most powerful data scientists in the world, putting him in company with the then-CEO of Google, Larry Page. His work has pioneered organizational engineering, wearable computing, and modern biometrics, among other things. Welcome, Sandy.

Sandy Pentland: Hey, thank you for inviting me.

Noshir Contractor: Delighted that you’re able to join us here today! I want to start by asking you to tell us a little bit about how this incredible work that you’ve been doing on the web and how the web is shaping society? To what extent and how did you get interested in looking at the web? Because I do know, your scholarship, even before you were looking at the web, you had already made a name for yourself in areas such as image recognition, etc. What got you interested in looking at the web? And how did you get started in that?

Sandy Pentland: Well, I’ve always been interested in human interaction, human perception. And, you know, sort of around the end of the 90s, I was setting up laboratories in India, we were living in India, trying to set up sort of things like the (MIT) Media Lab, but in India, and I noticed that the Board of Directors we had was terrible. And it wasn’t that they weren’t smart. It’s just that they had too much charisma, too much personal force. And I got interested in how did that sort of nonlinguistic, the sort of style of speaking, change decision-making. And then that was this sort of honest signals work, that that, you know, people are aware of, and turns out that you can do things like get early warnings of depression and things like that using this. But when mobile phones came along in the sort of mid 2000s, we started using those. And, of course, that’s part of the mobile web, that is the mobile web. And so suddenly, we were looking at not just two or three people talking, but hundreds of people talking and even more. And I did experiments like looking at how communities make decisions by looking at their face-to-face interaction, as well as their digital interaction. And of course, as the web exploded, and you got lots more video conferencing, and things like that, it just became web science, right?

Noshir Contractor: You were one of the early people who looked at the web as an opportunity to be able to study signals, your book, Honest Signals was very influential in making that point. And more generally, social signals that you looked at, etc. One of the things that strikes me about your work is that you are amongst the first that looked at the web as an opportunity to be able to get data about human interaction and perceptions.

And at the same time, your work has also been equally influential in recognizing the web as a source of concern in terms of privacy, etc. Can you share with us a little bit about how you straddle and how you reconcile these issues in your own scholarship, and what you advise policymakers in this context.

Sandy Pentland: Now, the core attitude I bring is one of science. I’d like to understand, particularly human nature, and how it is that we learn, make decisions, form society. And what you see very early on, when you begin looking at this, is that people form into cliques around topics. So there’s your buddies, the people you talk to, and then people go off and explore to find other sorts of opportunities. And that, that exploration is critical for development of a community but the interaction of the community, the separateness from the rest of society, is critical for people developing modes of operation or norms of operation. 

As you begin to look at this, you realize, “Oh, my God, I can tell who this guy’s friends are. And I can tell who the boss is. And I can tell that he broke up with this person over there.” And so this brings all of the sort of classic privacy concerns. And it’s actually a lot more concerning than, say, Twitter or something like that, or Facebook, because records from cell phones in particular, right, or where you actually were. So it’s who you actually spend time with, not what you say about it. And it’s extremely illuminating for people’s, not just personality, but their social structure, what they believe, where they’re going, independent of what they say. So that led me to start the discussion at Davos that led to GDPR, and have been developing technologies to be able to preserve the good parts, which is the communication, the community building, the exploration for new ideas, but without having the downside of privacy and security risk.

Noshir Contractor: Well, one of the things that you must have thought about is also the variation in norms around the world, on what constitutes privacy. Clearly, you know, some of the things that we see in GDPR, for example, may not be very appealing to even a US audience, and certainly would be different from what the Chinese or the Russians to just name two, what are your thoughts about the extent to which these kinds of policies need to be responsive to specific countries and cultures, or you have the belief that there are certain human rights, basic fundamental principles of privacy, that should be true for everyone on the planet?

Sandy Pentland: There is a certain way of thinking about it, which is universal, it has to do with human nature. But there are variations. And so what I see is that privacy fundamentally, is about individual freedom: can you make, learn things, work with people, do things without any interference, and also without other people knowing, so that you feel more free to try things that you might not want to talk about in public, and, you know, this doesn’t have to be dark. This is like, you know, I’m gonna date this person, but I don’t want that written on my record for the rest of my life, because it may not work out, right. 

But there’s a big axis in societies that has to do with individualism versus the social fabric. So in the United States, we’re extremely towards the individualistic side. In Eastern culture, it’s much more for the good of the group. But the issues are always the same. It’s just there’s a control knob that has to do with the value of the collective versus the value of the individual. And one of the things that people get wrong about this, I’m talking about the law, as well as the large discussions that people have, is that a certain amount of clannishness is key for social support, for mental health. You’ve got to have your buds that support you, you got to have people you bounce ideas off, if you don’t have that, people go haywire. They really do. It’s not a tiny thing. The biggest predictor of mental health is social interaction.

That’s why solitary confinement is this horrible, horrible sorts of punishment. But the question is, is what do you mean social? Is it just your gang, a small group of people? Is it the people in your neighborhood? Is it you know, your government? different societies, different cultures have very different answers to that. But we shouldn’t forget that there needs to be a circle of trust for a human to be healthy.

Noshir Contractor: That’s really a very interesting way of addressing these variations that we see around the world. We’ve talked a lot about phones and mobile phones and how that is protected. That became very influential in getting you started looking at these kinds of signals. You’ve also been one of the pioneers of these intelligent wearable devices, and sociometric badges comes to mind. Can you talk a little bit about what led you to that, and what you see moving forward about the future of those kinds of devices as a way of tapping into social signals, but also maybe providing you some feedback on the basis of that?

Sandy Pentland: Yeah. So I started the wearables group, which was really sort of the pioneering wearables in society type of group. So we had people running around with displays on their head and computers in their backpack, and stuff like that. And it was a response to the realization that we were going to have wireless communication — at that point, we didn’t have WiFi, or even cell phones — and that computation was going to get very small. And we wanted to experiment with what happens if your glasses had displays in them, what happens if things could whisper in your ear, etc, etc, etc.  What quickly became clear is, first of all, things on your body have a social dimension that computer screens don’t, you, you present yourself, you want to look attractive, you want to look credible. So what you wear is really important. And, and also, the second thing was,the main thing you do in your physical world are social things, not information tasks. And so connecting with people better, being more responsive, being more of who you want to be, those are the things that really take off. And people keep forgetting this. So what we saw, what I’ve seen is very slowly, people are figuring out ways to incorporate this in social interactions, in the real texture of your life. One of the main barriers has been batteries, for God’s sakes, yes, it turns out the battery technology is critical, because you can’t be like, you know, recharging things all the time. There were mistakes like Google Glass, which actually is a brilliant idea. But then by putting the camera on it and making it look all space age-y and George Jetson, people revolted. On the other hand, something that reminded me of your name, when I met you, again, would be pretty awesome for most of us, right? Or directions so that you don’t get lost. So what’s happening with that is that it’s being driven by health concerns now, of course, monitoring yourself for COVID, or just staying healthy. And then also, people being at home more, they’re interested in things that maybe aren’t exactly wearable, but are very different formats, from what we use now. And so we’re going to see a lot of creativity in these wearable things, driven probably principally by health concerns.

Noshir Contractor: Wearable devices has become now an extended part of the web, of the mobile web, if you may.

Sandy Pentland: Wearable devices, payment devices, things for managing traffic, and so forth. And of course, the downside of that is privacy, because now there’s a lot more information about you. And cyberattack, because the surface for attack is growing exponentially. Which means you know, taking down all the payments or using the payment machine to get at the core bank or all this sort of cyber attack stuff is going to increase by order of magnitude at least.

Noshir Contractor: You published recently, some research looking at blockchain transactions. Can you talk a little bit about how that research should be really important and salient for those interested in web science?

Sandy Pentland: Yeah, so we published something, a book, actually — MIT Press — called Trusted Data. And what it does is it lays out the architecture that you need to have to survive in this coming era, much greater cyber attacks, IoT, and other sorts of problems, echo chambers, etc. And the core thing is that the web was designed as a communication medium, getting bits from here to there, not 100% reliable, but cheap, good, fast. But it was not designed as a transaction medium, the sort of thing where you know, I pay you a little bit of money, you do a service, and if you screw up, I can sue you, and, you know, I mean that sort of legally binding real transactions, it’s terrible at it, because you don’t know if the messages get through there. They don’t have standing in legal courts. And so what’s happening now with IoT, Internet of Things, blockchain — don’t think Bitcoin, think just ledgers that record stuff in a, in an immutable way, a very serious way — and AI also is you’re getting the evolution of the web, from a communication medium to a transaction medium. 

And you can get a picture of this on Amazon, right? You know, you like, ask for things, it shows you, you click, you bought it. Right? One click —  all that. And that’s because within that tiny walled garden, they can take care of all the security and who you are, and payments and contractual things. Imagine that was on the web as a whole. So you know, you could say, design and build a house with one click, because it would go off and find the architect and the architect would put the plans in the mix, and the computers would merge it and find a build. I mean, you can imagine a world that is almost magical, because things would happen reliably auditable, traceable, legally-binding, fair, you know, you can build that in there. 

And to help do that, we’ve set up these sort of protocols that have blockchain and stuff in them. And I’ve gotten the European Union to adopt this as their core architecture for data. And also a number of large organizations like Fidelity now uses this architecture. Intuit uses an architecture like this, other companies that handle a lot of our life. 

One of the aspects of it is really interesting is is that law is turning to be a network of web science. Because as you get more things on the web, more of it has to do with legal rights and complaints and resolutions and so forth. So I just launched a thing called law.mit.edu, which is an alliance of law schools around the world, to think about how law can make the transition to this sort of digital age, because it’s not obvious. I mean, when there’s so much that has to be human-centric or human-centered in law, human judgment. And that’s under threat as things become more computerized. But you have to have to become more computerized to deal with this much more extensive digital environment. And so resolving that conflict is something that I wanted lawyers and computational people to think about toget her, so that we don’t end up in some horrible place.

Noshir Contractor: That shows again, why web science is so fundamentally an interdisciplinary initiative. And that requires us to think systemically across all disciplines to address and understand and enable the web as we know it now. You’ve already touched on many things that need more progress. But if I were to ask you, amongst the many things that you’ve been thinking about, what are some of the areas where you believe we have seen the least progress in web science? And that you see we need to be able to put much more emphasis in the near future?

Sandy Pentland: Well, the area of privacy and data ownership is the main thing that I’m trying to sort of push on. So we have things like GDPR, the California privacy, protocols, things like that. But having rights over your data doesn’t really do much for you. It’s just like, bunches of bits, right? What do you do? And we have this problem that some small number of organizations have huge amounts of data, very unequal. And I think that the solution to that is the area that web science and people ought to think about, which is how can people take control of their data to get the medical service, the government, etc, that they want? It’s not a matter of money. Money is a distraction. It’s really, are you getting the care, the government and the opportunities that you ought to do? All of this sort of anti-racism stuff revolves around this because this community wants to be treated fairly. Well, what does that even mean? How is it being treated, they don’t have the data? What is beginning to happen, now, is you see data cooperatives, data unions forming, or a community or a neighborhood. Everybody doesn’t give away their data, they set up something like a credit union that holds their data for them. So they still own their data. But now the data is in a place in this credit union, this data union, where they can analyze it and ask, are we getting the same medical services as those guys over there, because they have data from lots of people?

Data is the new resource, like labor or capital. We have credit unions, those came about as agricultural credit unions in the 1870s 1860s. We have labor unions, those came around the 1900s. And now we need data unions, where groups of people pool their assets, which they have the legal right to do under GDPR, and CPP to be able to stand up for themselves. And I think that sort of forming of community,rebuilding trust around facts, around data will have a transformative effect. 

Because as people work together, to have freedom, fairness, power to thrive, they build trust with each other, it reconstructs a lot of what is damaged today. Community should have the right, more, of self determination. And that will help solve a lot of the problems out there. That’s fundamentally a web science type of issue.

Noshir Contractor: Yeah, you raise a really good point that we have spent a lot of time and energy, focusing on being able to have access to our own data. But what you’re pointing out Sandy, is that that is a necessary but not sufficient condition for us to be reap the benefits collectively, of owning our data.

Sandy Pentland: Exactly. Som So at the beginning, I talked about this access of individualism to social fabric, even in places like the US which are very individualistic, the community that you’re part of, and you get to choose what community, the community that you’re part of, is necessary for your support for you to thrive. And, and we tend not to recognize that. And I think the time is now where people are waking up to the fact that we need to pool assets locally among our community, to be able to reinvent ourselves and get what we need.

Noshir Contractor: I think that’s really on point. In closing, Sandy, I wonder if you could reflect a bit on what our current situation, whether you’re thinking of COVID-19, or about the race issues that we’ve been confronting as a society and globally, in fact, how would this have been different if it were not for the web? 

Sandy Pentland: An interesting observation is that our large scale, non-web administrative structure — and I’m not talking about federal government only, all the way down — were unprepared and uncoordinated. And part of that is that different places, they have different culture. Texas is different than Massachusetts, I’ll tell you, oh, Montana is different than New York City. It really is. Not just a culture. But the physical. And I’m what’s ended up happening is the more effective things have been things that were adaptive to local communities. And you see the same thing in the science that’s going on. 

So there’s an explosion of science to find vaccines and treatments for COVID. But none of that is NIH or CDC or WHO project. That’s all local research groups working over the web, to find new solutions that they didn’t write any proposals for. Now the NIH furnished those labs. So they’ve provided the ground and the tools, but the actual research direction, that’s people grassroots pulling together, and the web is the thing that enables it. Alright. So what they’re doing is a dynamic learning network. So the web as learning network, and that is the way we’re making lightning speed progress on this problem, not big government programs. The big government programs set the infrastructure, just like the big government program helped invent the web. But it was these grassroots things that allow this sort of agile, locally adaptive exploration that’s gonna save us.

Noshir Contractor: It again, highlights the fact that even though the web was not originally invented with a particular set of activities in mind, certainly, perhaps not the activities that we’re experiencing right now, it has served us well in being able to, as you said, at the grassroots level, dynamically be able to coordinate large groups of people and large communities to come together and mobilize in helping us address these challenges.

Sandy Pentland: Yeah, I think maybe two parting bits. One is, you know, the birth of the World Wide Web was research projects, distributed research projects that weren’t directed from the top, they came from the bottom. And the second thing is, is what we’re doing right now, would have been impossible five years ago. We have constructed a web that supports all this sort of stuff, just in time for the pandemic, it’s, it’s, it’s crazy. If this had happened 10 years ago, none of that zooming stuff would have been plausible. Lots of these other things just wouldn’t have happened. And so by luck, or whatever, we find ourselves in a world that is, is webified, for better and for worse, but a lot of it is for good because now we can learn faster, we can adapt. We can do things together in ways that just simply weren’t possible until just very recently.

Noshir Contractor: Thank you again, Sandy, for taking time to talk with us today and sharing your insights. You’ve been one of the pioneers of a lot of what has been happening across disciplines and helping understand the web, and we greatly value you taking time to talk to us today.

Sandy Pentland: Thank you for thinking of me and enjoy talking to you. Take care. Thank you.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

Episode 9 Transcript

 Deen Freelon: Identity factors, which include, you know, not only race, gender, in some cases, sexual identity, national origin, also in some cases religion, really help to get a fuller picture of what’s going on the web and in various digital domains. So that’s something I’d encourage every web science practitioner to do, first of all, to read up on it, to figure out how to integrate that into the work they’re already doing, and then secondly of course, to implement that knowledge.

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the web science trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

You just heard our guest today, Deen Freelon, talking about why identity is key to understanding the complex interplay between the Web and society. Deen is an associate professor in the School of Media and Journalism at the University of North Carolina in Chapel Hill. His research covers two major areas of scholarship: political expression through digital media, as well as data science and computational methods for analyzing large digital datasets. He has authored or co-   authored more than 30 journal articles, book chapters and public reports, in addition to editing a scholarly book. He has also served as principal investigator on grants from the Knight Foundation, the Spencer Foundation and the US Institute of Peace. Professor Freelon has been at the forefront of research into  misinformation, disinformation, hyperpartisan content, ideological asymmetry, identity politics, and personalized information environments. And as a member of the web science community, Deen writes lots of software to analyze data, some of which he releases in open source spaces. Welcome, Deen.  

Deen Freelon: Thanks. 

Noshir Contractor: I’m so glad that you’re able to join us here today, I have been a huge fan of your work for a long time. Let me first begin by asking you, how did you first get interested in studying the web?

Deen Freelon:I’ve always been a bit of a nerd, my dad was an early adopter of computers, I learned how to do web pages when I was in high school, this is mid-90s. I went to college thinking that I was going to be a computer science major, but I was at Stanford at the time. And I I found that the way they taught it wasn’t quite wasn’t quite my speed. So I sort of pulled back and I majored in psychology. Later, I taught myself how to do PHP in my first job, which was a as a technology trainer at Duke University, which is in my hometown. And at the same time, I was teaching myself how to code, I was also becoming more politically aware, right. So this is around the time, 2000, 2003, start of the Iraq war, and all that. So the code piece and political piece were happening right around the same time. And so it was only later that I realized, wow, I kind of had these two pieces of my eventual scholarly identity, that were percolating and evolving at the same time. And this is actually before the field of Communication Studies, and probably web science as well, starts to become aware of computational methods and data science is a key component of both of those. And so really, it was kind of serendipity that I ended up having those skills and those interests at a time when those fields were starting to value those and starting to promote them.

Noshir Contractor: Well, I think we’re all very lucky for that serendipity, because you really were the right person at the right time. And one of the things that I really admire about your work, Deen, over the years is that you’ve taken issues and been able to capture it in a way that advances intellectual insights, but also speaks to a larger public. And you’ve done this in an amazing way in your scholarship, as well as your public engagement. Talk a little bit about how you began to think about these issues. I’ll throw a couple of recent papers that you’ve written. You have a paper called “False equivalencies: Online activism from left to right.” Tell us a little bit about what this false equivalence is, and why it might be going against the grain of some conventional wisdom that we might be listening to in this area. 

Deen Freelon: That paper is really the culmination of a lot of thoughts that I’ve had over the past, I don’t know, probably half a decade at least. And the false equivalency is between the left and right, so you have a lot of work on the left that has really come from and we talked about this in the paper, from kind of the hashtag activism school, right, so let’s look, you know, there’s a lot of work on, you know, Black Lives Matter, there’s a lot of work on, you know, the climate change movement in terms of their use of hashtags. And so there is one view of the the left, and actually, that connects to prior work. That’s not computational or web science in nature, primarily in sociology and communication, that in which the left has overwhelmingly been focused on, when you, when you’re talking about social movements, social activism.

And you’ve got work on the right, that really comes out of the tradition of sort of the right wing media ecosystem, which of course, long predates the web, right, going back all the way back to the 30s. But you know, really intensifies the 1980s, and the sort of mistrust of the mainstream media, that, that dates back decades as well. And so those sort of very divergent research traditions, I thought were really interesting and important to look at in contrast in that piece. And so that’s really what it does, it tries to figure out, you know, how the left does business as far as activism goes, how the right does business? What similarities are there? They’re both online, they both use many of the same social media platforms. What differences are there? The literature tells us that, for example, disinformation is a much bigger problem on the right than it is on the left, the issue that we identify in the piece, or one of the issues we identify, is that there hasn’t been that much research on disinformation on the left. So there’s a couple possibilities. One possibility is the research record reflects reality, right? Disinformation is a bigger problem on the right, than it is on the left. Another possibility is that, because there hasn’t been quite as much research done on disinformation on the left, we simply don’t know. 

What we call for in that piece is to try to figure out exactly what is going on as far as disinformation on the left goes. Searching through the literature, we didn’t really find that there were that many attempts to even answer the question. So what we want what we’re, what we’re advocating for is an affirmative answer, to this question of how much disinformation there really is, in terms of left wing left leaning or left oriented, so that we can characterize it against the disinformation that we know is rampant on the right.

Noshir Contractor: Deen, why do you think there hasn’t been more studies that have tried to examine disinformation on the left?

Deen Freelon: That’s a good question. I think some of the disinformation may not be quite as out there. I think, as we saw in terms of the events of January 6, there is a very strong argument to be made that the disinformation on the right, apart from how much of it there is, I think that the character of it is a lot more virulent and more likely to result in injury and harm to bodies, specifically, as well as to democratic norms. And so I think there’s a greater urgency there simply because of that. However, I do think it’s more than a mere scholarly curiosity in terms of characterizing the nature of disinformation that may appeal to the left as compared to that which appeals to the right. We simply haven’t done that work. I think it’s analytically important. I think it has public importance as well.

Some of it may have to do with the political commitments of the people who do the research.I don’t — I’m certainly not going to cast aspersions on anyone who does that kind of work, and I certainly don’t know enough about their political commitments to be able to say definitively, that’s just how, you know, confirmation bias and sort of, you know, motivated reasoning tend to work. 

This is something that, again, extends from research tradition that extends, at least until the 60s, you know, the studies of the civil rights movement, being kind of the paradigmatic social movement. And even if you look at some of the definitions of social movement, some of it actually has, almost seems to have left wing politics built into it. And so I don’t think that’s a great idea. But I do think that some of the analytical pieces of this also play a role in determining what gets categorized as a quote unquote social movement, and what is studied as, you know, reactionary politics or, or mainstream politics, because they’re practiced by people of different ideological commitments.

Noshir Contractor: So you’re not making a conspiracy argument, you’re just saying that this is a scientific curiosity that needs to be balanced across the left and the right.

Deen Freelon: Yeah, I really try not, I really try to stay away from any and all conspiracies. I do think that, you know, in that review, I think we’re doing what good reviews do, which is to point out, you know, gaps in the literature to say, we’ve done a really good job over here, we haven’t done quite as much work over here. So let’s, you know, balance the scales a little bit.

Noshir Contractor: One of the things that, obviously, is front and center on many of our minds these days, especially in the United States, is the Black Lives Matter movement. And I want you to talk a little bit about your piece that was titled,”Black trolls matter: Racial and ideological asymmetries and social media disinformation.”

Deen Freelon: Sure. Well, I want to give credit for that title to the wonderful Jeff Hancock of Stanford University. That piece really grew out of my work on Black Lives Matter. I did a report, a public report that came out 2016 and a follow up empirical article a couple of years after that. And so that actually was one of my big entree into the world of online disinformation, because I had this big black lives matter dataset. And when the internet research agency, Russian troll list of handles came out at the end of 2017, I basically just looked into my Black Lives Matter dataset and said, Wow, there’s like 300, you know, some names from this data set represented in my Black Lives Matter data set. So I said, Okay, well, this is definitely something I have to study, because they seem to have some interest in activism, specifically Black activism. And that piece of research that you, that you mentioned, is really the culmination of that investigation. 

What we found was that Black-presenting Russian trolls were actually more likely than any other of the categories that we looked at, which included right wing trolls,non-Black left wing troll trolls, and a couple of other ones. They were more likely to pull in retweets, replies and likes on a per tweet basis. And we thought that was quite remarkable, especially because the study design allowed us to disaggregate the influence of ideology from race.

Noshir Contractor: Can you talk a little bit more about that? What does it mean to be able to disambiguate race from ideology? And also, if you could just recap again, what exactly was the asymmetry in the social media disinformation that you found?

Deen Freelon: So we rely on that study on categories that came from a couple of researchers out of Clemson University, They came up with a really great initial typology, they lumped together, Black left wingers and non-Black left wingers, and so based on some theory that we detail in the piece, we made the theoretical argument for disaggregating those. We found out that a substantial amount of the effect for likes retweets and replies that were attributed initially to left-leaning were actually explained by Black-presenting, right. We found was a very, very strong indicator that the Black presentation was actually driving, a lot of, a significant portion of the effect.That’s where the asymmetry comes from, the asymmetry between left and right being more effectively explained by race than by ideology. And also the asymmetry between being sort of non-Black left wing as well as between Black left-leaning.

Noshir Contractor: That is incredibly interesting, because it’s so easy for us to conflate some of these in our stereotypes. I’m going to ask you a more general question, do you make a distinction between disinformation and misinformation?

Deen Freelon: If you look at our piece that ran in political communication last year, “Disinformation is political communication,” we talk about disinformation as being false or misleading content that is intentionally spread to damage a third party. So that is where the person spreading it is aware of the deceptive nature of what they’re spreading. And they’re doing it with a specific goal of damaging some enemy. Misinformation is where content is spread, without knowledge on the part of the spreader that it’s false, or that there is some deceptive element to it. And so what that actually implies is that dis- and misinformation are not necessarily inherent qualities of the content itself, but rather, they are relations between the people who spread them and the content.

Noshir Contractor: And so by that definition, then the two pieces that you wrote about Russia, one titled “The Russian Disinformation Campaign on Twitter” and the other about Russia as internet research agency, appearing in the US News is Vox Populi, tell us a little bit about how you got interested in this particular issue. And what were some of the key takeaways for you? 

Deen Freelon: I feel that my interest in disinformation is sort of, you know, charitably achieved through my interest in social movements, and in the way that a lot of the most prominent disinformation including the IRA and others, have really tried to glom on to existing social movements, to be able to spread their falsehoods. And so I think that is something that is a logical outgrowth of outgrowth of the work that I’ve done.A lot of the work that I that I have done in this has really stuck close to the sort of the relationship between disinformation and social movements, because that’s something I’ve been interested in since I was a grad student.

Noshir Contractor: And you find that in the case of the “Russian Disinformation Campaign,” one of the things that you argue, which again, is counter to the conventional wisdom, is that the disinformation campaign on Twitter targeted political communities from across the spectrum, not just from the left, as some in the media would have us believe.

Deen Freelon: The internet research agency, which was a very specific group of paid Russian trolls that were paid by the Russian government, targeted, not only you know, folks in the Black community, or on the left, they also targeted folks on the right. And one of the studies, the study that was published in the Misinformation Review, my my colleague, Tanya Loca, and I point out that the specific identity that the IRA agents took on was the same identity of the people that they actually wanted to reach. So conservative presenting trolls wanted to reach conservatives, Black-presenting trolls, mostly reached Black individuals, left presenting trolls reached out to and actually ultimately reached left-leaning individuals. So in some ways, that’s actually helpful analytically to understand exactly what they’re doing. They’re playing on this, this idea that most of us who study social media, and web science understand, which is like follows like, right, you know,  birds of a feather flock together. And so they’re really taking advantage of that specific tendency on the internet and social media, to be able to reach out to folks and have the real individuals who share those political identities to carry forth their disinformation for them. And that’s one of the main ways that they’re able to get traction is to have real people sharing, retweeting and engaging with their content, which gives it that imprimatur of reality.

Noshir Contractor: And what was interesting is that you suggest that the best way to counter that or at least one way to counter the Russian disinformation campaign, would be for people across the political spectrum to collaborate against it? Tell us more about that. 

Deen Freelon: Now, that was, in all honesty, a bit of a pipe dream, right. I mean, we’re pretty, we’re pretty polarized, I think in our country right now. But I think if there are, I think if there are opportunities to do that, I think it would be a great thing. I don’t know anybody who openly proclaims that having, you know, foreign agents, infiltrating our political conversations is a good thing. So it seems to be at least in principle, to be something where people from differing sections of the political spectrum could come together and agree at least, that this is a bad thing, and we should find ways to, to combat it. So now, in terms of how likely that is, I don’t really know. 

Noshir Contractor: Yes, we are living in rather, hyperpolarized times, as you might put it. You did a project that has been going on for a period of time called the filter map. Tell us more about where that project started and where it is now. 

Deen Freelon: “The filter map” is the name of a piece that came out, I was commissioned to write this piece by the Knight Foundation, and it came out in 2018. And in that piece, I sort of take issue with some of the conversations that were occurring around ideas of the echo chamber, and the filter map. The idea at the time was, Oh, well, people really need to engage with content that lies outside of their own bubble, right. So it’s, it’s content that is produced by people who disagree with them, they need to engage across ideological lines. And my contribution to the conversation is, there are certain ideas that it is not fruitful for us to engage with, right. So if you’re talking about, you know, open racism, open sexism, you know, Nazism, things of this nature, these aren’t ideas that we should give the time of day to, so to speak. And so what I tried to do in the piece is I tried to articulate the kinds of ideas that we disagree with, that we may want to give the time of day to, and those kinds of ideas that we may not want to, right. 

So the idea behind the filter bubble, is to say, whether you agree with something as sort of one aspect of your relationship to an idea. A second aspect is, if you’ve decided you disagree with something, whether it lies beyond the pale of things that you would at least consider as a second factor. And so that general set of ideas kindly of sat on the shelf for a little bit, until I was lucky enough with three of my colleagues to be able to receive one of the big Knight Foundation Center endowing grants in 2019. And at that time, I realized that I had an opportunity to put the ideas in this filter map into practice. 

So I’ve collapsed it into two dimensions. And so one dimension is if you think about this horizontally left versus right, so there’s been a lot of progress in the past few years, a few years in terms of ideologically scaling, media personalities, media outlets, and Twitter handles, things like that. So you can think about that as being scaled in a horizontal axis, as well as on a vertical axis that would look at things like the total number of, you know, ratings that you get on PolitiFact, right. So if you’re high truth, you’re up here, you’re low truth, you’re down here, right? So and now you got two axes that shows left, right, one, high truth, low truth, up and down. And you can actually look at your own social media feed and see how much of each quadrant you actually get. So if you think about above, board where your high truth, that’s where you’re seeing the kind of content that you want to engage with, oh, here’s the high truth, right wing stuff, okay? Let’s think about that. Let’s engage with that. And if it’s low truth, well, it’s low truth, and it’s on my side, that’s maybe disinformation that’s trying to target me, that’s where I’m most vulnerable. And that’s what I want to keep out of my information stream.My hope is that that will help people understand their social media feeds better. And it’ll help put some of this heady, you know, theoretical stuff into practice in a way that ideally makes people’s lives a little bit better.

Noshir Contractor: This is an example of how you make your scholarship very actionable or potentially actionable by individuals in terms of giving them something to look at. You’ve also contributed by way of sharing code and your software tools that you’ve developed it etc. Tell us about why you chose to do that. And what do you see as the challenges and opportunities for people in the web science community to be sharing their code.

Deen Freelon: Sure. Well, I first started sharing my code when I was a grad student. And actually, the very first thing I shared is by far the most popular thing I’ve ever shared. And that is Recall, which is online, in a code of reliability calculator for content analysis. So that’s kind of that’s kind of fun for me, and I think, useful for many people. 

In many ways the success of that project, which was really just an offshoot of my master’s thesis, I mean, the the short version of the story was that before Recall, the primary intercoder reliability program with something called Pram, and it only ran on Windows, and I had a Mac. And you know, I did my grad work in Seattle. And, if you know, Seattle, it’s very rainy. And so when I was doing my Unicode reliability tests for, for my master’s thesis, I didn’t want to walk from my apartment all the way to the lab, in the University of Washington Communication Department. So I said, well, I’ll just make one myself and program this thing, literally do the math myself to do this, sort of prettied it up, and made it usable for others, when I put it on my website. So the success of that really led to other you know, sort of forays into writing software for the research community, I think is incredibly important. I think of how I personally have benefited from other people’s software that they’ve created that’s been on an open source basis. And I just want to give back a little bit to that one issue, I find that I think hampers people from from doing this is that, especially outside of computer science, and perhaps information science, the production of open source software for the research community is often not seen as, as, as much of a contribution as it should be. 

Noshir Contractor: We’ve talked about sharing code. What about sharing data? You were involved as part of the beta test that Twitter has offered to make all of its data available for free for researchers who apply for it. So tell us about moving from sharing code to sharing data in the web science community.

Deen Freelon: This is a really big topic. Our access to data, especially that which is owned by or stewarded by, for profit corporations is fundamentally tenuous. We’ve seen, you know, the rise of social science, one, which provides application based access and also money to Facebook data, we’ve seen this more recent initiative by Twitter, which allows access all the way back to the first tweet in 2006, to researchers who applied but again, even though I applaud that particular move by Twitter, ultimately, they have a say over who they accept, in terms of this program. Thatstill puts a lot of power in their hands in terms of deciding who gets to access this kind of data, and who gets to do this kind of research. I think that any researcher in the web science area should really have what I consider to be a diversified portfolio in terms of the data streams that they’re working with. So don’t become over reliant on one type of data, to be able to get your work done. So a lot has been written about and said about our field’s over reliance on Twitter data. And so you know, if Twitter data is your only game in town, well, if Twitter decides, you know, that giving this kind of data access is not in their best interest, or if they decide to reject your application for access to this wonderful, you know, time-unlimited stream, then you’re not going to be in a very good position. So having a number of different data sets that can speak to the kind of questions that you’re interested in,whatever they may be, I think is critical for being a web science researcher in 2021.

Noshir Contractor: We talked earlier about polarization, and I’m going to use that as a pivot to a very polarizing concept that I would love to get your take on. And that is, the notion of being able to infer individual level characteristics from digital trace data. You get people on one end of it, who think that that’s the most incredibly powerful way of being able to get to things and others who think that is the scariest idea on the web.

Deen Freelon: Well, this is something I’ve been thinking about for a very long time, and it’s something that I feel like I wish more people paid attention to. Because there are certain norms in certain fields, that don’t really think or are not thoughtful enough about what those traces really mean. For some researchers, it seems that simply studying the trace itself is enough. And there’s not really a whole lot of discussion about what theories this may apply to, and what those traces actually mean. 

So I think that under certain circumstances, certain digital traces are really, really great proxies for things that we really care about. In other cases, the fit may not be so great, but what I really want the scholarly community to do, web science and other social sciences, is to really consider carefully the fit between the, the theoretical concepts and research questions of interest and the data to which they have access.

Noshir Contractor: What do you see today, based, either on your own work, or more generally, what do you see as important issues that web science should be addressing moving forward?

Deen Freelon: Again, another really big question. I’ll just sort of beat a drum that I’ve been talking about for a while now. I think that, you know, web science community is in many ways, not unique among social sciences in underestimating the importance of identity more broadly, and race specifically. So, when you’re thinking about any topic that you deal with whether it’s virality or some of the more policy-oriented aspects of this, keeping an identity-focused aspect of this firmly in mind is really important. Identity factors, which include you know, not only race, gender, in some cases, sexual identity, national origin, also in some cases religion, really help to get a fuller picture of what’s going on the web and in various digital domains. So that’s something I’d encourage every web science practitioner to do, first of all, to read up on it, to figure out how to integrate that into the work they’re already doing, and then secondly of course, to implement that knowledge.

Noshir Contractor: Now that’s extremely important especially because in some ways, one can argue that the web conceals some of the normal visual surface level characteristics that we will look closely at many of these identity issues, not all, but some of these identity issues.

Deen Freelon: Yeah, and that actually ties back into the trace data issue, So one of the examples has to do with the underlying concept of gender versus race. So you’ve got a situation in which gender, generally, at least, anglicized names, can be heard with high levels of accuracy from someone’s first name. Then the question is, to what extent does the system support the use of real quote unquote first name. Facebook has its terms of service that you can use your first name so that’s in terms of service level issue, you can use something else but you risk your account being kicked off. Twitter does not require you to do its Terms of Service, and lots of people don’t. So, you would assume that any study that had a bunch of names of individuals on Facebook, and have a lot easier time determining gender than what an equivalent study on Twitter. Now shift to the idea of race, race is a lot harder, especially in the United States, to infer the basis of someone’s first name. In some cases you might be able to do in other cases, you may not be able to do it. 

And so that becomes a lot harder to be able to to get. Actually, a better example is that Facebook allows you to indicate your gender, so, the difference in terms of the identity characteristics that you’re able to get out of those systems is baked into the design of the system. So, that means that some identity characteristics are easier to integrate into a research study than others. But I think that the effort is well worth it when you’re trying to figure out how for example, different soci technical systems are used by different people, how they impact different kinds of people, and how different kinds of people see them.

Noshir Contractor: In closing, in 2020, spending almost a year in isolated confined environments and dealing with all kinds of reckonings, cultural, racial, health-related, etc. Can you talk a little bit about what this entire experience might have been how it might have been different, for better and/or for worse, if we didn’t have the web?

Deen Freelon: The image that popped into my mind was, how would a skyscraper be different if you remove the second floor. The second floor goes away, what happens is, floors three through n crash down, and they crush floor 1. So, I think that you know taking the web, that is so, you know, deeply enmeshed into everything we do, would render our society completely unrecognizable. So it’s not like, okay, you take the web out and you go back to the 1980s. It’s everything that relied on that, everything from banking to getting your takeout with a couple of clicks of an app, to your health, to how you relate to others, the fact we can have this conversation remotely.I just don’t think that would really be something that we could imagine, we can’t really put the genie back in the bottle. We have to live with this, as it is. I think there’s certainly ways that people can use the web better, there are choices that I wish people hadn’t made. I think it’s extremely difficult to imagine our society without the web.

Noshir Contractor: I love the metaphor of the second floor of a skyscraper falling apart, I think that is an extremely evocative way of capturing our dependence, if you may, in a very foundational way. Deen, thank you again so much for taking time to talk with us today. I think that your work is extremely important in part because it challenges some conventional wisdoms and does so in a way that really is provocative and advances our understanding and sensibility about many issues related to web science. And I look forward to seeing continued research and insights from you in the years and decades ahead so thank you again.

Deen Freelon: It was really great to be here.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

Episode 7 Transcript

Gina Neff: We see a moment that we’re in right now of being somewhat trapped, I think, between the necessity of contributing to a public good, but also needing to understand where we fit personally in these. So I’ve spoken out quite publicly about back-to-work solutions that don’t protect workers’ privacy, right? What we know about organizations and workplaces, is that we absolutely have an imbalance of power between employees who need work and employers who might have other interests or demands in the workplace. 

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the web science trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

That was Gina Neff earlier, talking about how the Web and the workplace are influencing one another. Gina is a professor of Technology & Society at the Oxford Internet Institute and the Department of Sociology at the University of Oxford. She’s a sociologist who studies how web-based technologies are shaping the future of work, and has published three well-acclaimed books and over four dozen research articles on innovation and the impact of digital transformation. Her writing for the general public has also appeared in Wired, Slate and The Atlantic, among other outlets. Given her thought leadership in this space, Gina was invited to deliver a Keynote at the 2020 ACM Web Science Conference. Welcome, Gina. 

Gina Neff: It’s really good to be here.

Noshir Contractor: Well, I thank you very much for taking time to talk with us today. And I’m very excited to hear your opinions and your insights about this very important new discipline that has been emerging over the last decade, called web science  So first I want to talk about what does that term web science mean to you?

Gina Neff: As a social scientist who studies work and technology, I really can’t do what I do without thinking about web science, so when I first think of the term, I think of mapping the web.  What I do in my work is to really dive deep into those ties and think about what people are doing in those connections. How are they linking at work, what fo those links mean for them, and how are those collaborations playing out — both at workplaces where people are trying to work on tasks together, but also in terms of making social structure, in terms of making the new rules of society that come out of the links that they’ve built. 

Noshir Contractor: And I wonder if you can give us some key insights and contributions that have been made by web science to better understand not just the way we do work now, but the changing nature of work and what some would argue is the future of work?

Gina Neff: Without understanding how our networks and relationships are changing with technology, we simply can’t understand how people accomplish the tasks and goals they have in the workplace. So this is going to seem like a tangent, but bear with me.

I’ve been studying large scale construction projects, skyscrapers and, you know, on the one hand, it’s one of the last industries that we would think of is high tech, and yet, a decade ago they were trying to figure out how to do remote work meaningfully.

The first web meetings — WebEx meetings I was ever in — were on construction sites. Why Because people who come together working on a construction site often have to travel from a two hour radius to get to the construction site. If you bring all of those different companies, people from all of those different jobs to the job site. It’s costly. It takes time. And much of the work that they were trying to do is coordinating in a digital space. And so they tried. They’re like, okay, let’s just have video calls. And yet, even though we have a very tight closed project group, everybody understands their roles and tasks, really highly structured, they struggled with figuring out how to come up with collaborative decision-making in these online virtual meetings. 

That’s a neat problem, right, and it’s a problem that we’re all facing right now. So I think when I think about web science, I think about the ways in which we’re mapping new kinds of communication ties that end up structuring our everyday life. Whether that’s from social media, whether that’s from our news environment, whether that’s from our workplace. And so when we look at how this becomes part of our daily way of working — part of our daily rules and ways of being — what I think as a social scientist, what we start to see is some really exciting things about how the fundamental rules of social life are formed. That’s what I think web science can do.

Noshir Contractor: That is that is incredibly important and significant and I was wondering also, how this would tie into some of the earlier work that you did, the book that you wrote titled “Venture Labor Work and the Burden of Risk and Innovative Industries.” Acting with technology, you were one of the early scholars looking at this issue clearly from a web science point of view.

Gina Neff: Yeah, so that project really asked the question, why on earth would an internet industry form in New York City. If we have the capacity with the new commercial worldwide web, to have these links that allow us to work remotely, why would a thriving industry form in some of the most expensive real estate in North America, right in the center of Manhattan?

And the answer is kind of twofold. One is there was a supply of creative individuals who worked in adjacent industries, in advertising and in film and writing and in magazines and that they could come together and basically create new kinds of content to fuel the first wave of commercial web activity.

Now, that’s part of the story. But the other is that we know that innovative industries really thrive and prosper on these close links that people have and that that big that kind of information becomes both the way that industries can understand the event horizon that they face, right? They have all of these people fairly closely together who share information, share new technologies, share ways of doing things, they share new kinds of companies, you know, this is the hope that everyone has of the new Silicon Valley, right, what, what makes it an innovative industry.

And so you put these two pieces together, you have a bunch of creative people and they’re all trying to figure out how to make a new industry.

It — I argued in the book — becomes a way that new kinds of risk gets shared and dispersed and spread across a new industry. And that I think is really interesting for us to think about in this particular moment because people learn to adapt and people learn to take on that risk in these new kinds of environments and they, and they welcomed it in the first wave of the web.

Noshir Contractor: That’s a good point that in terms of the first wave and the second wave. I think initially, a lot of people in the first wave we’re focused on broadening our networks to the point where, we could be anywhere in the world and have virtual organizations and virtual teams, But what you’re pointing out is that the web has at least an equally important role in helping people who may be co-located also augmenting their interactions and communication and collaboration by leveraging other aspects of the web that we might not have looked at initially.

Gina Neff: And I think that, you know, in this moment right now we’re having this conversation. You’re sitting in Evanston, I’m sitting in Oxford. Most of England and much of the United States are sitting at home because we’re fighting the coronavirus pandemic.

So what we don’t know yet is how these initial stores of our social networks, our social capital get translated in this incredibly highly stressed moment, so that  people can can become useful and and do their jobs and make the connections in a moment when we can’t travel. When we can’t see each other face to face, right? This is — it is incredible to me. Can you imagine doing this 15 years ago, right? If I go back 20 years ago, if I go back to this moment in Silicon Alley, in New York City, where content creators were so excited about the possibility of what the web could be. They were doing serial small video segments, you know, there was a company pseudo.com that had the, the hubris to turn to 60 minutes and said, I will put you out of business. No one’s heard of pseudo, right? Pseudo is gone, it’s long gone. But the idea that a web company doing streaming video was so laughable in the year 2000, right, it was so inconceivable that we could have the bandwidth to do this thing that we’re doing right now. The high quality video, high quality streaming, high quality interaction. And so, we’re still in many ways in the early days of figuring out how these kinds of links and ties are going to be intensified in our face-to-face interaction, and then nourished through the other kind of digital-mediated ways that we interact.

Noshir Contractor: And this is a question and a challenge that web scientists like yourself have taken on. I was also struck about how with all the technology that we now have around us, they have the ability to be instruments tracking us digitally. And you talked about that in the book that you co-authored with Dawn Nafus, titled  “Self-Tracking.” And while that book was focusing more on the quantified self and instrumenting yourself in personal contexts, I would love to get your take on what might happen as technologies in the workplace are being instrumented to capture our actions, interactions and transactions. Where do you think self-tracking is headed in the workplace ? And what do you see as the promises and perils of that?

Gina Neff: One of the things that I really learned in the self-tracking project is that for many people, their personal data is something they’re very willing to share in an altruistic way. There are all these wonderful communities that I studied and the self-tracking in the self tracking project of patient community is where people share really intimate data, genetic data medical histories, things that you and I might see as much too risky to our own sense of personal privacy and protection and yet, these incredible people were driven to do this because they, they saw that in their data held the possibility for the cures to their illnesses. And in their data held the possibility of these incredible connections and ties to other people who were going through the same thing they did.

We see a moment that we’re in right now have been somewhat trapped, I think, between the necessity of contributing to a public good, but also so needing to understand where we fit personally in these. So I’ve spoken out quite publicly about back-to-work solutions that don’t protect workers’ privacy, right? What we know about organizations and workplaces, is that we absolutely have an imbalance of power between employees who need work and employers who might have other interests or demands in the workplace in terms of HR, management or legal exposure.

And so any kind of app that gets us safely back to work absolutely has to take these two kind of tensions in mind, right, have to be designed from the ground up first to tap into people’s altruism. People want to solve problems and they certainly want to solve the global pandemic that we’re in right now. But they don’t want to do it at the risk of their own livelihood, or their own their own ability to continue working, or their own,  so so you’ve got to, got to think about that kind of data as control, and data as power. You know, we kind of went off from your basic question. Now, suddenly everything we do at work is somehow also traceable and trackable. It’s a huge opportunity for those of us who study workplaces to think about how those networks at work might be changing, how might the networks for people, who are say, women at work, are we seeing how remote working changes women’s ability to navigate networks. That’s an open question and one that we’re going to have trace data to study. But at the same time, we absolutely need to be thinking about how do we create safe workplaces and how do we create better and more stable workplaces, given the fact that now everyone’s exposed in these new ways with their data.

Noshir Contractor: You hit the nail on the head. It’s really a dilemma in some ways, but also an opportunity to be able to understand how networks are changing today when they go virtual. It’s one thing when we are connecting with people that we already knew and we may be in a position where we are deepening those ties, but it’s an open question how this environment will work when we have to deal with people we had not met previously. How will the web environment accommodate the levels of social presence that we are used to in which we have a prior face-to-face interaction?

Now, you already made reference to the pandemic, but I wanted to just give you another opportunity and invite you to talk about what you think are the one or two most significant things that have been different for us as we navigate the pandemic and global cultural reckonings without the web.

Gina Neff: Can you imagine doing this without the web. Seriously. Can you, im- I mean, evidently there have been global pandemics before the web, but I can’t, I can’t think of them between how we have organized our shopping, how we have organized our home life, how we have organized our schooling, how quickly, within just a matter of weeks we transitioned from face-to-face to be at home around the world. I find it literally inconceivable. And I can remember a world before the commercial internet. So what does that mean? 

I think that one of the challenges that we need to remember about the web is that beautiful wonderful decentralized structure that is stable enough to permit this thing, to keep on going and perpetuating without centers of control, is the exact same protocol that allows us individually to navigate in very different ways. And so while my ability to reach out and regenerate my networks may not be damaged as much I as a manager, I as a supervisor as a, as a leader, I really need to remember that others I work with might not be able to do that as well. And so I think that that’s the, that’s the kind of catch-22, right? There was a wonderful example — a terrible example really here in the UK when shops opened up and there were very long lines at one of the discount retail stores, and they’re in the chattering classes in the media, talking about, you know, how dare these people wait in line going to these discount shops in order to buy clothes, that seems so risky, why are they doing that and forgetting the number of people who don’t navigate the world through Amazon, who don’t have credit cards at their disposal. Who, you know, navigate the web from their smartphone and therefore have a very different kind of experience through how you might buy and shop and deal. And by the way, this retailer has no online presence, right. So the retailer that allows the best discount on clothing in the entire country is not one selling online. And yet, this very large disconnect. So I think that one of the things that has been made visible is the way in which we navigate communities through the web are quite distinctive and we need to remember that others are doing their own distinctive path as well.

Noshir Contractor: I think these are really, really important points because you mentioned that you have memories going back before the commercial internet and so do I. And I don’t remember an overwhelming amount of discussion about designing the web to help deal with a pandemic. And yet, for some reason, it seems that many aspects of the web were designed perfectly to deal with it. On the other hand, as you have also been mentioning, there has been the downside of the web in terms of the ways in which at this particular point in time, it might be influencing certain communities. And I’ve heard you talk about the term infodemic and I wonder if you want to talk a little bit more and share your thoughts about that in the present situation.

Gina Neff: So when the head of the World Health Organization says very early on in the COVID-19 crisis that we have an infodemic, right. That is as they say, on, on, on Twitter, you know, not swimming in the lane, right? Come, come, swim in our lane, right, the lane of people who understand online communication. And I want to use the term meaningfully because you know, I, many of the ills that we talked about in terms of disinformation, misinformation.

The infodemic — there are reflections of a moment where there is ending, decreasing distrust of our social institutions. This is not coming from the web and the way in which people connect, this is really about how people feel a part of society. And that’s the challenge we’re having right now. We’re having an enormous challenge in Western democracies, of this relationship between individuals and the state and individuals to their communities and something will shift and change, we just don’t quite yet I think know what. When we bring that back to infodemic, right, when we bring that back to this idea that high-quality good scientific information is hard to come by. I’m sure you’ve seen in your social media feeds and in mine, I’ve had to deal over this crisis with people who don’t believe in science. They don’t believe in vaccines. They don’t believe the same thing I do, and I can take the approach that says — if I only convince them that, you know, if I only argue hard enough — or I can begin to say, part of what we need to do is not simply about the kinds of information, getting better information and better quality information we need to do that. But we need to get out the kinds of stories that we know have always connected us, and make us feel a part of something bigger than what we are individually. And so that’s what I think, you know, we see in this moment right it’s just a simple amplification of a social trend, a very large social trend that’s predated the web, it’s predated COVID-19 and this is coming together at this particular moment. It’s interesting times.

Noshir Contractor: Well, thank you again, Gina, for taking time to join us today to share your insights with us, and more importantly for your thought leadership in web science. Thank you.

Gina Neff: Thank you. It’s been a real pleasure.

Episode 6 Transcript

Brooke Foucault Welles: The really interesting thing about hashtag activism in particular is that it becomes this kind of shorthand organizing principle for people who have experiences that don’t normally get covered in mainstream media or by mainstream press. To come together and share those experiences and the collection of those experiences does two interesting things. 

So first it validates them right: so you’ve experienced something, I’ve experienced the same thing. If we can connect those experiences, then suddenly, our experiences collectively feel more real. And when people can collect these things together, they become newsworthy in themselves. So we’ve seen, over time, the mainstream press maybe not covering individual incidents, but covering the hashtag and the collection of those incidents as a newsworthy event. 

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the web science trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

Our guest today is Brooke Foucault Welles — you just heard her talk about #Hashtag Activism, the title of an award-winning book she recently co-authored. She’s a professor of Communication Studies and a core faculty member of the Network Science Institute at Northeastern University. She’s also the director of the Communication Media and Marginalization Lab at Northeastern. That’s CoMM for short. She studies how online communication networks enable and constrain behavior, with particular emphasis on how these networks both enhance and mitigate marginalization. And in 2019, she was a general co-chair for the 11th International ACM Conference on Web Science. Welcome to the podcast, Brooke.

Brooke Foucault Welles: Thanks Nosh. It’s great to be here.

Noshir Contractor: First, Brooke, I want to start by congratulating you on the publication of your book #HashtagActivism: Networks of Race and Gender Justice which you co-authored with Sarah Jackson and Moya Bailey, and was published by MIT Press earlier this year. I also want to congratulate you on being recognized by the international communication association with a 2020 applied Public Policy Research Award that was associated with the publication of this book.

I want to start with the title of the book. #Hashtag Activism. What does that mean to you?

Brooke Foucault Welles: Thanks, Nosh. So it’s an honor obviously to both publish the book and receive an award for our work and hashtag activism. You know, it was a term that was coined by the press that kind of malign this form of activism that emerged around the so called Arab Spring and the Occupy movement where folks use the internet as a way of organizing and proliferating messages of resistance and solidarity with marginalized communities.

We kind of co-opted or hijacked that term to really interrogate the role of the web can play in advancing progressive social justice movements. And our argument in the book, and more broadly in our body of work on hashtag activism is that hashtag activism is a logical and sensible extension of the use of media by resistance movements and social justice activists, dating back to the historical Black press and the civil rights movement and everything in between. Hashtags in particular have become associated with social justice movements in a way that’s meaningful and powerful and affecting social change.

Noshir Contractor: And you’ve talked about in your research talked about several specific hashtags that you have looked at over the years, things like “Girls Like Us,” “Ferguson,” “my NYPD,” what did you find when you were looking at these specific hashtag”

Brooke Foucault Welles: Yeah. So the really interesting thing about hashtag activism in particular is that it becomes this kind of shorthand organizing principle for people who have experiences that don’t normally get covered in mainstream media or by mainstream press. To come together and share those experiences and the collection of those experiences does two interesting things. So first it validates them right: so you’ve experienced something, I’ve experienced the same thing. 

If we can connect those experiences, then suddenly, our experiences collectively feel more real. And when people can collect these things together, they become newsworthy and themselves. So we see over time, the mainstream press maybe not covering individual incidents, but covering the hashtag and the collection of those incidents as a newsworthy event. As you know, in communication, mainstream media coverage is still the gold standard for setting an agenda and create a policy change.

Noshir Contractor: And from a point of view of web science, what insights have we gained by looking at the role that hashtag activism is playing in changing society and transforming the public discourse.

Brooke Foucault Welles: That’s a great question. So, one of the things that’s really striking about web science as a set of tools and also just sort of logic of thinking about the world, is that these conversations have always been happening but they’re happening in private, and in a way that was really hard to track, especially at scale. And now we can — Not only can activists and and people,  regular people find each other online, but we as researchers can find those spaces and reflect on them more fully and completely. Obviously we don’t have access to everything that’s happening, but we have access to so much more. And so much more of the kind of routine everyday organizing efforts that are going on. And so web science gives us the tools and the access to study that and understand how it works.

Noshir Contractor: Indeed, your book covers this topic quite extensively, but it ends in about 2017. A lot of people call the summer of 2020 a global reckoning on social justice. Given everything that we’ve seen recently, what new insights or what generalizations do you think about and reflect upon in light of these social justice movements?

Brooke Foucault Welles:. So one thing people will often ask, when they ask about this book, is why these particular events or why did this happen in the way that it did. And of course, we don’t have the counterfactual world where Ferguson didn’t happen or the Me Too movement didn’t happen or some other thing didn’t happen. 

But one of the things that I think gets lost when we talk about hashtag activism is that there are, of course, the sort of these spectacular events. So, high profile murders draw a lot of attention where people rally around and there’s massive spikes in certain hashtags. But these networks get built. So, the web is sort of inherently networked and these networks persist, right, so these conversations are still going on. It may be quiet ways. And so it’s not as if people stop talking about Black Lives in 2017 and didn’t talk about it again. In fact, they’ve been talking about it the whole time. And these networks kind of laid dormant. They weren’t covered in the media. And then another horrific incident happens. Lots of people are paying attention and we suddenly see not only the activation of the folks who are talking about these things in 2017 and 2018, but this whole new swath of folks who are suddenly in tune with what oppression and marginalization looks like because they’re seeing it and they’re experiencing it in their everyday lives. So I think, I think part of the reason we have that sort of massive surge of attention and the sustained both online and offline protest is that the networks are there and the networks are building, feeding each other and sustaining each other to keep this movement going.

Noshir Contractor: You raised two really interesting points. One is that these networks, if you may, go through periods of dormancy or latency and not much in the public eye, but they’re all there, and then they occasionally will surge in visibility. And the second thing I also heard you mention was that in many cases these networks are not exclusive with each other, that there’s a lot of overlap amongst these various networks and they sort of build on a symbiotic relationship amongst them.

Brooke Foucault Welles: Mmhm. I think that’s right. So we, you know, in the book, we focus on race and gender justice and we do have examples where it’s one or the other. But almost all of them involve both race and gender justice because, of course, those things are intertwined and I’ll add you know in this sort of COVID, pandemic era, environmental and health justice are also intertwined with all of those things. So we see the kind of multiplicity of oppression and marginalization coming to bear and really being discussed in these networks and grappled with in real ways.

Noshir Contractor: And another major contribution that you’ve made to web science is co-editing a volume called the Oxford Handbook of Network Communication. Along with your own work in that area, you’ve talked a lot about the transformative power of networked counterpublics. How does the term networked counterpublics relate to hashtag activism?

Brooke Foucault Welles: Yeah, so this is, thanks in huge part to my co-authors, we coined this term to get here. But it’s an extension of public sphere theory. So, very briefly, you know, publics are groups of people who engage in democratic deliberation. Certain folks, historically, and presently are excluded from those kinds of deliberations so, people of color, LGBTQ folks, women and so on, aren’t full participants, or aren’t always full participants. They form counterpublics. And, you know, fast forward to the networked and online era. Of course, these things are playing out online as well. So “networked counterpublic” really captures this idea that there are groups of marginalized folks who are coming together online to discuss issues and then also advance counter narratives in the mainstream. So it’s kind of a heady theoretical term, but it also has these really applied consequences and implications that we, we can see it happening in these coalitions of folks on the internet bridging across web pages, blogs, different social media platforms and so, to really create an advanced an agenda for social change.

Noshir Contractor: Now this might sound as though the web is just absolutely superb and fantastic and utopian for celebrating network counterpublics, hashtag activism, etc. And yet there are stories on the web, where, for example, in the gaming community, there was a lot of attacks against female members of the community, etc. Can you talk about how the web might be having the dual edge effect — both positive and negative — in the context of some of these issues?

Brooke Foucault Welles: Yeah, of course, that’s right. The very same systems that enable progressive racial and gender justice activism, among other things, also enable regressive radicalization and harassment. And I also note that although I think the architecture of the web is set up to be open and available for everyone, they have corporatization of the capitalization of the web has created the structures that are that are actually hard for marginalized folks to engage in. So the fact that we have Black Lives Matter or Me Too is not entirely because of the web, but sort of, in spite of some of these corporate structures in place.

So one of the things I think web science in general, you know, in society in general, frankly, really needs to grapple with right now is what does it being an open Web actually look like and how might a web serve better the cause of justice better, more than the flow of information sort of unfettered, in ways that can be harmful. And, you know, there are no right answers to that. But I’m confident the web science community can figure them out.

Noshir Contractor: So are you suggesting that we need to have a hashtag activism that focuses on the design of the web and to open the web, #OpenTheWeb?

Brooke Foucault Welles: I love that idea. Sign me up.

Noshir Contractor: Great. Well, I think there are many who would agree with you on that. So, based on your perspectives and all the scholarship and activism practitioner work that you have focused on in your own research and scholarship, what do you consider some of the most significant issues that need to be further addressed by web science?

Brooke Foucault Welles: For sure, #OpenTheWeb is one of them, so how can we think about not just ethics and websites but justice in web science and how we optimize a web for justice in order to correct the current and historical harms. You know, I would also love to see a tighter integration in, you know, not just the social sciences and engineering and stem sciences, Which I think web science does really well, but bridging into the arts and humanities as well. So I think there are space to come up with interesting collaborations and interesting futures for the web, when we bring in kind of the full spectrum of folks working on these spaces.

Noshir Contractor: I think that’s that’s a very inspiring idea. Can you just off the top ot to put you on the spot — Can you think of a couple, couple of strategies that you would offer if hashtag open the web became a thing? And someone approached you and said, What’s the, what’s one thing we can do to stop in that direction? What would you come up with?

Brooke Foucault Welles: That’s a great question. So you know immediately, one thing that comes to mind is creating activist-centered tools. So activists are now working within the sets of tools that are provided by corporations which you know comes with things like surveillance of their data, corporate control over what they can and can’t say. So more open access tools, things that folks can use on their own terms, where they can retract their data if they so choose, or maybe end-to-end encryption in such a way that they can’t be surveilled comes to mind easily. You know, I would also love to see obviously more Black Indigenous and minority ownership over some of these systems. So, promoting, not only the, you know, kind of educational pipelines, which I think we’re increasingly sensitive to but also the corporate pipeline. So how do we get folks who are not just developers, but CEOs of companies and organizations working in this space and obviously then collaborating with folks like that to make sure that their businesses are sustainable, you know, well researched and accessible to everyone.

Noshir Contractor: That sounds really exciting. You mentioned that one of the things that web science does well is cultivate collaboration between the social sciences with STEM fields science, technology, engineering, mathematics, and you also advocated for bringing in more of the arts and the humanities. Can you give an example of a web science project, can you point to one that has done that well, or a hypothetical project that could do that really well?

Brooke Foucault Welles: There’s a book called Data Feminism, which is just a lovely example of, of how embracing the arts, humanities, social sciences and technical sciences, yields new insights on how to observe and subvert power on the web. So, I totally recommend folks read it. And they did a lovely job, which is integrating across all of these things, showing how, you know, systems can oppress people but also help people kind of subvert those oppressions through clever hacks and understanding both the art and the science of the web.

Noshir Contractor: And so, in what way did the arts and humanities contribute to this effort?

Brooke Foucault Welles: So in some ways by studying sort of arts and artist collectives and gives inspirational ideas about how to think differently about power and organizational structure, right? So folks often have very different ways of organizing over there. So that’s one concrete example. I think other ways, you know, sort of applying web science tools in order to create things that don’t have sort of capitalist monetary value, if that makes sense. So things that are lovely to engage with are beautiful, but aren’t necessarily efficient or profitable, I think helps scientists think differently about the value of their work so engaging these exercises a student or practitioner might inspire new ways of thinking about what it is we’re doing here.

Noshir Contractor: That’s really good. Well, in closing, here, I wanted to ask you a question that is relevant to our current times. And so what is the one or two most significant things in your opinion, that would have been significantly different, or for better or worse during the COVID-19 crisis, and then the other crises that have now come along with that, to what extent would this have been different without the web? Can you conceive of what today would have been without the web.

Brooke Foucault Welles: So I’m going to give this response from my own sort of unique perspective as a, you know, an American, living in Massachusetts raising kids. At a federal level, we had a pretty complete failure of communication, right? So, so, that’s an interesting example of when centralized communication really broke down, in terms of what to do and how to do it, but we saw lots of people rising up on the web and taking that spot disseminating good information, science-backed information on how to handle ourselves during a pandemic here in the US. My friend’s an epidemiologist, suddenly got zillions of followers on Twitter. You know, rectly. I don’t think that would be possible without without the social media or web. because folks are looking to scientists di

The other thing I want to lift up is just the incredible work of teachers and educational technology-makers to create opportunities for children to stay connected and sustain learning, you know, as a parent of kids in that age category. It’s been incredibly helpful not only you know for their benefit, but for my benefit as someone who then needs to work and find a way to educate children and work at the same time. The fact that those tools exist and that they can be disseminated locally, you know, regionally and even globally and kids can continue to have learning experiences engage with one another and with educators all over the world is pretty incredible. So I’m grateful for that.

Noshir Contractor: And I’m glad that you preface it by saying that you were making these observations as an American based in Boston, because we know that these kinds of privileges that we might have here are not necessarily universal and that while the web is the World Wide Web, the benefits of the web are not necessarily worldwide. And so your points are really well taken up there.

I want to again thank you so much for talking with us about the work you do. You are uniquely positioned as one of the rising stars in the area of web science, both in terms of the research you do, in terms of the activism you do, translating it into applied areas working across a variety of disciplines. And I also want to take a minute to thank you for helping build a community of web science. You’ve been one of the organizers of web science conferences in previous years, and you’ve been very active and engaged member of that community. So thank you again very much for taking time to talk with us today.

Brooke Foucault Welles: It’s my pleasure.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

Episode 5 Transcript

Fil Menczer: Astroturf is alive and well, unfortunately, and it’s getting more sophisticated and harder to detect. And so in some sense, it’s job security. There’s no shortage of research challenges, you know, even 10 years later to try to identify this kind of manipulation.

Noshir Contractor: A decade ago, Fil Menczer was studying digital astroturfing right as it was ramping up online, and he’s continued with that work. But that’s not his entire breadth of research. Fil is a distinguished professor of Informatics and Computer Science at the Indiana University School of Informatics, Computing and Engineering. He’s also the Director of OSoMe — not just the word awesome, but the Observatory on Social Media. Shortened, it becomes OSoMe, pronounced “awesome.” 

His research spans web science , computational social science, network science, and data science. He focuses on analyzing and modeling the spread of information and misinformation in social networks, and detecting and countering the manipulation of social media. Besides all his professional activities and accomplishments, Fil has been an early fan of the web science movement, and in fact organized the Web Science Conference in 2014. Welcome, Fil. 

Fil Menczer: Thank you very much for having me, Nosh.

Noshir Contractor: Let me start with something that I know you spend a lot of time thinking about and uniquely positioned to help kick us off here. How do you think social media can be manipulated for the spread of information?

Fil Menczer: Essentially, you know, social media are platforms that let people communicate and share their opinions and their thoughts. And also everybody has a responsibility in also spreading other people’s opinions that they agree on. So in some sense, we’re all editors, but we don’t all have, you know, the ethics and the experience and the skills of journalists. we’re vulnerable to being misinformed, we’re vulnerable to spreading misinformation ourselves. 

On top of that, platforms have all kinds of mechanisms that they use, for various reasons, often very good reasons. For example, trying to figure out what’s interesting and making recommendations about who to follow or who to friend, or what to pay attention to. And our research shows that all of these mechanisms have some unintended consequences. So for example, showing people how many people have liked the video makes them more likely to look at it. And that’s something that can be gamed. Or recommending, a friend of a friend might accelerate the formation of echo chambers, where you are exposed to less diverse points of view, and perhaps even more vulnerable to you know, to be manipulated. 

And then on top of all of that, platforms generally have API’s — application programming interfaces — which are ways in which one can write code and programs to interface with these platforms. On the one hand, this is wonderful because it allows us to collect data and do research. It also allows different people to come up with new applications, new ways to use this data. And those are good applications. But at the same time, it also allows bad actors to manipulate the platform by creating fake personas by impersonating people, by creating the appearance that many, many people are sharing your opinion, or angry or happy or supporting an idea or attacking a candidate. Where in fact, this is all the work of maybe one single entity. And so people can be tricked. Because our natural you know, cognitive and social biases tend us to trust things that come from our friends or or pay attention to things that look like they’re getting a lot of attention. And those things can be gamed. 

So it is really easy actually to create social bots, that’s the term that we came up with several years ago to identify these inauthentic accounts. And then those accounts can be used to game and manipulate and also to amplify the spread of misinformation. We’ve shown that in our work as well. So it’s not a simple answer. There is a very complex interactions between different algorithmic biases and social and cognitive biases that play together in creating this ecosystem of information, which unfortunately, is vulnerable in many ways.

Noshir Contractor: You were one of the first people if I remember talking about astroturfing on social media. Can you tell us a little bit more about how far we’ve come in both the growth of astroturfing and how we can combat astroturfing today?

Fil Menczer: Yeah, in fact, it was 2010 when we started collecting data from Twitter, on a large scale, and actually there is a connection to web science there. I don’t know if you know the story. But at the Web Science Conference in 2010, there was an article by (Panagiotis) Takis Metaxas and Eni Mustafaraj on a bunch of fake accounts that had attempted to manipulate a special election that was happening in Massachusetts at that time, and to replace Kennedy who had died. And they found that the night before the elections, a bunch of fake accounts pushed some misinformation about the Democratic candidate. And that generated a lot of traffic, even though Twitter took down those accounts very, very quickly, because they were doing typical things that spammers do. And despite this on the day of the election, if you search the name of the candidate on Google, you would find this fake news because those social bots had been successful in creating a viral cascade. And then Google picked up that signal in their search engine. 

So that was a fascinating paper, it actually got best paper award at web science. and so as I was watching it, and talking with, you know, with Takis and Eni afterwards I was thinking, you know, is this an isolated incident, we need to get more data and see if this is, in fact, just the tip of the tip of the iceberg. And that’s where we started this whole study of manipulation, of social media, and astroturf, which is like fake grassroots campaigns. And what we found, in fact, is that it was very widespread. And when you look at systematically everything that was being shared on on Twitter about the elections, that was a midterm election year, there were thousands and thousands of memes and links to fake news. And that’s the year that we found the first instances of fake news websites, and we found bots that were coordinating to support a candidate, and to amplify and make it trend and bots, they were spreading fake news — real fake news, like completely manufactured, made up attacks against candidates and then targeting journalists trying to get it to go viral. And that’s when we realized this was a system that was extremely vulnerable. 

And our first tools to detect this were based on looking at the structure of the network, of the diffusion of networks. And that gave us some good signals, so we could build very simple machine learning algorithms to try and detect these kinds of astroturf. And over the next 10 years, you know, that has continued and and and now we’re, you know, we’re looking at individual accounts that may be inauthentic as well as coordination, that doesn’t happen even without donation, they may not be bought, but they may be a bunch of accounts that are run by people, but that impersonate other people. So even though it looks like it’s 1000 independent voices that is pushing a particular message or conspiracy theory, it’s really one, you know, entity that’s really controlling all those accounts, even though maybe they are using software, maybe they’re not using software. So astroturf is alive and well, unfortunately, and it’s getting more sophisticated and harder to detect. And so in some sense, it’s job security. There’s no shortage of research challenges, you know, even 10 years later to try to identify this kind of manipulation.

Noshir Contractor: Could you talk a little bit about the network by which these messages spread? Is there a way to tell whether a message was astroturfed? In other words, artificially made to spread as compared to one that was truly organic and truly grassroots, rather than artificially grassroots?

Fil Menczer: Those were the early days, where a lot of these kinds of manipulation was easier to spot than it is today. It was easy to detect some of these manipulation but of course, there might be other astroturf and social bots and malware and manipulation that we did not catch, you know, so we only know the things that we did find. But among those at that time, our intuition was that, like I said, the structure of the network could provide useful cues. So what we did is we built this diffusion network, where a node is an account, and a link between the two accounts identifies either a retweet — at that time quoted tweets didn’t exist yet — but it could also be a mention or reply. 

And so now we have this network with different kinds of edges. And we can look at things like you know, how influential a node is by looking at how many times it is retweeted. So we could look at the distribution of hubs or popularity or influence among the nodes and extract statistical features. For example, you know, the skewness of the distribution of the degree or the strength which is the way that degree of these nodes, we could also look at the community structure. Was the network fragmented into, into many different groups or was like one big connected component in the network? And also, you know, was the idea whether it was a link to a fake news site or hash tag or whatever, was it injected by many independent people?

And also, we could look at the distribution of the weights on the edges, right. So for example, if you have two accounts that retweet each other thousands of times, you know, that would be demonstrated by a very heterogeneous distribution of weight degree. And that was a very strong signal. So the very, very first two bots that we discovered, was in that way, we found that there was this edge between two accounts that had a weight of 10,000. And we thought there was a bug in the code. And eventually we realized, no, no, this is no mistake, these two accounts, are we shooting each other 10,000 times in the last week. And so we looked at them. And then we realized, “Oh, my gosh, these are obviously bots.” They were two accounts that were just automatically posting and reposting things at very, very high volume. And now today, if the two accounts that did that would be immediately detected and suspended by Twitter. So you have to be a little bit more sophisticated in order to evade detection. But at that time, simple signals like those were sufficient. 

And these days, our bot detection algorithms use much more sophisticated algorithms to look at over 1000 different features that characterize not only the structure of the network of diffusion, but also characteristics of the accounts, of the profiles of their friends, the content that they generate. We do we do speech analysis for the content, we look at sentiment analysis, we look at temporal patterns, for example, not just how frequently they tweet, but also do they do it in a bursty way, like humans, or in a regular way that looks more automated. So there’s lots and lots of different signals that we try to pick up to try and infer whether there is some, you know, automation.

Noshir Contractor: Well, one of the things that you’ve described is that there’s a constant, cat and mouse game between your ability to detect structural signatures and signals, and what those who are trying to evade your detection are going to continue to improve those signals. In the context of bots, are we at a stage where you find that bots are being created to create new bots?

Fil Menczer: (Laughs) That’s a very interesting question. Meta bots. We haven’t seen evidence of that. However, what we do find evidence of is some sets of accounts that are all very, very similar to each other. So for example, they all have, you know, a pattern in their name, like a common first name, followed by an underscore by a common last name, followed by a sequence of digits. Also, they might all have the same description.. So a lot of times what we what we find suspicious is not the behavior of a single account. So you have to look at not the pattern of an individual account, but the pattern of a group of accounts. And then you might say, each one of these accounts looks perfectly reasonable, it looks like maybe you know, a person who posts about politics, maybe supporting this candidate or that candidate. But now if you look at 10 of them, and they see that they are tweeting at the same time, or they’re tweeting exactly the same sequences of hashtags, or they’re all retweeting one account that they’re trying to, to support and amplify. And then that’s where you say, “Well, what is the difficult probability that by chance, you have this kind of behavior by many independent accounts,” and if that probability is very low, you know, let’s say 10 to the minus four, then you say, “Okay, this is a suspiciously similar behavior, probably there is coordination.” 

Other examples are accounts that post the same images in sequence, or very similar images. So there’s lots of ways that we’re looking at to identify this kind of coordination. And so that coordination in some sense comes because under the surface, there is probably an entity that is using software to automatically control all these different accounts. Even if the messages are coming from a person like there is a human that says, you know, go red or vote blue. But that human now is doing that on 100 accounts or 1000 accounts. And so we can spot the pattern of similarity that gives it away in some sense. And so those are ways in which, you know, the arms race that you were talking about is really happening. Not only individual, you know, bots are becoming more sophisticated, but also humans are mixing with software to create accounts that are more difficult to detect by looking at individual accounts. And of course, looking at large groups of accounts is computationally much, much more challenging. And so it requires a lot more work and, and sophistication and also more, you know, computational power. So it is tough to catch all of this abuse.

Noshir Contractor: It almost seems like we need a Turing test to detect whether it’s a human being or a bot that you’re dealing with on social media.

Fil Menczer: That’s a very interesting observation. In fact, the key of the Turing test is that, you know, you were talking through an interface with either a human or the computer. And so the only thing that you could see was, was there, you know, whatever they were saying, and how, and in some sense, social media have made that easy for anyone, because all you see is the presence of social media, you have no way of knowing who’s behind that identity. Even platforms, you know, they they have access to some additional signals, like, you know, maybe phone numbers, IP addresses, but even platforms cannot really know sometimes for sure, who’s behind an account in certain typically, they can see where they are violating some terms of service, and whether there is coordination, but nobody knows who’s behind it. 

And so this means that there is plausible deniability, if this campaign is trying to promote a particular candidate, that candidate can claim, perhaps correctly, that they had nothing to do with it. And there is no way of proving who’s behind it. Very, very rarely, do we have, and only through extensive work by intelligence services, can we say, “Oh, for sure, there was that particular state actor behind this activity.” In the majority of cases, at best we can detect them and maybe alert people or perhaps remove them, if they are, manipulating the public or abusing, you know, the rules, but we cannot really say, “Oh, that that actor is behind it.” And so that actor is free to just start over, maybe tweak the algorithm and do it again. So yeah, it is as hard as, as the Turing test.

Noshir Contractor: One of the things that you are really well known for his being the director of the observatory on social media, which abbreviated to OSoMe, and it’s pronounced awesome. As someone who appreciates the creation of clever acronyms, I’m truly impressed with “OSoMe.” And I wanted to ask you a little bit about what went behind that there are many people who, for decades have talked about creating some kind of an observatory to study the web, you have done it, and you’ve done it successfully. And you now host several tools that you’ve created, that go beyond your own research, but actually gets used not only by others within the research community, but it’s routinely used by journalists, and so on and so forth. Talk a little bit about how you made this happen. And what were the lessons you’ve learned from it?

Fil Menczer: Yeah, so OSoMe. It is a, is a cute acronym. I can’t take credit for it. It was, I think it was our Associate Director for Technology who came up with this idea. The idea of using the word observatory in it actually came from web science, because the web science community, you leading it, among others, was really sort of pushing this this idea of collecting data on a large scale from the web, to get to a deeper understanding about some of the social you know, social impact and social phenomena, you know, of society, in some sense. Our behaviors, how are they affected by the information that we see, how is data about our online behavior is telling us something about social action, about norms, about behaviors, and about vulnerabilities, which was the part that interested me in particular. And so that’s where the idea of observatory came from, it came from the web science community. 

So, as you say, in addition to doing lots of research, we also like to develop tools. And that’s because, you know, in web science, we think that it is important to go beyond just research and to actually do work that can impact society in a good way. And, and for me, you know, based on based on our skills, one of the things that we could do to help a little bit was to make the tools that we build for research and push it a little bit further to the point that they can be used by a broader audience so that they’re not only useful to write a paper, although that’s important, too. But they can also be used, like you said, by journalists, by investigative reporters, by civil society organizations, and also by common citizens to gain an appreciation for whether they are vulnerable, whether they are talking with another human being, whether they’re being manipulated. 

So, for example, our most popular tool is called Botometer. And it started from our research on bot detection. And then we had a demo for a grant that we had, and as part of that demo, we thought, okay, let’s just, you know, put it on a little website so that you could, and then we realized, wow, this could be useful to other people. And so eventually, it became a public tool that is now called Botometer. And that is used a lot it. We serve between five and 600,000 queries per day.

Noshir Contractor: Wow. 

Fil Menczer: It’s very, very popular. Now, obviously, it’s not perfect, because just like any machine learning algorithms, it makes mistakes. And there is a lot of research challenges that we’re pursuing. And some of my students are working in their dissertation about how to improve these tools, how to make them better able to recognize suspicious behaviors that are different from those in the training data. Also how to combine supervised learning and unsupervised learning, as I was saying earlier to detect these coordinated manipulation campaigns that may not necessarily use automation. So there’s a lot of research challenges in building these tools. But we try to also bring the results of those out into things that can help other people. 

So Botometer is our is our best example. But we have several others.

Our latest tool that we’re kind of excited about is called BotSlayer. And it is a way to let people even without technical skills, set up an infrastructure in the cloud, where they can, with just a few clicks, track all of the tweets that matched some query as they happen and look at them in real time on their screen, and also have all of the entities that we extract from these tweets, an entity can be a link to a news article, it could be a hashtag, a username, a phrase, and then for each of these entities, see how many people are sharing it? How many unique people are sharing it? Are bots more likely to share this particular entity than something else? And also, is there coordination among these accounts? Are they all retweeting the same, you know, the same set of users and so on. So, in some sense we’re making a very complex tool that we’ve been developing for our research and putting it at the fingertips of other researchers. journalists, and nonprofit organizations, so we have hundreds of organizations around the world that are licensing this, and we hope soon to have the next version that will be a little bit better and more robust, and make it available so that people can use it to study COVID-19 to study, you know, the current protests around the Black Lives Matter movement, and so on. 

So those are some of the things that we have out there, there’s a few more. We really think that creating tools, you know, and making them openly and freely available to the community is an important part of our mission of the observatory.

Noshir Contractor: And your group is so good at it. And it’s really making a major contribution to academia, but also to society at large. And so thank you, again, for all the work that you’re doing on that front. One might get the impression listening to this conversation, that bots are always evil, especially when you have apps called Botslayer, for example. Now, is that true? And if not, then how can you distinguish between a good bot and a bad bot?

Fil Menczer: That’s a very good question. And absolutely, that’s not true that all bots are bad. You’re absolutely right. In fact, many, many bots are very useful. And we all use them, right? If you, for example, if you follow the feed of I don’t know, The Wall Street Journal, or the New York Times, or your favorite news source, that’s a bot, right? It’s an account that automatically posts things that you can extract from an RSS feed or you know, or some other source. And then there are some bots that are funny and interesting and entertaining, and others that are kind of trivial. So there is a huge range of behaviors, but many of them are perfectly innocuous or even or even helpful. And our research is focusing on detecting automation. Because when that automation is not revealed, then the bots can be used to manipulate. Now, if a bot says I’m a bot that tells the time every hour like at Big Ben, there is nothing wrong with that, it’s not trying to mislead anyone, right. And if it says, “I am the you know, the New York Times, and I post the news every five minutes,” there’s nothing misleading about that. People know who they are following. But if a bot says I am Nosh Contractor, and I’m a professor at Northwestern, and here’s why you should really believe that if you want to, you know, be cured from COVID, you should drink Clorox. I mean, that’s, that’s an inauthentic account, that is impersonating a person. And making it look like that person is saying something, which in this case, in this example, is false, and in fact, dangerous, very dangerous. And this is done a lot. 

So we hope that the focus on detection of automated accounts and not only automated accounts, also coordinated accounts, like I was saying earlier, can be useful in spotting this kind of abuse. That’s the one that we are worried about. Obviously, we’re not worried about benign bots. 

But sometimes the same technology that lets you detect one also lets you detect the other. So we train our machine learning algorithms with whatever bots we can find out there, either because they tell us themselves that they are automated, or because some human experts have looked at them carefully and concluded that they are automated or perhaps because you know, Twitter has taken them down, so we know that they were inauthentic. And so we use those data sets of labeled accounts to train our algorithm. And the hope is that they are obviously they’re not used to do anything against, you know, benign bots, but they could be used hopefully to alert people about the malicious ones.

Noshir Contractor: Thank you for clarifying that difference, because I think it’s an important distinction to recognize and appreciate that bots are not intrinsically nefarious and that we interact with them all the time.

Fil Menczer: But there is also like everything in between, right? There are accounts that for some while they are doing good things, and then they are turned or because you know they are hacked or because people let applications post on their behalf. And so you’ll have accounts that are partly automated, partly manually controlled. So it’s a very complex ecosystem where you find all sorts of complex behaviors, and it’s really hard to make sense of it.

Noshir Contractor: In closing, I want to ask you about something you have already referenced. We live right now in an age of reckoning when it comes to social upheaval, in addition to the pandemic. And I want to know, if you could share some of your opinions about how things would have been different, for better or for worse, if we were experiencing this without the web?

Fil Menczer: Oh, without the web? Oh, my gosh, that’s a really, really interesting question, and tough as well. (Sighs). Well, I’m an optimist and a technologist. So I would say that overall, probably the better outweighs the worse. But certainly, we have plenty of examples of both right? In some sense, the web has, you know, enabled some amazing advances, whether it’s in, you know, sustaining social movements for the advancement of humankind, creating public awareness about huge planetary issues and challenges with global warming, and, you know, pandemics and racism and creating awareness of these issues. Imagine that the economic harm of the current pandemic, as bad as it is, and it is terrible. Imagine how much worse it would be if we didn’t have communication technology, so that we could still teach remotely as badly as we do that. But at least, you know, it’s something that we could do remotely and, you know, let alone conferences and, and teleconferencing and so on. But just the capability of being able to, to connect with each other, even at a distance, you know, the world would be much worse-off if the web didn’t exist to enable those kinds of interactions. 

So there is a lot of good in it, there is good in it in allowing, you know, minorities or groups that have less power to put their message out there. So to some sense, the democratization of information. That was this utopia of the early days of the web that we all bought into, certainly I did. to some sense, it has happened. And so the world is all the better for it. For the same reasons why the web can be used for all these good things, it can also be used for all sorts of bad things. This has been true of every technology in history. And it’s also true of the web. And it’s true of social media. And it’s true of zoom even, the last thing that we now realize how it can be abused, so, of course, technology can be abused, and all the things that have been talking about.

And our research is really focused on those kinds of abuses and manipulation, whether it is spreading, you know, misinformation, or suppressing, the voting, which for me is one of the huge challenges ahead of us that one of the reasons that motivates our desire to detect manipulation, because we see that that’s one of the main applications, you’re probably not going to change people’s opinion, if you weren’t going going to vote for one candidate, you’re not going to vote for the other candidate. But you might let somebody decide not to vote, if you convince them that that candidate is really not that much better than the and the other. And I think that this probably has happened in the past and will continue to happen. And then we haven’t even yet seen large scale consequences of new technologies that are just now becoming mature, like deep fakes. And I think that those possibly could pose big challenges in the next few months. So as for anything, there are good things and bad things, and certainly the world would be very, very different without the web, for better or for worse, I would say more for worse. Overall, I’m still an optimist, and I think that we can make things better.

Noshir Contractor: Well, thank you again, Fil, for talking with us and giving us some really interesting insights about the role of bots and the ways in which you and your team has helped contribute to detecting the nefarious bots and helping make the world a better place as a result of that, especially with not only your own research, but the tools that you’ve made available. So thank you again very much. 

Fil Menczer: Thank you so much for having me. 

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

 

Episode 4 Transcript

 

Jen Golbeck: You know, I kind of jokingly said to someone at some point that I want to be the world’s expert on dogs on the internet. And I might be at this point, or at least up there with kind of pets on social networks. 

Noshir Contractor: That was Jen Golbeck. Jen is not only an expert on internet pets, but a leading voice in web science. She’s a professor in the College of Information Studies at the University of Maryland at College Park. You may know her from her TEDX talks or podcasts about web science and pets. She has been a research fellow of the Web Science Research Initiative, and gave a keynote address at the 2017 ACM Web Science Conference. 

Jen is also known for her work on computational social network analysis. Her models for computing trust between people in social networks were amongst the first in the field. And Jen’s also received a lot of attention for her work on computing personality traits and political preferences of individuals based upon their activity on online social networks.

Welcome, Jen.

Jen Golbeck: Thanks. I’m glad to be here.

Noshir Contractor: Jen, let’s start by learning about how you got interested in studying what you do now.

Jen Golbeck: I’m really lucky that the time of what was going on in technology and the time of my life intersected in a kind of fortuitous way. So the web came about, I think I was probably in middle school. And then when I was in high school, the early to mid 90s, I started designing web pages professionally, which you could do as a 15-year-old at that point. I did that throughout undergrad, and you know, through my master’s degree. Sometimes it was my entire income, sometimes it was a side income. But I was also on the path to get a PhD the whole time. And so when I came to the University of Maryland to get my PhD in 2001 in computer science, I met with Jim Hendler, who was my advisor. And I had actually started as an economics major at the University of Chicago, changed to computer science. But Chicago has the guys who did Freakonomics, you know, this behavioral stuff that wasn’t just, you know, markets and finance. And I loved that. 

And so I was like, Jim, how can I take this sort of stuff about how people behave, and things emerge out of that, and then cross it with the web, which is something that I’ve just been immersed in, you know, since I was kind of a thinking pseudo-adult? I said, “Can we maybe do like a social network and put that on the web,” and he was like, I mean, “That sounds interesting. Go ahead and try it and see what happens.”

That was 2001. So pre-Facebook, Myspace was just kind of getting started. And I was like, alright, I’m gonna study social networks on the web. I built some, you know, I studied some of the early ones that were out there. And so it got me into doing research, right as the entire universe of the web shifted into this place where humans were creating tons of content, people were spending a lot of time. And so it was just kind of natural, then, to flow into web science, you know, working in a lab that was looking at knowledge representation and putting information online, and then pulling my own interest of people online and what they’re doing and how to merge that with AI.

Noshir Contractor: It looks like you had the right time, and the right people to be working with, in addition to having the right skills for doing all of that stuff. One of the things that I remember reading earlier on about your work was this work in the area of trust-based recommender systems, and you developed a platform called Film Trust. Can you tell us a little bit about what you learned from that experience? And what do you think of the future of those kinds of recommender systems?

Jen Golbeck: That’s my dissertation work that you’re talking about. So I basically built my own social network because there was no data like this in existing social networks. And in there, you could go in and like, rate your favorite movies. Like you do now with Netflix or Amazon, whatever. And then you also could add friends like on any social network. But I added this system where you could rate how much do you trust this person to recommend a good movie to you, basically. 

And the question was, we had recommender systems at the time, like we have now with Amazon and Netflix and say, here are some movies that you might want to watch. Those generally worked by finding people with similar tastes to you and suggesting stuff they like, essentially. And so I was interested: Could we use trust that people express about their actual friends, and do a bunch of interesting AI with that and use that in place of similarity? So if I say I trust you about movies, even if it looks like we’re statistically different, can I maybe get some good information? And it turned out from that, that it does work. And it works really well in cases where I’m just very different than everyone else.

So an example I used to give all the time was, I’m a real film buff. I used to be a projectionist in a theater. I hated A Clockwork Orange, which is like a classic piece of cinema. I wish I had those hours back in my life. No one who is a film buff hates that movie, but I hated everything about it. And so recommender systems would see, okay, well, she loves all this classic cinema, of course, she’s gonna like that movie. And I’m like, any system that tells me to watch A Clockwork Orange is not one I want to use. Like, it doesn’t understand me. And trust is great at capturing those really extreme preferences on either end. 

And so it was this really interesting lesson in our social relationships and our understanding of how we relate to people has a power that statistics alone don’t capture. But we can put those things together with AI and some statistical analysis and all this data on the web, to kind of get the best of both worlds. And that’s now something that in all these personalization algorithms, like you see on Facebook, sorting your timeline, like you see in, you know, a lot of recommender systems, they’re incorporating those elements of social relationships. That was one of the things that I first investigated in that dissertation work.

Noshir Contractor: And that was very influential at the time. I remember looking at it, and people were beginning to understand whether trust-based recommender systems may be different, or augment purely algorithmic-based recommender systems. 

Netflix, for the most part, is making its recommendations based only on its internal algorithms certainly improved by the Netflix challenge. Is there a difference in the kinds of recommendations? Do you see some day when algorithmic recommendations like the ones that Netflix are doing will just get so good that you don’t need to rely on trust-based recommender systems or social network based recommender systems? And at the end of the day, I guess, do you trust your ability to report accurately who you trust?

Jen Golbeck: That’s a really good question. So I think it depends on the domain, you know, Netflix, I don’t think they really need to use a lot of social network data. Because for movies, you can get a lot about people’s preferences with the genre, the actors, like all this really detailed information we have. The same thing goes for like music recommenders, and all these kind of streaming music services that will make a channel for you. You don’t need a lot of social data for that it may help in little instances. 

But there are a lot of cases like, what do you want to look at on your social network feed that are much more social, and not just news stories, right? But like, whose friends’ kids’ updates do you want to see, you know, if it’s the person that you’ve been friends with, since elementary school, you may totally want to see that. If it’s, you know, some guy you met at a professional conference that you happen to friend on Facebook, you may not care at all about that. And your social network and your friends preferences can shape those sorts of personalizations in a way that I don’t think we’ll ever really capture with a purely statistical algorithmic recommender system. So I think depending on the context, the more social that context is, the more important it is to have social input to it.

Noshir Contractor: It also seems that in some situations, if you really trust someone, and they tell you something different than what you’ve seen previously, you might be more open to looking at it and considering it. While if a computer is basing its algorithm exactly on what you like, it may be less likely to provide you enough variety. One of the things where recommender systems have been criticized for is that they make you live more and more in an echo chamber. While in a social network, somebody that you trust might actually tell you something a little different that you may or may not like. If you don’t like it, you may not trust that person anymore, but you might like it as well.

Jen Golbeck: Yeah, it’s, it’s an interesting combination. I’ve had PhD students, I have one just graduated a year ago who was in journalism, looking at, you know, how do we figure out the news that people trust? You know, are they more likely to believe conspiracy theories or fake news or kind of legitimate, real journalistic standard kind of news? And how does that relate to the people who are sharing it with them? And I think that’s really important. 

If I have a really trustworthy source, who’s coming to me with something, that’s not what I might normally believe, it may make me more likely to consider that and understand that information. And so that’s one of those interesting ways where you could merge something like, here’s Facebook or Twitter with a really good model of what I’m going to like or click on or comment on, very good at that. Here’s stuff that’s gonna keep me engaged. But let’s broaden that out with a more diverse perspective. 

And something that they can see is different than what I might normally click, but coming from someone I trust, that’s a way to sort of say, okay, like, let’s expand to this viewpoint, and maybe look at optimizing things like the social good, or how informed someone is or how much they’ve considered a breadth of perspectives. It’s something that we need to get more into in the research, but I think social connections will be really critical if we start expanding recommendations like that. 

Noshir Contractor: And you continue to work on beyond predicting which movies that you might like, and you spend a lot of time looking at web data from the web, to understand individuals’ activities, attitudes, and behaviors. One in particular, was your work on trying to predict the extent to which a person might be able to stay within the Alcoholics Anonymous program or not? Can you tell us a little bit more about what you found there?

Jen Golbeck: Yeah, so that project we originally had started wanting to look at DUI recidivism, so someone gets a DUI, how likely are they to get another DUI, and we really dug for data on social media about that, but not surprisingly, people aren’t posting a lot about their DUIs. And, but what we did find in the process of looking for that is a lot of people talking about their problems with alcohol, going to Alcoholics Anonymous, if they were drinking. And so we did this study where we basically looked for everyone who announced on Twitter that they were going to their first AA meeting. And then we followed what they tweeted after that, you know, after filtering it out for jokes, or whatever, people who legitimately had drinking problems. And after they said that we looked at, did they stay sober for 90 days? Or did they go back to drinking, and we made sure they said, so it could be, you know, two weeks later, they complain, they were hungover at work, we knew they were drinking, it could be six months later, they were celebrating their six months of sobriety, so we knew they’d made it those 90 days. 

And then we just took all the data that we could model from their Twitter feeds, to try to see if we could predict that, you know, on the day, you announced you’re going to AA, can we predict if you’ll be sober? And so we looked at things like: Who are the people that you follow on Twitter? And how much do they talk about booze? How much do they use words about alcohol? Are you over 21 or under 21? How do you cope with stress, kind of using other AI as input. And these are all things that addiction researchers might consider. And so we use that as an input to our model. And we can predict with astonishingly high accuracy, 80% accuracy, if someone is going to stay sober or not, on the day they decide to go into treatment.

Is it good or bad, I really struggle with it. We have not made that tool available to the public, because I can see a lot of dangerous ways for it to be used. But it is also explainable. So if you say I’m going to go to AA and my algorithm says I don’t think it’s going to work, it can tell you, this kind of therapy might be helpful or changing up your social circle might be helpful, which I think could be really useful. And it’s one of those things where I am impressed as a scientist with the computational power of what we can predict from this web data. I am also very concerned as someone who plays in the social science space about the implications of that algorithm. There are good and there are bad and I think we just need a lot more work in the kind of policy space, the regulation space before a tool like that is brought out to the world.

Noshir Contractor: And this is exactly why web science is trying to navigate this balance between what can be accomplished technologically and what should be accomplished, or how should it be accomplished, from a social standpoint or a policy standpoint. And your work is just a really excellent illustration of how one tries to navigate through that dilemma. One of the things that this work shows is that if it goes in the wrong hands, for example, I can imagine somebody who is being pulled over by a cop, for example, right? And then the cop could potentially be using this algorithm in different ways to determine what kind of response in that particular situation. 

Jen Golbeck: One thing that we’ve seen with AI is that it’s used sentencing guidelines now. And it’s used in ways that we know are already profoundly unfair. But you can imagine this algorithm being included in the decision about whether to send someone to jail for a DUI or to send them into treatment, you know, if the algorithm says treatment will work, they can go to treatment, if not, they can go to jail. But the algorithm is wrong 20% of the time. It’s also pessimistic, so when it’s wrong, it tends to say you won’t recover when you will. Who knows what other biases are in there? We can’t really tell, but there certainly are some. So yeah, it’s really worrying to think about people who may very well mean well and want to make the right decision using this technology. Because AI has this veneer of objectivity, right, it’s math, it’s totally objective, it can’t be racist, or sexist or biased. But of course, it totally is, it just reflects human society. And people who don’t understand that and the errors and the pitfalls, may try to use it in ways that just echo all the problems that we already have, that we’re kind of trying to fix with technology. And that, you know, is worrying not just in this case, but in all these applications of AI and web data together that get out into the world. 

Noshir Contractor: Well, one of the things that you highlighted in your own work, is that when you have these predictions, if it goes into the wrong hands, for all the reasons you’ve been describing, could be something that could have unintended negative consequences. Do you believe that these predictions should always be given to the person involved? 

Jen Golbeck: I mean, generally, I think they should always be made available. Right? If I ask, I should be told 100% of the time exactly what I’ve asked for, I think I have a right to know that. I may not have that legal right right now, especially in the US, but it’s something that I am working hard for us to get, I think it’s important. Will everyone want to know, you know, not necessarily. A lot of this is benign — personality traits, what are your political preferences, stuff we already know about ourselves. But this alcoholism example, as one, you may not want to know when you go into AA if an algorithm says it’s going to work, you know, if the algorithm says AA won’t work for you, when you’re going and you really want to solve your problem, it may be discouraging to the point that you are so fragile in that recovery that you decide not to continue. So you may say I don’t want to know what the algorithm says, because it may tell me something that won’t help.  So I think they should have a right to know if they want to, but it doesn’t necessarily need to be automatically shared.

Noshir Contractor: I want to move us from social networking to social petworking. And I wanted you to tell us a little bit about the research that you’ve done on looking at social network sites for dogs versus cats. And why, turns out, that they do different things on these websites.

Jen Golbeck: I kind of jokingly said to someone at some point that I want to be the world’s expert on dogs on the internet, and I might be at this point, or at least up there with kind of pets on social networks. So I’ve always been fascinated by how people put their pets on social networks, as I’ve followed the development of these. And the work you’re talking about is some work that was looking at some of these early pet social networks, Dog-ster and Cat-ster, there is a Hamster-ster, where you could create a profile for your pet, make them friends with other pets just like you would do on Facebook or any other network. 

And what we saw when we studied this is that people use them quite differently. Cat people tended to participate in these kind of community forum discussions, they would do these role playing kind of games and exercises from their cats’ perspective. So there would be these like, cat weddings were very common. People would pick their cats to get married, everyone would come like they’d send invitations, they talk through the reception and everything at a particular time. Dog people tended not to do that kind of thing. Now, this was kind of early days of social networking, you know, mid 2000s. And we’ve seen that kind of behavior shift on to things like Twitter and Instagram now, where I have very popular social media pages for my dogs. I don’t post in their voice, but some people do that. 

It’s a really interesting way, where you can see like cat videos were the thing on social media for a long time. Dogs maybe have eclipsed that a little bit recently. But people interact in really different ways through that. One of the most popular cat social media accounts is, I think, “black metal cats.” And it’s like, death metal lyrics with pictures of cats. And they tend to kind of embrace that, that stereotype of cats, where dogs’ social media tends to be very wholesome and encouraging and supportive. It’s interesting that, you know, all the research bears out dog and cat, people tend to have different ways of approaching life. As that moves on to the internet, we’ve seen that consistently that they — they tend to behave in different ways, which I think is kind of fun and wholesome, interesting kind of research in the space.

Noshir Contractor: One of the things that I found really interesting is your explanation as to why cat people are more likely to organize virtual playdates on the web, as compared to dog people. And you mentioned that one of the reasons that might be the case is that dog people take their dogs on walks. And that might make it more sociable than most people, who don’t take their cats on walk. 

Jen Golbeck: I am doing some research on this topic right now, sort of the benefits of having a dog and a lot of research benefits of having a dog, not just online but offline, is that it absolutely makes you more social, because you’re out there walking the dog, even if you don’t go to a dog park, you tend to encounter other dog people, you can talk about your dog, if you’re a dog person, you know, if there’s people you see regularly on your walks, you may not even know the humans’ names, but you know the dogs’ names, you recognize those people. So you get to have these social interactions around your dogs, that you really don’t get to have as much as a cat person, because you’re not out in the world encountering it. So virtual spaces provide an opportunity to socialize around those pets. And I think that’s, that’s one of the like, really good things that we’ve seen in general on the web, is that it’s created these spaces for people to connect socially, whether it’s around a rare disease that they have, or you know, a life struggle they’re going through, or their pets or their hobbies, where it was hard to do that in-person, because there just wasn’t a lot of density or opportunities. The web has created those spaces, and so even though it can look a little weird looking in on these online cat communities, I think it’s great that it provides those opportunities to socialize.

Noshir Contractor: And unfortunately, many cases now, virtual playdates have become much more prevalent in general today because of the pandemic, in addition, of course, to the social cultural movements that are experiencing right now as we have this conversation. I want to close here, by asking you, if you could reflect on one thing about what we are experiencing in either of these two areas, and to see how it would be different, better for worse, if we didn’t have the web today?

Jen Golbeck: So I tend to be the pessimist painting the picture of our dystopian future and warning people of the bad things that are gonna happen. I’m not going to do that here, I will give you a positive view of what’s going on right now. You know, you especially look at the Black Lives Matter movement, everything that’s going on with the protest, police brutality, and then also against the administration’s handling of COVID. Social media has been a really powerful place for that. 

And I think an interesting way to think about these movements, not just right now, but going back to like Ferguson, looking at the Me Too movement, these, these social movements that have come is that if we look in pre-web times, our media was very much controlled and gatekept. We had a few major networks, they decided what was going to be shown, those were the voices that we saw. And they tended to be white voices, and male voices. I was born basically, you know, a little before 1980. So I remember as a kid, constantly being irritated in a way that I couldn’t describe, at the way women were portrayed in commercials and on TV. It wai not how I was, and it’s not how I was raised, but was so frustrating that the dominant view of women was like, “Oh, we need to be helped.” And you know, “I’m sort of ditzy and whatever.” And, you know, I can only imagine the experiences that people of color had with that. 

Social media and the web have given voices to every community that cannot be suppressed in the way that they were when there were these gatekeepers. And we have seen these movements. We saw Ferguson, which is something that I think would not have been covered by the media in the same way, if everyone was not on Twitter, when that was happening. We have the Me Too movement that gives people a voice to sort of challenge these large voices in ways that they couldn’t before that. 

And I think if we look right now at what’s happening, especially with the introduction of mobile and everyone having access to these platforms, from their mobile devices, posting videos. posting pictures, challenging what we would always see the police say. You know, there was a video that came out of police arresting, taking a teenager off the streets of New York City throwing her into an unmarked van and driving her away. There’s tons of videos of this, and the police say she was wanted on a bunch of other charges. And then the police were attacked with rocks and bottles. The video shows that they were not attacked with rocks and bottles. None of that was happening. And before social media, we would have gone “Well, the police said this. So that’s probably what actually happened.” And now everyone has the power to challenge these dominant voices. 

I think that shift of power away from institutions and people that have traditionally had it is very uncomfortable for the people and institutions that have traditionally had it. But it’s incredibly powerful in shaping and pushing for the change that we have desperately needed for a long time. So I think the web has facilitated that shift of power in a way that is so good for society. Even though we’re very disrupted right now, for lots of reasons. I think the web is playing a net good role in that. And it’s, it’s one of the, you know, most powerful influences that we’ve seen in the last 20 years. And I think it’s going to continue playing that role.

Noshir Contractor: I’m glad to hear that you are so optimistic about these things. And frankly, I’m optimistic knowing that scholars like you are leading and pushing the frontier in the area of Web Science. As you mentioned, I have been following your work since your dissertation days and have been really impressed with the ways in which you’ve been doing high-quality work, socially responsible web science, and being able to translate it well. And I definitely recommend that our listeners, follow you on Twitter and on one of your many Twitter accounts that you have, as well as, listen to your talks on on TEDx because they are really, really compelling. Thank you so much again, Jen for taking time to talk with us.

Jen Golbeck: Thank you. It was a real pleasure as always.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

 

Episode 3 Transcript

 

Wendy Hall: I always say there’s two things I don’t like about web science. One is web and the other is science. But the idea there was that it wasn’t just about the technologies, HTML, HTTP. It was about the web of people actually, it was about interconnectivity, and science in the sense of study of it. And nowadays, I say it’s, we’re studying, you know, our lives online, basically finding ways to do that.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast, we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web. 

Our guest today, Dame Wendy Hall, was involved in naming the very field we’re talking about. In a different world, we could’ve sat down to chat about philosophical engineering movement, or psycho history. But Wendy and other co-founders decided to call it web science.  

Wendy was a Founding Director of the Web Science Research Initiative and is the Managing director of the Web Science Trust. She became a Dame Commander of the British Empire in 2009, and was elected a Fellow of the Royal Society in the same year. She was elected President of the Association for Computing Machinery in July 2008, and was the first person from outside North America to hold that position. And now, Wendy is the Regius Professor of Computer Science at the University of Southampton, and the Executive Director of the Web Science Institute there.  In 2020, Wendy was appointed as Chair of the Ada Lovelace Institute by the Nuffield Foundation. Welcome, Wendy. 

Wendy Hall: Hi. It’s lovely to be here.

Noshir Contractor: Thank you so much for joining us, Wendy. It’s a special privilege to be able to talk with you today about web science, because in many contexts, I would consider you as the matriarch of web science. And I would like to begin by asking you to take us through what motivated you and your colleagues to come up with the idea of creating this entity called web science.

Wendy Hall: Well, thank you very much for asking me. We being myself, Tim (Berners -Lee), Nigel (Shadbolt), and Danny (Weitzner) — started meeting in Tim’s office at MIT, to talk about why the Semantic Web was not being taken up more than it had been: Why weren’t people interested in linking data? I’ve known Tim since before the beginning of before he launched the web, and had been around the evolution of the web. And he talked about Semantic Web in his very first keynote at the first Web Conference in 1994. But then everyone was focused on getting the web up and running. And that was thought of as a web of link documents. And we didn’t have social networks yet. The Semantic Web was always part of Tim’s big vision. That machines could help you link data, and when you could link data, then you could infer knowledge from the documents that you were linking or from whatever you were linking, if you could describe it with data.  

And people didn’t just didn’t get it, the web consort — W3C — had developed the Resource Description Framework. He published his paper with Jim Hendler, and others all about what the Semantic Web would mean. And he just couldn’t get people to think about linking data. And so when we started talking about this, and this was 2004, 2005, so 15 years after the web was launched. It was clear that we had to look back in order to look forward. 

So we started to look at how the web had evolved, what had been the tipping points that had for the web to take off? Why did people start adopting the standards that Tim eventually made completely universal? And we started drawing pictures, of how — and we realized that it was actually to do with people and not so much about…not just the technology, it was what people did with it, and how companies used it to create new businesses..

People started having computers at home, and then the smartphone appeared. And that was all happening as we were talking. So you can see it was taken off, but it was so interesting to think about what had happened. It was clearly a sociotechnical story. We have to study the web as an ecosystem. And has to be interdisciplinary studying, it had to bring in people from social science and law and economics and geography and physics and maths and history and education and politics and business studies and anything.

We called it the Web Science Research Initiative, between Southampton and MIT because that’s where we were based. Jim came on later, and then you came on later, while we were still the Web Science Research Initiative. 

But we didn’t really know what to call it. Tim wants to call it philosophical engineering. He studied physics at Oxford, and that was when it was called natural philosophy, so he wants to call it philosophical engineering. We all wanted to call it psycho history, from Foundations and Trends by (Issac) Asimov. Because you can’t predict what a person is going to do. But can you by looking back at history of development, can you forecast, not predict, what the mass of people will do? What society will do? And that was the really founding idea, but we felt people wouldn’t understand psycho history. I think when we make the film or write the book, they will, but we called it Web Science, for good or bad. And I always say there’s two things I don’t like about web science. One is web and the other is science. But the idea there was that it wasn’t just about the technologies, HTML, HTTP, it was about the web of people actually, it was about interconnectivity, and science in the sense of study of it. And nowadays, I say it’s, it’s we’re studying, you know, our lives online, basically finding ways to do that.

For web science, the other thing that was so important was the timing. 2005 was when we did our real thinking about this. And we thought about the name. And we actually launched it on the world in 2006. That’s when we did the press release from MIT, we had the piece in Science. And the amazing thing was that Facebook didn’t start till 2004, so. And Twitter hadn’t started then. So we were doing all this thinking before there were the social media networks, they didn’t exist, but we could see that they were coming and that the issues were the big issues for the future, were going to be issues like privacy and security and trust. I remember writing them, they were like the three term mantra we had, because we could see that they were going to be the big issues for the future, as this opened up to everybody. And the trouble is, everybody includes the good and the bad of us. 

When you think about Vint Cerf and Bob Kahn, when they founded the internet, it was a league of gentlemen who all trusted each other, they were all friends. If somebody did something wrong, you would just tell them to stop doing it. But once the web opened up to the planet effectively, then you can’t stop people. It’s very hard to actually stop people doing bad things on it. We think now about what would we do if we started again now, what would we do differently? Because that was what made it work, was this openness and its accessibility and the fact that anybody could set up a web page and a server. That’s what Tim gave to the world. But that meant anybody, including people that want to steal and harm and do all the nasty things that exist in the real world, do them online at scale. And that’s what we’re living with today.

Noshir Contractor: You’ve touched upon the issue that the web can be used for good and bad. And I want to ask you, to what extent do you see the mission of web science to be focused on the cautionary tale of things that could go wrong, as compared to the opportunities that it creates for novel ways of organizing, for example. How do you reconcile these two aspects of the vision of web science?

Wendy Hall: Wheel, when we started, certainly, in my mind, it was the — it was definitely the former idea. It was, how could we forecast what will happen if we do this with the web, if we create this, this ability, if we develop this standard, if we allow people to put videos on the web, right? 

When the web first started, you couldn’t get a picture or a video on it. It was a dream. And as those standards emerge, so you could, you then of course, have to think about what will people do with that? We — now we know. Tim will often say he would have put more security protocols in the standards if he’d realized, you know, what people were going to do with his invention. So it was the idea that this is never going to be a predictable science, but forecast how people will behave and what bad things could happen. And I think of it like scenario planning, and you sort of like, well, how can we make it shift? How can we make it? What can we do to make sure it goes more in the good direction than in the harmful direction? How can we mitigate against harm? And the problem with that, of course, is you’ve got to then observe what’s going on, you’ve got to have a way of observing, and analyzing what has happened in order to look forward. And then if you are observing what people are doing, you potentially change what they’re doing, you know. So it’s quite a difficult science to evolve in that sense.

Noshir Contractor: You spoke about the invention of the Internet, and how it was different in some ways from when the web was invented. In your own work, you’ve spent a lot of time thinking about today’s fragmentation of the internet and off the web more generally, I’d like to you to share a little bit about the concerns and issues that you see in terms of the fragmentation.  

Wendy Hall: What has happened is over the last 30 or so years, is that the internet has evolved in different ways in different regions of the world. And the geopolitical nature of that is fragmenting the internet and people often talk about the internet becoming bimodal between the US and China, the bi-ification of the internet. But actually, we think it’s more nuanced than that. 

My colleague, Kieron O’Hara and I, who have developed this idea, we’ve just just written a book about it. Think of it as four internets. So the first internet is the open free universal one that we think of as coming from Silicon Valley, all the big companies are there, and then you’ve got their mirror images in China. But actually, the different regions of the world act culturally differently. 

So in the US, the internet is very market driven. The big companies there, they lobby Washington for the regulations and tax reliefs, that will help them grow their companies and bring value to their shareholders, which is fair enough. 

Europe has taken a very different attitude and put data protection first. We don’t have any of the big social media platforms, they’re all based in the States. So the civil libertarian views from Europe have moved in a data protection way. So it’s culminated in GDPR, General Data Protection Regulation. And if you want to  be on the internet in Europe, you have to abide by those rules. If other countries want to sell their digital services in Europe, they have to abide by GDPR.  So that’s the sort of regulation before innovation sort of idea. 

And then of course,  moving to the east, you’ve got China, 1.4 billion people in the world. From the very get-go, China realized the power of the communication medium that the internet was, is. And so the basic rule in China is the government can look at anything. And so if you’re a company, you want to have a digital business in China, you have to abide by Chinese laws. And this is really beginning to fragment the internet. 

And I’ll say two other things  we talked about in the book about Russia being the spoiler. It’s not trying to create a new type of internet, it’s just trying to use the internet to interfere with other countries in various different ways. There are other — other regions that do that as well. 

And then there’s also the big point is that last year, 2019, we reached the 50/50 point on the internet, which means that 50% of the planet have access to the internet. That’s awesome. In 30 years, that’s happened. But it also means there’s 50% still to come. And that 50% is in, largely, in rural China, rural India, and rural Africa. The way India goes in terms of internet governance is really important for the future. And Africa will probably go the way of China because of the Chinese investment in Africa.   

And if you look at the numbers and populate the population numbers, we can end up with a very small part of the internet that is run by democratic governments unless India sticks with the open and free type of access that we have. And I can say a lot more but that’s it’s that sort of a geopolitical analysis of what’s going on. It’s really important, I think our message of the book is keep the technical standards open. Because if that goes and people start to create alternative views of the internet, which means the web can’t run across globally, freely, then you know, all bets are off. And the key thing is nobody owns any of this. Right? The web or the internet are not owned, there’s no one company, no one government. It is us, they are ours. So we have, I think, as much duty to look after them as we do to look after the physical planet.

Noshir Contractor: Yes, I think what you point out is that having these common technical standards does provide a prerequisite for creating a public good that would be global. And yet, that may not be enough in this context, that in some cases, you can still see fragmentation based on geopolitical forces.

Wendy Hall: And it all comes down to how countries govern data.

Noshir Contractor: Given that 50% of the planet is still not on the internet, a lot of places that you referenced were what we might call the global south. Do you see that as creating a fifth internet? Or do you forecast, to use your term, that it is going to fold into one of the existing four internets?

Wendy Hall: Well, our forecast is it will, it will fall into one of the existing four. But we do talk about a fifth internet in the book. But I’m not going to tell you; you have to buy the book.

Noshir Contractor: That’s a wonderful teaser. Wendy, in addition to playing a prominent role in research, you’ve also helped shape science and engineering policy and education. And you co-chair the UK government’s AI review, which was published in 2017. And the UK Government announced you as the first Skills Champion for AI in the UK. 

One topic that is particularly near and dear to your heart is the role of women in computing, and more generally, in science and engineering. Can you talk a little bit about where you think that is headed?

Wendy Hall: (Sighs) It is a tale of two cities in a way. When I was young, you know, I was born in the 50s. And the world was very different. And no one in my family had been to university before. And the reason I would be expected to go to university would be to find a better husband, and get married and have kids. That was the expectation. My parents wanted more than that for me. But the expectation was, my future was marriage and kids. And we didn’t have the equality laws in the UK then, and my very first job interview, as I was a mathematician originally, and I went for a job as a lecturer in maths, at a university I won’t name on here in the UK — it wasn’t Southampton — and my first job interview they told me at the end of the interview, when they decided to get the job. On the day, the head of department said to me, I’m afraid Wendy, you didn’t get the job because you’re a woman. And he told me that on the day. I was young, they didn’t think I’d be able to control classes of engineers and computer scientists. And anyway, the very next week, I got a job doing the same thing. But that was my first sort of realization that things were different for women. Now, they couldn’t do that today, but they might still think it. 

Then when I went to Southampton, we realized in 1987 that we had three years of computer science undergraduates with no women on them at all, women had just turned away and there have been women before and this was the time of the personal computers, the spectrums and the BBC Bs and the Commodore pets, and and suddenly computers were had become overnight toys for the boys. That really switched a whole generation of women off almost overnight. In the West, we have never really recovered from that situation. Countries that came later like your home country, India, that came later to the world of computers didn’t have quite that issue. And so if I go to India, I go to a computer science class in India, more than 50% of the students will be women, right? So it isn’t genetic, it is deeply cultural. 

I’ve tried all my life to turn that round and try and get girls interested in computing and really failing quite miserably, because the stats are just so bad. But the world around us has changed dramatically, of course, and, you know, women now aspire to much, much more, I think they’re still under pressure, they still have this problem of you can do everything. And so you try and be a mother and career woman. But, you know, it is possible. 

The cultural computing in some parts of the industry is just toxic for women. In particular, Silicon Valley, is well known for really, really being toxic for women. It’s so sad for me that we still have this problem, I meet more and more women. So in the world of AI, in some ways, this gets even worse, because when you take a master’s degree in machine learning, you can’t really take those degrees unless you have a computer science undergraduate or maths undergraduate program. So you already got a much, much shorter pipeline for those. So you’re going to increase the stereotyping. 

And we were so worried about this, when we wrote the AI review about how we would get more women coming into AI. But I do see lots of women are involved in AI in the areas of ethics, thinking about how AI is going to be used in society, that is attracting a much more diverse pool of talent. And so I think we need to capture that, that’s what we’ve been trying to do with the skills program in the UK, there will be lots of new jobs that are not to do with programming, to do with auditing AI, looking at bias in AI, design of AI to make sure it’s for the good, not going to be harmful to people or get the wrong results. You know, be biased. 

And I always make sure diversity is firmly in that ethical framework. My argument is if your workforce is not diverse, and I’ll tell you a diversity in its broadest sense, I mean, here, gender, race, culture, age, disability, and all everything you can think of, in that broadest range, then, if it’s if your workforce isn’t diverse, and there’s more chance that your AI systems are going to be unethical or biased in some way. So my mantra is, if it’s not diverse, then it’s not ethical.

I still want to get more women doing the feeding into the pipeline, I want to get more women interested in school, so they do the qualifications to and want to study computer science, but at least we see more women in the workforce.

Noshir Contractor: I think that’s that’s a very fair point. I think there is a temptation in the past to equate balancing the need for diversity and the need for excellence. And what you’re pointing out is that in fact, the two are not opposed to each other that they actually are symbiotically related.

You have been such a wonderful role model in all of these respects, and so we salute you for that, you’ve given us a really good story about how the web got started, and web science in particular got started. Based on your perspective, what do you consider as some of the most significant issues that need to be addressed by web science moving forward?

Wendy Hall: Well, I will have to answer that in terms of data. As you know, I’ve been passionate about the idea of building observatories for web science, the use of the term like the physicists observe the stars and the planets. And, and, and use that all that data to, you know, to work out where we came from, and where we’re going.

And part of that also is how you visualize, how you analyze the data that’s in there. But for me, the really difficult thing is getting the evidence we need. 

And it was, to me, it was all about how do we, how can we share the data that we collect, because it takes so much effort to collect that data. And then when the person who’s collected it retires, or leaves or moves to another job, it just all evaporates. 

And we need some way of being able to share data with other people in ways that’s legal and ethical. And, you know, people are not abusing, you get cited if your data is used, or you get some money for it, if people make money out of it. This  is important for companies, but it’s important for research scientists too. And we’re still struggling to find a way to do that. The hardest thing is actually curating the data and making it available. And then you’ve got the issue of well, what if it’s, you know, data about people and data that’s confidential to companies. So I think that is our biggest challenge is how, as a you know, how, as a community, we can crack that one. 

Noshir Contractor: That’s a really important issue. You also spend some time working with the Library of Congress and some of these issues, haven’t you? 

Wendy Hall: Can we tell the Twitter story? I mean one of the reasons I went there was we all knew that the Library of Congress was getting the Twitter feed and all the data. There was a server down in the basement of the Library of Congress, which was getting a Twitter feed every day. When that was deal was done, Twitter wasn’t the company that it is today. And so that data represents how the company is doing so and it’s very confidential. And of course, even though Twitter is open, you know, you tweet to the world. People — Twitter allows people to delete. So you know, there’s very confidential information in there. So they’re collecting all this stuff. had nobody using it. And so they turned it off as a project. And I understand why. But my reason, my worry is who is the custodian of all this data that in 100, 200 years time, people want to know, what were we saying on Twitter? What was on Facebook? Well, this is the record of our society. Right? And we are collecting snapshots of it in the libraries. They have, you know, the British Library, the library, Congress, they have Web Archiving projects, but it’s snapshots. And the Internet Archive takes what it can, but they’re snapshots. And I don’t know if the companies are storing it for the future historians. I think that’s another challenge for us as a community, how do we, how do we retain our memories, digital memories?

Noshir Contractor: In fact in some cases, I believe that there’s regulation that does not allow the company to hold on to data beyond a certain number of years. I wanted to ask you, as we consider this moment of social reckoning that we are experiencing alongside the pandemic, what is the one or maybe two significant things that would have been different, for better or for worse, if we were going through this period without the web?

Wendy Hall: Well, can you imagine, I say this to people, if the pandemic had happened anytime before 2000, 2010? We would not have been able to deal with it as we have. Not only is it kept our communities, it’s enabled us to see friends and family and talk to them, to actually have lockdowns in a way to save lives, and to keep our spirits up and to enable us to communicate, then life would have been really difficult.

And also the international work to share how to deal with the virus, right? Work about vaccines and antibodies, and what treatments to use and when to lock down and when to ease up. So we’ve rediscovered our love of the web and the internet and COVID,  the The TikTok videos and the zoom cocktail evenings. It will change our lives, it will change our world. It has taught us we don’t have to travel halfway around the world to go to a conference to give a single paper. I want to get back on an airplane. I’m sure you do too. But, you know, we’re beginning to understand that there is a world other than jetting around all the time. And before the pandemic. We were in the West, certainly we were worried about the harmful things that were happening on the web and the internet. We were worried about how to deal with that. We still are, but we have as I say, we have learned to love it again. We’ve remembered why it was invented in the first place. And I think that’s hugely important.

Noshir Contractor: Well, thank you very much again, Wendy, for taking time to take us through a journey of Web Science from where it started to where it’s headed. It was really a pleasure to speak with you. And again, thank you for all your efforts in leadership in terms of developing this field, but also in terms of the work you’ve done in related areas of policy and education. Thank you again.

Wendy Hall: Thank you Nosh, thank you for doing this series.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

 

Episode 1 Transcript

Noshir Contractor:​ Welcome to this episode of untangling the web, a podcast of the Web Science Trust. I’m Noshir Contractor and I will be your host today. On this podcast we bring important leaders to explore how the web is shaping society and how society, in turn, is shaping the web.

Today I’m speaking with Professor James Handler. Jim is the director of the Institute for Data Exploration and Applications, IDEA for short. He is also the ​Tetherless World​ Professor of computer, web, and cognitive sciences at Rensselaer Polytechnic Institute (RPI) in the United States. He’s acting director of the RPI IBM artificial intelligence research collaboration and serves as a member of the board of the UK’s charitable Web Science Trust.

Hendler is a man of many accomplishments. He’s a fellow of the American Association for artificial intelligence, the British Computer Society, the Institute for Electrical and Electronic Engineers, The American Association for the Advancement of Science, the Association of Computing Machinery, and the National Academy of Public Administration.

Jim might well be the only individual who is a fellow of all of these professional associations. On a lighter note, in 2010, Jim Hendler was named one of the 20 most innovative professors in America by Playboy magazine.

Besides being involved in the start of the web, Jim was one of the pioneers of the interdisciplinary we call web science. Today, I talked with him about the origins of web science, how it has evolved over the years, and its relevance during the COVID 19 era.

​Jim, Welcome to this podcast.

Jim Hendler: ​Thanks, Noshir.

Noshir Contractor:​ I wanted to start by giving you an opportunity to share with our listeners what is meant by web science. What does that term mean to you and how did it get started?

Jim Hendler:​ Sure, great question.So, you know, the web was invented. And there are different dates you could use. 89 is where Tim Berners-Lee actually wrote the proposal for what became known as the web. By 90 and 91 there was code that was being shared. Really around 95, 96 he started see the take off of this and people becoming more and more aware that the web was there and that things were happening.

But it really started have much bigger impact by really it was the late 90s where you started to have search engines, you started to have monetization of things on the web. You started to have the social networks growing better now.

So, again, a lot of that was happening all around the same time from the 90s to, you know, the early 2000s and at some point, those of us who had been involved in the web and the web architecture, we’re starting to feel like understanding this thing was breaking up into many different pieces – you would go to one kind of conference and hear a lot of discussion about the mathematical underpinnings of networks and networks science, but some of that was really not about the web. The web was just one example of something much bigger.

And then you would go to another meeting and it might be, you know, the social impact of the web or legal aspects, because we were starting to see some of the early days of people beginning to worry about privacy and security on the web, things that are now much bigger issues. And then, there was sort of the engineering of the web. How do we build it? How do we do a better job of knowing that if we deploy a certain technology, you know, how will it get us involved, what impacts might have.

So, some of us started to believe that the web itself had become something we needed to understand. The web sits on top of a larger system known as the Internet, which has a lot of mathematical properties of its networking and, you know, a lot of the routing and things like that that gets talked about happened.

But the web thing was really sitting on top of that, and was its own entity that needed its own understanding.

And there were principles of how you design things that were standards groups, but there really wasn’t a lot of research going into the interaction between all these different pieces.

So some of us started to feel like maybe there was kind of a systems science to really understand the web in all its forms.

So around 2005, a group of us and Wendy Hall was one of the organizers. I was one of the organizers. Nigel Shadbolt was there. Danny Weitzner I believe was there and some other people. And a lot of other computer scientists and a few social scientists who were really trying.

It was an invite only workshop about 30 people, held in London, sponsored by the British Computer Society. And the goal was to say, you know, what did we really need to understand to understand the web? So a report was generated by that coming out of that workshop.

The report really got boiled down to a couple pages that got accepted as a perspectives piece by science magazine. So what most people call the start of web science was an article called “Creating a science of the world wide web.”

And it didn’t use the term web science in that article. It’s just that as people started to refer to the thing we had called for, which was something that would put the math, the engineering and the social on the same page, and get those different communities talking and working together. That’s when the term web science started to be used.

Noshir Contractor: ​And that article didn’t appear until August of 2006.

Jim Hendler:​ Yes.

Noshir Contractor:​ And so really, would you consider that as one of the dates that is most associated with, not the start of the web, but the start of web science?

Jim Hendler:​ Yeah so 2006 of that article, you know, people tend to try to find some definitive thing. That’s the start of something like this. So obviously, a lot of us were talking about stuff that now might be called Web science, but the term itself really grew out of that 2006 article and the first web Science Conference held in Athens was in 2007 based in part by the community started by in part by the community that come together around then.

Noshir Contractor: ​That you mentioned that as we got started on this. You were focused on a group of people, some of who were invited only at this event, and then subsequently unlike some other scientific interdisciplines, the web science decided to form a Web Science Trust. Can you tell us a little bit about why – what was the thinking behind the creation of the Web Science Trust, as opposed to, say, a learning society or some other division within another context?

Jim Hendler:​ Yeah, so it’s a good question and you know it’s a combination of design and accident, but what actually happened was the two e two leading institutions that had really been trying to create web science, sort of institutionalized at that point where MIT and Southampton University.

They created a joint statement to create something called the web Science Research Institute WSRI and very quickly, a few other organizations joined the University of Maryland where I was at that time was so, so I became the third school and there were basically five of us who were kind of leading things at that time.

As it started to grow. We started to create a network of laboratories, including your lab and others, and realized we needed something a more formal way of people to interact because something that was inherently interdisciplinary has a tendency to to coalesce around one of the disciplines, just you know that’s historically what’s happened.

And it becomes less and less interdisciplinary as it becomes its own discipline and we really felt that was the wrong thing that this needed to be, you know, I used the analogy, sometimes of climate science.

Right when you’re studying the climate, you need geologists and you need, you know, people who study the atmosphere and need these people to study the ocean and you need… but not everybody who studies the ocean is looking at climate, not everybody who studies the atmosphere is looking at climate and so it was the same thing with the web. We had people who are studying networks, but only some of them who really cared about the web. We had people who were looking at social impacts of growing communication networks, but some of them in particular we’re looking at the World Wide Web.

Now, in the past, you know, 15 years as we’ve grown the definitions of have slid a little bit about what is exactly the web and where the boundaries and things like that. But the Trust has really tried to be an entity that would help promote web science, help keep this network of labs going, helps make sure there was a conference and eventually a journal. So some of the things I’ve learners society does, but without really trying to create the way to learner society and be the kind of disciplinarity that tends to come with it.

So, again, web science tries to both be an entity that brings people together, but also an entity that doesn’t pull people out of the other things they do. So you know, some of them studies network and network science can also be a web scientist without there being any tension between them.

Noshir Contractor:​ It’s fascinating because it takes on the role of both helping to build a community, but also to curate that community intellectually and one of the challenges that I imagine you might have faced is the difference, if any, between those who think that what they are doing is internet science versus web science. Do you have any thoughts on that distinction?

Jim Hendler: ​You know there’s there’s been a lot of different terms that have a lot of overlap. So information science in the US was taking off around the same time and a lot of people were arguing that web science belong in information science. Other people were saying no, because information science really is sort of some schools doesn’t really include the computational or mathematical side of things. So, so, you know, same thing with internet science. There was a feeling that web might be too limited of a term. And frankly, I would love to see so called internet science and so called web science come further together.

But by in large, the desire to keep the social science piece wedded to the math and the engineering side has been tended to differentiate the websites approach brothers. That’s not to say no one else wants to do it, but I think the dedication to that as a sort of core value that we’re trying to bring people together across these different ways of looking at the web. And nowadays you know the mobile web. The, the big companies, the things that look at information and challenges that information, you know, again, they happen at a lot of different places and web science really would like to be an integral place that brings these people together.

Noshir Contractor: ​And I imagine that some folks also will be questioning whether the study that doesn’t happen, specifically on the web, but things are sort of migration into apps and different kinds of ways in which we are navigating this new world should or should not be included as spider web science?

Jim Hendler: ​You know, there’s always a tension in those kinds of directions for any interdisciplinary science, but I think that the goal has been open. The definition at a technical level of what the web is actually very different than what a lot of people think of. So a lot of people when they’re opening something on their browser. I’m sorry, opening something on their phone, not on their browser.

And looking at how you know the apps world aren’t realizing that it’s sitting on top of the web architecturally. So you know some of us like to think of it, as you know, I hate the word ecosystem, but I don’t have a better word.

That’s evolving and so you have parts of the web that moves one way and part moves the other. And again, part of what web science wants to do is not reject any of those parts and say, really, if we’re going to understand this thing we have to understand it as a system or systems as systems. We have to understand the interacting pieces, whether they are considered by a particular practitioner to be “web: or not.

You know, there’s a lot of overlap with work that’s done at the world wide web conference. But for example, some of the work there, really, is not particularly seen as web science per se because it’s technologies to enhance web products more than technologies that are really understanding the interactions happening on this huge network of information.

Noshir Contractor: ​Great. So 2006 when you wrote that article quarter that article and laid out some priorities. To what extent where those priorities focused more on seeing the web as an opportunity or web science as something that explores new opportunities versus focusing attention on potential concerns as the web became more and more prevalent. And while you think that, are there some areas where you feel that web sciences made the most progress, while others where you would like to see a lot more progress being made?

Jim Hendler:​ Sure, so you know, in that document and then not long after that we had something we produced we called it a manifesto, which may or may not be the right name. Something became a book on web science or a publication that you know later you joined for the second edition.

But we were really looking at trying to get thematically what this thing was what was happening, how it worked and and we always wanted to express both the positive and the negative.

Again as a science, you have to be looking at what’s happening. But again, part of part of what makes web science somewhat unique is he goal to bring together the people who are building it and can put in mechanisms to try to solve some of these problems with the people who may be studying the problems or the opportunities, trying to understand why does some things on the web, take off and “go viral”.

And here I don’t necessarily mean just like a video or something. But the whole use of the web for video, the whole how does that change the world when you know people couldn’t can video chat, rather than talking, in person or not, you know, again, how does that change from the phone network to the web when you have a web of information.

Not, you know, how to search work. But what a search does, what’s the impact of being able to find this kind of information.

As crowdsourcing Wikipedia, things like that grown, that’s become one of the things we’ve been trying to study and understand and that includes misinformation as well as information, right. So if you look at sort of the papers that have been presented at web science. In fact, some of the early best papers. One of them was how trolling on the web, right, influenced an election. This was actually the, Congress person from Boston from Massachusetts and it was interesting that that was one of the earliest studies of trolling and misinformation in an election long before it became part of the national election in the US and Brexit and things like that. So again, web scientists were really looking at these impacts in a very deep way.

Noshir Contractor:​ Yes because the idea of, for example, seeing the extent to which search became so important in the age of what was happening at the time.

I remember a remark made by someone that said that before search became a thing that the World Wide Web was like a library. With all the books strewn all over the floor and no easy way to do look for the right book and to be able to get to it. And I think, search was an example of something that really helped address an early challenge that was faced by the growth of information on that World Wide Web.

Jim Hendler:​ Right, to give you the counter example because, of course, it’s going to be a science, no matter what someone says, someone’s gonna disagree. But search also — as search engines took over, it became harder and harder to find the opposing opinion. Right. It’s hard to go to some of the major search engines now and say I’m searching for this but show me something very different. Show me…You know somebody who disagrees with this approach that.

So if you say this is the what most people believe. And I actually think, you know, and you’ve studied some of this that has that’s part of what creates information bubbles, because then the people who don’t believe some piece of that go off and create their own search structures and their own ways of doing things. So again, a lot of different ways of looking at how this plays out. And, you know, again, it’s so we used to say, you know, the sort of the metaphor of surfing the web had a connotation of a little bit of danger and serendipity. You might not end up where you were looking to get to search make some of that different.

And so people have been looking at how do we reimpose creativity and new kinds or search, how do we look at argumentation. So, you know, some of the exciting stuff happening in web science nowadays looks at some of the impact of these technologies becoming centralized and says, can we re-decentralized, can we find a way to put it back into the you know, away from the everything owned by a few big companies and much more back to the handsof the users that remains the tension we look at today and things like privacy privacy preserving technologies and things like that, which I’m sure will be topics for later podcasts.

Noshir Contractor:​ Absolutely. I think that I have a couple of closing questions and I’m going to wait and ask you a closing question on covid but fast forwarding to 2020 or 2019 domain, what are the areas where you feel that web science has made the most progress and what are areas where you don’t see as much progress or you would like to see more progress?

Jim Hendler: ​So I think where web science has made a lot of progress is helping to focus attention on I say really two different things. Several things have been really impacted by web science One is transparency and the whole open data movement.

So that was coincidence, with the growth of web science that was in part because a lot of the leading people in web science, including Tim Berners-Lee himself, were very involved in helping to try to get governments to open their data to make it more available to develop some of those technologies. 

I think also predicting some of the dangers. So early web science papers were already saying, you know, let’s look at privacy. Let’s look at security as companies on the web, grow, they’re going to be able to see our information to share it to track it you know as cookies came along.

So, you know, I find when I say things that a Web Science Conference that really bother some, you know, if I tell people in a normal setting that when you go on the web and you look at the price of something on a particular website, it may be different than, you know, someone else looking at it.

Because they’re using information about you to try to adjust the price, people are surprised. Web scientists aren’t surprised.

We’re exploring it. We want to understand both what are the algorithms that are being used, but also people who believe that that is problematic, how we might control it, things like that. So I think, you know, we, it’s more than we’re embracing exploring the problems. But of course, the web is a very fast moving thing.

I think the whole mobile web and app space that you talked about, you know, many of us view it still from a web development platform. Well, other, newer people come int web science are beginning to really look at those apps themselves but then you start getting into these boundary issues, right, if somebody has studied a particular Twitter phenomena is it or isn’t it and you know what we try to do is be very embracing right if the work is important and talks to it.

 Jim Hendler: That’s good. On the other hand, if it’s just a pure mathematical analysis of something happening in some different network. Then the paper that shows why that applies the web is going to be much more interesting than the paper that just says all networks have some feature.

So again, the boundaries are very hard to see, but I think that, again, the challenges of the web were something we embraced very early and are still looking at.

I think the opportunities become more apparent to people just as more people, you know, it’s just part of our lives.

I think the social impacts or something. So we keep trying harder and harder, bring in more social scientists

Particularly people who really can talk to the qualitative meets the quantitative to really try to, again, look at that triumvirate of yhe underlying math of what’s going on the engineering and building this thing because the web’s not a natural phenomena and the social impacts and policy impacts of that.

Noshir Contractor: One last question that we plan to ask everyone was appearing on the podcast during this pandemic time is by what is one thing that you personally believe by which the web is or was it could have been a real help during covid and or one thing I think the web has hurt society during the covid crisis.

Jim Hendler: Great question. So you know, so you and I are sitting here on opposite ends of the zoom channel and we could be on any of five or six of its competitors talking.

More people are working from home. Imagine the lockdowns that we had around covid and entire countries entire cities without the communication infrastructure.

And the communication infrastructure that provides the bits to move between things is the internet. And so, you know, it’s sort of hard to pull that apart. But the thing that lets people really interact at the information level that includes find each other for these videos that includes you know when I clicked on a link to open this zoom chat that clicking on the link and how that happens.

And the protocols that made that happen. That’s all web. So the web itself has really been absolutely instrumental in allowing communication to happen.

 Jim Hendler: You know, big international conferences that were canceled at the last minute were held online for this year, again virtually, and that virtual conference is made possible by the very technologies, we’ve studied where the negative comes in as the negatives of the web to get amplified and, you know, cultural differences, things like that impact, but, certainly in the States, we’re seeing the astounding growth of misinformation. The weaponization for politics of misinformation about covid. Significant amount of them are, you know, bots and trolls.

In fact, the largest network of pro-covid call it an anti-covid, whatever that means you cover your face or not those sorts of thing both have the same origin from the same trolling point which appears to be mostly trying to push divisiveness rather than push a particular point of view. So again, understanding how that works.

Understanding the math of that and being able to show people that will allow both people to understand what’s happening, I hope. But also, you know, engineers to understand what we might do to improve that situation. And what we can’t do.

Noshir Contractor:​ Thank you again, Jim, for taking time to talk with us about the history of web science and how it all got started because you have a very privileged position in order to share those insights with us since you were there at the time, it actually happened. And we’re also very involved in making it happen.

Jim Hendler:​ Thank you.

Noshir Contractor:​ Untangling the Web is a production of the Web Science Trust. Thanks to Carmen Chan for editing and technical assistance. I am Noshir Contractor. Thanks for listening.