Episode 19 Transcript

Sinan Aral: While there has been misinformation and disinformation throughout human history, we’ve never had a technology that has essentially rewired the central nervous system of humanity within one decade, that accelerates the spread of information, as much as social media does, in an algorithmically controlled fashion. 

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

My guest today is Sinan Aral. You just heard him talking about the impact of social media in spreading misinformation and disinformation. Sinan is a global authority on business analytics, award-winning risk researcher, entrepreneur and venture capitalist. He is the David Austin professor of Management, Marketing IT and Data Science at MIT, where he also directs MIT’s initiative on the Digital Economy. He’s also a founding partner at Manifest Capital. Sinan has won numerous awards, including the Microsoft Faculty Fellowship, the National Science Foundation Career Award, and the Fulbright Fellowship. In 2018, his article on the spread of false news online was published in Science. It went on to become the second most influential scientific publication of the year in any discipline. And in 2020, Sinan published his first book, the Hype Machine, which received  a best book on AI award from Wired Magazine. Welcome Sinan.

Sinan Aral: How you doing Nosh? Great to see you.

Noshir Contractor: Thank you so much for joining us on this podcast. You’ve had quite a year. I want to first start by asking you what got you interested i looking at information on the web?

Sinan Aral: Well, I was a PhD student at MIT and I knew I wanted to understand technology, I didn’t know exactly how I wanted to get into it. As I was studying at MIT, I was taking statistics classes that assumed that all of our observations in the data were independent. And I was taking sociology classes with pictures of network diagrams in the research articles that uncovered the tremendous interdependence of our world. And I thought, a lot of the models of society could really be explained by the ebb and flow of information between us in our complex interdependence. The major thing that’s different today in that web of interconnections and the flow of information, even though the human species has been interdependent for a very long time, is the digital flow of information. And so I thought that’s where a lot of the answers to unexplained phenomenon in society were going to come from. And I’ve been researching it and studying it ever since.

Noshir Contractor: You talked about this web of connection in the digital economy, which is exactly the focus of why your work is so central to web science. Can you give an example of what you mean by an unexplained phenomena in society that can be well studied through these means.

Sinan Aral: So for instance, businesses try to understand demand patterns, for example, you know, it’s thought about in terms of the products that are sold, it’s thought about in terms of consumer preferences. And for many, many years, the thinking was that there’s some distribution of product characteristics. different products exhibit different characteristics. There’s some distribution of consumer tastes — different people have different tastes. And you know, there’s a match between these products’ characteristics and consumers’ tastes, and that can explain changing patterns of demand for different products over time. But one thing that has become clear over time is that people talk to each other. And they share their own opinions about products, and now we have digitized that communication in the form of social media likes, comments, and so on. And in the form of reviews and ratings where we are constantly describing to each other in digitized recorded mass scaled format, what we think about — and not just products, political candidates, you know, bills before Congress, you name it. And it turns out that the patterns of choice that human beings make, whether it’s voting, or whether to get a vaccination, or whether to buy a certain product, are heavily influenced by the patterns of communication and sharing of information about those political candidates or public health behaviors like vaccination or products. And so a significant fraction of the variance can be explained by the ebb and flow of information online, whether it’s through person to person communication on WhatsApp, or through microblogging services, like Twitter, or whether it is reviews and ratings from the crowd. The scaling of public opinion is changing the nature of the way we decide and act.

Noshir Contractor: I want to take you back to a decade ago, where you published one of the first articles that looked at doing natural experiments in the field. So to speak, I’m thinking back of your article in Management Science, titled Creating Social Contagion Through Viral Product Design. That article was one of the first attempts in my opinion that tried to look at how word of mouth campaigns could be digitally transmitted. And you did this experiment on Facebook that I would love for you to describe. And in particular, tell us about what got you to think about the question, but also the strategies that you used to answer those questions, which was quite novel at the time.

Sinan Aral: Our experiment was an experiment in the wild among 1.5 million Facebook users. So it was perhaps the first very large scale online experiment about the causal effects of information flow in networks on behavior. And the reason I became interested in it is to basically scratch an itch, which is how a lot of research starts. And that is to understand really the causal effects of networks on outcomes and behaviors in society. What we did was we developed an app for Facebook, along with a movie studio. And it was a movie app, you could friend celebrities, buy movie tickets, rate movies. We wanted to understand this concept of what we call viral design — Can we design products that are more likely to be shared amongst friends? So we built several features, a invite your friends button, and then a passive awareness campaign feature. The first feature allows you to press a button, there were buttons throughout the app that invite your friends, that showed you a list of your Facebook friends, you can pick who you wanted to invite and invite them. The passive awareness whenever you took a key action in the app, like rated a movie, it would send a message to all of your Facebook friends that said, :Hey, Sinan just rated this movie four out of five stars, you might be interested in the app, here’s a link to download it.” We created three versions of the app, a control group with neither of these features, and two other experimental versions of the app with these features. And then as people downloaded the app, we randomly assigned them to one of these three versions of the app. And then we just simply observed the sharing and diffusion of the app through the Facebook network. And any differences in the speed, breadth or depth of the diffusion of any of these versions of the app has to be causally driven by the existence of these features, because we randomly vary just toggling on and off those features. And that was it. And we found very significant differences. And then we were able to study, what is the power of a personal invitation? What’s the power of a passive awareness campaign? And as well, what are the actual differences in the rates at which apps spread with or without these features?

And we found evidence of network effects that when you use the app with your close friends, you were less likely to give the app up. If you use the app with acquaintances, you were more likely to give it up, which indicated that network effects varied depending on the closeness of the relationship. Now we’re doing massive scale experiments to measure that across all the platforms, Instagram, Snapchat, Facebook, Twitter, and so on. Because we think that’s such an important part of the economics of social media today.

Noshir Contractor: And one of the things that I think distinguishes that kind of work you do, this carefully controlled experiment where you put people in different conditions, helps address an issue that has plagued prior research in this area. I’ve heard you talk about this before — talk about that analogy you’ve used here.

Sinan Aral: When we, as network scientists or web scientists study patterns of behavior in a population, sometimes we tend to foreground the explanations that we are investigating, and tend to ignore that there could be many other confounding effects. So when we study the diffusion of a behavior through the network, and we see that Wow, people who are connected to one another, tend to exhibit this behavior in succession, one after the other rapidly, we say, “Wow, there must be some sort of network effect here that that this behavior is passing from person to person to person, because it only shows up in the network in succession between people who are linked together, rather than more randomly in the network of people.” And that seems like a logical conclusion. But it’s not always a logical conclusion. In fact, many times it’s not a logical conclusion. 

So the analogy is to a crowd of people in a field watching a political rally or a concert, and you see the first umbrella go up in the bottom left hand corner of the field, and then immediately the umbrella next to it opens. And then the umbrella next to it opens. And then the umbrella next to it opens. And you see this crowd of umbrellas, opening one right after the other from the bottom left hand corner of the field to the top right hand corner of the field. And one explanation for that is that the first person, open their umbrella and then nudge the person next to them and said, “Hey, open your umbrella, you know, it’s the cool thing to do, everybody’s doing it.” That’s one explanation, the social influence explanation. But another explanation that’s much more likely, is that there’s a passing shower that is moving dynamically from the bottom left hand corner of the crowd to the top right hand corner, hitting raindrops on the heads of the people. And that is what’s causing them to open their umbrella. And so I use that analogy, because sometimes as web scientists and network scientists, we assume social influence, where it’s really probably some third factor that’s causing that pattern of behavior and the data. So we have to be careful about correlation and causation, if we intend to be rigorous about network effects.

Noshir Contractor: Sinan, when you began working in this area, I got the sense that a lot of your interest was to see how products could be made viral — how they could be designed so that we could have social contagion, and do it in a way that was scientifically grounded and effective. Somewhere along the way, I got the sense that what you thought was going to be a positive set of strategies began to trouble you where information was being sent, or misinformation was being sent. How did that happen? 

Sinan Aral: I mean, I think that technology is agnostic. And you know, one of the themes of my book Hype Machine is that if we intend to solve the social media crisis that we find ourselves in, we have to get past this debate about whether social media is good or evil? Because the answer is yes. And we need to understand how it can promote positive outcomes in society and how it can be dramatically negative for our democracies and our economies and our public health. And so I’m interested in both the good and the bad, in part to promote the good and counterpart to contain the bad or to work to build systems that reduce the negative effects of social media.

Noshir Contractor: Let’s talk about the book, the hype machine. I love the title. I’m going to first ask you, how did you decide on the title of the book, which I noticed is not only the title of the book, but also the title of chapter three in the book: The Hype Machine.

Sinan Aral: I considered a lot of things. I thought carefully about the chapter titles. And it’s super interesting to have that conversation with you. Because I’ve done many interviews over the last year. And nobody’s asked me that question, I’ve been waiting for someone to ask me. You know, I titled the book, the Hype Machine, because social media is built on an engagement model, a business model on engagement. And the way it works, obviously, is that social media companies sell attention as a precursor to persuasion. And they sell that as ad inventory. So the way that they maximize the opportunities to sell advertising, is to engage people, and to keep them engaged. And so the machine is designed to hype us up. And that’s where the title of the book comes from.

Noshir Contractor: You have a chapter that is titled “The End of Reality.” And that sounds very depressing, Sinan. Tell us about how that title came about, and what’s the thesis of that chapter.

Sinan Aral: In 2018, we published a 10 year study of the spread of fake news online. We worked directly in collaboration with Twitter and had access to the entire Twitter historical archives. And we studied the spread of all of the verified true and false news stories that ever spread on Twitter over 10 years. And what we found was that false news traveled farther, faster, deeper and more broadly than the truth in every category of information that we studied, sometimes by an order of magnitude. While there has been misinformation and disinformation throughout human history, we’ve never had a technology that has essentially rewired the central nervous system of humanity within one decade, that accelerates the spread of information, as much as social media does, in an algorithmically controlled fashion. 

And so we’re at a particular moment of risk. And in this chapter, I discussed the potential impact of the spread of falsity on democracies, economies and our public health. I talk about coronavirus misinformation, I talk about meme stocks and how misinformation can affect the stock market. And I talk, of course, about democracy and elections: the 2016 US presidential election, the 2020 presidential election. I talk a lot about deep fakes and the science of fake news. What is it about human cognition that makes us susceptible to falsity? And what some of the solutions might be? In this book, I try very hard to remain rigorous throughout the book and allow the science to lead. And you don’t need to exaggerate it, for it to be dramatic, because there is dramatic, rigorous science out there that should be compelling enough to motivate policymakers and platform designers and leaders to the potential peril that we face with social media, as well as the tremendous promise.

Noshir Contractor: I’m aware of several people, perhaps Mark Twain, and you can correct me if I’m wrong here, who, a long time ago, said that lies can travel twice around the world before truth could put on its boots. So why is it that today, the web might be changing that particular phenomena? And is it simply a question of the lies traveling 100 times around the world before truth could even put on its socks, let alone its boots?

Sinan Aral: The phrase fake news was first mentioned in a Harper’s Magazine news article in the 1920s, so you’re right when you say that, you know, fake news is not new. And it’s interesting, the Mark Twain quote, is actually not a Mark Twain quote. I’ve heard that quote, attributed to so many different people incorrectly, which itself is ironic. To bring it back to current times, there are a couple of things that make the spread of falsity today particularly dangerous today. And that is speed and the algorithmic amplification of falsity, as well as the targeting that happens. So we don’t know who is seeing which information. The speed with which information travels today is nearly instantaneous. And it is much faster than even just a decade ago in terms of the spread of false news, but also true news. This favors falsity, because we’re 70% more likely to share a verified false news tweet than a verified true news tweet. 70% more likely, over a 10 year period of all the tweets on Twitter, that’s a big number. And false news travels about six times as fast as true news. The entire planet could be mistaken about something, about a consequential choice that needs to be made, and if the truth doesn’t catch up quickly enough, then we can make very significant errors. 

Elections are a dramatic example. The rise and fall of equity prices are another example. And then finally, whether or not you get vaccinated in time to stop the spread and whether you can achieve herd immunity against coronavirus. These are three dramatic examples. The hype machine was published in September 2020. And in the book I predicted all three of these things would happen. In the book I said that we were gonna see violence during the 2020 election. We had the Capitol Riot in January. I predicted the rise of meme stocks. We saw the GameStop stock price rise. I predicted that misinformation around vaccines would create protests around vaccines that would disrupt the vaccination process in the United States. In January 2021, we saw Dodger Stadium in Los Angeles shut down by anti -vaccine protests. And I don’t consider myself an oracle. These outcomes were entirely predictable. We’re at a particular moment in history, where the spread of falsity can have even more dramatic implications than it has in the past.

Noshir Contractor: Amazing. Now another chapter title: and Networks Gravity is Proportional to its Mass. 

Sinan Aral: I wanted to make a point about economics, but in a way that non-economists might understand. And you know, people who have taken grade school physics, understand mass and gravity in a sense. And really the point I wanted to get across in this chapter is how important the economics of the social media economy are to the outcomes we see regarding democracy, public health, our economy, stock market, business outcomes, and so on. And the main economic force that shapes the social media economy is what economists call network externalities or network effects. And that is to say that the value of a platform or a product is the function of the number of people who use the platform or product. And so the size of a Facebook network, the size of Twitter’s network, has implications for the attraction it has to new users and the stranglehold it has on current users. We’ve got a big conversation about whether we need to break up Facebook and whether we need to break up big tech, we’re worried about the rise of monopolies in the social economy. And what people I think are missing is that in order to really create competition in the social economy, we have to deal with the structural economic phenomenon that are creating market concentration. And that really is driven by network effects. And in an economy driven by network effects, big platforms have big power, they have power to attract new users, they have a stranglehold on current users. If you want to speak with your friends and family, you can’t leave Facebook, Twitter, and the major platforms today, because that’s where everybody is to talk to. And it’s hard for new entrants to get new users because of the network economics of the social economy. And so a lot of the outcomes we see, in terms of misinformation, in terms of effects on democracy, economy, public health, in terms of the very nature of competition in the social economy, stems from the simple fact that a network’s gravity is proportional to its mass, the amount of power that a social network platform like Facebook has, is proportional to its size. And if we want to address that, from a policymaking standpoint, or a business standpoint, we have to make structural reforms to the economy itself, that address network effects. And as I described in the last chapter of the book, that means instituting interoperability and social network and data portability, which is itself the main structural reform to the social media economy that will have the single biggest impact on the level of competition in that economy.

Noshir Contractor: Let’s go to that final chapter, you title it, “Building a Better Hype Machine.” What you just described was, the larger the network, the more sticky it is, the more likely it’s to keep you down on it and not let you go away from it. How does interoperability solve the problem? And why would the large networks today have any incentive to participate in that?

Sinan Aral: That’s the main question. And I think the answer is obvious. They have no incentive to change, because they’re making money hand over fist, consumers have no choice but to use the platforms that have the greatest network externalities, because that’s where everyone is to talk to. Now, it’s interesting that you started with the 2011 paper that we wrote, where we first began to measure network externalities in the social economy. That was a decade ago. Now we’re doing a very similar experiment across all the platforms to measure, well, how big is the network effect for a Facebook or a Twitter or Snapchat? And what can we do to break this network effect? So if I was to hypothesize to you, how would you feel if you couldn’t send a text message from Verizon to Sprint? You think I was crazy. You think I was absolutely insane — What do you mean, I can’t send a text message from one mobile carrier to another, of course, I should be able to do that. And then I say, Well, why can’t you send a message from Facebook to Twitter, or from Instagram to Snapchat? And suddenly you think to yourself? Hmm, that’s interesting. Why can’t I do that? Well, the reason is because they have made themselves incompatible in order to retain their network externality and their network value.

And because of that, consumers have no choice. And when consumers have no choice, and they can’t switch from one platform to another, without incurring significant cost, then there’s no incentive for that platform to give consumers what they want, which is a clean internet that protects their privacy, that reduces bullying, that reduces misinformation, that does something about social media manipulation during elections, that reduces the amount of hate speech on the platforms and so on. Because all of that stuff is engaging and profit maximizing, it’s allowed to continue. But there’s a bill in front of Congress now called the Access Act, which would require any social media platform greater than 100 million users to become interoperable with other social media platforms. If that were enacted, and if I came up with a social network that said, I would protect your privacy, I will eliminate phishing on my platform, I will have tight security, and you can speak to anyone on Facebook, Instagram, Snapchat, and Twitter, then people would be able to choose that social network. And as more people chose that social network, the larger platforms would have to make reforms to provide similar levels of privacy and security in their platforms, and so forth. That is an example of how interoperability creates competition in networked industries.

Noshir Contractor: Now, for a long time, even before, before we got to social media platforms, the notion of having some kind of interoperability goes back to the very start of the internet. The brilliance of the web was in large part based upon HTTP, the hypertext transfer protocol and allowed interoperability. Do you see that there is any reason to believe that through legislation or technological innovation, we will see a breakthrough in this interoperability dilemma that we face right now?  

Sinan Aral: I’m sincerely hopeful that we do because I think this is just an obvious point — imagine a balkanized internet where you had not a global internet where you could share information, but you had many different privately controlled, intranets that people were part of and had to pay to be a part of, and to get access to information and so on. The level of innovation and collaboration and communication and life-saving health information, and everything that is shared, because the internet is interoperable and is free to build on is staggering. If you don’t create interoperability, you really stifle the innovation and creation of value tremendously.

Noshir Contractor: Well, And thank you again so much for joining us and giving us a chapter-by-chapter tour, in some ways, of the Hype Machine Book. It’s a wonderful read. And I would definitely recommend that to anyone in web science. I look at your scholarship as being one of the poster children of really excellent cutting edge web science. Thank you again Sinan, and good luck for all the additional work that we will be looking forward to seeing coming out of your research in the years ahead.

Sinan Aral: It’s a true honor and a pleasure to join you. I’m looking forward to seeing you in person again very soon and to give you a hug and a high five, because it’s been too long. We’ve all been through a lot and it’s great to connect virtually but I’m also looking forward to connecting with you physically in person as well soon. 

Episode 18 Transcript

Matt Weber: The Internet Archive has about — last count — nine petabytes of archive data, I wouldn’t be able to begin to tell a student how to begin cracking open that repository, we don’t really have the tools for that yet. So in many ways, we’re still developing the technology to be able to look at some of these questions at scale.

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the web science trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

My guest today is Matt Weber — you just heard him talking about some challenges and questions related to web archiving. Matt is a faculty member in the Department of Communication at the School of Communication and Information at Rutgers University. With more than a decade of experience researching information, ecosystems, organizations and communities. Matt focuses on the use of large scale web data to study processes of change. Some of his current areas of focus include work and algorithms and knowledge, public policy processes, production of media and the science of communication within these information ecosystems. In addition, Matt has been an active member of the web science community. He’s the program co-chair for the ACM 2021 Web Science Conference, and delivered a keynote at this year’s conference just earlier this week. Welcome, Matt. 

Matt Weber: Thank you, Noshir. It’s a pleasure to be here. 

Noshir Contractor: To get us started, take us back to when you first got interested in looking at the web as a changing process. What got you interested in this?

Matt Weber: I came into academia, having been a marketing professional working in the media industry, and worked at an ad agency for  a number of years, and then ended up working at Chicago Tribune Tribune Corp, and saw firsthand how badly companies were responding to shifts in communication technology and shifts in news media production. So come 2008, I found myself as a graduate student having the opportunity to head over to Oxford to take part in their summer doctoral program. And it happened that that year the Web Science Trust sponsored the summer doctoral program at the Oxford internet Institute. And so that experience was my introduction to a lot of the core ideas of interdisciplinary research that are central to web science scholarship. And so from very early in my career, I started to see these intersections between the broader questions that I wanted to ask about the interplay between technology and information ecosystems, and some of the core questions that were being asked by scholars who are working in the area of web science.

Noshir Contractor: A lot of the work that you’ve done is based on the assumption that the web is not a fixed entity, and that in fact, it is not just growing, but also in many ways, losing a lot of what is on it. The web itself is extremely ephemeral. And you note in your work that in a study of 10 million web pages, researchers found that the average web page remains live for barely three years. And that a study of Twitter data, focusing on major social events found that 11% of relevant tweets were not available after one year and 27% were not available after two years. How does this ephemeral nature of the web and Twitter affect our ability to understand society?

Matt Weber: So many of us who study topics related to the web look at single snapshots of instances that we record when we engage in our scholarship. And it’s rare that we look backwards in our research or that we have the opportunity to look at larger periods of time we engage in web based research. A lot of that has to do with the availability of data, a lot of that has to do with access to data. 

This is a question that I confronted when I first started my research. I came into graduate school and I wanted to study how news media had been adapting to web based technology. By that point, most newspapers already had web pages, I didn’t have a time machine, I didn’t have a way that I could go back in time and record what I wanted to see. So go back to that opportunity, I had to be at summer doctoral program, it happened that the web science group had brought in a speaker from the Internet Archive. Well, I found my time machine. 

In that moment, I found this resource that had for at that point, 12 years been archiving all of the available web data they could get their hands on. And so I spent a lot of time as a graduate student, building the foundations for providing researchers with access to open up archived web data, and gain access to these larger swaths of data that previously had been made and accessible. 

Now to answer your question about the ephemerality of the web and web based technology., that’s really key. Having archives and having repositories allows us to examine change over time. And it allows us also to see what has been lost over time.

Noshir Contractor: What does it mean to create an Internet Archive? Is it taking a snapshot every day, every minute? How does that work?

Matt Weber: When we think of a term, Internet Archive, many of us at first glance, think, Oh, I’ll be able to go and look at the web page and just replay it across time. And the reality is far different from that. The Internet Archive itself — archive.org — the initial example of what an internet archive is, was started with the idea that this would become the library of the world, it would become a online home for all web content, digitized music digitized books, this would be a go to resource for anyone looking for free information, free knowledge on the web. In many ways, that’s what the Internet Archive has become. But the concept of internet archiving is much broader than that. The Internet Archive is an example. But then look at other organizations like the Library of Congress, the British Library, countless national libraries across the globe, that have all created their own separate repositories to archive either their national web domains or specific subdomains of content that are of interest. The question within each of those domains is what is actually being preserved? Each librarian or each archivist who is working within the library has to make a decision about how often do we record webpages? How detailed are those records going to be? How much of the content are we going to preserve? How accurate is the replay going to be? And what we find is an incredibly wide variance across these libraries.

Noshir Contractor: I want to turn to this question that given that we have the data and given that we now are able to roll back and essentially play like a movie, how the web looked at at different points in time? What are the kinds of questions that you think we can begin to ask now in ways that we weren’t able to do before?

Matt Weber: Before I answer the question about the kinds of research that we’re able to address this type of web archive data, I want to point out that one of the fundamental problems we still have is that even with all of the advances in technology, we still are not very good about accessing this type of archival web data at scale. The Internet Archive has about — last count — nine petabytes of archive data, I wouldn’t be able to begin to tell a student how to begin cracking open that repository, we don’t really have the tools for that yet. So in many ways, we’re still developing the technology to be able to look at some of these questions at scale. There are a number of researchers right now, including myself who are working to tackle these challenges. There are obviously a whole host of questions that we can start addressing. For me, I think one of the most fascinating things is to be able to look at different aspects of our information ecosystems, the environments within which we live in and operate in today to understand how those ecosystems are evolving over time. 

Noshir Contractor: Can you share some insights about what are the kinds of things we are able to learn that we were not able to do before we began in web science to study the internet archives?

Matt Weber: One area where I’ve been looking for the past five, six years is at the growth and demise of various aspects of our local news ecosystem. And by looking at scale and leveraging web archive data, we’re able to unpack specific findings such as the connection the percent of minority residents within a community has to the overall health of the information ecosystem in a local community. For instance, we see that as there is a greater and growing Hispanic population, there is a pullback on the part of corporate news organizations in terms of the amount of content that’s being provided to that community. We see more niche newspaper outlets coming in to fill the gap. And that’s a story that without being able to look back through the repositories that we’ve built up, we would never have been able to detect and to pull out. 

Completely different topic, we have a repository built around social media data, and news media data tracking the events around both Superstorm Sandy and Hurricane Katrina. And you’re able to see how community partners were able to work together in very niche micro clusters to basically fill the gap of information in the early weeks, early days after each of these disasters, to create a nexus for information for the communities that were affected.

Noshir Contractor: As you look at these insights that we get from people studying past events on the web, are there ways in which the insights helping us come up with actionable ways of doing things differently moving forward?

Matt Weber: When we talk about web archives, the term archive is maybe a misnomer in many ways, because it implies something that’s been archived, stored and put away. We’re talking about data that’s contemporary today, and then chronologically works backwards. And so much of the web archiving research that we’re talking about, is present to modern day, but allows us the ability to then look at the evolutionary path going backwards. And so again, I’ll come back to my own work, right now, looking at local news information ecosystems — we’re leveraging that work today to advocate for policy solutions in a number of states that are looking to create new models to support local media environments.

Noshir Contractor:  You mentioned earlier about some of the challenges that we face that there are technological challenges associated with being able to navigate a study of the web. Yet, the combination of new tools and metadata formats have demonstrated that some of this analysis can be conducted more affordably. Tell us how hopeful you are about this, this trend towards making these data more accessible.

Matt Weber: So let’s start first, with advances in computing. We’ve seen significant progress in our ability to work at scale on the web. And web science being the interdisciplinary home that is, is a perfect venue to be talking about  this type of scholarship and this type of education. With regards to research, increasingly work that took a supercomputer, work that took a computing cluster, can today be run through Amazon Web Services on your laptop.

Add to that some of the more programmatic technological advancements. All of that goes to say that we can work with larger sets of data in a fashion that is much easier than it was even two or three years ago, 

The challenge with web archive data is still in translating between say, your library and the work file format. There are groups out there that are making a lot of advancements in this area.  And I think in the next three or four years, we’re going to see even more gains, they’re going to make this research much more commonplace.

Noshir Contractor: That sounds exciting. We’ve been celebrating all the incredible insights we can get from looking at the Internet Archives. But do you also have examples of concerns about limitations, things that may be lost in a biased or systematic fashion? That might then in some ways limit the confidence we have in our inferences based on looking at the Internet archives?

Matt Weber: That’s a fantastic question. I think it’s a fundamental problem with web archiving, that hasn’t been fully addressed. The process of archiving works very much like a network, you start with a few central nodes. And you archive out from those central nodes, you pick your starting point and say who’s linked to these nodes, and then who’s linked to those nodes, and you continue to crawl on. And so this sets a dominant hierarchy for what is going to be archived, what is going to be stored, if you are a niche community on the web — if you are a, say, a group of minority-serving newspapers in Newark, New Jersey that has a very small web presence, but a very strong impact in your community — if you’re not connected to the main network of media organizations, and the main network of information websites in the state, most web archiving platforms will miss you, will simply skip over as if you never existed. And so when the researchers then go back to use this archive, to leverage this archive, from their own work, if they’re not aware of these gaps that may exist in the archive, the presumption will be that those websites never existed, that those communities never had access to information that was being provided. Even though there was a very robust community there. And I say this, I use a Newark example, because that’s exactly what happened in own work. 

Noshir Contractor:  And I imagine that everything that you’ve just described, if it is a problem in New Jersey, I can only imagine how much more of an issue that is, in other parts of the world, in the global south, which have a much weaker digital footprint in some ways. To what extent do you see that as a limitation in terms of our ability to make inferences?

Matt Weber: You mentioned the global south, we travel around the globe and pick your example, let’s go say into India and look at news provision in India. And a lot of that is either A. still happening through printed paper or B. happening on technological platforms that skip what we know to be the mainstream web. So a lot of news dissemination via apps like WhatsApp, that are increasingly community based platforms for spreading news and information through a community. And all of that is overlooked by this traditional web archiving type of technology. 

As web scientists, as researchers studying in this domain, we have to be increasingly attentive to multi method research that allows us to more accurately represent the gaps that may exist and dominant modes of data collection. Unless you engage with the communities and talk to people living in these communities to understand how they’re getting access to information, how they’re getting access to news, you wouldn’t understand where those gaps were. And so more and more today, when we have greater access to data at scale, we simultaneously need to be leveraging partnerships with other scholars, partnerships within communities, to better identify where the gaps exist in the data that we’re relying on for our research.

Noshir Contractor: How concerned are you that as we move towards platforms like WhatsApp, for example, that those platforms which are often in some way shape or form walled gardens? That archives may not be necessarily tapping into what is happening within these sort of private spaces? 

Matt Weber: We were deeply concerned about walled gardens a decade ago when newspaper companies started putting up paywalls and limiting access to certain types of content unless you were a paying subscriber. At a very different scale, and a very different level, we’re having a similar conversation today, when we talk about walled gardens, we’re talking about information that we can’t access. Now, some of that today is happening because of increased concerns on the part of consumers around privacy. Part of the shift to messaging and information dissemination on platforms like WhatsApp comes from an increased desire on the part of consumers to have privacy and the information that we’re sharing. This creates a lot of challenges for us as scholars, in terms of the information that we study in the information that we hope to have access to in order to better understand the social lived world that we engage in, day-to-day. There are no ready answers for this. But the lessons that we’ve learned from the past two decades of research, examining web data, living in the world of web science has prepared us to better tackle these questions going forward. Hand-in-hand with that, we talked about Twitter as a great example of a company that’s opened up data, we’re seeing pressure on other companies like Facebook to make some of their data more available. And so we are seeing some forward progress in terms of opening up other platforms.

Noshir Contractor: You brought up the issue of privacy amongst individuals as being a major driver of these moves to platforms that provide walls in which people can discuss, which brings me to one of our closing points here, and that is article 17 of the GDPR — the general data protection regulation — talks about the right to erasure or the right to be forgotten. To what extent do you see this right to be forgotten, and giving people the ability to go back in time and delete some of the information that has been aired about them as a serious concern in terms of being able to study the archives?

Matt Weber: The joke has always been something along the lines of: Be careful what you say on the web, because once you say it, it’s impossible to delete. Web archives are the embodiment of that challenge. Once content has been stored into a web archive and preserved in some fashion or other, it’s technically very hard to go back in and scrub every mention from every archive. We as researchers have to be very careful about what we share in terms of personally identifying information when we go back and use web archives in our scholarship. 

And on the other hand, the archivists themselves also have an obligation to better understand how we can work across web archives to adhere to standards, like what GDPR has set forward. With regards to the right to be forgotten. I expect that that type of legislation will continue to grow over the next decade. And we will have to find other ways to make sure that even going back in time, you have the right to have your information removed from these types of repositories.

Noshir Contractor: Given how much you have been thinking deeply about web science and the web over the course of your academic career already, what do you see as some of the most challenging issues that you or others within the web science community need to be placing more of an emphasis on than we currently do?

Matt Weber: In this conversation alone, we’ve hit on a number of the key themes that are pressing issues for the web science community at large, but also for the group of scholars who are thinking about the data that we have available to address these questions. And I would include web archiving in that set of data, we need to do a better job of thinking about how we address privacy in the data, privacy rights and the data that we are accessing. We need to do a much better job of making sure that a diverse set of populations are accurately represented in the data that we’re using. I think both of those fronts, there are decade’s worth of unanswered questions that I know many of us are working to address. Those are critical areas right now.

Noshir Contractor: Wonderful. Well, again, I want to thank you so much, Matt, for taking time to talk with us. And you’ve done so much in helping us recognize the importance of looking at that web as an ephemeral changing dynamic process and telling us about how we can learn so much about society by not just looking at a snapshot of the web at one point in time, but by essentially rewinding and playing back how the web has constantly been changing over the last few decades here. And I thank you again, for both your research and your engagement with the web science community. As I mentioned, you’re the program co-chair for the ACM 2021 Web Science Conference. And we all look forward to listening to your keynotes that you will be delivering on the 21st of June. So thanks again, Matt. And we look forward to learning more about your research in the years ahead.

Matt Weber: Thank you Noshir, I really appreciate the conversation.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening. 

Episode 17 Transcript

Emilio Ferrara: I feel like in web science and jointly adopting theories and data science, tools, and computational tools allow you to come up with the right blend of theory and data. That allows us to understand this phenomenon beyond just simple characterization, or simple theoretical explanation without a support from empirical evidence. This is really the fascinating power of web science, putting together these two things and balancing them together.

Noshir Contractor: You just heard from our guest today, Emilio Ferrara, who is an Associate Professor at the University of Southern California in Los Angeles. He has appointments in… Communication at the Annenberg School for Communication & Journalism, in Computer Science at the Viterbi School of Engineering and in Preventive Medicine at the Keck School of Medicine.  

He’s also a Research Team Leader for AI at USC’s Information Sciences Institute and the Director of the Annenberg Networks Network, ANN for short. And earlier this year, Emilio became the Chair of the Web Science Trust Network of Laboratories (WSTNet for short), which makes him an especially valuable guest today to talk with us about what he envisions as the next generation of Web Science. Welcome, Emilio.

Emilio Ferrara: Thank you very much, Noshir. I appreciate being here today.

Noshir Contractor: I’m just delighted that we have an opportunity to talk with you. I want to start by a tagline that I see associated with you, and that is your interest in networks and societies plus humans and machines. Can you unpack what you mean?

Emilio Ferrara: These new technologies emerge within our society, and they have effects on our society, their effects on the ways we connect among each other. They have effects on what we see on the web, and so on. So I feel like one of the most interesting opportunities in the context of web science that has been emerging over the last couple of years is certainly the ability to study how all these components interact — social media platforms, social networks, online and offline. And these emerging tools of artificial intelligence allow humans and machine to collaborate with each other. So this intersection of these different disciplines and areas has been the focus on my research for the last decade.

Noshir Contractor: You’ve done a lot of work, obviously, in the area of social bots, how would you consider or characterize the research that you’ve been doing, as an instantiation of this tagline that you have in terms of networks in society and humans and machines?

Emilio Ferrara: So we’re seeing the effect of social bots, which are accounts controlled in part are entirely by software, rather than human users, on social media platforms in a number of application domains spanning from politics, to public health, and so on. And these accounts are operating in public spaces in online networks, and they also interact with human users. So here you have the intersection between humans and machines affecting our social networks and our society.

Noshir Contractor: That’s terrific. Now, one of the first areas where I began to read about your work was the DARPA Twitter bot challenge that got a lot of visibility. DARPA has had an interesting history about technologies having been in involved with helping get the internet itself started. And this Twitter bot challenge was something that caught a lot of attention. Tell us what the challenge was about and what what what you learned from your experience with it.

Emilio Ferrara: Our team at Indiana University led by Fil Menczer and Alex Flamini was selected for this program. And I was lucky enough to be part of these efforts. And one of the goals of this program, as it developed, was to understand the possible effects of social bots on social media communication, especially in the context of public health.   

So DARPA  organized in 2014, this challenge for the detection of bots engaged in vaccine debate online. So the goal was to distinguish these anti vaccine and pro vaccine bots in the discussion. The challenge itself took place over several weeks over which we would receive Twitter content, sort of a playback of the to their content that will deploy our technologies to detect such accounts. 

So three teams did extraordinarily well. They detected all the bots, the team from the Subramanian and University of Maryland and the team at Indiana University led by Phil Manzer and Alex Flamini that I was part of, and then the team by USC led by Ron Gaston and Christina Lerman. And ultimately, this was just before I moved from Indiana University to USC.

Noshir Contractor: So you moved from one winning team to the other winning team, our competition? 

Emilio Ferrara: Yes.

Noshir Contractor: Back then, you were already looking at understanding the role of social bots and social media to deal with issues associated with vaccination, way before we all were experiencing what we have over the past year with the COVID crisis.

Emilio Ferrara:   I feel like public health has been maybe the most salient area where the manipulation of social media can have an impact on the real world and the change of behavior in the real world. Of course, there has been a lot of emphasis on politics, but public health sometimes goes under the radar. And it’s really not well established yet, the extent to which the manipulation of public health related discussion can be detrimental and dangerous for our society and social media definitely play a big role. So we looked at vaccine debates, long before COVID-19, we started looking into that around 2013 or so. 

And that was just around the time when a big measles outbreak occurred in California. And interestingly enough, the vast majority of anti vaxxer and vaccination groups, were actually aligning with left leaning or more liberal ideologies. Whereas today, when we look at the hesitancy around the COVID vaccine efforts, these emerge mostly from more conservative users. So these should tell you how much of a bipartisan issue is vaccine hesitancy, and how important it is to understand vaccine vaccine hesitancy through the lens of social media, because social media allow us to get a very diverse representation of political ideologies and how these ideologies interact with public health behavior.

Noshir Contractor: I want to go back to something you said, and that is that initially, the anti vaccine will be largely from the left, and now they’re from the right. I understand the second one. What is your explanation about the first?

Emilio Ferrara: That is an interesting phenomenon that we have seen, and you’re definitely right early on. And, you know, a decade ago or so these anti vaccination movement, especially opposing mandatory vaccine regulations for children emerge mostly from liberal progressive users. Were individuals with high education, typically from good upbringing, upbringing, in urban areas in rich states, like California, and you know, states in the West Coast.  And this was not necessarily for religious beliefs, but really for personal beliefs, and concerns about vaccine safety, vaccine side effects in children and so on. Interestingly enough, we have seen this shift towards a more diverse population of users that are opposing vaccination campaigns and a shift towards more conservative users opposing COVID vaccinations over these recent last year or so. So it’s really a bipartisan issue. And it’s a very complex issue to explain that cannot be explained exclusively with political beliefs.

Noshir Contractor: You’ve talked about the sort of relationship between political polarization on the one hand and online conversations not just about politics, but also about the pandemic. And you published an article late last year titled “What types of COVID-19 conspiracies are populated by Twitter bots.” So Emilio, my question to you what types of COVID-19 conspiracies are populated by Twitter bots?

Emilio Ferrara: Unfortunately, a lot. And unfortunately, some of the worst conspiracies that you can imagine. So this is a paper in which I took an early look at the landscape of COVID related discussion on social media. So we were lucky enough to have this foresight in our lab to start tracking COVID discussion early on in, maybe before everyone else did, in January 2020. And we also published these data sets. We made it openly available to the research community, and publish the associated paper in the Journal on Medical Internet Research. 

We made it available because we thought these would eliminate one of the barriers to allow researchers to get large data sets and understand this phenomenon beyond what we could do in house in our lab. 

In this study, I highlighted the role of 10s of 1000s of bots over the first couple of months of COVID. And it turned out the bots were active in the spread of political conspiracies, conspiracies of various types, conspiracies pertaining the origin of the virus. Some of the bots suggested that the virus was a biowarfare that was deliberately created, for example, by China, and it was deliberately spread to the United States and the rest of the world. And this created a lot of anti Asian sentiment. So that was very problematic kind of conspiracy. The article also highlights how other conspiracies focus on misinformation about treatments. 

But I feel like the most concerning kind of conspiracy is that the study highlights are related to bots that spread extreme political beliefs, beliefs that are mostly aligned with the out-right movements and far right ideology and so on. So what this study highlights is the attempts to  hijack COVID the discussion and turn it into political extremism. So some of the most active bots that we uncovered that I documented in this study, are bots that effectively spread Q’Anon. And some of the most prominent hashtags that we see are spread by these bots are hashtags that are very popular hashtag for white supremacists.

There are very many troublesome ideas and ideologies that have been spread and injected into COVID. And many of them have been pushed by bots. One thing that fortunately we have observed is the fact that Twitter ultimately suspended a large fraction of these accounts. So there was a mitigation strategy in place. But this took place many, many months after these accounts started to spread these ideas. So it was already laid, in some sense, they had already a large effect on the network in terms of spreading these problematic ideas.

Noshir Contractor: You’re using web science to study this phenomena that was created on the web. What are some of the ways in which web science is providing you unique tools and techniques to study for better? And for worse? 

Emilio Ferrara:  That is an absolutely interesting question. And actually, I feel like web science was the catalyst that started this entire research direction. In fact, early on in 2014, we published one of the first studies that looked at how to use the tools of web science to study social movements on online. Over the last several years, we have been using the same tools of web science to study other social movements. For example, we have some work coming out immediately where we studied Black Lives Matter, through the same lens of social media and social media discourse. So web science has allowed us to focus on the behavior of individuals and the communities and groups on these platforms and understand how these collectives emerge and characterize their activities, not only from a computational standpoint, from a data-driven standpoint, but also from a theoretical standpoint, looking at these groups as organizations,  looking at these groups as collectives. 

And I feel like in web science and jointly adopting theories and data science, tools, and computational tools allow you to come up with the right blend of theory and data. that allows us to understand this phenomenon beyond just simple characterization, or simple theoretical explanation without a support from empirical evidence. This is really the fascinating power of web science, putting together these two things and balancing them together. And we have been learning a lot about how to increase diversity, how to understand the biases, and so on, through the lens of the web.

Noshir Contractor: I want to touch on something you just mentioned, the extent to which web science should be in your opinion, be primarily concerned with identifying issues, identifying bias, recognizing things that might not be obvious. And then on the other hand, for a lack of a better phrase, doing something about it. How do you see where web science is currently positioning itself? And how well it’s doing on either of these? And how much should it be doing on each of these areas?

Emilio Ferrara: It’s dear to my own heart and research agenda to understand how we can use the web and the technologies that are enabled by artificial intelligence and so on to improve the web and to improve our society in you know, very directly right. 

I was delighted to see that these years Web Science Conference topic is actually revolving around making the web a more diverse, more equitable place using web science as a as a framework. And I feel like web science, in its transdisciplinarity nature, provides the best tools to do that. We have the artifacts, we have the machines, we have the technologies and tools. And on the other hand, we have the collectives, the networks, the aggregation for where people come together, and they are at the same time using these tools, but they’re also affected by these tools, right. And in my opinion, if you look at these dimensions in a disjointed manner, you’re only going to be able to grasp a partial view of the problems. You need to look at both sides, if you really want to understand how you can improve society using these tools, and how you can mitigate the negative effects of these tools on human users. And, I feel like web science offers the best lens to do that.

Noshir Contractor: And you have done beyond your own research. When I introduced you, for this episode, I mentioned that you have recently become the director of the Web Science Trust network of laboratories. So first of all, congratulations, and thank you for taking on that important role. I would like you to tell our listeners a little bit about what the Web Science Trust Network of laboratories is, and what you see as a vision going forward for this network.

Emilio Ferrara: So the Web Science Trust operates a network of laboratories that are spread around the whole world, there are more than 20 section labs affiliated in this network. And they are all very well known groups of researchers whose work is often associated with web science and other sister disciplines, all revolving around the study of social networks, the web, human and machine behavior and so on. These centers have been pushing the disciplines collectively over the last two decades or so. And the network itself plays a role into shepherding in some sense the community and the official direction of this discipline.

I feel like as a director, my dream would be to enable all these labs belonging to this network to collectively operate and create new initiatives that can push forward with data, and can push the impact of web science into our world into even more evident, obvious avenues. So we are embarking on to our collective initiatives, initiative to pursue larger projects, collaborative projects that try to cross national boundaries, and push for more diversity in this field and more diversity in even in the labs and in the discipline as well. Coming up with our moonshot, which would be at least one major research project, bringing onboard as many of these labs as many of these countries as many of these sub communities as possible, and trying to pursue such moonshot project. So there is a bright future in my mind ahead for the web science community, for the Web Science Trust and the network of laboratories. 

Noshir Contractor: You did mention the word moonshot. If I had to ask you at this point, what would be an example of a moonshot study that would involve all the WSNET labs or at least a large number of them? What would that look like?

Emilio Ferrara: As for the moon shot, I feel like one of the major roles that web science and web science community can have in the future is really operating as a glue to bring together people from different fields and encourage them to pay attention to maybe the web as as a societal glue, as a as a system of systems, and allow the study of the web or systems of systems, with respect to some of the emerging problems that we collectively face, our society. 

The pandemic of courses may be the problem that is on everyone’s mind on these days. But there are many other problems that emerge. And the web can provide a lens to study them: sustainability, climate change, and so on are definitely very important problems. I feel like that’s something where the Web Science Trust can contribute because we can use the web as a monitor, as an observatory, to understand how people think about these problems, how people pay or not pay attention to climate change, to sustainability issues and so on. 

Obviously, there is another big problem that revolves around artificial intelligence and automation. So you know, as these AI revolution keeps emerging, there are going to be issues with job displacement. Web science, again, provides a lens to study human behavior in these new contexts and a way, maybe to anticipate issues and problems with that will exist in the future of work and society. 

And then, of course, there is always the aspect of democracy that is very dear to my heart, right. So as we observe the world change, as a reflection of all these phenomena, public health, pandemic, automation, and so on. Our countries, our democracy are constantly in peril in danger, right, because we have seen the rise of these nationalist movements, and extremism of every kind and sort that they’ve been leading and growing in the web. And we should study them through the web, right? Because that’s their natural environment. And these are the tools that we have at our disposal, and we should maximize them. So I think these are going to be part of the big moonshot, that the Web Science Trust, and web science as a community should, hopefully contribute to in the near future.

Noshir Contractor: That is extremely exciting. I think the idea of taking web science and training its focus on some of these grand societal challenges, would be incredibly powerful and compelling, if for no other reason, because so much of what is happening in all of these contexts is being coordinated via the web. And so as you said, using the web as an observatory and as a monitoring platform becomes important. And as you said, beyond just using it to monitor, you also have the ability to change some of these phenomena as a result of the tools that we have and the technologies and that we have related to the web, etc. 

I want to thank you again, Emilio, so much for all the excellent work that you’ve been doing in this area, for your leadership on the Web Science Trust Network of Laboratories, and for coming and sharing some of these ideas and exciting plans that you have with us today. So thank you again, very much, and we look forward to getting much more insights and research and leadership from you in the decade ahead.

Emilio Ferrara: Thank you very much. It was my privilege.

Episode 16 Transcript

Aleks Krotoski: I think that is the thing that has surprised me the most about where the web is now. The requirement the necessity for people to present as their offline selves, whether that’s for commercial purposes, or for social and psychological purposes. The great playground that we had of identity, the idea of being shielded from, full identity revelation that that we experienced, even as late as 2009. You know, we don’t have that as much anymore. We aren’t able to play with our identities as much as we were anymore. And I think that that has very interesting consequences for not just how we study web science, but also for the actual experience of the people who are living in this digital world.

Noshir Contractor: Today, our guest is Dr. Aleks Krotoski. Earlier, you heard her talk about the web’s impact on people’s offline and online identities. She’s an award-winning international broadcaster, author and academic, and she studies and writes about technology and interactivity. In 2009, she earned a Ph.D. from the University of Surrey, with her thesis focusing on information flow and the spread of ideas across digital spaces. Her book, “Untangling the Web: What the Internet is Doing to You,” based on her hit columns in the Guardian and Observer, was published in 2012. Since then, she’s continued to break ground in academia and journalism, and she’s currently a Visiting Fellow in the Media and Communications Department at the London School of Economics and Political Science and a Research Associate at the Oxford Internet Institute. Welcome, Aleks.

Aleks Krotoski: Hi, thank you Nosh, it’s wonderful to see you.

Noshir Contractor: It’s really good to see you and hear from you as well today. It’s been a long time since we spoke, and when I first met with you, you were working on your dissertation research, which I believe was one of the first efforts at doing a social science research project, which then we came to know as web science research. Tell us about your dissertation and your research and what got you interested in it.

Aleks Krotoski: You’re sending me down memory lane. Let me go back to what got me interested in it. So way back in the day, I mean, this is 150 years ago, I was presenting a television program in the UK about computer games. And I, over time, I became one of the assistant producers and became sort of instrumental in identifying the things that we should look at and the topics that we should cover and who went into these spaces. One of my co-presenters, I assigned her to review the game Asheron’s Call this is this is the era that we’re talking about. And I thought she was gonna come back and tell me about all of these things, these guys who sit around in their dark parents basements, and being really geeky, but she came back talking to me about the most extraordinary social phenomena I had come across to date. 

As a social psychologist, I had been interested in looking at how society interactions, group dynamics were functioning. But when I sent her into the space, I didn’t realize that I was going to hear back from her about justice systems, I didn’t realize I was gonna hear back from her about identity and identity play and identity development, I didn’t realize that all of these dynamics were not only present in these spaces, but also, they were effectively recreating the systems that we already had offline, that blew my mind. 

These were spaces, which in my mind, were completely separate from the physical environment in which we lived a place where we would be able to reinvent ourselves entirely, completely come up with new systems. And yet, here was evidence again and again — I started to recognize that we weren’t reinventing anything, we weren’t coming up with brand new systems, we were simply bringing our existing ideas about who we are, who society is, into the online space, and I really wanted to understand and explore that. So when I was looking initially at the work that I wanted to do, coming from a social psychology background, I was interested in patterns of communication. 

Now at that time, network science was particularly really coming to its own and there were two distinct, I would argue sort of brands of network science, there was the more mathematical science. Then you had the more sociological elements. And I was like, Well, here we have a technology that actually describes those connections. And then you can ask the people about those connections. And you can track and trace and follow networks of information to see to what degree the online world and those systems and those processes that we already knew offline, reflected, or were different from the offline processes. So that was kind of that was the nugget it all it all came back from Asheron’s Call and me sending somebody into the virtual world.

Noshir Contractor: There was a definite assumption that the offline and the online were very separate worlds. And that what we might find in the online world would be very different, perhaps, than what happens in the offline world. And you were looking in particular at a particular phenomena in these online games, do you want to tell us a little bit more?

Aleks Krotoski: I was looking at influence. And I was looking at the adoption of an innovation I was using the virtual world Second Life as my territory. And I remember, every time I logged in, I would write down the number of people who had accounts and the number of people who were active. The numbers that I was writing down were like 2500 people have accounts, which eventually turned into something like 15,16 million accounts. Like I watched the explosion of this virtual world, just a phenomenal network, a profoundly enormous network.

And at that time, Voice Over Internet Protocol (VOIP) was a system that was being introduced into Second Life. The developers themselves thought that this would be a very natural technology that people would adopt, because it would allow for business transactions, it would allow for interpersonal transactions, whatever transactions went on. But what they found is that people were not adopting it. And I was curious as to why. I wanted to know why this was stalling. And what I found by mapping a subset of 47,000 accounts and got the reciprocal relationships, and then really dug into what those meant to the people who were within that network. What we found is that online, there was a difference in the adoption of innovations. 

It came back to a very, very interesting, but a very well known phenomenon offline, which is about who you believe as credible, who you believe is trustworthy, who you believe is like yourself, and who you believe is a very prototypical member of your interpersonal network. Now, the difference between online and off is that usually those networks in face-to-face experiences, they’re quite rich. 

As we know, through web science, we know that the nature of interpersonal interaction offline and online is different because of the richness, of the leanness of the medium, just the amount of stuff that we can read, without having to literally read the information. What I found in this research was that you had the initial adopters. And then it reached a point at which people were either gender playing, or they were aged playing, or they were playing identities that were not identical, or overlapped with their offline selves. So if somebody was presenting as female, they didn’t necessarily want to do VOIP because people didn’t realize that, in fact, offline, they were male, their voice would give them away.

Noshir Contractor: And so what your research points out is that sometimes there is value and merit in not having all dimensions of our appearance and of ourselves be presented in a web environment, and that sometimes there is freedom in being able to conceal certain facets of your character in the online space.

Aleks Krotoski: 100%. That research came out, that was 2009, and then subsequently, in the decades since, I wrote the book that was sort of, in part based upon that research, and then also on other research that I’ve done, journalistically since,  I think that is the thing that has surprised me the most about where the web is now. And the requirement, the necessity for people to present as their offline selves, whether that’s for commercial purposes, or for social and psychological purposes, the great playground that we had of identity, the idea of being shielded from full identity revelation that that we experienced, even as late as 2009. We don’t have that as much anymore, we aren’t able to play with our identities as much as we were anymore. And I think that that has very interesting consequences for not just how we study web science, but also for the actual experience of the people who are living in this digital world.

Noshir Contractor: I think you’re right, there has been so much more emphasis on making sure that we authenticate people in various contexts on the web, that predilection for authenticating people has come up the price of not enabling and empowering people to have alternate presences on the web, as they did back in 2009.

Aleks Krotoski: And indeed, it’s not just the presences on the web. This is something I find very important that I do feel that we’ve lost because we do live so much of our lives online. The internet, particularly over the last year has finally in many ways become mundane, for many people. 

Our existences are fixed, right, it’s as if we are living our entire lives right in the past to the present in the now. But the context of that information feels like it has been lost. 

I remember Eric Schmidt many years ago, when he was the CEO of Google, said that he wanted to create a search that sort of that stuff from the past would just disappear. And the reason for that is because you know, I am very different from who I was when I was a teenager. Right? And I’m even different from who I was 10 years ago when I was being when I was writing about this. But you pull me up right online, and probably one of the first things that still comes up is a TV show that I presented, right, the thing that I was talking about earlier, the TV show that I presented between 1999 and 2002. Sort of having to explain to random people that I am not that person that I was 20 plus years ago, is exhausting. And it also means that the idea of people coming up in this space, are not able to naturally reinvent themselves or have spaces in which you can discard that old self and move into another space. And everybody mutually accepts that this idea of being unable to psychosocially develop, and to discard the self and sort of to always have to have the consequence. It’s ironic, in web science, we used to talk all the time about how the online space had no consequence. And now, the consequence is forever there.

Noshir Contractor: One of the things that Europe is perhaps been a little more advanced than the US is in efforts to have the right to forget, that has been brought up in the EU. And it I think it speaks to some of the concerns that you’re just expressing there.

Aleks Krotoski: Often the right to be forgotten is, you know, it’s granted not on the basis of some kind of embarrassment happened a few years previously, but usually to do with some kind of, you know, a bankruptcy or some kind of thing that you have served your time for, 

I mean, I’m talking about like, you know that embarrassing thing that you did in front of your Aunt Martha back when you were four and every time you see Aunt Martha she reminds you of it, you know? Like, well, now I’m 44, can you please stop talking about this? Like, the internet is your Aunt Martha. And you’re not sort of allowed to move forward. I’m curious whether to what degree this has, has an impact on people’s development of self and their feeling of freedom to reinvent? I don’t know if anybody’s been doing research on this. it has been some time since I’ve thought about that.

Noshir Contractor: Yeah, well, speaking of research, you are, Aleks, one of the best examples in of somebody who has straddled the academic and the sort public space within web science. And you’ve had fellowships at the University of Oxford and at the London School of Economics. And at the same time, you made reference to those columns that you wrote in The Guardian, which aptly was titled, Untangling the Web, the very namesake of this podcast series. And then the book that came out in 2013, with the same title. 

And in that book, you unpack a few dimensions of untangling, you talk about untangling me, untangling us, untangling society. And then finally untangling the future. Tell us a little bit about why you called the book and column Untangling theWweb. And which of these untangles have surprised you the most? Now, all these years, almost 10 years since the book first came out?

Alex Krotoski: Well the reason I wanted to call it untangling the web was somebody suggested, and I was like, that’s really clever. It’s a great description, because we are, of course, all wrapped up in this space, again, even more so now, over the last year. As those of us who have been studying web science for a long time and have been living it, it’s sort of like, oh, welcome, everybody, we’ve been waiting for you to come to the party. That in and of itself has been so interesting to witness the degree to which all of our research, all of our findings, were actually relevant and valid in a space in which the entire world if they have been lucky enough to have digital technology, you know, has sort of graduated to the space. But I have always been of the opinion that we are always entangled in whatever technologies are within our lives, whether it’s television, whether it’s the pen, even electronic light.  And it’s exciting. All of these innovations and inventions have had an enormous impact on our lives. And we have developed alongside them. So there was that kind of seeking to untangle ourselves from these spaces. 

But another thing that I really wanted to get across, my sort of main aim from this book was, I wanted to untangle what people’s expectations were of this technology, as something that was other. This is the thing that I think drives me more crazy than anything else, is the magical thinking around technology, because it devolves our responsibility as human beings for the decisions that we make, for the outcomes that happen away from us to the technology. T his book sought to look at the psychological research of each of these categories and the subcategories within them. Look at the psychological research the findings, before the web, right? Things that we expected about how we thought about celebrity, how we thought about love how we thought about death, how we performed these things, right, looked at that, and then looked at the meaning of those things after the web, to the degree that we had done any research in that space as sort of before and after. And to this day, I’m pretty adamant about this, what I found is that almost nothing is different. Right? The idea, the meaning of privacy is exactly the same before. And after we still want privacy. It’s just we’re performing it in a different way. 

But going back to your question about, what is the thing that I think, you know, in some ways as has changed, or has surprised me, in that time, I think part of it is the identity piece, is the fact that we are not as free to reinvent ourselves, or to develop our identities in this space.

Noshir Contractor: One of the things that you have lamented in your writings is that a lot of the research that you just described, and that you’ve been translating in this book, a lot of it is hidden behind the walls of the ivory tower. Why do you think that is Aleks? And do you think it has gotten better or worse in the past decade?

Aleks Krotoski: It’s a wonderful question. I just simply think it’s just the nature of of the ivory tower. I think in the last decade, it has become profoundly better. And that is because of initiatives like web science. That’s because of initiatives like open data. That’s because of initiatives of people who like Sci-Hub, getting that information out to the public so that the public can read it and can be informed, right? I remember there was sort of movements of people to release their content online ahead of time, as it was being developed, but I don’t think we really started to see that truly like a sort of show your work kind of thing, until the mid-2010s. I’m grateful for that. Why should the academy have the stranglehold on this information? Because these are things that absolutely profoundly affect people’s lives, everyday lives.It just requires critical thinking skills, and an ability to be a critical consumer of content.

Noshir Contractor: One of the things that you wrote about in the columns, and in the book was the evolution of the web itself, the growing pains that it went through, the life stages of the web that you describe.  If you could summarize some of those ideas, but also then project out in terms of, what do you see as sort of the next steps in the evolution of the web?

Aleks Krotoski: Great question. And I’m going to answer it, not with reference to the book, but more with reference to what I have seen and some of the things that I have seen over the last year, now that we are kind of feeling the weight of the of the world, in the web. One of the greatest and most profound moments of the web’s history, social history was the eternal September in September 1993, when AOL opened. And it was the first time that people who were new or newbies arrived and outnumbered the number of people who had already populated the web, thus irrevocably shifting the culture, because suddenly, the majority was not interested in what the old guard had to say. The majority was simply forging ahead and doing its own thing and creating its own norms. 

Well, I think we are about to witness a really interesting moment where people like us who’ve been studying and diligent and living it, and all that kind of thing, the web science community is either going to be embraced by or embrace. And I think that’s going to be kind of an interesting tactical way to do it. I don’t know how that’s going to happen. You know, the hundreds of millions of people who thought that the web was an interesting place to visit, but didn’t really ever imagine living there. Now that this enormous population has opinions, because they’ve lived it for a year, I think that’s when we’re going to start to see some really interesting innovations that are not going to come out of the small pockets around the world that have historically been the places where innovations and technology come from, but more people are going to become empowered, because they see the ways that technology does not fit them. And they are able to define how it can fit them. So I think that’s what we’re going to see in terms of the future.

Noshir Contractor:  Listening to you, I’m reminded of the notion that there was some people who were there initially, who might have been the so called digital natives, and that the majority of people were tourists who would visit from time to time and be charmed. But many of those tourists are now digital immigrants that have come to set foot for a long time here. And that’s the big change that you see and your sense is that they are going to change the web as much as the web might change what they’re doing, or perhaps change the web even more.

Aleks Krotoski: I think they’re gonna change the web even more. Because when you have a mob who comes on and has opinions, right, I watched it, it was so interesting, people kind of walked blindly into this space where they were like, I know the room, but I’m not really sure where to sit.

We have evolved norms that are perhaps different from the norms that existed before, perhaps completely uninformed by the norms that were before because that information wasn’t widely available, or even of interest to the masses, who suddenly had to go online and suddenly had to perform and be and do what it is that we have been doing as web scientists for decades and decades. 

So I think that, there is going to be an enormous reinvention. I profoundly hope that one of the outcomes is that people will stop seeing the web as something that is magical, and something that is other, and something that does stuff to me or you or your dad or your mom or your kid or your dog whomever, and actually is a tool and a technology that we operate in as much as the electrical system, you know, the the water system and all of those other things that that we operate within a society and that we use for our own purposes. 

Noshir Contractor: That’s a fascinating vision. Aleks, thank you, again, so much for joining us today on this podcast. As I’ve said, You are one of the best exponents and champions of web science, both as a research scholar, as well as a public intellectual in the space. And we thank you for all your contributions. And we wish you the best and look forward to seeing your continuing insights evolve in the area of untangling the web. Thank you.

Aleks Krotoski: Thank you Nosh, I feel that your listeners are now witnessing my blush. They’re feeling it through their ears. Thank you so much. What a treat.

Episode 15 Transcript

Munmun De Choudhury: Mental health is one part of medical sciences where we have not seen as much progress.…And that’s where the research that I do finds its motivation. Can we find other ways of assessment that can improve the status quo in the way we both diagnose people with mental health risks, and also the way we treat them?

Noshir Contractor: Today, our guest is Munmun De Choudhury, who is a professor of interactive computing at the Georgia Institute of Technology, where she leads the Social Dynamics and Wellbeing Lab. You just heard her talking about what motivates her innovative research in Web Science, which uses social media in order to understand and improve mental health. She adopts a highly interdisciplinary approach, combining social computing, machine learning and natural language analysis with insights and theories from the social, behavioral and health sciences. She has been recognized with the 2021 ACM-W, or the Association for Computing Machinery’s Women, Rising Star Award, the 2019 Complex Systems Society Junior Scientific Award, and over a dozen best paper and honorable mention awards from the ACM and the Association for the Advancement of Artificial Intelligence. Her work has also received extensive coverage in popular press including the New York Times, NPR and the BBC. 

Welcome, Munmun. 

Munmun De Choudhury: Thank you very much for having me here.

Noshir Contractor:  This is a pleasure to get a chance to talk with you. Your work focuses on how web science and the web more generally can help us to detect mental well-being issues, to mitigate those mental well-being issues, as well as to facilitate the treatment of these issues. I’m curious what got you interested in looking and applying these computational approaches to study wellbeing?

Munmun De Choudhury: I loved math and science. But I also loved all of the other coursework that I did as a kid in social science and social studies and humanities. Until I was, late in my college years, I didn’t know if there could be a possible way to connect and bridge the two, like, how do you do stem work,that is connected to people in some way or the other. So but thankfully, you know, I found myself around people who have been thinking about intersections of different disciplines for many years. And I think that reignited my passion to connect what I do as a computer scientist with something about people. 

I think the work that I do right now, that kind of started about a decade back when I joined Microsoft Research as a postdoc. It was also around the time when I had lost my father to cancer and that was a moment of an introspection for me in my life about what does research mean for me? What is it that I can do? And how can I kind take these ideas about connecting computer science with social science in a direction that would help me find meaning in that personal loss? That’s how I started to connect my work with the health field, with the wellbeing field. And over a course of time, I found my home in mental health broadly speaking. 

Noshir Contractor:  It’s always curious how certain personal events in one’s life can explain where we pivot in terms of our professional goals and aspirations. One of the things obviously, that has motivated your work is the very high prevalence of mental health issues. The National Institute of Mental Health has identified that one in four adults, or about 61 million Americans, report to experiences that are challenges in a given year. What do you think about the ways in which we are currently handling these issues, and how the web and social media approaches that you’re taking might be able to address some of the obstacles we face?

Munmun De Choudhury: So you know, the last 100 years or so have been incredible for medical science, we have made a lot of progress when it comes to illnesses such a infectious diseases, I mean, we are living through a pandemic right now, and if you just look at the pace of progress that we have done around it, it’s incredible. But mental health is one part of medical sciences where we have not seen as much progress. The methods that we currently use are pretty much still the methods that were prominent about the time of the First World War, which is when some of the earliest recognition was given to mental illness as illness. 

We saw some developments in the 60s and 70s, with the invention of antidepressants and other drugs. But in terms of assessment and diagnosis, we are kind of still about 100 years old, a primary paradigm is self-reports from individuals. And unlike other illnesses, where we have objective tests for diagnosis or, or to treat people across the course of their journeys, we don’t have it for mental health. And that’s where the research that I do finds its motivation. Can we find other ways of assessment that can improve the status quo in the way we both diagnose people with mental health risks, and also the way we treat them?

Noshir Contractor: And then one of the things that has also I think, contributed to some of the challenges is a general stigmatization of mental health issues, at least until relatively recently in society. And that brings me to a bit of a conundrum. If people are not in general willing to talk about these issues in public because of fear of being stigmatized, how does looking at social media help?

Munmun De Choudhury: So the beauty of our social media is that we can use it the way we want to. One of the interesting developments that we have seen as social media platforms have become a part of our lives more and more, is that people are finding people with the same lived experience, who are probably going to understand the struggles that they’re going with who are probably not going to be judgmental of the experiences that they have faced in life. And hopefully, it would be less stigmatizing. 

So as much as there is the concern that social media platforms are performative, right? At the same time, we do see other users where people are being candid. And this provides a window of opportunity to look at what struggles they’re going through in terms of their mental health. 

Noshir Contractor: And I imagine that it with some irony, while you might be less inclined to talk about some of these issues with your close friends, you might actually be more comfortable talking to strangers.

Munmun De Choudhury: Sometimes we don’t feel comfortable talking about our deepest struggles with people we know in the offline world, because they might be our coworkers, they might be our bosses. And we don’t want to disclose something that we feel we could be judged on. 

Noshir Contractor: A lot of your work has also looked at how you could look at the passively shared data on social media to proactively detect one’s risk of mental wellbeing challenges. What you are using as a detection strategy has to be somewhat more nuanced than just literally filtering social media postings for those who say they are depressed. Tell us a little bit more about how you go about getting that kind of information about individuals. 

Munmun De Choudhury: You’re absolutely right, that we are looking at more subtle signals, nuances and the writings of people: what type of words that they’re using. So I’ll give you an example. When we use a lot of first person pronouns, such as I, and me and myself, literature and psychology says that it shows an inward focus in terms of our attention, I’m talking mostly about myself. Sometimes, experiences of mental health can be detached from the external surroundings, from the social contexts that people live in. And that can manifest in this inward focused attention. 

But on the other hand, when we use words, such as we and us, it shows that we find ourselves as part of a larger collective, or when we use second person pronouns, it shows that we are interacting with another person. And these are really valuable cues when it comes to somebody’s mental health. 

We also find that social interactions are very, very valuable signals. Am I having a lot of interactions with other people? Am I getting the support that I think I should be getting? So these kinds of signals that are less consciously regulated by people, those are the types of signals that we look for in our work. 

Noshir Contractor: Now, the signals that you’ve talked about in my mind fall into two categories. One is looking at the content and doing a sentiment analysis if you may, or parsing the words to decide what kinds of pronouns people are using. But the other sort of signals that you refer to, are things like just the amount of activity you have, the amount of friends or the amount of responses that either you give to others or others give to what you’re doing.

The latter gets sometimes referred to as metadata, which is not looking at the content, but data about the interactions. Do you see any differences in the utility of and the efficacy of looking at the words versus the data about the data? 

Munmun De Choudhury: Yeah, so that varies by the platform. So for instance, if you’re looking at Twitter, normally, the content words carry a lot more signal.And that might be because people are relatively more candid on Twitter, compared to, let’s say, a platform like Facebook. But on Facebook, we found that some of these metadata or some of these social interaction attributes on these network aspects are more valuable, because for a lot of people, their presence on Facebook is also closely tied with their presence in the offline world.

Noshir Contractor: How do you reconcile that the signal that you get about a person from any one platform may be incomplete, inadequate? And are there ways of being able to cut across different platforms to be able to get a richer picture of someone’s well being? 

Munmun De Choudhury: So I would answer that in two ways. The first part is whether any one platform or a couple of platforms, is that sufficient for us to get a more comprehensive understanding of their mental health?

The reality is that right now, if you look at the state of the art in mental health diagnosis, or treatment, none of those signals are being factored in. So now, even signal from one or two platforms can be additional knowledge to the person themselves, to their caregivers, to their family, or clinicians, whoever might be able to take actionable steps and use that information in helping the person. It’s some data, it’s not all data, but I think it’s still valuable data.

Still, there is the question of, we have our identities that are fragmented across different platforms. And that is more and more the case. So a lot of the work that I have been doing has been to go across platforms and think of these data in terms of their multimodal natures. So I absolutely agree with you that as much as information from a couple of platforms are valuable, nevertheless, there is still value to considering the fragmented nature of our identities.

Noshir Contractor: I know that this is initial work that you’re doing, but are there any examples of insights that were different or modified? Because you were looking at multiple digital services providers?

Munmun De Choudhury: What we have definitely seen is not maybe as much of a contrast, but having data from one platform giving us context about data that we see on another platform. Some of our work recently was looking at physical activity data that is collected through smartphone use. And then we also had, for these individuals, we had their Facebook data. So lining those two up was really insightful for us. So when we saw that the person’s, let’s say, heart rate increase at a certain point in time, we can go back and see what might have been going on on their Facebook, maybe they reported a major life event, they reported something difficult that they were going through. So I think those are definitely some of the strengths of an approach that cross cuts across different sources. 

Noshir Contractor: Technically, how difficult is it today to be able to connect what someone said on Facebook with, with some of what they report, say on a fitness device?

Munmun De Choudhury: From a technical perspective, it’s quite difficult, because you need appropriate infrastructures that can collect that data. Social media data is longitudinal, it is fairly sparse, it is largely text. Fitness data, we are talking about, you know, a very high sampling rate, dense data. And again, I mean, these are largely time series data, for instance, they’re not text. 

There is the question of feasibility, right, like finding enough people who are willing to share not just their data stream on one platform, but across multiple platforms, there is a question there as well.

Noshir Contractor: In terms of ethical issues, are you able to get information about individuals if they consent to sharing that information across platforms? Or do you have to deal with some of the providers themselves?

Munmun De Choudhury: It is a question that is getting harder by the day. There is some good reason why it is getting harder, because I think discussions of privacy and ethics are finally getting the momentum that they deserve, in the field of web science, but at the same time, they’re these questions of multiple stakeholders, who have an interest in a data stream and have different policies, have different value systems around protecting or sharing the data. 

I think at the center of it are the creators of the data, they’re the people who believe they would benefit from, you know, this research. And so we have been doing a lot of work with mental health patients, where they have been voluntarily sharing that data with us. And I feel that is probably the path forward for this kind of research. 

Noshir Contractor: You mentioned that a lot of the data on the social media platforms is largely text, for example. But I’m also thinking of some of the more recent platforms that have got a lot of activity like Tiktok, like Clubhouse. TikTok is. you know, video based for the most part and then Clubhouse is audio based for the most part, have you had much success in being able to parse through video and audio as a way of detecting wellbeing?

Munmun De Choudhury: That’s I think a direction where there could be a lot of research that happens going forward, or we have done work in the image space, particularly on Instagram, which is an image heavy platform, we have done some work on Tumblr, which is also kind of multimodal with images and videos on text. The next frontier are platforms like Tiktok, like Snapchat, where a lot of the young people are going and hanging out. 

Noshir Contractor: A lot of young people now today rely on their social media presence for their own self esteem and for their mental wellbeing, and there have been some efforts recently by some of the platforms to actually not publicize the number of likes a particular post gets or the number of shares a particular post gets, so that people are not overly focused on building their self esteem on the basis of that, do you think that those approaches will work?

Munmun De Choudhury: I think there is definitely something that needs to happen there. Social comparison theory has often been put as sort of the causes of the negative impacts of these platforms on people’s mental health. I think the jury is still out whether these platforms are good or bad for people.

At least the current understanding in the scholarly community is that it just depends upon how somebody uses the platform. Whether it’s for good or for bad, I think the platforms do have a responsibility to consider that their platform(s) are being used by people, sometimes we’re for improving mental health, sometimes not for improving mental health, and how can we change the design of the platforms or the features.

Noshir Contractor: Have you looked at ways in which the research that you’re doing and the tools that you’re developing might be used not just by the person who is in need of help with mental wellbeing issues, but also tools that might be used by family members, or a clinician. Can you talk a little about how the work you’re doing, might have audiences beyond the person, him or herself? 

Munmun De Choudhury: My view of mental health is that it is not a solitary experience. It is an experience that is shaped and it impacts people around you. And therefore, if you’re thinking about solutions that are grounded in these approaches in web science, it’s important to also think about what would that technology look like for these other people in somebody’s social ecology. 

So there are two stakeholders that we have engaged with quite extensively in the past few years. The first is, like you mentioned, it’s the mental health practitioners, Wehave been working with a Northwell Health, they’re are big health system based out of New York State. And we have been, you know, recruiting and working with mental health patients, but also their clinicians, they are part of our research teams. We are kind of adopting a participatory approach there, in building both the algorithms that use patient’s data, but also what kind of technologies could be built on top of those algorithms that could help these clinicians in the treatment that they provide. And so the clinicians form a very important stakeholder in there who can benefit from these algorithms, because they can get a fuller understanding of what might be happening to the individual.

The other stakeholder kind at a very different scale, are public health stakeholders. And for the last two years, we have been working pretty closely with the Centers for Disease Control and Prevention or CDC, in helping with their public health efforts on suicide prevention. Organizations like CDC are realizing that, you know, a lot of these conversations on mental health, on suicide are happening online. That is an entire piece that is missing from their public health work. 

So the work that we have been doing together is to extract meaningful information about how people are talking about suicide, what kind of stressors are being expressed by people. And that knowledge could provide evidence to organizations such as the CDC, to figure out which communities might be in need of greater health than others, how do we allocate budget,to assign mental health resources? And how can we do that in real time fashion.

Noshir Contractor: And based on your encounters working with these different stakeholders, could you comment on what you see as their receptivity? First of all, is a patient typically enthusiastic or concerned about sharing that this kind of information, giving consent to share this information with physicians? Are physicians excited about using these kinds of tools? And are the CDC policy makers enthusiastic about it? What’s been your experience?

Munmun De Choudhury:  In our interactions, what we have seen is clinicians are curious about how this will impact their decision making, questions of these power imbalances in therapeutic relationships? How will that impact their own relationship with the patient? Right, that’s an important one. And also, there are questions of liability. I mean, when you have an algorithm that looks at social media data and makes an assessment of risk, what happens if it’s correct, what happens if it’s wrong? I mean, who takes the responsibility for that? So there are definitely the legal questions, the questions of infrastructural, institutional support. So, as a clinician, they might not feel comfortable to use such technology, if there is no support from their whole institution and allowing them to do that.

Actually, from the patients, the people with the lived experience, we have seen the least skepticism among all stakeholders. And I think the reason is, they see the value that this can bring maybe directly to their own lives, or maybe the lives of other people like them.

The question of stakeholders like CDC is a very interesting one,  I have been pleasantly surprised how open-minded they have been to technology. In my interactions with the researchers at CDC, I think there’s a great deal of interest in taking some of these algorithms that glean insights from the, from the web, and somehow making them a part of their public health efforts. 

Noshir Contractor: Well, we spend a lot of time talking about social media and websites that are there to be able to help individuals who are having these challenges. But there is also an undercurrent, a set of websites, that are actually there to facilitate people engaging in behavior, say eating disorders like anorexia and bulimia. What are your thoughts about those sites?

Munmun De Choudhury: We had so many aspirations from the web, in the 90s, about how it’s going to be liberating, and it could democratize our freedoms, in many ways that ave have been lacking until that point. and a lot of that has been true. But at the same time, it will be foolish for us to not recognize all of the things that are terrible on these platforms and about these technologies. And the example that you cited is a great one in how these platforms, while they can be used for good purposes, they can also be used for harmful reasons. 

And we see health misinformation is a huge problem. And that we see in the context of mental health, we see in the context of substance misuse, there is a lot of misinformation that goes on around. That is an aspect we desperately need to attend to, when it comes to health more broadly, and also more particularly mental and psychological wellbeing. 

Noshir Contractor: As we begin to wrap up here, Munmun, can you talk a little bit about how the general work that you’re doing on wellbeing applies also in the context of workplace experiences, and of course, including now remote work, as well as perhaps hybrid work moving into the future?

Munmun De Choudhury: I mean, this was true even before the pandemic, that  personal wellbeing and workplace wellbeing is deeply intertwined. But I think this blurring has only been intensified. What constitutes work, what constitutes not-work, those lines, we are not able to manage them very well. If we think about the future of work, there are also a lot of questions in that space. What does it mean to be able to understand workplace wellbeing now, and what is the role of technology because ?

Noshir Contractor: The way we’ve been working in the pandemic, a lot of our work even within the organization is using what is called enterprise social media, things like Microsoft Teams, Slack, Zoom and so on. And that of course means that the same kind of information that you’ve been studying in general social media platforms is now amenable as data to help detect issues within the workplace itself. Now, you have the tools and the data potentially to be able to look at interactions that are happening within the workplace.

Munmun De Choudhury: Absolutely, and workplace harassment needs more attention. I think there is tremendous opportunity to look at some of these workplace behaviors, but at the same time, how far should we go, so that it’s still justified and at one point does it become like, “Big Brother,” right? Remote work opens up these possibilities of using these technologies to both get an understanding of our struggles and difficulties, and at the same time, it can be deeply compromising to one’s privacy.

Noshir Contractor: Absolutely. Again, I want to thank you, Munmun, for taking time to talk about this really exciting area of research that you’ve been at the frontier of pushing, in terms of seeing how web science can help us with the general area of wellbeing, and mental wellbeing in particular. Your approaches and techniques are truly interdisciplinary, and the results and insights you’ve shared with us today, and the concerns that you’ve shared with us, are very reflective of the eclectic approaches you use, in terms of theories and models from a variety of social science and computer science disciplines. So thank you again for taking time to talk with us, Munmun. 

Munmun De Choudhury: Thank you so much for having me, and I enjoyed all the questions and conversations.

Episode 14 Transcript

Robert Ackland: Hyperlinks are connecting pages together, and allowing people to, as they surf the web, find new information. This, for me, has always been the thing that I’ve been most interested in, because there is a social science of why hyperlinks are created. And what does it mean for a website to create a hyperlink to another website?  It’s used in order to guide people’s attention, shape people’s attention. And so the types of actors that I studied, have been political parties, social movements, organizations, activists, and they are all making choices about how to hyperlink to, and why. And these choices have measurable impacts on shaping the attention of other people. 

Noshir Contractor: My guest today is Professor Robert Ackland from the School of Sociology at the Australian National University in Canberra. You just heard him talk about his work with hyperlinks. Rob works at the intersection of network science and web science, to study networks on the Web. Under a 2005 special initiative of the Australian Research Council, he established the Virtual Observatory for the Study of Online Networks, VOSON for short. His research has been published in journals such as Social Networks, Journal of Social Structure, Computational Economics and Social Science Computer Review.And his book, Web Social Science: Concepts, Data and Tools for Social Scientists in the Digital Age was published in 2013. 

Robert Ackland: Great to be here, Nosh.

Noshir Contractor: Welcome, Rob. Thanks again for joining us from downunder. I want to start by asking you what got you interested in web science, coming as you did from an economics background, initially interested in issues that were related to economic development? 

Robert Ackland: As you mentioned, Nosh, all my training is actually in economics. In my first academic job after my PhD, in 2001, I was working in an interdisciplinary research center, and I started working with a political scientist, Rachel Gibson, who’s now at the University of Manchester. And Rachel and colleagues were working on studying how political parties were using the web in order to undertake various political functions, such as raising awareness about issues, engaging with potential voters raising revenue. A big aspect of that work revolved around the hyperlink — the idea that if you have more hyperlinks pointing to your website, that can bring more eyeballs to your content, and therefore allow you to raise more awareness about issues that could concern you. 

I saw an opportunity there to use web crawlers to collect large scale hyperlink network data sets, and then to start studying these networks. I produced, for example, networks of political parties, looking at mainstream versus major parties, and conservative versus, liberal parties. So I started to look at the structure of hyperlinks of political parties. 

I realized that what I was doing was, in fact, social network analysis applied to the web. And a big part of my career has been looking at how can methods and approaches from social network analysis be adapted to study online networks?

Noshir Contractor: You use the word hyperlink a few times. And I know that that word just rolls off your tongue with a lot of ease. But for most people, when we think of social networks, we think of potentially, links between people. So you have a friend on a social media platform or a follower on a social media platform. But when you’re talking about hyperlinks, these are links not between people, but between websites. And then you use these to crawl the web. Can you unpack that a little bit more?

Robert Ackland: My interest in the web has always been the fact that it’s a socially generated network of resources, the resources are web pages, and also, the other media files. The piece of engineering that connects these resources together is the hyperlink. Hyperlinks do not get formed randomly. 

Hyperlinks are connecting pages together, and allowing people to, as they surf the web, find new information. This, for me has always been the thing that I’ve been most interested in, because there is a social science of why hyperlinks are created. And what does it mean for a website to create a hyperlink to another website? It’s used in order to guide people’s attention, shape people’s attention. And so the types of actors that I studied, have been political parties, social movements, organizations, activists, and they are all making choices about how to hyperlink to, and why. And these choices have measurable impacts on shaping the attention of other people. 

When I started studying the web, there was not the availability of tools and techniques to allow a broad range of social scientists, particularly those with an interest in social network analysis, to easily access and collect hyperlink data, and turn these data into what I call research ready data-sets, data sets that are amenable to social network analysis, and so I really designed the VOSON software to be a tool for social network analysis using data from using hyperlink data. 

The VOSON software was effectively a web crawler that allowed researchers to easily select a set of websites, and then find how those websites connected to one another through hyperlinks. 

Nosh, you made you made the point that today, it’s very common to think of people networking, on the web, or via social media, But in the early days, before social media, web 1.0 was an era where you had to have quite a lot of resources, in order to be able to put material on the web, for example, newspapers or academic institutions. The typical user of the web was a consumer of information. Web 2.0, which started with blogs, but then moved on to the social media era, became an era where it was possible to not only consume information for produce information. 

And so today, it’s very easy to conceptualize this idea that people go onto social media and connect with one another. In web 1.0. era, it was less easy to conceptualize this. But I really saw the hyperlink is the tool that allows organizations and groups to connect to one another. And I was interested in using social network analysis to study that phenomenon.

Noshir Contractor: So one of the things that you were pointing out is that websites are very strategic about which other websites they point to, because that’s how they represent themselves to the public, and are also very interested in which websites are pointing to them, and to the extent that we know in society that you’re judged by the friends, you keep, what you’re saying, Rob, is that a website is judged by the hyperlinks it keeps.

Robert Ackland: It’s always of great interest to know, well, who is hyperlinking, to whom? It’s a measure of popularity. In an information context, it’s a measure of authority, is your website an authoritative source on a particular topic. 

It’s very important to know who is linking to you, and also, the perception of your organization is very much influenced by who you direct your hyperlinks to, and so on. It’s one of the aspects of web science that I find very interesting and compelling.

Noshir Contractor: One of the things of course, that can happen with hyperlinks, as it does today with friend links, or follow links, is that you can create them and at some point, you could dissolve them or you can unfollow someone or unfriend someone or remove a link that you have with someone. So as you look at it from a historic point of view, is it possible to be able to go back in time and look at the archive to see when someone might have created a link from one website to another and when it might have dissolved. And what that might tell us about society?

Robert Ackland: It’s a really important aspect of research in the sense that the web is constantly changing. This is one of the reasons why governments are very concerned about preserving the web, because it’s a digital record of a country or of a society. 

From the perspective of a web scientist, I think there’s really two aspects of hyperlinks that are, in some ways, the holy grail for research. Number one: I find that when I present my hyperlink research to people, one of the first things they say is, you know, how has it changed over time? Another aspect that is very important, is knowing what amount of attention is traveling through hyperlink, it’s difficult to know exactly how many people were following that hyperlink. 

Noshir Contractor: So one of the interesting and important contributions, Rob, that you have made to the study of web science, is the development of this virtual observatory for the study of online networks. When you began that effort, it was focused largely on mapping hyperlinks between websites, and since then you have evolved the entire project to also look for mapping links that happen between organizations or people or organizations that have Twitter accounts . Tell us a little bit about why you got interested in creating what I think has become a remarkable public good for anyone interested in studying web science.

Robert Ackland: I was always interested in developing tools that could be used by non-programmers. Web science brings people from a whole lot ot disciplines. And the whole point of web science in my mind is studying how the web is contributing to society from a lot of different dimensions. It’s not just about the engineering, but it’s about the social, political and economic impacts of the web. As the web evolved to the social media era, I wanted to make sure the VOSON software evolved. 

We started then, collecting data relating to Facebook. So an early version of the VOSON software enabled research of Facebook, and of course,  API’s and privacy changes on behalf of the social media companies in terms of access to data means that a tool like VOSON has  to constantly be evolving as well, so VOSON’s designed to allow researchers to collect data from major social media platforms using application programming interfaces. So a lot of current tool does enable collection of Twitter network data. This is, I believe, really important for the study of political deliberation, how that is occurring on social media. And so to the extent that the social media companies continue to provide open access through to their data through API’s, then I’m very keen for the VOSON software to be a part of the web science toolkit.

Noshir Contractor: Along with your evolution of work, from hyperlinks to looking at other social media platforms, you’ve also evolved in your conceptualization of bots, where initially you could think of a bot as being a web crawler. We now know a lot about spam bots and chat bots and bots that can conduct automated high frequency trading and global financial markets. you also talk a lot about what you refer to as social bots. First of all, how do you define a social bot? And what differentiates a social bot from some of these other bots that we’ve just talked about?

Robert Ackland: So, my interest in social bots came about around the 2016 US presidential election  I think the 2016 US presidential election and also the Brexit referendum in that year, really raised awareness about the potential for social media to be a vector for influence. And the influence might be coming from foreign influence operations. So, troll accounts, for example, set up by foreign governments in order to try and influence political conversations, but another area of concern related to so called social bots. The idea of intelligent agents or bots, is not new, but the 2016 US presidential election, in around that time, there was concerns about how bots were being used, in order to shape conversations on social media. I became very interested in how to understand how bots might be having an impact on political conversations on Twitter. 

And so this really gets back to a very sort of a long standing and interesting question in social science research, and which is how we measure influence. Is it the case that they are influential? Because they’re very active? And they’re tweeting a lot? Or is it the case that they’re influential because the tweets somehow help to propagate particular information or raise prominence on particular themes or frames. 

I think the presence and impact of bots is a core issue and potential concern for web science.

Noshir Contractor: While there is a lot of research that highlights the dangers of the risks associated with social bots, can you talk a little bit about why and how you believe that social bots can actually help promote deliberative democracy in social media?

Robert Ackland:  If we think about bots, in other areas of society and the economy, they’re generally designed to be useful, in the sense that they provide information that helps people make decisions in a financial market setting, for example. 

The first work that I got involved in the area of social bots was with Tim Graham. We were interested in the potential for social bots to play a positive role in political deliberation online. If you think about what political deliberation involves, it’s this idea that people are engaging with one another, often with people who do not share the same views. And they are able to develop a common set of terms and understandings about a potentially divisive social issue, and potentially changed minds, or at least come from common understanding about what the problems are. 

We were interested in the idea that it might be possible for social bots to be designed to have a positive impact on political deliberation, for example, by connecting groups of people who otherwise are not connected in online conversations. One of the bots that could be designed in such a setting was what we call the bridger bot, and the idea was that such a bot might have to try to meet communities in social media, who otherwise are not connected to one another, to help promote cross community dialogue. 

Another thought that we had with regards to the potential positive role of social bots was the idea that certain clusters of social media users could benefit potentially from being exposed to ideas that are different to the ones that they currently have. And so, the idea was that it was a bot that could somehow start to operate or start to be present in their conversation, participate by raising ideas that were somehow counter to what the current thinking was. However, I would like to say this, this is where I think web science is really important — because it’s one thing for social scientists to conceptualize a popper bot or bridger bot, but this is an engineering and design issue. And so this is where websites play a role in terms of connecting engineers, computer scientists and social scientists in projects that are trying to study for example, the potential positive role of social bots.

Noshir Contractor: Speaking of positive roles of bots, you and Tim, inspired by Isaac Asimov’s three laws of robotics, postulate three principles of social bots.

Robert Ackland: So the paper on social bots that I co-authored with Tim Graham was partly inspired by our common interest in studying the web from social network analysis perspective. But there are actually two literary inspirations for this work. The first inspiration is evident in the title of the paper, which is “Do Social Bots Dream of Popping the Filter Bubble.” So this was a reference to Philip K. Dix’s seminal novel, Do Androids Dream of Electric Sheep. And this is a novel that inspired the Blade Runner film. So we were interested in this idea of social bots as autonomous agents, with a purpose. And we were interested in the idea that the purpose of social bots could be a positive one, in the sense of making a positive contribution to deliberative democracy online. 

However, this is an engineering problem. We’re social scientists. But we realized that the design of a social bot is  an engineering task. And so another literary inspiration for this work was Isaac Asimov, who famously proposed the three principles of robotics. And so we drew on those principles. 

And I want to emphasize that this is not an engineering paper that we’re proposing. In some ways it’s a thought piece. But the first principle of robotics, or our adaptation of Asimov’s principles, was that social bot must do no harm to a human being. And so how might we think of a social work creating harm to a human being? Well, by being annoying, for example, by butting into conversations where they’re not required, by creating noise in a social media conversation. 

The second is that social bots must protect their own existence, except where in doing that, it that would conflict with the first one. The idea there is that a social bot has to be designed well, in the sense that it’s, it’s not annoying, it doesn’t get outed very straight away as being a bot rather than a human. Because then that can lead to people on social media platform banning it. 

And then, the third principle that was adapted again, from Asimov’s three laws of robotics, was that social bots, social bots must make a significant improvement to deliberative democracy. 

Noshir Contractor: That’s brilliant. I love it. Another major contribution that you’ve made to web science is the book that you published in 2013, titled, “Web, Social Science Concepts, Data and Tools for Social Scientists in the Digital Age”. Tell us a little about your thinking when you decided that you would write this book, and tell us what you’re hoping to achieve by people who would read this book.

Robert Ackland: I’ve been involved in teaching at the IU for the last 10 years now, my teaching has been in the area of social science of the internet, online research methods. Essentially, my goal in my teaching has been to equip social science students with the conceptual concepts, and the also the tools and the methodological training, to allow them to do web science, in the sense that they can work with data being generated from the web, to understand the social, political, and economic impacts of the web. 

My book, really had two goals. Firstly, it was to introduce students and researchers to the web as a source of new data for studied social, political and economic behavior, the heavy emphasis on social network analysis, but also other methodological approaches.The second aspect of the book was to provide an understanding of how social scientists can contribute to the future development and pathway of the web, in order to allow the web to reach its full potential or to continue to have its full potential in terms of making a positive contribution to society. 

Noshir Contractor: I’d highly recommend that book to anyone who’s interested in helping us understand how we live online, and what are the consequences of that. It has been a true delight to get a chance to catch up with you, and to hear all about the ways in which you’ve been thinking about the past of web science, the present of web science, and also the future of web science. And I’m very encouraged and inspired by everything you’ve done to contribute to the web science community in terms of your own research, in terms of the platforms like VOSON, that you have helped develop, and the book that you helped write, to help shape the next generation of students. So thank you again, Rob, for joining us today.

Robert Ackland: Thank you. It’s been my pleasure to participate in Web Science and participate in this podcast, thanks Nosh.

Episode 13 Transcript

Jaime Teevan: When you start thinking about returning to the workplace, you can look at what we lose when we move remote? And what do we gain? Let’s do the stuff that’s better remote, remote and do the stuff that’s worse remote back in person. That suggests, large group meetings we can probably keep remote. First, there’s some pretty cool things about being able to share the slides, have in-meeting parallel chat, or see people’s names and know who everybody is, like those are actually benefits. On the other hand, meeting new people is something that you should really do face to face.

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

Our guest today is Jaime Teevan — you just listened to her talk about what work is better done in-person versus remotely as we prepare for the Next Normal. Dr. Teevan is chief scientist for Microsoft’s Experiences and Devices, where she’s charged with creating the future of productivity. Previously, she was the technical adviser to Microsoft CEO Satya Nadella and a principal researcher at Microsoft Research, where she led its productivity team. She developed the first personalized search algorithm used by Bing and introduced microproductivity into the office. Jaime was recognized as one of the MIT Technology Review Innovators under 35, and has received many awards, including the Anita Borg Early Career Award, the Karen Spärck Jones award, and the Special Interest Group on Information Retrieval (SIGIR for short) Test of Time award. Welcome, Jaime.

Jaime Teevan: Yeah, it’s a pleasure to be here.

Noshir Contractor: I want to start right now with the new report that you helped put together at Microsoft titled “The new future of work,” research from Microsoft into the pandemic’s impact on work practices. This was an excellent compilation of insights that came out of what I understand was a ongoing cross company initiative to coordinate efforts towards understanding the impact of remote work.

Jaime Teevan: We’re in the middle of a pretty significant transition that, if we don’t come out of it better, then we’re going to come out of it worse.  It’s really an opportunity ahead of us to create a new and better future of work. Prior to the pandemic, we were already in the midst of a pretty significant change in how people get things done, with a move to the cloud, and the proliferation of Edge devices and real advances in artificial intelligence. But COVID took this primordial goo that was ripe for innovation, and it provided a spark. I mean, most of us moved from working from the office to working from home, literally, in a day. I’m pretty sure I have a plant back at my office that is dead. I haven’t seen it since last March. 

And as a productivity company, Microsoft is really interested in understanding work practices and how people get things done. And we have a lot of sensors in place with which we can see work. So we obviously have large scale telemetry data of how people are using our products. We have really rich customer panels that we set up with all sorts of different customers to get more direct feedback. We use survey instruments. We’re also a large company, just all on our own,   you know, hundreds of 1000s of employees who are working and had to shift from in-person work to remote work. So last March, when COVID hit all of these sensors shifted to focus on remote work. 

Researchers from across the company came together in what we believe is the largest research effort to happen to understand changing work practices. And the cool thing about that is sort of all of the different non-traditional ways that people are coming together, too, we have all of these converging methods quantitative and qualitative, all sorts of different approaches, we even have EEG studies of people’s brains, and we look at a number of different populations. 

Noshir Contractor: In your report, you talk about a lot of different areas, and I want to just pick on a few of them. The first one was the impact of this sudden switch to collaboration and meetings. can you talk a little bit about what these findings are in terms of how we change the nature of collaboration? To what extent did it broaden our networks at work or deepen our networks at work?

Jaime Teevan: Now, it’s a great question, because a lot of the insights from web science actually apply here, where we have seen really interesting evolutions of people’s networks as a result of the shift of remote work. When we moved remote, we had a lot of social capital that we had built up from interacting with people face to face, and we’ve spent the past year spending down that social capital. So when you look at the networks, and the way that people work together, what you see is actually our strong ties are the people that we are close to and work with, well, they have stayed relatively strong, and we continue to meet one on one with our managers, our close collaborators. But our weak ties, or the people we don’t know as well, those are atrophying. So like collaboration trends in Microsoft Teams, and Outlook, show that communications with those outside of our immediate teams have diminished with their move to remote. You can see large group chats have decreased nominally by 5%, whereas the one on one chat, those have increased by 87%. So we’re doing increased communication, and interaction with people we know well, we’re doing decreased communication with the people we don’t know well, and that’s gonna have a lot of consequences for like how work gets done moving forward.

Noshir Contractor: Absolutely. So we see that the technology is being used to deepen our networks, rather than to broaden the networks. And as you pointed out, the number of weak ties falling has consequences. Because again, we know from prior research that weak ties are very important in terms of engendering and fostering innovation and new ideas. Which is surprising though, Jaime, because in some ways, you could say that the technology now enables us to reach out more easily to people, when we are unfettered by geographic boundaries, etc. But even though technologically, it’s possible, what your research finds is that that’s not what people are doing.

Jaime Teevan: You probably remember at the start of the pandemic, like virtual happy hours were a thing. I feel like we’ve all gotten too tired. But like, at the start of the pandemic, it was amazing, I was like Oooh, I’m hanging out with my uncles and my friends on the East Coast and and all these people, I was like, Ooh, I’m doing regular time with them. And it was amazing. And I don’t do it anymore. I think it just gets — it’s just work to sort of maintain that broad network.

Noshir Contractor: One of the things that you also talk about in the report is the impact not just on collaboration and meetings, but also on personal productivity and well-being, including the ways in which this has meant working from home or living at work, take your pick, means that you’re breaking down boundaries, both to space and of time. Can you talk a little bit about what you found in that context?

Jaime Teevan: There’s a lot packed up in that question. We were using space as a technology to get work done right, space was delineating the start of the work day, and the end of the work day, it was creating natural boundaries between home and work. It was creating opportunities for serendipitous conversations and spontaneous interaction. it was actually a useful limiting function for meetings, because you could only have as many people who could fit into a meeting room, and now everybody can join any meeting they ever want to. So there were all sorts of values in how we were using space as a technology. And that went away. 

It stopped providing useful temporal boundaries for us. So we saw that people were sending a lot more messages in the weekends or after hours,  I think the number of IM’s that people send between 6pm and midnight went up 52%. And people who didn’t normally work on weekends saw their weekend collaboration, triple. So the kind of time boundaries that we were used to went away. 

It’s nice, because like, now, I can be like, Oh, it’s lunchtime, I’m going to take a walk and hang out with my kids. Or I can say, I’m a morning person. So I like to wake up early, and I like to go to bed at eight. And that works. But it does make the coordination of work practices, very challenging, because your personal decisions are never decisions that just impact you. They impact other people.

People are working from different states or different countries are really rethinking about where they live. Our mutual colleague, Brent Hecht had a really interesting point, where a lot of the movement that used to happen was along latitudinal lines, because actually, it’s similar latitudes, you have similar environmental factors, like the same crops that grow at a certain latitude will do so elsewhere. And now what you’re actually starting to see is movement around longitudinal lines instead, because that’s when you’re on the same time zone. And we haven’t really figured out how to solve the timezone issue. 

Noshir Contractor: You touched on the idea of giving you autonomy to be able to go for a walk in the middle of the day, etc. And that brings up of course, issues of those of us who have the economic resources to have a life that allows us to do that, as compared to those who might not have the opportunity.

Jaime Teevan: There’s several things embedded in that as well. So the report that we’re discussing right now focuses primarily on information workers, which represents a sizable portion of the world population in the country, but obviously not everybody. So after the pandemic, you saw about a third of people stay working in the workplace as essential workers. You saw about a third of folks furloughed because they, their physical presence was required to work and they couldn’t and they weren’t deemed necessary to go into work. And then you saw about a third of people move from in person work to remote work and that primarily is the information worker population that we’re looking at, and these populations are quite different demographically as well. 

A lot of the burden of having to either return to the workplace or being furloughed, falls on people of color, falls on women, falls disproportionately on different people. Even when you look within the third, that transition from face to face work to remote work, you see pretty significant differential impact there as well. Business Leaders tend to be weathering the storm more successfully than others. Caregivers, and in particularly mothers, have had real challenges, particularly with children being at a school and having to pick up child care.

Noshir Contractor: Your report does talk about some of these societal effects that are both negative and positive. Some argue it’s the K effect, some people are doing better in the situation, others are doing worse,  The overrepresentation of BIPOC workers and firstline and others on site. Your report also notes that the layoffs resulting from the work from home is disproportionately affected women, African American and Hispanics.

Jaime Teevan: The K effect is a good description of it. And my background is personalization, I think about how people, there’s individual variation across people. So there’s a piece of me that, that almost rolled my eyes as I’m like, oh, there’s a lot of variation within the impact of COVID. Like, yeah, tell me something new. But when you dig into the data, ‘s just an order of magnitude different than what we’re seeing anywhere else. It’s hitting in interesting ways, even in like, the work setups. 

So even when both parents are able to work, women are much more likely to have their workspace be set up in a shared space, more likely to be interrupted. There’s variation by job role, as well. You’re seeing a particular challenge around well-being with managers and a particular bump in numbers of hours worked among them, new employees, anybody who’s changed roles, is having challenges as well. So you’re seeing a bunch of challenges showing up along a number of dimensions.

Noshir Contractor: Being a company that is involved in software and software engineering, your report also points out that software engineering got slightly more productive, actually, but also came with accompanying burnout.

Jaime Teevan:  One of the things that I get asked about this report a lot is “Ooh, what surprised you about the findings. And we forget how surprising it is that people were able to work remotely at all. It is true, we’re seeing developers are productive, we’re seeing information workers in general are being surprisingly productive by the standard metrics But it’s coming at a huge cost, I think we can all feel that. There’s real hits to our well being and working in shared spaces, working longer hours.

People are being productive, but it has really driven a significant shift in the way that business leaders are thinking about work, to recognize that work isn’t just about, the stuff we’re doing. But it is about the person we bring to the task. It’s about the networks we have, you’d mentioned well being, it’s about our ability to respond well and be thoughtful and make the connections we need and not sort of be living in our panicked mind. When you talk about taking a crisis and trying to make it positive, I think that’s one of the potential positive outcomes of this, is that holistic view we’re increasingly taking towards work.

Noshir Contractor: A s we begin to see our way out of this pandemic, hopefully in the near future, people are talking about moving from the new normal, to the next normal, and no one expects that next normal to be remotely similar to the old normal. One of the things that I found quite interesting,, about the report was your efforts at trying to see what of these  learnings and insights are going to stay post-COVID? People talk about the hybrid model. What are your thoughts about what that hybrid model might look like moving to the stage of post COVID?

Jaime Teevan: It is hard to imagine. And as researchers, we like to make good, thoughtful data driven decisions, and we don’t like to get ahead of our skis. And yet, everybody right now is having to make important big long term decisions based on very little short term data. And that’s hard.  

We do have some places we can look to make this easier. Microsoft has offices in China, and China has actually opened up and we can start looking at what hybrid work looks like there and the kinds of decisions that people are making. The truth is even though we don’t really know the answers, we have a pretty good sense. And being able to make a decision from some data is better than being able to make it with none. 

So we do have some recommendations that we’re making. When you start thinking about returning to the workplace, you can look at what we lose when we move remote? And what do we gain. Let’s do the stuff that’s better remote, remote and do the stuff that’s worse remote back in person. That suggests, large group meetings, we can probably keep remote first. There’s some pretty cool things about being able to share the slides, have in-meeting parallel chat, or see people’s names and know who everybody is, like those are actually benefits. On the other hand, meeting new people is something that you should really do face to face.

Noshir Contractor: One of the studies that you have recently been involved in has been doing a large scale analysis trying to isolate the effects of working from home on collaboration activities, but removing or controlling for all the other effects of COVID-19. Can you tell us about how that led you to some results that might be counterintuitive?

Jaime Teevan: What we’ve been doing is essentially, yes, trying to partition out what, what is going on right now, because we’re in the middle of a global health crisis and what’s going on because we’re working remotely, because it’s not, the same. (Laughs). One of the ways that we do that is by looking at prior to that pandemic, people who were working remotely, and so then you can see for those people how their behavior changed, before and after March, as compared to other people who moved from working in-person. It’s hard to control for absolutely everything, but it actually shows the folks who were working remotely beforehand, they didn’t have such a significant increase in meetings, as the rest of us, A lot of different sources of evidence that we have looked at, suggests there is an expertise that comes with remote work, and as we figure that out, then it becomes easier.  We’ve all had this crash course in remote work now. So that as we return to the workplace, we’re going to be able to carry that over and still use it a little bit.

Noshir Contractor: I thought it was interesting that the results of this study point to the fact that working from home, after taking into account the partitioning of the COVID issues, actually resulted in less time on collaboration and more focus time.

Jaime Teevan:  In general, one of the benefits of working from home is focus.  People report some additional distraction from from children and kids and pets and leaf blowers. I feel like there’s a leaf blower in every meeting. (Laughs) But working from home is quite good for focused work.

Noshir Contractor: Speaking of focus, one of the things that you have also been looking at in a study is the role of multitasking. Until now, Jaime, when people said multitasking, it seemed to be a four letter word. But your research finds that multitasking in meetings has both positive and negative effects.

Jaime Teevan: First of all, it’s kind of cool that we can actually measure multitasking better than we could before. Every conversation you have is digitally mediated, everything you do is and that provides us so much more information and so much more insight. And so you can look at exactly how often does somebody email during a meeting? And you can say, Oh, I actually know the answer now. 30% of people email during a meeting, and then you can start looking at which meetings do people email in and which ones don’t they? It becomes interesting to start thinking about how you can sort of peripherally pay attention to a meeting, to jump in when it’s relevant. And think of all these long meetings that you go to, that there’s a lot of stuff you don’t care about, what if you were able to focus on it, right when it was relevant. 

The other thing we’re seeing a lot of is and I wouldn’t call it multitasking, sometimes we’re calling it deliberative tasking, it’s actually doing multiple things on a single task. So you can see during a meeting, there’s a parallel chat going on. And often the parallel chat’s quite rich and has a lot happening in it. And there’s a conversation and maybe you’ve got the deck open, and you’re going through the deck, as well. And doing all of those different things on the same task makes for a really rich, deep interaction. It’s exhausting. It’s part of what makes meetings even more exhausting, but it’s a really intense and deep way to engage on a task.

Noshir Contractor: And and you’re absolutely right, that the reason we are talking about so many of the insights that we are able to glean is because we are working on the web. And that’s why this is such an important area of work, in terms of web science, and being able to leverage these various forms of data telemetry, as you call it, and one of the things that Microsoft has invested in over the last few years, is what used to be called workplace analytics and has recently been renamed as Viva insights. And the idea here, if I understand that correctly, is that if we are able to get all this insight about the way we work, and how we work, what if we could provide that information back to the workers and back to the organization’s?

Jaime Teevan: So In recent years, we’ve really seen the value of data and behavioral data in particular, and I absolutely credit the web for that, as well. The real insight with Viva insights is actually that that same data, when you start aggregating and looking at data over time actually allows you to understand, introspect and respond to things. And particularly during a disruption like COVID, the ability to understand what’s happening and make good decisions to be resilient to disruptions is, is a real, is really important.

Noshir Contractor: And at the same time, there are many who are also extremely concerned about the privacy implications of these data, who gets to see these data, what if these data are in some ways corrupted, and somebody is making decisions about you, including your job based on some kind of flawed data? So how are you and your colleagues thinking about the quality assurance issues associated with these data? 

Jaime Teevan: Those are all such important questions. And I love that you include bad science in there too, because it really highlights the importance of our job as scientists. There are all sorts of challenges with aggregating understanding and making decisions based on data. And those challenges show up at the individual level with things like, you know, workplace surveillance and privacy concerns, they show up at the organizational level, when you start thinking about security issues, or data leakage from models that you might learn, and they and they show up at the societal level. 

And you can see that in the conversations that are happening recently around responsible AI and ethics and competing, and just our ability to make reasonable inferences. We’re also seeing an increase in interest from countries and sort of thinking about their national interests and the data that they have in their, in their, within their boundaries. And so all of those are super important. And a place where research really comes in to help, not just to help us figure out how to do good science, make good inferences. We’re investing a lot in thinking about research related to responsible AI, research related to privacy-preserving machine learning, homomorphic encryption, differential privacy, I think there’s a lot that we need to do here. One of the things that I do like that is important not to forget, though, is this allows us to make explicit what is happening, and there’s a value to that, then you can introspect them and understand them and make decisions about what we think is correct and what we think isn’t correct. Like, the bare minimum is like we should try to not build biases into our system, the opportunity is we can understand the biases that are there and start correcting them and building systems that actually behave in a way that we that we would want things to happen in the world.

Noshir Contractor: When one one of the things that I think is important is that organizations like Viva insights,they make the possible visible, and then invite the debates that have to be had as a society about the issues of privacy, and the positive and negative impacts of it. And so I see this as being the first step and say, this is possible. Now let’s talk about how and why when and when we should be using these data and insights.

Jaime Teevan: I talked about how the web in many ways has created the current AI revolution with these feedback engagement loops, right? Where, you know, you engage with the system, data gets collected, that gets fed back in the system, the system makes better. We’ve seen that there’s problems that come out of them, as well. 

But there’s an opportunity there to think about, like, how do we drive these feedback loops towards our goals and towards things that matter? So thinking of Viva insights, and this opportunity organizations have to start thinking about their organizational goals. You can start thinking about the recommendations that happen in the context of an enterprise. So we talked about how weak ties are atrophy, maybe we want people to have stronger weak ties, maybe we want Team A and Team B to be closer. And so instead of developing feedback loops that make recommendations within that context towards engagement, we can say, oh, let’s make recommendations that help Team A and Team B, be closer.

Noshir Contractor: And so in closing your as you think about the scholarship that you have done in the area of web science, and that you think needs to be done, can you give us from your point of view, what web science might have accomplished so far, and what really it needs to focus on moving forward?

Jaime Teevan: The big thing that it has accomplished is allowed us to see behavior at scale, and make decisions related to that. And then it has allowed us to see the influence of the technology we build on society at scale, as well, and start being able to quantify and understand that, and be thoughtful. And then that obviously creates a need for us to, to do that in a responsible way in a way that is thoughtful. Another thing that I’ve found interesting about the web is how dynamic it is. The web is constantly changing and there’s such an opportunity to learn from those dynamics and grow from them. I even think of just like, how much better you can understand a web page or piece of content if you don’t just see it right now, but you see its entire history. And I think we’re increasingly able to capture and understand the entire evolution of content in a way that’s really interesting. And then of course, raises all sorts of challenging problems that are associated with that.

Noshir Contractor: Well, what I will say is that we as web science community are grateful that organizations such as Microsoft Research and Microsoft more generally, is engaging with these issues in a way that you are uniquely qualified to do it because you have access to these data, and also making those insights, available to the larger scientific community.

Jaime Teevan: As a company we believe strongly that our success is other people’s success. We are a company that is designed to help other people accomplish their goals, other people get things done. And that requires a strong community, a strong academic research community and a strong business community. It’s fundamental to our mission in the world to support that.

Noshir Contractor: Jaime, thank you so much, both for your leadership in this area, your own scholarship, and also your ability of helping to steer this incredible report that I recommend strongly to anyone who’s interested in learning about how the new future work is going to be shaped. So thank you again for joining us today.

Jaime Teevan: Thank you, my pleasure.

Episode 12 Transcript

Danny Weitzner: What we realized with the web is that for better or for worse, we in fact, have entered what might look like a panoptic world, a world in which maybe not everything that we do, but so much of what we do is is recorded. The idea of somehow putting that genie back in the bottle, or somehow wrapping up all that data in a confidentiality framework, became obviously impractical, if not impossible, so what it pushed us to think about was a different approach to privacy, which I would actually argue is grounded in some early areas of law. But it’s an approach to privacy that emphasizes accountability. 

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the web science trust. I am Noshir Contractor and I will be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society and how society in turn is shaping the web. 

Our guest today is Danny Weitzner, who you just heard talking about privacy on the Web. Danny is a 3Com Founders Principal Research Scientist at MIT Computer Science and Artificial Intelligence Laboratory. That’s CSAIL for short. He’s also the founding director of the MIT Internet Policy Research Initiative. His research interests include accountable systems, privacy, cybersecurity and online freedom of expression. Danny was the U.S. Deputy Chief Technology Officer for Internet Policy in the White House under former President Obama and also led the World Wide Web Consortium’s public policy activities. He is a recipient of the International Association of Privacy Professionals Leadership Award (in 2013), the Electronic Frontier Foundation Pioneer Award (in 2016), and was named a Fellow of the National Academy of Public Administration (in 2019). Danny is a proud founding member of the Web Science Trust and will be a keynote speaker at the upcoming 2021 ACM Web Science virtual conference. Welcome, Danny.

Danny Weitzner: Thank you, Nosh It’s great to be with you.

Noshir Contractor: Thank you so much for taking time I want to start, of course by remembering and going back in history. You were one of those who were there at the start of the entire web science movement. And as I just mentioned, you were a founding member of the Web Science Trust. And so take us back to how this all began. 

Danny Weitzner: Thank you. And this really takes us back to 2004 and 2005, when Professor Wendy Hall, Professor Nigel Shabbat, and Professor Tim Berners Lee and I, shortly thereafter, joined by Professor Jim Hendler, got together, and frankly realized that the web didn’t quite have a place in computer science academia, and even more so didn’t quite have a place in the larger social science and humanities research communities. So the web by this time, of course, had an enormous impact on society. Much more was yet to come. But we knew because of really both Tim’s invention, and because so many people gathered around the web so quickly, that this was a world changing technology. But what we realized was that, oddly enough, computer science didn’t think the web was all that interesting at the time, because it was actually quite simple technology, elegantly designed, of course, but not not pressing the state of the art of any established field in computer science. 

And at the same time, we knew that the way that the web was designed, and the way it was being adopted in societies all around the world, were creating enormous questions of privacy and cybersecurity and equity of access and the nature of democracy and the future of work and on and on and on. Coming from a computer science and a law background, we didn’t feel we had the tools to really wrestle with those questions. So the founding of web science was in some sense, a simultaneous play to both the social science community to help us understand the impact of this extraordinary invention, and the computer science community to focus its attention on how we should be designing the web and related technologies going forward in order to meet society’s most important goals.

Noshir Contractor: Let’s start by talking about what your focus has been largely in the area of combining web science, and the computer science aspects of it with, with the law, and in particular, with privacy issues when we think about technology and privacy, one of the things that has gone into the public sensibility is Michel Foucault’s notion of the panopticon where you now have the ability to watch everyone everywhere, all the time, But I also wanted to note that you have spent a lot of time on another concept that Michel Foucault brought up, that doesn’t get quite as much attention. And that’s the concept of countervailance. So tell us what are the differences between panopticon and countervailance and how both of these are important for web science?

Danny Weitzner: Sure. You know, I think that what the web crystallized, for us, is a recognition that to a first approximation, almost every action of significance is going to end up being recorded digitally somewhere available for access, available for analysis, and available for reuse. What we realized with the web is that for better or for worse, we in fact, have entered what might look like a panoptic world, a world in which maybe not everything that we do, but so much of what we do is recorded. And it’s often recorded for good reasons, because we get some value out of it. We write messages in email, because we want to communicate, we record our fitness data because we want to monitor our health. The idea of somehow putting that genie back in the bottle, or somehow wrapping up all that data in a confidentiality framework, became obviously impractical, if not impossible. And so what it pushed us to think about was a different approach to privacy, which I would actually argue is grounded in some early areas of law. But it’s an approach to privacy that emphasizes accountability. 

This is where we come to focus on the idea of countervailance or surveillance. Because we can’t lock up all of our personal data perfectly to prevent misuse, what we instead have to do is actually surveil how that data is used, actually monitor how that data is used and by whom, and make sure that the rules that we want to live by are still respected. Our challenge now is to understand how to enforce those rules. The other side of web science has made, I think, a very important contribution to our work here, as well. That’s really the social science methods that we’ve been able to deploy to understand the impact of different kinds of privacy environments on people’s behavior. One of the critical privacy harms that we have to insure against in this increasingly transparent world is the risk of creating chilling effects. The very worst thing we could do on the web in our society, is chill people’s behavior and make them feel that if they interact too much in public, if they shared their thoughts too much, that somehow there’ll be negative repercussions for that. Tthe ultimate irony of the web, which is all about opening up information,would be if the result was that everyone went off and hid in their corners. We’ve taken this approach of trying to protect privacy through greater accountability, number one, and number two, to try to really understand the nuances of how people’s behavior is affected one way or another, by privacy. 

Noshir Contractor: So on the one hand, where panopticon was saying pay attention to the prisoners,  countervailance is saying pay attention to the prison guards.

Danny Weitzner: That is exactly right. We know that, particularly in the world today, where so much of our data is held by really a relatively small number of very powerful private organizations. I’d say still, we have not yet cracked this code of providing adequate accountability, either from a technical perspective or from a legal perspective. You know, it’s one of the things that I think we didn’t anticipate even in the mid 2000s, and certainly not, I think, when the web was initially designed. The web was meant as a great decentralizing force. And, of course, what we now have, at least in some arenas, is an extraordinarily centralized set of forces. That I think remains one of the great challenges really for web science.

Noshir Contractor: I’m reminded that, at least at one point, you were the head of a group at MIT called a decentralized information group, that was DIG for short.

Danny Weitzner: That’s right. I’ll say with a bit of humility, that we probably placed too much emphasis on the power of technical architecture to control social outcomes. Many in the internet community and the personal computer and computer community generally thought that if if individuals had information power in their hands, that we would end up in an almost a kind of decentralized nirvana where where power was radically distributed, and large institutions wouldn’t be the kind of threat that we see them as often. And I think we’ve just candidly got that wrong. And in a way it took, the attitude of web science to recognize why that was the wrong perspective, why that was a an overestimation of the power of technology and underestimation of all the social forces that determined the context in which that technology was was really used.

Noshir Contractor: You mentioned, Danny, that countervailance is something that should be embraced by these few private organizations that are controlling so much of our data. Can you talk a little about what active transparency would look like? 

Danny Weitzner: So really, accountability, through  countervailance requires a very thorough record, a log, an audit trail, you might say, of all the uses of personal information. And it requires the ability for independent third parties, which might include individuals, but probably needs to be more robust organizations, to actually assess how that information is being used. 

As an example, we did a piece of work a piece of research with some colleagues of mine who work in cryptography, and theory of computation. And the case study that we took was the example actually of electronic surveillance conducted by governments for good reasons. When governments do a wiretap or or surveil criminals’ email or something, we want that activity to be secret, at least from the criminal, otherwise, the surveillance doesn’t work very well, and law enforcement aims are thwarted. But at the same time, we want to make sure that the police who were conducting those surveillance activities are actually following the rules, they’re only getting the information they’re entitled to get, are only using it for purposes related to the actual criminal investigation. And so we developed a cryptographic technique using zero knowledge proofs, and certain kinds of public ledger technologies in order to keep the computational processes secret, but at the same time record enough of them in a publicly visible way that you could prove to the public, that the police were actually conducting this activity using this very intrusive power in a way that was that was accountable. The same kind of thing you would want to see in lots of private sector contexts.

Noshir Contractor: Well, this is going to be a good challenge because it’s almost like we see an adversarial arms race between  the private companies that own a lot of this data, and privacy activists who are trying to challenge them. 

Danny Weitzner: That’s right. It’s again, where the two sides of web science, the ingenuity and system design, and insight from social and behavioral science really need to come together. Because we have to understand the larger picture of accountability that we’re trying to produce the kind of human behavior, the kind of institutional behavior, we’re trying to incentivize, and then figure out how to build the right systems to help that happen.

Noshir Contractor: Absolutely. That is very exciting for a lot of us in social and behavioral sciences to be thinking about those issues. I also want to take us back, as you have in your writings, back to Ben Franklin, who was the first postmaster of the US and the newspaper publisher, and had a key role in making our communication infrastructure, a provision for that infrastructure written into the constitution. Can you take us a little bit from there, and how it then led to all of the recent regulation and policies associated with what we know about the web today?

Danny Weitzner: Yeah, Ben Franklin, as the revolution as the American Revolution was happening, recognized that, we had to figure out how to knit together these 13 colonies, which had quite a bit of diversity to them, and quite a bit of distance separating them. It was for that reason that he persuaded his colleagues, the other founders, that creating a system of public post roads was an absolute priority. And this was in marked contrast to the European postal systems at the time, which were his reference point. These were mostly available to the royalty and privileged members of society. And Ben Franklin said that, oh, this has got to be a public service available to everyone, and most importantly, without discrimination as to content. So he really was, in some ways, the first American network architect. And secondly, he envisioned how this network of post roads could actually create a vibrant free press. Because of the intricacies of the way the postal charges worked, he realized that he had a way to give postmasters a particular advantage in being recipients of information. The newspaper names even that we’re used to, the New York Times, well, that was the times at which the ships arrived, the Washington Post — the reason that postmasters became newspaper publishers, is they got their mail for free. So it meant this network of postmasters could exchange information all around the country, and then publish it in their newspapers. And what they actually did is they sent each other their newspapers back and forth across these post roads. Franklin in this way, built a network that not only enabled the movement of physical goods, but enabled the movement of ideas.

Noshir Contractor: And so we come from there to also the provision where,the information providers or the platforms if you were not necessarily liable for what kind of content was put on some of these platforms. If you come to the 1996 communication decency Act in Section 230, in particular, tell us about what was the original thinking behind that, and how that may or may not have changed in today’s world,

Danny Weitzner: The carriers, the internet platforms, that included everyone from internet service providers to web hosting providers to the current internet platforms that we know of, today, social networks, Twitter, Facebook, Google, etc. Section 230 provided that these platforms would number one, not be liable for the speech of their users. So that if I get on Facebook, and I insult someone, that person might be able to sue me if my insult was sufficiently harmful, but they the person cannot sue Facebook. 

I was at the Center for Democracy and Technology at the time, very much in the middle of Congress’s debate about how to approach the internet. What what a number of us realized was that if we really wanted these platforms to enable robust free speech, we couldn’t put the platform’s in the position of having to monitor and assess their potential liability for every single piece of information that flowed across the network, exactly in the same way, as you suggested Nosh, that if we told postmasters, they were liable for the contents of every message ever, every letter that they carried, then the mail would come to a halt. 

But Section 230 also did something else, which is very important. It also said that, if platforms take steps to remove content that might be considered by their users to be offensive or harmful, for whatever reason, that they would not be liable for those actions. And that was a very intentional design, to encourage platforms to create environments that would would suit the communities that were using them. We knew we couldn’t possibly have one single content standard, to govern all the information on the internet, it just would be too complicated. What we envisioned was that there would be many different platforms, each of which would perform this kind of function and create environments that was that were targeted to different audiences. 

Now, what we didn’t envision, as we talked about before, is that we would only have three or four platforms at any one time. In the early 2000s, not long after section 230 was was enacted by Congress, there were over 8000 Internet Service Providers just in the United States, and literally hundreds and hundreds of web hosts. Again, the early days of the internet was a much more decentralized, less concentrated environment. Today, it’s reasonable to ask what we should be expecting of these dominant platforms when it comes to speech, but I still think the underlying goal of Section 230 as a free speech protecting rule remains every bit as important now.

Noshir Contractor: Indeed, I want to turn our attention now to some more recent work that you’ve been doing in the context of the public health pandemic, trying to understand how we reach a balance between privacy and public health. And your project which is titled Private, Automated Contact Tracing or PACT for short can you tell us a little bit about how this would work in terms of balancing public health, good and privacy of the individual?

Danny Weitzner: To begin with, just to set the context, we’re now a little bit more than a year into the pandemic, and just a little more than a year ago, my colleague at MIT, Ron Rivest, who’s really one of the world’s leading cryptographers, and an extraordinary computer scientist, came to me and said, we need to figure out whether it’s possible to do privacy-preserving, contact tracing. It was understood even then, in early March of 2020, t hat because COVID was an infectious disease, deploying the traditional public health approach of contact tracing was critical — that once you find a case, you have to very quickly figure out who else might have been exposed to that individual and make sure that they quarantine or take appropriate steps both to protect themselves and to to limit the spread of the disease. 

What we saw at the time, was that the countries that were hit with COVID, first, that just happened to be in Asia — China, Taiwan, South Korea, others, some of these countries very quickly adopted, very innovative, but very intrusive, smartphone based surveillance techniques. After the pandemic started in China, in order to travel around it all, you had to have a kind of a COVID pass, which showed that your, your risk of exposure was limited. And this was all based on a highly centralized system that Chinese public health authorities built in order to detect who might have been in close proximity to someone else who had been tested positive for COVID. And these were all systems that were used, that were developed. using the GPS capabilities of our smartphones. That is they were location based systems. 

We realized that it was simply going to be unacceptable in the United States or other democratic countries, to have that kind of intrusive surveillance, whether or not it was going to work. And so we realized that, from talking with colleagues in public health, that really all that mattered, in assessing exposure risk for COVID was proximity to another individual, not your absolute location in the world. We realized that really what public health authorities needed was a way to detect who had been in close proximity for a sufficiently long time, to someone who tested positive, and that we needed a way to get notifications to those individuals who were potentially exposed. We at MIT with some colleagues at BU and Carnegie Mellon and other universities, worked very quickly to develop a protocol that would provide this kind of exposure notification based on proximity, not location in order to protect privacy. And we had colleagues also based in Switzerland, mostly EPFL, who also were developing very similar protocols. 

Once we released our protocol, Apple and Google announced that they were going to work together to adopt a very similar design. And they have, I think, to their great credit, worked extraordinarily hard together to develop a single system, which is deployed on both the Android and the iOS systems to enable exposure notification. The critical property of this system is from a privacy perspective, is that it doesn’t collect any information about your location, it keeps any personal information about your proximity to an infected person, and any personal information about your infection status, entirely private to you, as the individual user.

If I’m an infected person, I don’t know who I infected. If I’m a contact of an infected person, I don’t know who infected me. And probably most significantly, and perhaps most controversially, the public health authorities don’t know any of this. Our system relies on notification to individuals, who then are instructed to contact public health authorities to take further action. This was a, you know, a decision that we took quickly, but not thoughtlessly, because we felt that if we built a system that required that everyone trust their governments to hold this information securely and without any risk of adverse consequences, that would just discourage too many people from using the system. 

What we see around the world now is that states and countries that have deployed the system, there’s anywhere between 20 and 50% of the population uses the system. That’s a big number, but also reflects some substantial hesitation. This is another web science challenge. We are right now studying trying to learn how people have made decisions about whether or not to deploy this, this service, whether or not to turn  this app on, on their smartphones. And we don’t have what I would regard is statistically relevant data yet, and we certainly haven’t published anything yet on it. But very early indications suggest that people are making pretty complex decisions that have to do with a sense of trade-offs. It’s not just a question of whether my I’m giving up too much privacy or not, it’s a question of what am I getting for it? Am I getting a system that’s actually going to protect me? It’s actually going to protect my community, my family or not? There’s a huge amount to learn about how people make those decisions,, what’s the right way to communicate about them. Even though, we’re well into the pandemic, and even though we designed the system,really in a matter of weeks, I think it’s going to take us longer to really understand some of the details of how it’s actually being used and understood.

Noshir Contractor: Well we’ve done quite a tour de force today talking about all the thinking and research that you’ve done in helping with policy in the area of privacy as applied to platform providers as applied to public health. Talk a little bit about training the next generation of web science scholars and one of the pioneering courses that you have been teaching in collaboration with colleagues at Georgetown Law School, between computer science and law students.

Danny Weitzner: When I left the government in 2012, my colleague, David Vladek, who is a law professor at Georgetown and was the head of consumer protection at the Federal Trade Commission’s did a lot of very, very important work on privacy investigations, David and I, wanted to keep working together. And we had both spent a lot of time working on privacy legislation together that had been proposed by the Obama administration. So we said well let’s let’s see if we can help law students and computer science students to work together on developing privacy legislation. This has developed into a course this now in its sixth year, and we bring together 15 law students and 15 computer science students, do a kind of a crash course on privacy law, a crash course on relevant, computer science concepts. And then we give teams of students made up of two law students and two computer science students the particular privacy challenge and we say, understand the challenge, technically, and come up with a legislative response to it. You know, we thought when we started this course, that what we were teaching about with privacy. And of course we do that. What we learned is that what we actually are teaching is how lawyers and computer scientists can work together.

And I think what we’re really learning is that addressing public policy challenges that web science brings to the fore, is really a team sport, that it really isn’t something that any one discipline could go off by itself, and either study alone, or act on alone.

Noshir Contractor: In closing here, your research and your teaching are absolutely stellar examples of how web science has to live up to the spirit of serving at the intersection of these different disciplines, and so I am really grateful that you took the time to talk with us about some of these issues, for your thought leadership over the decades on this particular topic. And as a shameless plug for those who would like to hear more wonderful words of wisdom from Danny, I would encourage you to think about attending the Virtual 2021 ACM web science conference From June 21 to June 25, where Danny has graciously agreed to be a keynote speaker, so thank you again, Danny, for joining us today.

Danny Weitzner: Nosh, thank you so much, I really appreciate you having me. This was wonderful.

Episode 11 Transcript

Ravindran Balaraman: In a country like India, the number of people who are active on the web far exceeds populations of most countries. But then here’s a significant fraction of our population that still doesn’t have access to the web and access to the services that are being provided on the web. So, this there is this digital divide.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the web science trust. I am Noshir Contractor and I will be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society and how society in turn is shaping the web. 

Today I have the pleasure to welcome Ravindran Balaraman, who you just heard discussing the unique challenges people in India face accessing the Web. He is the Mindtree faculty fellow and a professor in the Department of Computer Science and Engineering at the Indian Institute of Technology Madras. He also heads the Robert Bosch Centre for Data Science and Artificial Intelligence at IIT Madras, which is the leading interdisciplinary AI research center in India and India’s first lab to join the Web Science Trust Network of laboratories from around the world. He co-founded the India chapter of the Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Dating Mining (SIGKDD for short), and he is currently the president of that chapter. His research is pushing the boundaries of reinforcement learning, social network analysis, and data text mining. And his work bridges the gap between theory and practice in machine learning. In 2019 he was instrumental in hosting the first Web Science Symposium in India. He was recognized in 2020 as a Senior Member of AAAI (Association for Advancement of AI) for his significant accomplishments within the field of artificial intelligence.Welcome Ravi.

Ravindran Balaraman: Noshir, thanks for having me on the podcast. 

Noshir Contractor: It’s my pleasure. Thank you for joining us, I must say that I’m especially thrilled to have you as the first guest on this particular series that is joining us from the global south. I’m absolutely delighted to have your insights about what web science means and can do or can’t do in the emerging economies of the world. You have mentioned for example, that there are many India specific challenges that need to be addressed by web science. What do you think web science means in the context of countries like India?

Ravindran Balaraman: In a country like India, the number of people who are active on the web far exceeds populations of most countries. But then there’s a significant fraction of our population that still doesn’t have access to the web and access to the services that are being provided on the web, right. So, this there is this digital divide, which people talked about, when the IT services became more popular. Now, with the growth of the web,  the society interactions are happening on the web, this kind of digital divide is getting exacerbated, it is getting much worse. Recently, a colleague of mine from our social sciences department, we have been looking at the impact of this worsening digital divide on the migrant population and in particular, their access to digital banking. So with the enablement of digital banking, there is so much more of our commerce, now, even in India, happens online. And there is a significant fraction of the society that is getting excluded from that. The migrant population, because they have now actually been, you know, transplanted into a slightly alien culture for them within the country. But they don’t want to use these web services that are available for the rest of the country. They don’t want to use them because they’re feeling even more alien. This is not within their realm of experience. One of the theories that we are posing now is that, maybe we should use techniques from AI, to make sure that these interfaces on the web that these people are getting access to reminds them of home, as opposed to having an impersonal voice that’s going to talk to them about, Okay, you want to do banking, and press this number or press that and then enter something here. And so can we have somebody talk talk to them in their local dialect. 

Their portal to the web now becomes more like a slice of home. Given that very few countries have this kind of large internal migration of migrant population like India, it’s a problem that literally, we have to buckle down and start solving.

Noshir Contractor: This is really intriguing. Can you make a concrete example of what kind of migrant population you’re talking about, and what can be done to help them feel less alienated and more at home?

Ravindran Balaraman: So let’s take one concrete example. Like almost 80 to 90% of the construction workers in India are or people who are displaced internally, these are people who move in from a particular state in the north of the country called Bihar. And most of the construction workers in my state, my home state, which is the southernmost state in the country, come from Bihar. And it’s a completely different culture, not just language, from the way we dress, the climate, and the kind of festivals that we celebrate here, the food that is available to them, everything is different. So this is really alien country for them, and they tend to stick close to one another, right? And then you tell them that, okay, the government is offering you — no schemes — all you have to do is go online, you know, click a few buttons on your smartphone, all of them have smartphones, this is a surprise, all of them have smartphones, and they use that to connect with their families back home, right, and just call them or link with them on WhatsApp. So they are happy to do that. Not that they can’t get online, but they can’t integrate with a larger web community, mainly because they just want to use it as a conduit for connecting back home. My colleague in the social sciences department has been doing a lot of study on migrant populations within India. And so we are drawing on the insights that he has looked at, from their assimilation into local society, and then trying to look at how that affects their assimilation into the web. And then the insights that we have looked at is that the web actually gives us an opportunity to give them a slice of home, If you can tell them, okay, all your interactions with the web can happen in the local language and, and then you will log into a portal, and then it starts greeting you with local functions, local festival chat, asking you about your parents and stuff like that. So that’s the kind of idea that we are looking at. But that’s a solution that has to come from India.

Noshir Contractor: You touched on something at the start that I want to go back to you mentioned that as a result of the digital divide, difference in access has been exacerbated recently. And I want you to tell us a little bit more about the extent to which you think the presence of the web has contributed to this digital access divide. And or the extent to which AI is now becoming so permeated on the web is either mitigating or exacerbating these digital divide issues that you touched on.

Ravindran Balaraman: Almost every web service that you see online,? Whether it is like online communities like Facebook, or professional communities like LinkedIn, or whether you’re looking at services like Amazon or Google,, everything is strongly infused with AI. This enablement of AI is essentially making, you know, people rely more and more on the services because they are so much more convenient. Companies, because they are looking at where the bulk of their revenues are coming from, are tending to move more online. And so it makes it harder and harder for people to get services locally. So those who are not online are actually getting lesser and lesser services.

So it is certainly exacerbating the divide. Really, right now we are looking at how to make you know, the online access easier for people.

Noshir Contractor: It’s depressing in some ways to hear you say that AI might actually be exacerbating the divide, but you’re also looking and exploring at ways AI can be deployed to mitigate some of these access and divide issues. Can you give a specific example of something that is happening in India, that gives you hope?

Ravindran Balaraman: Languages is an important thing, right. So the most of the interfaces now have improved tremendously in India. A lot of companies are actually investing money in India, to build now local language interfaces. And my mother tongue is Tamil and I can talk to my phone in Tamil, and it does a perfectly fine job of transcribing it. Even if I give you a Tamil keyboard, right? So in fact, anyone who has tried using a Tamil keyboard knows that it is much, much harder to use than than the English keyboard. I would prefer typing in English than typing in Tamil, but I would love to talk in Tamil than in English. I can see more people getting integrated because of that. 

I still think AI is at the end of the day a technology, right, it’s up to us to figure out how to use it. And and that is a stronger awareness among the government as well as among some of the bigger you know, enterprises.

We have to actually start providing all the services in a more accessible manner. And that realization now taking ground.

Noshir Contractor: You point out a really important issue that we tend to take for granted in many of the developed countries. In India alone, according to the census of India in 2001, there were 122 major languages in one country, and 30 of these languages were spoken by more than a million native speakers. So what you just described, the technology of using ways of translating these across languages, really helps connectivity on the web in a way that we take for granted when we speak one dominant language in the West. Can you talk a little bit about the ways in which the study of artificial intelligence has in and of itself changed as a result of the web? I remember going back several years when the initial Dartmouth studies were coming together to coin the term artificial intelligence, ai was mostly seen as a rule-based system where you would provide certain rules and certain kinds of reasoning systems. And today, that seems somewhat antiquated, or is it?

Ravindran Balaraman: Oh, yeah, that’s a huge debate. AI seems to go through these phases, right? So while, at one point of time, they say, it is all about logic, reasoning, rules, and inferencing on it. And then the next point of time, we say, Oh, no, throw out everything, you have to learn everything from data, learn from scratch, it’s all about statistics. We are seeing a strong swing towards the data driven statistical approach to AI. And part of the reason is the web. So it has been both something that really helped AI grow, because it’s giving you huge volumes of data. Not only it’s not only just giving you data, right, but it’s also giving you data with tags on it, because people are so good at labeling what they are putting out on the web., Because everything is becoming more and more digital, that data is getting readily digitized.

Some of the techniques that AI is using now has been around for a couple of decades, if not longer. They couldn’t succeed because they didn’t have this kind of volume of data that the web has enabled us to gather rights and so that way, the web has had significant influence on on the growth of AI. 

Same time, I also have to say that the web also has caused us to kind of topple over and do things in a not so casual manner. Because if you look at some of the latest AI systems built completely on web data, you kind of see that they also tend to mimic the significant biases and prejudices that people bring to their writing things that they post on the web. And if you don’t do a capsule curation of what the data that you’re getting from the web, you’re going to  systematize the biases by putting it into a machine. And it actually makes it easier for people to make the argument that humans can be biased the machines can’t be. But then what they fail to sees that the machine is going to be biased because it’s digesting the biased data that the people are putting out on the web. 

So in some sense, it was great all that the web did. And it really gave a quantum boost to what AI was doing. But we are coming to a point where we have to start thinking very carefully about how we are going to take advantage of the data on the web. 

Noshir Contractor: So Ravi, one of the examples that got a lot of attention in the US at least, was the fact that exactly as you said, if you use AI to screen job candidates, then these AI systems will reproduce the same biases in terms of gender, and underrepresented minorities in terms of interviewing and screening for job opportunities, etc. And one of the issues that raised was that very often these kinds of AI techniques, give you a result, but don’t necessarily explain how and why they got those results. Some of my friends joke that what AI lacks is a “why” button. And that if the if AI gives a result, you should be able to press a button that says why, and this raised the whole issue of explainable AI. Can you talk a little bit about whether you see that as helping address the issue and the concern that you just raised? But also, how far are we from being able to have explainable AI?

Ravindran Balaraman: So I strongly believe that before AI, can be truly, you know, let out free in the wild. We need to solve the explainable AI question. So in fact, the job screening thing was something that was pretty, obviously, AI going wrong,? But then there are a lot of subtle ways in which AI is influencing our behavior,? In fact, if I go online, right, so my phone starts recommending these stories for me, then it’s going to start coloring my view of what kind of stories are going to see, that’s just because the AI system is learning this, and then it’s putting those out. So it very quickly customizes it for your preferences.

We need to have something similar to the ”why” button. So what people do nowadays as explainable AI is to say that, oh, you asked me why I said that particular image is appropriate for for your search, right? I want to see a football match. And then it shows me a picture of a football match. And it might say something like, oh, that here at this top right corner, there is something that, you know, that caused me to make it into a football match. So it can’t even tell you that okay, well, I think it’s a football match because there are like 10 people here and there is one guy carrying a ball. Iit basically says okay, that is this part of the image, which makes me think that it is a football match. That’s certainly not a satisfying notion of explanation for people. So we are quite quite away from getting to explainability, as humans understand explainability I’m not even sure how soon we will be able to get there.

But, this is what I always tell people. You don’t know how a motor vehicle runs, when you don’t know the details of an internal combustion engine, but you’re happy to drive a car. Right? So if you can, The reason you’re happy to drive a car around is because he knows that there is somebody at least back there who understands and has done all the testing and everything for you. If you can come to a point where I can say that, I know why AI this, me being an AI expert, right? As long as I can say that, okay, I understand the explanations for AI is putting out and I’m happy to certify that AI is doing the right thing. The general public just had to accept it — okay, it’s come with a certification from AI expert, that they understood what it is doing. But if you’re going to say that, it has to go to a point where the general lay public, the end user is going to understand completely what the AI is doing, I think there’s still ways off from that.

Noshir Contractor: To what extent do you think that AI is enhancing trust on the web, or undermining it for the lay public? 

Ravindran Balaraman: I mean I can tell you what I see around me, like at least in a large fraction of the Indian society, right? So we, unfortunately, tend to trust the web too much. The latest WhatsApp forward is taken as gospel. That’s mainly because the forward comes from a person that they know. And therefore they transfer the trust that they have of the person to the message that that’s been sent through them as well. 

So the web in some sense really worsened the impact of rumors and things like that, because you have a verifiable media source that sent you the information. And we tend to kind of ascribe the same trust to that that piece of information as well, right? Even though the person who forwarded it to you, might not have known where the message came from. When we did get a news from newspapers and things like that, that mean, there is at least the hope that appropriate research has been done before things they put on print.

And I’m not sure whether AI is still playing a role here in terms of making this worse or better. But I think AI can play a role in making things much, much better in terms of,attaching provenance, or at least doing a very, very quick analysis of, you know, the consistency of information on the web. The biggest challenge in fact checking all the information that floats out on the web is you don’t really know what the ground truth is. And the rate at which information is generated on the web, you can’t also go after the ground truth, right. So at least AI systems operating at scale can verify the consistency of the information that’s out there is there are like 10 people saying one thing and 10 people saying something completely different, then at least you can say that, hey, look, I don’t think this is right, because there’s just too many different opinions about this. And everybody is also working on this kind of fake news vector, and so on so forth. But who’s to say your news is fake?

Noshir Contractor: In the past year, in particular, with the pandemic and the other global reckonings, there has been heightened focus on social justice issues. But I want you to talk a little bit about what social justice means, specifically within the Indian context. And to what extent does the societal interplay and impact of AI and web have for social good in India?

Ravindran Balaraman: Throughout the country, the whole notion of social justice is very strongly embedded,? In terms of opportunities, and jobs, and in academics and everywhere. It’s significantly different from state to state, there are places where this kind of social inequities are much more pronounced. It could be that the same community in the society is discriminated against in one state, but not in another state. It’s a very, very, very complex dynamic within the within the country. So it’s not clear how we would, you know, build AI systems that are uniformly fair across the entire country. 

And again, sort of social good is concerned,? So there are various issues that people have looked at which they build solutions for in the west or in other countries. It’s kind of we struggle with implementing in India, because just implementing a system that would work for a million users alone, even though it will help the million users is grossly unfair to the Indian population.

Noshir Contractor: Can you explain more of why it’s unfair?

Ravindran Balaraman: It’s unfair in the sense that which million are you going to deploy to? Right, so who do you choose?Of course, there are a whole bunch of other factors that are going to come into play in terms of, to which fraction of the population do you have access that you are able to deploy your system to? There’s a whole bunch of other factors that are going to come into play. 

We really need to figure out a way to scale it much, much larger, a couple of orders of magnitude larger than what we can do right now with our systems in order to make it truly country. Countrywide deployable. 

Noshir Contractor: It sounds what you’re describing is a scaling problem. Help me understand why scaling is such a challenge. 

Ravindran Balaraman: Well, let’s say that I’ve developed a system  that tells me that, Okay, here are people with a certain medical conditions and, you know, they are having difficulty, you know, keeping to the drug regime, and you have to do some intervention to help them. Now I come up with a system that can look at analyze a million people, and then filter out, 1000 people who need this kind of intervention  and then I can actually put, you know, like, healthcare workers, who can go help these 1000 people, right now I scale it. But now suddenly, I’m looking at 100,000 people who need this kind of intervention. It’s just not a question of computation being hard, it’s a question of actual deployment in the field, that makes it much harder.

Noshir Contractor: So the challenge is not just in the technology, and the web might help us identify those who are in need. But that still begs the question of how are we going to reach all those people in any physical, tangible way to provide the need that the technology has helped identify what they need. In closing here, one of the questions that we’ve been asking our guests is that as we have been going through 2020, and now into 2021, we’ve been dealing with obviously, the pandemic as well as many global reckonings sociocultural nature, political in nature. And I was curious to get your take, specifically, from an Indian vantage point, on how you think this period 2020 and 2021 would have been different, for better or for worse, without the web.

Ravindran Balaraman: I can’t imagine 2020 without the web. So we literally lived off the web, not only was I working on the web, I was meeting friends, having I mean, everything right, so I just can’t imagine how we would have survived 2020 without the kind of online work and online meetings that are happening. I strongly feel that things would have been for the worse in the last year without the web. You might have as well thought of how would they have gone through 2020 without electricity?

Noshir Contractor: Yes, indeed, yes, it’s become a utility that we take for granted in most cases now. Well, I want to thank you again very much, Ravi, for taking time to talk with us and specifically for giving us insights into how web science has a different lens when seen from the context of the developing world, in this case, particularly from India, etc. And we’re just delighted that IIT Madras, the Indian Institute of Technology, Madras, became the first member of the web science trust network of laboratories from India and you were certainly instrumental in making that happen. 

And I wish you and your colleagues the very best in helping advance the notion of web science in developing countries etc. And we will be looking forward to hearing more about those insights in the years to come. So thank you very much again.

Episode 10 Transcript

Eszter Hargittai: Back in the early 2000s, my advisor in graduate school, Paul DiMaggio and I suggested the term digital inequality, to use instead of digital divide, to signal the spectrum of differences among people after they go online. So I actually kind of wish that people would not use second level digital divide, and third level of digital divide at all, and just would stick to digital inequality when they’re not talking about access differences. 

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the web science trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

Today, we welcome to the podcast, Eszter Hargittai. You just heard her talk about her work about digital inequality and web science.Eszter is a professor and holds the chair in Internet Use and Society in the Department of Communication and Media Research at the University of Zurich, where she also heads the Web Use project research group. Her research focuses on the social and policy implications of digital media, with a particular interest in how differences in people’s web skills and digital literacy influence what they do online. She is one of the most highly cited researchers in Web Science with more than 40 of her articles being cited over a 100 times, and overall she’s been cited over 27,000 times. She has also edited three widely acclaimed books dealing with how we do research. In 2019, she was elected as a Fellow of the International Communication Association. In addition to her academic articles, she has published numerous Op-Eds in a wide variety of prestigious outlets. Welcome, Eszter. 

Eszter Hargittai: Thank you. It’s great to be here. 

Noshir Contractor: I’m delighted to have you here, because what you study at the cusp of looking at how web literacy is so important in terms of understanding and advancing web science is something that we all need to be thinking much more about within the area of web science. As I see it, a fundamental premise of your scholarship and public advocacy is that gaining access to the web and internet, does not in and of itself, solve the problem of the so-called digital divide. Tell us how you got interested in this topic, and came to this premise that has been so pivotal in your work.

Eszter Hargittai: So I was in college back in the 90s. And that was the time when the internet was starting to diffuse beyond academic circles. I actually started college when it was not yet automatic that you got an email address, but I asked for one. And so I started spending quite a bit of time online, and then studied abroad in Geneva, my junior year, which was 1995-96. And interestingly, this was very close to where Tim Berners Lee developed the web. And this got me quite interested in understanding how people keep in touch, especially since I was thousands of miles away from both my family and my college — many of my college friends, I was interested in how people keep in touch through this medium. But also realized pretty early on that just because people gained access to it didn’t mean that they would use it equally the same. And I continued to see this just in my own life with people around me. And as a sociologist, which is what I was getting a degree in, and someone interested in social inequality, I started wondering how the ability to use the internet and the web and to understand the web, so web-use skills, how this related to people’s background.

Noshir Contractor: And I remember that when the word digital divide first gained currency, most people equated digital divide as being associated with whether you had a computer or not, whether you had access to the internet or not. And then subsequently whether you had access to the web or not. And you were amongst the first advocates to say that that’s a very superficial definition of digital divide. 

Eszter Hargittai: This phrase, digital divide, I think, lost a lot of utility as the internet diffused to a larger segment of the population. And as people started incorporating the web into their lives, it was no longer that meaningful to talk only about the digital divide as such, which is, as you said, about access differences. 

And so I suggested in one article that I called the “Second level digital divide,” I suggested that web use skills were also very important to how people were integrating the web into their lives. Now, I will say in retrospect, I kind of wish I had not introduced that term. I think it has led to a lot of confusion in the literature. People now even talk about the third level digital divide. But back in the early 2000s, my advisor in graduate school, Paul DiMaggio and I suggested the term digital inequality, to use instead of digital divide, to signal the spectrum of differences among people after they go online. So I actually kind of wish that people would not use second level digital divide, and third level of digital divide at all, and just would stick to digital inequality when they’re not talking about access differences. And in some ways, I’m almost upset with myself that I introduced what ultimately I think has become rather a lot of confusion into literature. 

Noshir Contractor: It’s a nice dilemma, to have something that you popularized, that you then want to backtrack and retract after the cat is out of the bag, so to speak. In your own work, you have used many different kinds of approaches to help understand these digital inequalities. Can you talk a little bit about the kinds of studies and creative approaches you’ve used to help tap into measuring the extent to which these digital inequalities might surface?

Eszter Hargittai: My dissertation actually was about people’s web use skills. And I did this by interviewing people in person and collecting some survey data from them, but then also observing them as they use the web. So giving them questions, so-called tasks to perform, and then recording what they did, and later analyzing what they did, and quantifying what they did. And then ultimately, what I did was I took the measures of what were actual skills, right, because I had data on their actual ability to, to solve tasks online, and then looked at what survey questions correlated with those actual skills. And this is how I came up with a proxy measure of web use skills, that has since fortunately been used by lots of others, and continues to be a helpful measure. What I have found interesting in the literature is that many people go about coming up with these proxies from more of a psychometric measure perspective. But while psychometric measures are helpful for intangible things, like if you want to measure trust or privacy, that’s not really the best approach to study something like web use skills, because web use skills is a trait in people that we can, in fact, measure objectively as a skill. And so then we should, and then come up with proxies. The reason I suspect people haven’t done this too much, having done this work myself, is that it is a lot of effort. 

Noshir Contractor: So just to make this concrete, this is so interesting. What would be an example of a task that you might give someone to do on the web as part of these studies? And what would be examples of specific web use skills that you are looking to see whether they use those or not?

Eszter Hargittai: So one area that’s pretty hot these days is algorithm literacy. And so I’ve seen people try to come up with survey measures without the actual skills and the actual skills would involve sitting down with someone and seeing how they navigate, say, YouTube and how they look around the site to see whether they understand why their videos on the side showing up. Like, do they understand recommendations? Do they understand where those recommendations come from? Do they understand different feedback that they give to the site how that might influence what then the site gives them? So that would be a concrete example of actual behavior and skills you could measure and then come up with survey proxies for that. 

Noshir Contractor: And what kind of differences do you see in skills amongst people?

Eszter Hargittai: Skills vary very much across the population. A very consistent finding across time at this point and also across different national samples is that education is very much related to skills. So people who are more educated also tend to have a better understanding of the web and have higher level web use skills. 

A less obvious finding has been that there’s considerable variation, even among young people, right. So there’s a lot, there are a lot of assumptions in the media, but also just generally, you talk to a person on the street that people who are young are automatically savvy with the web, because they grew up with it. And so I didn’t think this was the case, but I then studied this scientifically, collecting data on young people’s web use skills, again, both in-person observations and also survey measures, and found that even within young adults, there is quite a bit of variation. 

And even within young adults, socioeconomic status matters. And young adults who come from more privileged backgrounds will have higher levels of skills. One other thing I’d like to say about age is that another assumption is that older people are necessarily worse than younger people. It is the case that once you get into ages 70s, 80s, and, higher, people’s web use skills do drop. However, if you look at people 50 and under, and certainly 40 and under, there really is not actually an age correlation with skills. 

Noshir Contractor: This is really an important finding. Because, as you pointed out, there is a lot of conventional wisdom that assumes that, Oh, young kids know exactly what to do on the internet. And I’m glad that you brought that up. Some of the other areas that you’ve looked at include gender differences, nationality, education, socioeconomic status, ethnicity, and so on. In particular, I also want you to help us understand something that is so important across web science. And that is the impact of disability status in terms of web using skills. And I know that you’ve done some really interesting work in that area.

Eszter Hargittai: Disability status is not something that a lot of work has looked at in web use studies. So with my collaborator, Carrie Dubrovsky, we have, using early 2000s data, have looked at how people with disabilities compared to people who don’t have disabilities, in their web use skills and what they do online. So from before, we have found that people with disabilities had lower level skills. But more recent data actually suggests that people with disabilities have caught up. And we no longer find this, what we would call a digital disability divide, although back to that whole digital divide issue. 

And then beyond this, we’ve also found that in some cases, people with disabilities are actually more active on the web. So they participate in online activities where they make their voices heard, in some cases more than their non-disabled counterparts. 

Noshir Contractor: Well one of the reasons why this disability inequality might have diminished is because there has been a concerted effort to make the web more accessible across different sectors of society. To what extent are the results and findings that you’re reporting, an acknowledgement that our efforts to make the web more accessible are in fact yielding payoffs?

Eszter Hargittai: Yes, so I think one could legitimately see it as a sign of that. It seems that if people with disabilities are able to be actively engaged online, that means that the web is in fact more accessible. One of the challenges of this area of work is national samples only have so many people, only capture so many people with disabilities. So it would be helpful to have a better understanding of how different types of disabilities relate to online behavior. And for this, we’d need much larger samples on people with disabilities. So it would be nice if there were resources to do more targeted sampling of those populations. But overall, it’s fair to say that certainly, some parts of the web have become quite accessible to different types of people.

Noshir Contractor: So there is some encouraging news out there, but I’m sure you agree that there is much more that remains to be done, even as new as new technologies come on the web, including things like virtual reality, and augmented reality, and, and so on. 

Eszter Hargittai: Exactly. And I think it also concerns partly educating the public, who are not necessarily disabled, right, so just to give one example, Twitter has the alt text when you upload an image, which means that you can say in text form what’s on your image. And I always do this when I upload images, but I suspect the vast majority of people don’t do it, not because they don’t care to, it’s partly, and this is where we’re back to web use skills, they don’t actually know it’s possible. 

Noshir Contractor: Who do you think is responsible, or should be responsible for helping educate the public? You have obviously done a lot in uncovering this as a scholar and a public intellectual. Do you see a role for some organizations, maybe the platform providers themselves to help inform and educate the users about these kinds of digital skills? 

Eszter Hargittai: I definitely think it has to be this multi method approach, so to speak. First,, we shouldn’t ignore educational institutions, right? So we, we have to move beyond this assumption that young people know understand the web anyway, so we don’t have to teach them, because that’s wrong. But then, of course, the largest segment of the population is not in educational institutions, and you don’t want to ignore them. So where else can we help? So certainly, libraries and community centers can play a role, and they often do play a role, they offer workshops.

Part of it would be on platforms to recognize that their users come with different abilities to their platforms. It’s a question of usability, right? So I don’t think it’s realistic for them to do a little skill enhancement programs. Rather, they should recognize that these skill differences exist and then address that in how they put together their platforms.

Noshir Contractor: You’ve talked several times about how important and how creative one has to be to do research on topics like this on the web is, in some ways, qualitatively different and more challenging, from doing research in-person, pre-web days, for example. And I’m absolutely fascinated by the titles that you used for three books that you have, edited, and or co-edited. Starting with a book titled Research Confidential in 2009, followed in 2015, by a book titled Digital Research Confidential, and a book in 2020, called Research Exposed. That sounds like a tabloid feel to the whole thing. Tell us about why you thought about these titles and what you want to convey by these titles. And obviously, the content of these books. 

Eszter Hargittai: I should say that the titles came from other people. So I owe others credit. The idea here, that studying the web requires being creative, as you noted, and in all sorts of ways, while the web generally offers all sorts of opportunities, it also offers challenges, or comes with challenges. And I felt like there just weren’t enough write-ups of how empirical social science actually gets done in terms of the day-to-day reality, right, so there are lots of methods books, lots, I mean, infinite number of methods, but but they usually tell you the ideal type of a project or what what you should strive for. But anyone who has actually done empirical work, knows that nothing ever works exactly as you plan, that it’s much more messy, and there are just so many issues that come up. But we don’t tend to write about those in academic writeups. So I wanted there to be a venue where people were just genuinely honest about all the challenges they encountered, but then also share how they dealt with them. I’ve been extremely fortunate to have amazing contributors to these volumes, who have been very generous with their time in sharing their experiences of all sorts of projects, studying the web, whether from big data, log data, to using more traditional methods, like interviews to understand how people use the web, to web scraping, So there are lots of different types of methods in all of these volumes. 

Noshir Contractor: And I think that what you just described is very well captured in the subtitle, so the second part of the titles of each of these books were, in the case of the first book, it was solutions to problems most social scientists pretend they never have. And the second which was called the secrets of studying behavior online, and the most recent how empirical social science gets done in the digital age. So you’re pointing out that your focus is not on aspirational methods, but how it actually gets done, and how people then deal tactically with the messy aspects of the challenging aspects of doing research on the web. 

Eszter Hargittai: Exactly. It’s the behind-the-scenes realities, it’s the ugly sides, the difficult sides that not only do, do people not write up in articles, partly because there’s just not usually room for these things in articles. But also, because they may be deemed as embarrassing, but, but part of the idea is precisely to acknowledge head-on that this is the messiness is part of the research project, there is no such a thing as a perfect research project. Those subtitles very much capture exactly what the books are trying to do.

Noshir Contractor: Fantastic. Well, one of the things that you’ve also been doing lately, is taking advantage of this pandemic situation that we find ourselves in, and have used that to initiate a really large data collection effort about COVID-19, and collecting nationally, representative sample data across three countries, tell us a little bit about what the study is, and also about the book on COVID and digital inequality that you are writing as an academic trade book for MIT Press.

Eszter Hargittai: So, back last spring, when the world went into lockdown, I was wondering how to, how to cope just like everybody else was. And it became quite clear very quickly that, that the web, digital media were going to be very important in this whole situation. And suddenly everybody was commenting on this. But I was someone who had been studying this for 20 years. So I felt like okay, doctors were doing more than their share by treating people. What could I do as a social scientist, and I felt like, I can contribute by trying to understand the social side of this and as an expert in, in studying people’s web uses, trying to understand how the web played a role in all of this. So with my team at the University of Zurich, wonderful group of junior scholars, we decided to do some surveys. And so early April, we fielded a national survey in the US and then mid April, one in Italy, and one in Switzerland. And then in early May also two more in the US. And so the book that I’m writing about digital inequality, and COVID, looks at the early days of the pandemic, and really the lockdown time, and how people were using the web at the time. And perhaps not surprisingly, but I think it’s very important to document, traditional markers of inequality like socioeconomic status, again, play a role in the extent to which people were able to pivot to the web for things that they needed done. But as we discussed earlier, in this conversation, some groups that we might not expect to do well, like people with disabilities, were actually doing quite well compared to others in their engagement online.

Noshir Contractor: Well, we’re going to look forward to reading this book when it’s, when it’s published. One of the things Eszter that you managed to do in the abundance of spare time you have given everything we’ve already talked about, is spend time dealing with outside the academics. You’ve been a very active public intellectual in terms of general audiences, op-eds, etc. But also, in talking with policymakers. Tell us a little bit about how webs science can shape policy in terms of influencing policy makers in these kinds of contexts. What has been your experience? And what are lessons that we can learn on how to do that more effectively?

Eszter Hargittai: Academically, our, our work is important and interesting, and hopefully insightful. But ultimately, it’s very helpful if we can then influence policy, where our findings can be translated into real world outcomes. And so at the very core of my work is this point about, don’t assume that young people automatically digitally savvy, don’t assume that once people get connected, they, they will have equal, even access. And definitely will not have equal skills in using the web. It’s very important to get this out to policymakers. it’s important to keep in touch with very different constituents and colleagues, right? So it’s important to attend different conferences, it’s important to write op eds.

So op-eds, are a really terrific way to get your message out to a broader public. But it’s a very different writing, from writing academically, just incredibly different. That’s something I had spent time on it, I think, was a really good use of time.

I have been affiliated with Harvard’s Berkman Kline Center for Internet and Society, and they do really well in connecting with the policy world. Through them, I spoke with the Obama transition team in 2008.

Noshir Contractor: One of the things that you have actually taken on and as a passion is to write about academic career advice. You have columns and blogs in places like Inside Higher Ed, etc. And so as we wind down this interview, what kind of advice would you give webs science scholars at different stages in terms of what they should, what priorities they should be having, what they should be thinking off what they could be mindful of both from a scholarly point of view, but also in terms of public engagement?

Eszter Hargittai: Actually, it’s been very interesting, as I’ve worked on this COVID Digital inequality book, I’ve actually found myself reaching out to people I knew from graduate school who are not in my specific field. So I was getting a PhD in sociology, and I’ve been reaching out to, for example, people in economics, political scientists. They could help with things that weren’t as obvious for me to tackle. Too often, I’ve seen junior scholars be hyper focused, and that if x is not doing their exact work there, that they can’t be relevant. I think this is, this is really a shame. Because you have so much to learn from people who are not necessarily doing what you’re focusing on. 

Web science is, by definition, interdisciplinary. So I think it is extremely important for people to cultivate networks of others in web science who aren’t necessarily from their own discipline, right? So if you’re mostly doing social science, be sure to be talking to the experts who are more in computer science or who if you do more traditional methods, talk to the computational social science, computational communication science, people and you can learn so much from collaborating with people who have different methodological backgrounds, for example. 

Noshir Contractor: Excellent advice. And then in closing, one last question, we spend a lot of time today talking about the pandemic and of course 2020, 2021 has had in addition to the pandemic several other global cultural reckonings, etc. Taken together with the pandemic, what is the one thing that you think would have been different in this entire experience that we are going through still, for better or for worse, without the web?

Eszter Hargittai: The web cannot be taken out of the COVID conversation, right? So the — us experiencing this is so much about web-based communication. I’d like to think mostly for the better, because of the connections we made the not not having to feel as isolated, if you were able to connect with others for social purposes, many, many people being able to continue their work, even remotely, these are all positive aspects. Negative aspects are, would be, the potential for very quick dissemination of misinformation, of disinformation. Recognizing this, then we need to think about ways to counter that. And so yes, that’s a potential negative, but the positive also, of the racial justice issues that happened last summer in the United States, people’s ability to connect with like-minded others to be able to organize in support of those experiencing injustice. The web is very important to this. 

So generally speaking, I think it’s important to recognize that the potential of any technology, including the web, depends on multiple factors. It depends on how governments respond to them, how they support them or restrict them. It depends on what actions the business sector takes. And then it very much depends on how users approach these technologies. Ultimately, though, I believe that it has been for the positive. 

Noshir Contractor: Fantastic. Well, thank you, again, Eszter for taking time to talk with us and to enlighten us about the nuanced ways in which one should be looking at digital inequalities in society. Notice I didn’t say digital divide. But also more importantly, thank you for your incredible scholarship over the last decade here and more where you’ve really helped understand and advance the process of doing research on the Web and also as we’ve just discussed, advocating for it to the general public and policy-makers, so thank you and I look forward to the next decade of research from you.

Eszter Hargittai: This has been a delightful conversation, Nosh, and thank you so much.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.