Episode 26 Transcript | Untangling the Web

Sandra González-Bailón: Even though there’s a lot of political organizations and a lot of politically motivated individuals who are trying to organize the next big thing to pursue their cause. It’s very difficult to predict social dynamics. I don’t think that’s depressing. I think that’s actually a reminder that nothing in social life is fully determined.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the website’s trust. I am Noshir Contractor, and I’ll be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web.

My guest today is Sandra González-Bailón, who you just heard talking about the unpredictability of political mobilization and how we can study that on the web. Sandra is on the faculty at the Annenberg School for Communication and affiliated faculty at the Warren Center for Network and Data Sciences at the University of Pennsylvania. Her research lies at the intersection of network science, data mining, computational tools and political communication. Her articles have appeared in journals such as the Proceedings of the National Academy of Sciences, Nature, and Social Networks, among others. She is the author of Decoding the Social World, published by MIT Press in 2017, and was also the keynote speaker of the ACM Web Science Conference in 2019, in Boston. Welcome Sandra.

Sandra González-Bailón: Hi Nosh, thanks for the invitation to join! It’s a pleasure to be here.

Noshir Contractor Thank you again for joining us today. Sandra, I’d like to believe that you are amongst the first of what I would call “bona-fide” web scientists, who began your career looking at the social world through the lens of the web. And I want to begin by asking you to help dissect the title of the book, Decoding the Social World, and the subtitle, Data Science and the Unintended Consequences of Communication.

Sandra González-Bailón: This idea of unintended consequences takes root in the fact that often in network systems, there’s no one who’s really in charge of the dynamics that take place in those networks. And so, you know, you might want to create a message that would go viral, but it’s really not up to you to allow that message to go viral. That depends on what other nodes in the network other people in the network will do. We can try to reverse engineer and unpack how those dynamics emerge and take place, and get a better sense as to how they happen, when no one really is designing those dynamics or in charge of determining how those dynamics emerge.

Noshir Contractor Could you give an example from your own research where something that happened was not intended or not anticipated? Some collective behavior, for instance?

Sandra González-Bailón: A lot of my applied research in which I use social media data analyzes episodes of collective effervescence, right? Moments where the whole is more than the sum of the parts, when suddenly you have a critical mass of people who are communicating about a particular issue or a particular topic or organizing around a particular movement. And of course, we hear a lot about the episodes that are successful, the moments where those processes of collective effervescence result in massive mobilization on the streets, in massive protests. And what we often forget is that for every successful instance of massive mobilization, there are many examples of unsuccessful attempts at mobilizing that critical mass. And so even though on one level, those episodes of political mobilization are intended — of course, there’s a cause to fight for — what is unintended, is the level of success. You don’t always have control over how many people will be retweeting your hashtag, or how many people you will convince to mobilize and take to the streets. That’s how I understand unintended consequences. And it’s really a shorthand to refer to the lack of control that we often have on collective dynamics.

Noshir Contractor Is the implication then Sandra, that we can only explain some of these major events in retrospect? That we are incapable of engineering events such as these? That might sound a little depressing.

Sandra González-Bailón: It depends on how you think about it. Because to me, it’s the opposite of depressing — if we could anticipate and predict it would mean that we live in a deterministic world, right? For me, unintended consequences offer really a window through which freedom and agency can squeeze in.

Even though there’s a lot of political organizations and a lot of politically motivated individuals who are trying to organize the next big thing, to pursue their cause. It’s very difficult to predict social dynamics. I don’t think that’s depressing. I think that’s actually a reminder that we are not determined, and that nothing in social life is fully determined. And I think at the same time, there’s also value in trying to understand how these things happen.

Noshir Contractor Let’s talk little bit more about some other examples from your book, because you also look back at the historical developments in technologies. And I think that that’s really important, because we tend to be caught up in the moment and make it sound like what we are seeing today is truly different from what has happened in the past. In many ways it is, but what has your historical insights about technology taught us about how to better prepare to understand the web today.

Sandra González-Bailón: I start the book with a preamble where I explained that the book really is a story of recurrence and change, right? The recurrent aspect is that for some reason, we keep on using the same metaphors to refer to these technological breakthroughs, right? In the 1800s, 1900s, we talked about the telegraph as the global nervous system of the planet, right, which is exactly how we talked about the internet these days. And so some things don’t seem to change too much, right? Human imagination seems to be like traveling the same old cliches, when it comes to coming up with metaphors. But what has changed and definitely has changed a lot is how we use the data that we generate through the use of those communication technologies to try to understand their impact on society better.

Like the fact that people suddenly could communicate across continents, via the Telegraph, it definitely shrank the world, it reduced social distance immensely. But they couldn’t get the sort of data we can get today. There’s a lot of progress when you look back, you know, the sort of questions that we can answer now, with that data, we have many more answers than they could aspire to have back then the metaphors haven’t changed, the answers to the same questions have improved immensely,

Noshir Contractor: One of your recent articles published in PNAS in 2020, focuses on how exposure to news grows less fragmented due to mobile access. What is it that you found in this article that surprised you?

Sandra González-Bailón: The findings in that article are counterintuitive in the sense that the prevalent view around how people get exposure to news suggests that technologies are entrapping classes of like-minded people. In the context of news consumption, this means that you would only consume those sources that will reinforce your opinions, and your kind of predispositions. And what we find in the article goes against those claims in the sense that, we find that rather than narrowing down your news, digital technologies, and in particular mobile technologies are sort of widening of the range of new sources that you consume.

Digital trade data is very rich, but it’s very different ways in which we can collect that data. And so one of the things that we do in that paper is to incorporate it into our analysis, mobile access to news, which changes the sort of conclusions that you can draw from the analysis compared to what you would find if you only tracked people through their desktop computers.

Noshir Contractor: One of the things that was interesting about this study was the kind of data that you use. Can you talk a little bit about the data, which was a five year time window involving 10s of 1000s of panelists. How does that compare with pure digital trace data?

Sandra González-Bailón: Yeah, this is a data set that’s compiled by a media measurement company. And it does rely on log data, right. So they do have mechanisms to track the behavior of their panelists when they are aligned. And what’s a little bit novel is the fact that they also track what people do on their mobile devices. We all know, intuitively and through our own personal experience that everything has gone more mobile now, right? And so from a scientific or measurement point of view, if we only track what happens in our computer, we are missing a huge part of online activities.

The other part is that, of course, the interface matters. The medium that you use as a content provider to deliver that content matters as well. Someone wrote a few years ago that the web was dead because of the rise of the app. So apps are like walled gardens, right? they are not this open, massive open space that the web is. The reality also is that online activities are more fragmented. One of the things that we also kind of point at, you know, in the conclusion of these of these papers that you mentioned, know is that we are also at a crossroads here, where we have to decide how we’re going to collect data moving forward, how we’re going to guarantee access to those walled gardens. Otherwise, even though we have more data than ever before, if only a handful of people can get access to that data, then there’s really not much difference, right?

Noshir Contractor: So there’s a paradox between the fact that there is a lot of data being generated, but a lot of that data is then not available for access to researchers like yourself to help reach these conclusions. I’m still puzzled by what seems to be a counterintuitive conclusion.

You’re saying in the study that as people access news and other items through mobile devices, that we grow less fragmented, yet society seems to be increasing, at least in terms of popular media coverage, increasing in our fragmentation. How do you reconcile the findings of your research with what we are told in the public about us getting more and more fragmented? Does the public media have it wrong?

Sandra González-Bailón: Well, the public media often has it wrong, but in this particular case, it’s a matter of what’s the level of analysis. What the paper does is analyze exposure to news and political content, but we’re looking at the information sources that people get exposed to. We don’t really talk about the processing of that information. What are you going to do with that information? Because sometimes, you know, you may read Fox News, that doesn’t mean that you are agreeing with Fox News. Exposure to content and information is one of those things that was very difficult to measure in the past. Now we have more fine, granular data to measure exposure, but it’s just that is, as just that, it’s just exposure to content.

Of course, the second part of the equation refers to the effects of that exposure, right? And so I think sometimes in the public discussions, we conflate a lot of things, right? We may be living in a very polarized society, but don’t blame digital technologies. What we see when we look at how people use those technologies to gain access to information is that their media diet are actually pretty rich and diverse. Now, of course, you know, that’s not the only reason why we might have fragmentation.

One of the beautiful, but also frustrating things of research is that it is very specific to very specific questions, and then you can extrapolate only so much from that research, right? Reality is very complex, there’s many moving parts. The conclusions that we draw from our analysis refer only to exposure to political content. It is not true that people get exposed only to a very narrow set of sources that they might agree with already, right? The number of sources you get exposed to amplify widens up over time part, especially due to mobile technologies. Now what the consequences of that are, it’s a second question. And there might be another paper following up on that first paper where we consider that question.

Noshir Contractor: So to summarize, what I’m hearing you say is that we can’t blame the web, for not giving us wider exposure. What we do with the wider exposure is different in terms of polarization and creating echo chambers.

Sandra González-Bailón: Yes. And of course, again, the web is one network, one layer in a very complex media environment we inhabit these days. What happens within Facebook is a different world. Again, Facebook is a walled garden, maybe in Facebook, we have this phenomenon of us getting trapped into echo chambers and ideological bubbles. Twitter is another layer, right? What happens on YouTube? These are kind of pockets outside of the web, very prominent pockets of activity. And maybe there, the answers would be different, right? What we analyze in that particular paper refers only to web activity, to what happens in these in this vast public open marketplace of idea called the web, right? Apps are a different world.

Noshir Contractor: One of the things that I also want to talk about is an even more recent publication of yours again, in the Proceedings of the National Academy of Sciences, focusing on the role of bots, versus verified accounts, in terms of dealing with contentious political events, can you tell us a little bit about what you found in that study?

Sandra González-Bailón: Yeah, so that study was also motivated by journalistic accounts of how much influence bots or these automated accounts that are engineered to meddle with organic communication in social media? Of course, the very first question you have to answer is, how are you going to identify robots, right? And so we capitalize on developments in automated classifiers, that, you know, that use a number of features to predict whether an account is bought or not. And then we also look at the verified feature that Twitter itself uses to identify accounts of public interest.

And so we come up with three categories of Twitter accounts, we called the media accounts, and these are automated. So these are accounts that our classifier suggests are automated, but that have also been verified by the platform. Legitimate news sources, oftentimes use bots to push notifications on your feed in a systematic fashion, right. And then we have the bots, which are accounts that our classifier suggests, are automated, meaning non-human, but that are also not verified by the platform. And then we have the rest, which are what we call the human accounts.

One finding which is consistent with what prior research suggests there is a huge volume around the events that we analyze, comes out of these automated accounts. But we also find that the verified media accounts are more central in the networks of information flow, meaning they were the reference points during these events. So people going on Twitter to find information about the protests that we analyze, tend to pay more attention to these very verified accounts and amplify those verified accounts more often.

Noshir Contractor: So the two movements that you studied, the one was the Yellow Vest Movement in France in 2018. And then the act of civil disobedience in the Catalan referendum in 2017. Tell us a little bit about why you chose these two events and examples of where a bot was more or less central than verified accounts in these two movements that you studied.

Sandra González-Bailón: The analysis that we run, really paying attention on the overall patterns, the aggregated patterns, right? And then this is partly to minimize cherry picking episode, where the one account that turned out to not be a verified account, got so many retweets. So we really look at the overall patterns of visibility.

And the reason why we decided to focus on these two events was, partly because we got a lot of press coverage, suggesting that these manipulated accounts were exacerbating conflict. And when you have episodes where people are protesting on the street, it’s highly volatile. And so if bots had that kind of influence, that would be something that we should know. It’s dangerous, right? Like they could really take things for the worst.

The other reason, and that was more in terms of research, is a lot of the work that is done in this area focuses attention only on the US and, or the Anglo Saxon world. That means the kind of knowledge that you can gain of how generalizable these dynamics are gets restricted, but also in terms of the methods and the tools that we have at our disposal, right, one of the things we do in the analysis is look at the sentiment or kind of the emotional content of these tweets. And many of these tools are designed for English only. So what we did in the paper is adapt one of these tools that allow us to extract sentiment. We adapted to Spanish, Catalan and French. And I think there is value in doing that.

Noshir Contractor: So one of the things that I’m noticing, more broadly across these two papers that we’ve been discussing, is that in both of these cases, your findings are suggesting that the general sentiment of blaming the web is not always well-founded. Is that a sentiment that you see broadly or just happens to be a coincidence based on the two papers that we’ve been discussing here? And what are the implications of your sense about the role of technology in these two cases,

Sandra González-Bailón: You echo the right spirit of those papers. I think there’s a lot of things we could blame social media companies or technology companies for, but we have to blame them for the right things, not for what we think they’re doing wrong. So I’m hardcore believer in evidence-based decision-making. So let’s pay attention to the evidence, so that we can think about how to redesign these platforms, to make them do what we as a society want them to do. That discussion has to be based on the best available evidence and technology companies need to be held accountable, because they do have an impact in the democratic and the democratic process. But those discussions should be based on the best available evidence and not on headlines in newspapers.

Noshir Contractor: So I’m going to put you on the spot, Sandra, given that you are a scientist, this area, and you pointed out that you would like to be able to influence change in media platforms? Do you have a pet suggestion that you think based on your own research, you would like to see media companies do?

Sandra González-Bailón: Yeah, I think they do have a role to play in improving the quality of the information environment that we inhabit. They are not the only ones, right? Some of the old players have a responsibility to and they don’t always do what they should be doing. And I do think that these companies have changed some important parameters. Algorithms for example, the way they reinforce certain patterns, or the way they float certain information, that’s a new parameter. And we don’t fully understand how those algorithms are shaping everything, including the democratic process. This is not to say that we need to get rid of algorithms, but we have to understand the impact that those algorithms have.

These companies often operate on the basis of business models that prioritize certain parameters, that get optimized in these algorithms. And those parameters are not necessarily the quality of the online conversations or the quality of our democracy. I do believe that these platforms are not doing their best in facilitating or encouraging the sort of healthy conversations that healthy democracies require. And we can do better in that respect.

Noshir Contractor: And so the algorithms are optimizing for things, but not necessarily the things that are good for society, it might be optimizing for things that might be serving the interests of the platforms and a business model, as opposed to democracy, for example.

Sandra González-Bailón: Absolutely. And again, if we were to get together and start having a conversation about what’s good for democracy, I’m sure we wouldn’t agree, right? I think it’s also unfair that sometimes we post demands on these companies that we as a society don’t have answers for. I do believe that we can settle on a common ground that has enough agreement to facilitate that kind of answer. And then once we have that answer, we can work hopefully together to make sure that we guarantee that we optimize for those parameters.

Noshir Contractor: I want to zoom out a little bit, Sandra, and ask you, where do you see the most important research that needs to be done in terms of the web, moving forward, either by yourself, or by the web science community? What are the big questions that you see, we need to be addressing

Sandra González-Bailón: So one question relates to the impact of these technologies and democracy in general. And I think there’s been a lot of emphasis on data sharing, sort of forcing many of these companies to offer data. That’s important, but I think more important than having access to the data, is defining the questions. It’s very difficult to come up with a data dump, it’s got to every question that any researcher may have.

One of the big challenges for us and one of the priorities we should all be focusing on is, what are the main questions that these companies, but also the research community should be working on to answer? And what are the kind of data infrastructure and research infrastructures that we need to be able to answer those questions, because this requires collaboration and creating bridges across labs and teamwork. And I’m not sure that academic institutions are designed to encourage that kind of teamwork. I think that’s one of the big challenges we face as well. We have to facilitate that kind of collaboration. And I think that’s where we should be putting all our energies on.

Noshir Contractor: I want to thank you so much, Sandra, for joining us today. You’ve persuaded me about the lack of determinism in our models being a good sign for society. And you’ve also talked us through two examples in your own recent research, where people tend to blame technology for certain kinds of phenomena, whether it’s contentious political events and bots or whether it’s increasing or decreasing polarization. And your research has convinced us that it’s one part of the puzzle ,but blaming technology is not in and of itself, valid, in the case of at least these two studies that you’ve done. So thank you again so much for joining us, Sandra. And I look forward to seeing your continued research in these areas.

Sandra González-Bailón: Thank you, Nosh, and I look forward to meeting you in-person again soon.