Episode 25 Transcript

Nigel Shadbolt: The reason the web was taking off a scale, the reason we have these extraordinary constructs emerging, like the blogosphere, was that human beings were involved — human beings who were incentivized to participate, to share and join information together.

Noshir Contractor: Welcome to this very special 25th episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web.

My guest today is Professor Sir Nigel Shadbolt, one of the founders of web science. You just heard him talk about why studying the web goes beyond the technical. Nigel is Principal of Jesus College and professorial Research Fellow in Computer Science at the University of Oxford. In 2009, he was appointed, along with Sir Tim Berners Lee, as information adviser to the UK Government. This work led to the release of many 1000s of public sector data sets as open data. He is the chairman and cofounder of the Open Data Institute, and a founder and chief technology officer of the ID protection company, Garlic. He is a fellow of both the Royal Academy of Engineering and the British Computer Society, and was knighted in 2013 for services to science and engineering. Nigel has researched and published on topics ranging from cognitive psychology to computational neuroscience, and the Semantic Web. Welcome, Nigel.

Nigel Shadbolt: Thanks, it’s great to be here.

Noshir Contractor: Take us back to what prompted you and your colleagues to come up with the idea of taking what was then a relatively young web, and recognizing the importance of creating a discipline called web science. 

Nigel Shadbolt: I’d began my career a long time ago, my PhD was in artificial intelligence University of Edinburgh in the 1980s. And then I’d spent 15 years building an AI group within a pond of psychology. I’ve really found that extraordinarily enriching, you know, to understand the basis of human cognition to understand, if you’d like, the basis of the existence, proof of intelligent systems. And toward the end of my time, at Nottingham, I got a series of PhD students who had been looking at this new explosive area of the web. And the web appeared as this extraordinary construct that suddenly brought data at scale together. So that was a turn for me. 

And when I moved down to Southampton and joined Wendy Halls group, we were very much seeing the web as a decentralized data asset, as well as an extraordinary concept for combining human ingenuity. We can come to that later. 

That project that we worked on together, through Directed Word, was called advanced knowledge technologie — the act project — that bought leading universities in the UK together to look at how we can exploit this emerging, construct — the web, and tools from knowledge engineering and elsewhere, machine learning, as it was then, to try and understand data and knowledge at scale. And that brought me into contact with Tim Berners Lee. It was our interest around the Semantic Web that really brought us together but we, in contact with people like Jim Hendler, who I’d known earlier, because he also had prior history in artificial intelligence, and Danny Weitzner. So we were sat there sharing ideas around the Semantic Web the challenges therein. and the more we got into that, the more we felt there was an itch that needed to be scratched, which was all around this idea that, too often the challenge became reduced to one simply have technical architectures, where of course, in fact, the reason the web was taking off a scale, the reason we have these extraordinary constructs emerging, like the blogosphere, was that human beings were involved — human beings who were incentivized to participate, to share and join information together. And as we shared our experiences — Danny with a background in law, Jim with a background in AI and cognitive science, somewhat like myself, Tim and Wendy, we realized that there were all sorts of aspects of what we were trying to understand in the web that would never be solved, by simply appealing to the technical standards of web servers or web browsers. 

So this, this immediately suggested, a wider interdisciplinary need. And we had always been interested in convening larger groups to discuss these wider issues of the impact of the web. And we struggled for some time to think about whether this needed to be convened at all, or would it just simply take care of itself. 

That is certainly the case that we were aware that there were cognate disciplines. But the unique phenomena that the web presented us with was, for the first time, structures at scale that demanded to be explained and understood in their own terms. And we started with examples like the emergence of search engines at scale, like Google, the emergence of the blogosphere, the emergence of the beginnings of those social media platforms, the emergence of large collaborative activities like e-science, and I think we sat down and worked out that we wanted to persuade people that there were scientific questions that sat at the center of this intersection of methods that demanded their own singular attention.

Noshir Contractor: One of the things that you touched on briefly was the Semantic Web. For those who may not be familiar with that phrase, how does the Semantic Web distinguish itself from the web itself? We know the web as a set of web pages in its most primitive form that link to one another. But Semantic Web goes beyond that.

Nigel Shadbolt: It originates out of this really interesting idea, that if you could take some key ideas that were around in artificial intelligence and knowledge representation at the time, and distribute it at scale in the web, there was this notion that a little semantics went a long way. 

Now, what did that mean? It meant that, for the first time with the web protocols, we had ways of persistently pointing to objects of interest in the web, either concepts or relations, things in the world, things in the cyberspace, we can argue about what those objects were. But they could be referenced, they could be dereferenced, with a URL, you click on a link, you get something back. So how could you think of the web as a linked graph of connected structure and content? We’re very used to thinking in those terms now, but back then,we didn’t have this way of thinking. So one of the early efforts was to generate a semantic markup language that went beyond, for example, what people understood at the time in HTML. And so the idea was to develop languages or ontologies that could be machine processed to describe the content, the semantics, the meaning of the content on the web.

Noshir Contractor: So can you give us a use case example — If a page has semantic markup, what could it do differently or more effectively, than what a page that simply has HTML?

Nigel Shadbolt: The Semantic Web of the early 2000s was a really rich place. So it wasn’t widely distributed enough. But there were, for example, ways of linking to academic texts. In fact, we again, we see the legacy in the way that bibliographic content is linked and threaded nowadays — there are controlled vocabularies for publications, there are controlled vocabularies for certain sorts of work we do, we, as researchers have our own identifiers, that describes something about the world we inhabit. And the vision was to try to do that much more at scale. So you know, and these experiments, these deployments still exist.

I would treat the web as a kind of distributed database. And I could send queries out to the web to find and collect information about all the conferences, the academic conferences in a particular subdomain, who were the key speakers. And that could be interrogated directly off the markup languages and the databases representing the markup languages in those pages.

Noshir Contractor: All of a sudden, I feel that what I’m able to get from the web today pales in comparison to what you’re describing, we could be getting from the web if you’re using these kinds of semantic mark-up languages.

Nigel Shadbolt: (Laughs). I think many of us wondered, yeah, if it would have become widely distributed —  and it’s about network effects — you could get really powerful affordances. And for a while sites that were originally marked up using semantic web standards were extremely successful. The BBC, for example, ran its Olympics using this markup format, it ran its natural history programs, with a whole set of semantic web annotations that allowed you to literally query the content behind the web pages behind their great natural history programs.

I mean, some people think that there are important elements that have persisted. But the full blown inference scale across the web. I think one of the things that got in the way is that the perfect was sometimes the enemy of the good and the standards that were being promoted spent far too much time worrying about detailed niceties.The original web succeeded, because in a way it allowed things to be a bit scruffy around the edges, you know, there’s that great phrase to let the web scale, let the links fail. Pragmatism is always an important feature, I think, in understanding the different forces at work on a web at scale.

Noshir Contractor: You’ve just talked about one example of a challenge. Where in general, in the field of web science, have you seen progress? And where do you see continuing challenges for the next decade of websites? 

Nigel Shadbolt: So I think one extraordinarily powerful area (is) the whole understanding of the network structure and topology of the web. And I remember in the original article, an article that Tim and I published in Scientific American, and then again, when we published in the ACM communications,  we knew a use case would be understanding how to extract insight from the web graph. I think that’s been a tremendously powerful success. I would also say that the push for a certain sort of openness around the underpinning data that was the resource the one of the key elements of the Semantic Web. It was no accident, in a way that a number of us who were involved in that Semantic Web effort also became involved in the open data movement, because the key to success at scale has been open resources that everybody can exploit. And the greatest example of that in the earliest of the web was effectively the Google phenomenon, you know, Google became the extraordinary organization, it is, off the back of open data, crowd-sourced effort, you know, humans making links that expressed their interest and relevance in content. 

Noshir Contractor: You spoke about this example of how humans were crowdsourcing the development of links across the web. That is one of the early examples I imagine, of what you have written about in terms of social machines. Tell us what social machines mean to you and what you have been doing and what you’ve learned from that in terms of the web?

Nigel Shadbolt: When we launched the web science initiative, back in 2006, when we were kind of thinking about that, as an enterprise. We were very aware the confluence of challenges, the deep synergies that existed between disciplines was something that we needed to understand. But of course, it was always understood that the web worked because it connected people at scale. And people are extraordinary information processers. There, of course, we have all the richness of our own humanity connected at scale. And when Tim wrote his book, Weaving the Web, he made a reference in that book to the concept of the social machine as being a world in which the machines did all the kind of routine boring stuff and that allowed humans to flourish. The truth, of course, decades later is somewhat more complex. Some people worry that it’s the people being given the boring tasks to do — why aren’t those things fully automated by our machines. 

But what we do see in a social machine is this intermix of data assets, linkage, algorithms at scale on the web, and human cognitive capacity and that interpenetration of machines, and human problem-solving at scale defines a social machine. Now some of them are realized very simply. So the social media platforms, which link people together — and largely, it’s the linking and sharing of experience and moments that define them — when they began had very little in terms of fundamental processing of the content of those interactions. As time has gone on, the amount of inference can be drawn over our interactions, the amount of additional services that can be woven into a web at scale, from query answering to, speech recognition, through to photo recognition, there is so much now that machines are doing to organize and manage our own information that the social machine construct is very helpful. 

It reminds us that really quite complex phenomena are made up of components of quite simple interactions, you know, likes and preferences, linkages, assemblages, aggregations. and in what we’ve done in the past, me and my team and others, in defining how to provide a classification — a taxonomy of different kinds of social machines, how to understand their characteristics — we see a spectrum from highly routine automated forms, various forms of citizen science will count through to much more effectively, creative exploratory tools.

Noshir Contractor: What would be one of your favorite social machines that most people might not have heard of?

Nigel Shadbolt: Well, I don’t know that wouldn’t have heard of, but one that I admired from the outset was a particular crowdsource platform, Galaxy Zoo. These were astronomers, who didn’t have enough fundamental research funds to spend on software engineers to build them automatic classification and recognition software. And they had all of this data coming in from the sky surveys, endless numbers of pictures of nebula and stars, etc. and not enough machine processing to classify it. 

And what they did in that work was provide a platform that allowed people to participate and train them and induce them to be able to classify objects of interest. Human recognition and visual system is extremely powerful at categorizing and recognizing subtle distinctions. And still very often best in class at recognizing differences in equivalences. 

And so we had millions of images being processed by hundreds of 1000s of volunteers, who ripped through this and began to make actually individual discoveries as well. Famously, participants in this citizen science effort are featured as authors on scientific papers of newly discovered astronomical phenomena. And that’s a lovely example where, again, what started out as a necessity for the scientists became a valuable resource in and of itself.

It introduced a wide community to the challenges of astronomy, it solved the problem for them. And along the way, they learned something really interesting, which is: allow the people to participate. Because originally they had allowed a great deal of in exchange or chat between users of the platform. Through time they realized that the side channel conversations, some of these volunteers were becoming experts in their own right at forms of exotic phenomena that the astronomers were either too busy or hadn’t noticed themselves.

Noshir Contractor: Which brings us to some of your more recent research, you made reference to the fact that social machines started out as computers doing the boring part and allowing the humans to focus on more the creative part. And you immediately gave a caveat that some people think that that might have flipped. The new work that you’ve been doing, which is called human-centric AI. To what extent are you concerned that it will stay human- centric?

Nigel Shadbolt: If we go back to our concerns, right back, when we began the website effort, it was very much from the outset about recognizing the intrinsic value and worth of the human element, you know, in all of this. And in an age of a resurgence of AI and algorithmic decisionmaking, new powerful methods being deployed, the concern is, do we retain and confer the values that matter? 

We want, essentially, to imagine building systems which augment us and don’t oppress us. And I think that’s why you see what some people call the renaissance of ethics in scientific areas where the concern is the maintenance of human values, and certainly AI ethics has huge amount of attendant work around it. 

And in my group, that materializes as concerns around choice — do we as individual consumers and citizens actually have effective choice when it comes to how our data is used? How our data is actually analyzed and aggregated? How can I effectively opt out? How can I exert more self determination? That’d just be one example. The second would be, do we think hard enough about age appropriate design about how as humans develop, grow up, their sensibilities, the ability for agency changes, and persuasive design methods, which is all about clever software engineers working out how to put the sweet spot to get the kid to click, we got to think hard about the pros and cons of all of that.

Noshir Contractor: You know, one of the things that has changed since you first began to study web science, is the ubiquity of data. Initially, people were using it on desktops and creating a certain small category of websites and so on. How has the ubiquity of data and its impact on the use of AI changed the way we think of the web and web science?

Nigel Shadbolt: I think it’s a fundamental change. I think you’re absolutely right. In a way, it was the game changer for AI. I think AI, as a field, got the web quite late. We were busy building knowledge base systems, we were busy very much with this kind of desktop or best server base model of our knowledge assets. And then suddenly, connectivity was out the box. 

And it used to be a very significant effort to arrange and integrate your data assets together. And data curation was a huge challenge– suddenly, we have billions of pages of English text to analyze, billions of images, so on. So that’s been a game changer. 

Of course. It’s introduced new classes of concern, which is at scale, modern algorithms are extremely data-hungry. And do we know enough about the characteristics of data to understand that the outputs – the classifiers, the decision makers — are giving results that are representative? Well, they may be represented with the data that’s been collected. But is that data even though it’s at large scale representative, the problem you want to solve? And so I think we’ve become much more aware in data science of the need for understanding the qualities and characteristics of good and effective datasets for training. 

There’s a considerable concern around now that the data assets themselves, how can we guarantee that they have not in some form or other been tampered with? How can we authenticate them. And my work with the Open Data Institute is very much now around things like data assurance, we talk about data institutions, new ways of putting governance structures. And again, it’s not just the technical; we need technical architects to deliver the web at scale. But we also need institutional architectures to make sure that data is held and governed ethically and responsibly.

Noshir Contractor: Nigel, you were there at the very onset of both the web and web science, certainly. Where do you see the field of web science going? Unlike some other fields, the early stages of web science were nurtured by the Web Science Trust, by the Web Science Research Institute before that, and that set up a trajectory that was somewhat different from perhaps the launch of other disciplines. How do you see that has shaped web science? And where do you see it going now?

Nigel Shadbolt: At the time, we felt we wanted to use the convening power we had to draw attention to urgent research questions. In some sense, the questions were sufficiently urgent that they were going to get attended to. We thought it was important that there was a framework and in fact, in trying to work out how we should be as broad-minded as possible when it came to methods and methodology and techniques, we spent quite a lot of time convening groups from other disciplines together, we spent a lot of time imagining what curricular could be that weren’t just about network science, for example, they went broader than that. 

So the question is, how do you stop it about being everything? You know, how do you provide a practical, pragmatic solution? And I think that the problems that we we sought to understand are still problems that we are seeking to understand. We have a better understanding, but I wouldn’t say we have perfect understanding. I remember that particular meeting, we tried to imagine what the grand challenges for our field were and with the ubiquity of data and the power of computing, and the developments in cognate disciplines, and just the sheer amount of development that have been in network based analysis and graph based databases, for example, knowledge graph work, has provided for a real acceleration of work in that field. And I think we could take a number of areas and say, that similarly, was a topic that web science  called out, people were contributing to it, and it has succeeded through time, developed and matured. It doesn’t worry me too much that people necessarily say, “I want to put the label web science on this.” We often see in the development of subjects that field labels come and go. And indeed, what remains are the questions and the methods that have been put in place. 

And I think what we would still argue is fundamentally important to web science is to be inclusive and admit and embrace diversity of disciplinary work. For me, the danger signs are always when people begin to patrol the boundaries of their discipline in a way that becomes exclusionary.

Noshir Contractor: Well, Nigel, you’ve been an incredible champion of this interdisciplinary work. And I suppose it comes somewhat easy to you, given that you yourself have had an interdisciplinary background, you’ve been interested in computer science in AI and philosophy and psychology. And so it makes sense, Nigel, that someone with your own interdisciplinary background would be championing for exactly that in the field of web science. And we’ve all been the beneficiaries of that. So thank you, again, Nigel, for what you’ve done to help advance web science. And certainly thank you very much for joining us today to share some of your ideas and your concerns.

Nigel Shadbolt: It’s been a great pleasure, Noshir — very, very good to talk with you.

Episode 23 Transcript

Rory Cellan-Jones: In the early stages, quite inexperienced and solo-bedroom developers, they were called, could make a big impact. And Edward Bentley, age 16 was one of them. He was this friend of my son’s, lived about a mile away. He developed this game, put it on the App Store. And one evening, the phone rang at the family home, and his father got a phone call from Apple on the West Coast saying, “Mr. Bentley, your app is being made App of the Week. And you’re going to need to open a bank account here for all the 1000s of dollars that you’re going to earn.” And he was mystified. Turned out his son had put his dad’s name against this because he was too young to be officially the owner of the app.

Noshir Contractor: Welcome to this episode of Untangling the web, a podcast of the Web Science Trust. I am Noshir Contractor and I’ll be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web.

You just heard from my guest today, Rory Cellan-Jones, talking about how the introduction of app stores to smart phones produced an enormous amount of creativity on the web and cemented the social smartphone era. Rory has been a reporter for the BBC for 40 years, covering Business and Technology stories for much of that time. At the beginning of 2007, he was appointed technology correspondent to expand BBC coverage of the impact of the internet on business and society. His first big story was the unveiling of the iPhone by Steve Jobs, something that we will be talking about today. He now covers technology for television, radio and the BBC website. And in 2014, he began presenting a new weekly program, Tech Tent, on the BBC World Service, a personal favorite of mine. He’s just published a new book, titled “Always On: Hope and Fear in the Social Smartphone Era,” And he spoke about that book at the recent ACM web science conference. Welcome Rory. 

Rory Cellan-Jones: Good to be here.

Noshir Contractor: I want to start again by thanking you for all the incredible coverage and storytelling and weaving that you have done over the years as a technology correspondent. And most recently, in the book that I have enjoyed reading, titled ‘Always On: hope and fear in the social smartphone era.” I want to start by trying to punctuate how you define the social smartphone era.

Rory Cellan-Jones: Well, my job as technology correspondent at the BBC started, as you say, in January 2007, and the first big story I covered was Steve Jobs unveiling the iPhone in San Francisco, which was an extraordinary event in extraordinary performance by a brilliant, charismatic and very difficult man. 

I made a big bet, really on that event. I responded to complaints to the BBC, that we were plugging a product on our Nightly News program with my story, by saying, ‘Well, I think this could end up being a Henry Ford Model T Ford moment,’ which I thought at the time, maybe I went over the top there. But I think it’s proved to be correct. It was the moment that smartphones really became mainstream from then on. There had been smartphones, but they’d been clunky difficult devices. The iPhone transformed all that and brought mobile computing to the masses. But about the same time, all sorts of things were happening all together. If you think about those years. 2004, Facebook was created 2005, YouTube was created. 2006, Twitter came along, and then 2007, the iPhone. And what you had, quite quickly ,was not just these incredibly powerful devices in everybody’s pockets, but these extraordinarily powerful social networks. And my sort of thesis is those two combined to have an extraordinary impact on the way we lived.

Noshir Contractor: So back in 1991, Sir Tim Berners Lee had released into the wild, the World Wide Web. But as he said, coming up before 2007, we had the release of platforms like Facebook, and YouTube and Twitter. But the central piece, as I understand, is that all of this changed dramatically when all of these events, activities on the web now became possible in the mobile. 

Rory Cellan-Jones: Let’s think of what Sir Tim said about the web when it came along. He talked about it as a ‘read, write, web.’ So I grew up in the age of television, the great mass medium that was this sort of big box in the corner of the room, which you did not interact with. The web was supposed to be an interactive medium that you know, we were all supposed to participate in it, build it, create it, so on. And that did happen a little bit at the beginning, but not a vast amount. Don’t forget that for millions, billions of people, there was no access to the web because they didn’t own a computer. It was a fixed line experience, It was an experience largely confined to the office in the home. So the arrival of smartphones, and the connectivity they provided, was a mass democratizing force along with those social networks, and we can discuss later the negatives as well as the positives. But What that unleashed was not just a democratization of the web, but a huge wave of creativity of content being created by these extraordinary devices. 

I started in television in 1981, a very long time ago, I could no more thought of creating my own television program all on my own, than I could have thought of landing on the moon. But these devices, enabled anyone to create quite sophisticated content, which we see done today on YouTube, and so on. There are lots of problems from that. But that did begin to really bring Tim Berner Lee’s original vision to life in a very full way.

Noshir Contractor: One of the things that you chronicle in the book is that at the time there was a battle between what you call the bell heads, and the net heads, the people who came from the phone systems and those who came from the computer systems. 

Rory Cellan-Jones: Yeah. That culminated, obviously, in the arrival of the iPhone. Up until then, don’t forget the mobile phone industry had been around since the mid 80s. But it was telecoms people, it wasn’t so much software people. And from 2007 onwards, we know who the big victors in this battle were. They were Apple and Google. I mean, Google, obviously, in some ways, much more important, and the Android is on, you know, 80% of the world’s phones. So it was a triumph of software and apps over just the pure sort of telecoms engineering types.

Noshir Contractor: And alongside the fact that people are now using the mobile platform to engage with Facebook and YouTube and Twitter. One of the early applications also in the mobile space was mobile payments, which curiously got its innovation and start in Kenya.

Rory Cellan-Jones: I mean, the whole mobile payments world, what’s fascinating about it is very, it took off, yeah, really, through M-PASA, which allow people in Kenya to transfer money easily — and not between smartphones, between very basic phones, was far ahead of anything that happened, for instance, in the United States. And actually, the United States is even compared to Europe, is way behind has always been way behind in that area in the payments area. Checks, I gather is still quite big in the United States. Whereas, I’ve not written a check for years. So I think it’s all about need. There was a need in places like Kenya, which there wasn’t quite in the United States, you had, you know, obviously a reasonably sophisticated payment system in the United States. You didn’t have that in Kenya, but they managed to leapfrog to being ahead.

Noshir Contractor: Alongside mobile payments, another area that also became quote, unquote, a killer app on mobile platforms was gaming. You talk about a very simple app developed by a 16 year old Edward Bentley called the impossible game.

Rory Cellan-Jones: That’s a fun story. I mean, one of the things to remember is that although the iPhone was, of course, extraordinarily important when it came out in 2007, firstly it’s quite a primitive device. It only had 2G. And secondly, it only had the apps that Apple put on it. And actually the arrival of Apple’s App Store the following year, and then Google’s Play Store was what really cemented this revolution. 

Steve Jobs was the ultimate control freak, didn’t really like the idea of putting people putting any old software on his phone, but was persuaded eventually to open this app store. And that, you know, sparked this extraordinary wave of A. creativity and B. economic activity. And in the early stages, quite inexperienced, and solo bedroom developers, they were called, could make a big impact. And Edward Bentley, age 16 was one of them. He was this friend of my son’s, lived about a mile away. He developed this game, put it on the App Store. And one evening, the phone rang at the family home and his father got a phone call from Apple on the West Coast saying, Mr. Bentley, your app is being made App of the Week. And you’re going to need to open a bank account here for all the 1000s of dollars that you’re going to earn. And he was mystified. His son had put his dad’s name against this because he was too young to be officially the owner of the app. But of course, Mr. Bradley senior was very pleased with the money that rolled in.

Noshir Contractor: And I’m sure if it was Americans who were paying it, they were sending it by checks at that time. 

Rory Cellan-Jones: Yes, yes. (Laughs).

Noshir Contractor: Of course, in a more serious note, you also talked about the sudden surge of messages that Biz Stone, cofounder of Twitter, began to get about a country that he hadn’t heard of called Moldova.

Rory Cellan-Jones: I talk in the book, obviously, about the positives and the negatives of social media. And we were incredibly optimistic, around 2011, 2012, about the impact social media was having. Biz Stone talked to me about that was the moment that really struck him, when he was suddenly being told that his company, Twitter, was helping to sort of foment a revolt in Moldova against the authorities. But more so the Arab Spring, I mean, don’t forget that Facebook was given a lot of credit. So that was the time that social media — and obviously only made possible by smartphones was an incredibly democratizing force. That’s what we felt. back then.

Noshir Contractor: One of the things that you also point out is that smartphone gave a big boom to AI, because it all of a sudden made a lot more data available that could be used by AI.

Rory Cellan-Jones: It was a sort of two way relationship. AI helped, you know, make a lot of the things you do on a smartphone much more sophisticated. But the big breakthroughs in the last decade in AI particularly is fed by vast amounts of data. And don’t forget, one of the huge changes the smartphone is broadened is in the way photography works, in the sheer volume of pictures we’re taking. And as computers were taught to, you know, recognize, for instance, the difference between a dog and a cat, was one of the great triumphs of AI over the last decade. the sheer volume of data from all these billions of smartphones. And these people taking pictures of everything they saw was one of the things that helped fuel that advance.

Noshir Contractor: And one of the things, though, that That has now fermented is a fear of what can happen with all of these data. You did an interview with Stephen Hawking, where he famously said that AI will make humans obsolete.

Rory Cellan-Jones: Yeah, this, this was an extraordinary interview. And this was quite early in the big conversations that we’ve had about AI, it was 2014. The way I did an interview with Stephen Hawking worked, is you had to send off the questions in advance. And he would then write the right replies, and then eventually you’d record it. And I sent off half a dozen questions about this, that and the other with a final question about ‘Oh, what do you think about AI?’ And that became his extraordinary answer, which basically said, If full ai ai was developed, it would be smarter than human beings and would therefore see no, real use for us, and we would become obsolete. And it was such a an extraordinary statement, that I got very excited. And then I realized that this wouldn’t be news until he actually said it. And he got ill for a while. And it was about another six weeks before we could actually record the interview, where he pressed a button on his computer and his answer came out and rocketed around the world and helped to spark the ongoing debate we’re having about the ethics of AI.

Noshir Contractor: And in this he was joined by other comments. Elon Musk talked about AI being the biggest existential threat. Dame Wendy Hall, who you and I know from the web science community, quoted as saying that AI might evolve faster than us and we might end up being slaves of the machine. 

Rory Cellan-Jones: On the other hand, that there was a certain amount of a backlash when the Stephen Hawking interview came out,  from people who were actual practitioners who thought he was probably worrying about the wrong thing. And I think as years have gone by, we’ve all begun to think maybe we shouldn’t be worried about the kind of Terminator style vision he was painting. There are far more imminent, and immediate concerns about AI — things like bias being built in.

And there was an interesting postscript to that interview. In the book, one of the great figures in AI certainly in this country is Demis Hassabis, the founder of DeepMind, which is now owned by Google. He told me that he’d gone and seen Stephen Hawking some months after my interview.. Hassabis felt that he put Hawking’s mind at rest to some degree by explaining how far away that kind of vision of artificial general intelligence.  

Noshir Contractor: One of the things that I want to turn to was your comment and your interview with Sir Tim Berners. Lee, who claimed, of course, in this very momentous Olympic moment in July of 2012, where he sent out that message, this is for everyone and caught a lot of journalists and the general public by surprise.

Rory Cellan-Jones: That was particularly the American, the NBC commentators, who said ‘Who is this guy? And someone said “Maybe you should google him.” And of course, the point is that Google would not really happen, but for Tim. 

I put that point as the high point of optimism about this area. I was actually there in the Olympic Stadium in London when that opening ceremony was happening, it was an amazing evening. And we did feel incredibly positive about all these developments about the web, about mobile economy activity, about social media. And I’ve interviewed Tim Berners Lee over the years, and in the last two or three years, his mood has darkened so much about what has been done with his creation. And he told me that what really woken him up was the Cambridge Analytica affair, and the way that he saw the web being used for malign uses of persecution of minorities, for manipulation of elections, and so on. And he said for years, he hadn’t worried when people said, there’s all sorts of bad things on the web. He said, I mixed with the people that I want to talk to on the web, they’re all really interesting. I just don’t interact with those people doing that bad stuff. And then he said, he came to realize those people doing the bad stuff on the web, as he put it, these people vote. In other words, they can determine the future of my country and other countries. And therefore, I need to worry if they are being manipulated in malign ways.

Noshir Contractor: Given that the smartphone has created such incredible opportunities for surveillance, we now have to think about ways in which we are being made aware and can have full control on who is surveilling us, and how we are being surveilled and you spoke with Jane Chapelle, co founder of digital shadows, a company that is trying to help us in this enterprise. And he is quoted saying something we’ve heard many times, if you’re not paying for something, you are the product. 

Rory Cellan-Jones: Yeah, I took my phone to him just so that he could explain what was inside it and how it was tracking me. He took me through just how many different sensors there are in a modern smartphone, and how many radio different radio systems I think he counted about 5  providing this huge flow of data, this data flow, which we are providing, until recently, very unconsciously, to advertisers. And of course, it’s advertising money that fuels the modern web, and is the source of the huge power that companies like Facebook and Google have. That for instance, every time you sign up to an app, you are signing up very often to being tracked wherever you go on the web. That’s why that pair of shoes that you happen to look at yesterday, keeps falling around on the web. And we have begun to have the debate about whether we’re comfortable with that. And it’s a difficult debate because we get something from it. ie we get free services, Google, Facebook, and whatever are free to us. But in return, we we are consenting to being tracked. And of course the last few months, Apple has brought in this new system whereby you’re asked if you want to be tracked, and of course that that is changing the balance of power. On the internet, there’s a debate to be had about whether Apple’s motives are pure, as pure, as it says, because it’s not really in the advertising business. But in any case, we are beginning to have that debate, because the other thing that’s been very prominent, just in recent weeks, is the use of these devices for surveillance by governments and cyber criminals, we’ve seen that story with the Israeli company providing software, which can effectively turn your your phone on your iPhone on, turn the camera on, turn the microphone on and, and to spy on you, incredibly effectively. And that’s obviously a big challenge for the mobile phone industry and a big, big concern for all of us.

Noshir Contractor: That is indeed a very scary story. On the potentially positive aspects of monitoring, a lot of the ways in which the smartphone also has the ability to monitor us is in terms of health related issues. 

Rory Cellan-Jones: Health tech has become a real interest for me. And it’s been a global interest, obviously, in the last year during the pandemic, what role could smartphones play. For instance, there have been a number of contact tracing apps developed. But for me, personally, the interest has been that I was diagnosed with Parkinson’s a couple of years ago. And so I’ve taken an interest in what the technology can do to help me, and there’s quite a lot of work going on. It’s more in monitoring, rather than treating the condition at the moment, because the condition is quite difficult to monitor. If you’re a patient like me, you see your specialist once every four, six months, and they say how’s it gone? And you think, ‘kind of okay, too hard to tell, really.’ But I’ve been taking part in a trial, uses sensors. The hope is to develop a smartwatch, or more sophisticated version of the Apple Watch, perhaps, that would measure your symptoms on an ongoing basis. And for instance, would be able to tell whether an hour after you taking a pill, which I take, you know, four times a day your symptoms are approved or not. So there’s a lot of work going on in that area and health that generally is a huge, exciting and potentially life-changing area.

Noshir Contractor: And of course, talking about Parkinson’s disease and the fact that you’ve been so open about it, you narrate in the book the story about your conversation with then producer Priya Patel, who first sort of prompted you and into being able to put a tweet out about this to share this news with the world.

Rory Cellan-Jones: I was doing a live broadcast about 5g. one day on a breakfast television, I’d been diagnosed some time earlier — I was talking about it. And my hand was shaking quite violently. I didn’t realize at the time. But then this great producer said to me, have you ever thought about going public, because that was pretty obvious. And I said, “Yeah.” And I just sent out a tweet. And it showed the positive side of social media, because within milliseconds, it felt like I was getting huge amounts of response, and you know, very warm and helpful and positive responses. That was great. There was one person out of 1000s, who said, Oh, you’ve been standing too close to a 5g mask. That’s why you got Parkinson’s,’ but I was able to ignore that.

Noshir Contractor: Unfortunately, the world was not able to ignore that during the pandemic, where once again, the 5g virus theory was raised especially in the UK.

Rory Cellan-Jones: I’ve spent a lot of the last year covering that. And it’s ne of the things that makes me actually quite angry is how much nonsense there is talked generally about technology in that area. And ridiculous rumors about some connection with the virus. I mean, there’s a spectrum there are people who, you know, justifiably, I suppose, have concerns in general about the impact of mobile technology, and whether it’s causing them harm. I don’t believe it is causing them harm, but a lot of them genuinely do. But then there’s the spectrum, which goes way over into these wild conspiracy theories that say, it was only because 5g was switched on in Wu Han that the virus started or that it’s making people more vulnerable to the virus. And as a non-scientist, I’m very humble in front of science. So I trust the science. And I listened to the majority scientific opinion, just as I do on climate change. And the majority scientific opinion, says this technology is not harmful.

Noshir Contractor: On a potentially more hopeful note, you discuss what the pandemic would look like, instead of being COVID-19. It was COVID-05.

Rory Cellan-Jones: I think of how I’ve got through this pandemic, which has obviously been difficult, and many, many millions of others have got through it. And if this had happened in 2005, it would have been just about impossible for me to carry on working from home. I use smartphones and great fast connectivity, which I didn’t have in 2005. To do my work, it would have been very difficult to, to shop as effectively from home a lot. It’s not just that there’s better connectivity. It’s that the arrival of them, the mobile phone, kind of supercharged the online economy, made things like for instance, home delivery of food, made them more economic, gave them scale. So without those kinds of facilities, those sort of services that have come in the smartphone era, it would have been pretty challenging.

Noshir Contractor: There is a sweet irony though, that even though we talk about the pandemic as being in lockdown, we are still seeing the value of the mobile phone. Because even while we are locked down, a lot of the services that we rely on are indeed mobile services, or mobile-enabled services.

Rory Cellan-Jones: : That’s a very good point. Yeah, I mean, all of those delivery services, all of those careers, they’re all powered by apps. It’s the app economy that has really come to our age during the pandemic.

Noshir Contractor: Thank you, again, so much for taking us through the issues that I had with hopes and fears, has helped us navigate the web and leverage the web in ways that we could not have imagined, or the launch of the smartphone. I also want to thank you for all your incredible coverage of making all the technological progress accessible to people around the world. As I said, I’m personally a big fan of the show on the BBC World Service. And I certainly recommend your book to anyone who wants to get caught up quickly on the history, both in terms of hope and fears of the smartphone. Thank you again, for joining us today. 

Rory Cellan-Jones: Thank you. It’s been a lot of fun.

Episode 22 Transcript

Pablo Boczkowski: When I think of how my daughters access the world of information, how, for instance, they do homework, with three screens not to, so they have the computer screen for work, they have their phone next to the computer screen where they are monitoring Snapchat, and they have the TV, where they’re binge watching their favorite show, and all of that at the same time. So in order to understand their world, and how much their effect and their sociality really resides on the screen,

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the web science trust. I am Noshir Contractor and I will be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society and how society in turn is shaping the web. 

Our guest today is Pablo Boczkowski. You just heard him talking about how his experience as a parent impacts how he thinks about his own research. Pablo is Hamad Bin Khalifa Al-Thani Professor in the Department of Communication Studies at Northwestern University, as well as the founder and director of the Center for Latinx Digital Media where he hosts his very own podcast titled El Café Latinx. He’s also the cofounder and the co-director of the Center for the Study of Media and Society in Argentina, and has been a senior research fellow at the Weizenbaum Institute for the Networked Society in Berlin, Germany. He’s the author of six books, four edited volumes and over 40 journal articles. Three of his books are being published in 2021 — Abundance: On the Experience of Living in a World of Information Plenty, published by Oxford University Press The Digital Environment: How We Live, Learn, Work. Play and Socialize Now with Eugenia Mitchelstein at MIT Press and The Journalism Manifesto with Barbie Zelizer and Chris Anderson, published by Polity. His work was featured at a Meet the Authors session at the 2021 ACM Web Science conference. Welcome, Pablo.

Pablo Boczkowski: Thank you very much, Noshir. It’s a pleasure to be here. Thank you for having me.

Noshir Contractor: You’ve had an incredibly productive 2021. And I’m not even sure which of these three books to start with. But let’s start by talking about the one that I know has gotten a lot of attention. And it’s the book titled Abundance. Tell us a little bit about what got you interested in this particular title, and why you chose to title the book abundance.

Pablo Boczkowski: Well, Abundance is a book that came out in May of this year, but it was in the works since March of 2016. Abundance draws from two major sources of information. The most important one is 21 months of fieldwork in Argentina Muslim one Osiris and the suburbs boroughs in several provinces, amounting to 158 interviews. And about a third into the field research we conducted an in-person survey with a national representative sample of people to get a better sense of the land, some of the larger structural issues. 

The research project started generally as an exploration of the interrelationships between the consumption of news entertainment and digital technology, in particular mobile devices and social media platforms. And in December 2017, we ended the fieldwork. About six months after that I was in Buenos Aires. I was traveling with my eldest daughter, and we are walking down this avenue, which is the main artery in Buenos Aires — a very popular city, about 4 million people. And I saw an image that really stuck with me. It was an image that is sadly quite familiar in many large metropolises around the world. There were two people ostensibly living on the street, this was maybe 7 or 8 p.m, so it was dark already; it was winter time. They were sitting next to each other on a couple of worn out chairs. They were surrounded by cardboard boxes turned upside down as if they were summer demarcating their semi-private space, you know, within the public space. They were facing the street with their possessions tucked away between their backs and the walls of a building. And they had sort of an improvised dinner table, that essentially was as far as I could see a large cardboard box turned upside down. They were having dinner, they were eating from a plastic container. And they had a can of coke next to it.

And all of this was very familiar sadly. What really caught my attention was that one of them was holding a mobile phone. Right? That they were both looking at, right while they were eating. So it was that tiny bit of light emanating from the phone. And it was a little bit, if you wish, a popularized 21st century version of people eating in front of the TV as the iconic media moment of the 1960s right. 

So the reason why I mentioned the story is I have been wrestling with many themes from the field work and many findings from the survey. But what that image did for me was it may coalesce what ended up being the main topic of the book, which was the contrast between even in an extreme situation of material scarcity they were connected to a world of abundant information. 

Noshir Contractor: Well the story is an extremely powerful and evocative way of capturing the central thesis of your book, in terms of seeing the separation between the simultaneous material scarcity, coexisting with the abundance of information in digital environment. A lot of the time what you’re describing as abundance is sometimes equated with terms like information overload, or things like data smog, how do you distinguish what you’re doing as being perhaps more celebratory than words like information overload that make it sound more negative or pejorative?

Pablo Boczkowski: That’s an excellent question. There is a long tradition of thought dealing with this notion of information overload as the umbrella term. All the work about information overload has a few characteristics that cut across most of the scholarship. One of them is the idea that there is an optimum amount of information, and that after you reach that threshold, it sort of is an inverted U shape in which you start getting diminishing returns. It’s the idea that the information is used to make decisions about which there is a right and a wrong. The notion of information overload tends to focus on the cognitive side of the human experience, and much less on other dimensions of experience. If you look at the terabytes and terabytes and terabytes of information that are consumed today, most of this information is not consumed on a daily basis by most people to make decisions. It is consumed to entertain themselves on Netflix, to learn about others on social media and to express themselves on social media,? And there is really no optimum. So what is the optimum number of episodes of your favorite thriller that you should watch in a day of release of a new season? How many hours should you spend on social media? It’s the same question of how many hours should you spend socializing with your friends at the park? There is no real optimum there. It is situationally dependent.

And when you consume that information, you’re not only processing cognitively you’re living it emotionally and relationally,. So information overload has this discourse of deficit associated with it. I wanted to move away from that. So the notion of abundance is much more sort of agnostic with regards to valuation. The idea is that whether something is positive or negative depends on the situation and the values that people assigned to that, that there is no right or wrong answer to most of the uses of information. And that it’s not only about the cognitive, but also the emotional and the interpersonal. 

Noshir Contractor: Well, I think you just absolved a lot of people who have been feeling guilty about the amount of time they have been doing doomscrolling and bingewatching. And now all of a sudden, they’re gonna feel good about the fact that they are doing exactly what you’re calling for. And that is wallowing in the abundance of information that they receive on the web.

Pablo Boczkowski: I have a funny story. My favorite show on Netflix is Money Heist. When season number three came out, I told everybody, the day comes out, I’m not leaving my apartment. People laugh, but I didn’t feel guilty at all. 

Noshir Contractor: Basically says that you live up to your research in your own practices as well. Good for you! One of the things that is a recurring theme, both in this book, but also in other work that you’ve done is the observation that a lot of people who are looking at the study of media in general and web science, in particular, tend to focus on what is happening in the global north, and you have made a very concerted effort and a very passionate plea for being able to broaden that stage to include the global south in the case of abundance, specifically Argentina? Why Argentina? 

Pablo Boczkowski: There are three factors that I think make Argentina a very, very suitable national setting for the questions that I posed in the book. The first one has to do with the use of material scarcity. you know, when we talk global north, we talk about 14% of the global population between 13 and 14%. So it’s a minority in statistical terms, and these are countries which are much more prosperous economic conditions. And they tend to be countries with more stable political situations. Now, most of the world is not like that, the other 86% registers much less prosperous economic conditions, much more income inequality and inequality in terms of social capital access to opportunities, etc, etc. and political environments and social institutions in general that are weaker and more uncertain. So Argentina sits a little bit in the middle. It is a middle income country by World Bank standards. It has a long and very sad history of political and economic instability. During the 20th century, it had the recurrent cycle of democratic governments being interrupted by dictatorial regimes.

I mean, a lot of the discussion of information overload, etc, etc, most of the research has been done in the global north, and it assumes access to material resources, it takes that for granted. Looking at Argentina shows that you cannot really take for granted access to the wealth of information. So, for people to to be able to access WhatsApp — and WhatsApp is by far the most popular platform in the country more than Facebook — people have to make much more of an effort, you know, to get a smartphone, relative to their income than will they have to make in a country like Norway, or Germany, or Canada, or the US or the UK. So it shows, you know, how much people care about this world of information, and then the lengths to which they’re willing to go in order to access it.

And therefore, it puts the issue of inequality in a different light. The second issue for why Argentina is a very good case for this, as I said, before people use or access this information, use these devices, these platforms, not only to make decisions, instrumentally, about work settings, but for the most part to connect, to relate to each other. So it’s been a lot of work over the past 10 years mostly, 20 stretching, that has looked at the relational side of this, what does this mean for our everyday sociality?

Most of this work has been done in the global north, where patterns of everyday sociality tend to be more instrumental. And the cultures are more individualistic than the more gregarious, collectivistic cultures that you find in South America and Southeast Asia, for instance, for that matter. So Argentina, in particular, has a very, very strong associational culture. It’s a very, very suitable space to test, then, whether really, these devices are making us more lonely, as it has been, in general, the idea circulating in academic and media settings, or whether in a context that has a very strong associational culture, the effects are different. And the third reason why Argentina is a very important case, I think, has to do with news, politics and trust. 

So Argentina is a country with a long, deeply held distrust of institutions in particular news, and in that sense, it’s a little bit of an avant garde of where the work has been going. So the competing of these three things, and the role of information in the polity, I think presents a very good combination for why this is not a lesser version of what you find in the global north, but a country where you have national conditions that are particularly suitable for an inquiry of this kind, and that reflect much more what what is happening in the other 86% than what you can come up if you study only the 14% of the global north and and try to imagine that that also applies to Ghana, Nigeria, Paraguay, the Philippines, or Pakistan.

Noshir Contractor: I think you’ve just made a very eloquent argument in support of why being able to expand web science to focus on the global south is not just a luxury, but a necessity. Another issue that you raised in your work is also the generational differences. You wrote this book, while you were parenting two teenage daughters, tell us about how the experience of parenting and listening to how media is being consumed by different generations influenced your thinking in the book?

Pablo Boczkowski: If I situate myself in my adolescence, as most Argentine, I’m a huge soccer fan, Argentina won its first World Cup in 1978. At home, I was 13 years old, and I watched that tournament in a black and white TV. We had a landline at home that we had to wait over a year to get that installed. And maybe there was also some bribing to the local government. So they will please give us a phone, right? So when I think of, you know how my daughters access the world of information, how, for instance, they do homework, with three screens not two, so they have the computer screen for work, they have their phone next to the computer screen where they are monitoring Snapchat, and they have the TV, where they’re bingewatching their favorite show, and all of that at the same time. So in order to understand their world, and how much their affect and their sociality really resides on the screen, that what happens on Snapchat, or on Instagram, a lot of their sociality of who they are as individuals, all of that happens through information that is mediated that is not face to face. And, and in order to help them you know, when they came to me with questions, so when they told me stories in angst or when they cried or when they left, for me, in order to fully participate in that I had to ask them a million questions in order to understand that world. the project is a little bit the result of a breach that my children and I built to communicate so that I could partake of their world and they could express their world to me. 

Now that is that crosses another dimension about age that was very surprising to me. And it has to do with the fact that as measured, you know by the survey and also clear in the interviews, age has become the dominant social structure organizer to access and use to personal screens, to social media platforms and to the world of entertainment, more so than socioeconomic status.

Noshir Contractor:  That is interesting. So in some ways, people share consumption of social media based more on the age than on the socioeconomic status.

Pablo Boczkowski: And which devices they use — not only the platforms or the hardware, if you wish, and how they entertain themselves. Now, the implications of this are humongous, because if you think about social structure, you know, social structure is something that in the daily lives of people is fairly stable. That is, what research has shown time and again, is that you change socioeconomic status, very, very rarely and very slowly. But you, age every day, and you change cohorts, right, you know, when Mannheim — when Carl Mannheim, who was the first social scientist to talk about the importance of cohorts, and generations really in history. A generation lasted 20 years. A generation that is in part, defined by access and use of technology lasts less than five years now. And we age every day, our experiences are changing all the time. So our society is much more in motion as a result of the dominance of age over socioeconomic status. It is changing constantly. It is in motion, and it’s much more uncertain. 

Noshir Contractor: Well, one of the things that you have been working on even before the current work is focusing not just on media consumption, but the production of news. I remember seeing your work a long time ago, where you were looking at how newspapers were trying to navigate what was happening with the digital environments, and with the web, etc. And then again, I see that you now have this book, titled The Journalism Manifesto. What is it that the journalism manifesto looks like today that was different from a few decades ago?

Pablo Boczkowski: A lot has changed. You know, more has changed in the news industry in the production of news in the past 25 years than probably in the previous 50 to 75. So The Journalism Manifesto is a manifesto as it says in the title. It’s a strong and polemic argument that the news needs to change and how we think of it needs to change. 

So my coauthors, Barbie Zelizer, Chris Anderson, and I focus on three main interfaces of the news. The role of elites — historically, the news has been made for the elites, and by elites, and we argue that that has led to a very narrow storytelling of the society we live in today, or first draft of history, that there are many groups that have been historically marginalized, even among the best intended, that have not been part of those telling the stories on or those whose voices are represented in the news. The second interface is the interface of the norms. The idea of norms is how information is processed, right, norms of objectivity, neutrality, etc, etc, we argue that norms that historically favored certain kinds of processing of information at the expense of others. So, we argue for other norms to be included, like norms of inclusiveness, laws of cosmopolitanism. And the third interface is the interface of audiences. And the interesting thing about these is that 100 years ago, even 25 years ago, what the research show is that newspaper people or news people told the stories for each other and to each other. They had very little knowledge of the audience, it was very badly seen that you would cater to the audience and for that you needed to know them. And they assume that the audience is who are going to be there, they took the audience’s for granted, essentially, if they build it, they will come if they publish a story, somebody will read. What the web has done is two things for the audiences. Number one, it revealed a lot of information about the audience is because as we know, every time the server serves your page, the server records information about that. Number two, what that revealed is that the audience became not only known but much more uncertain, because once the web opened competition up across the information industries, right? Before, you know, news organizations had sort of a quasimonopoly, natural monopoly oligopoly position. You know, in America, for instance, in 97% of the Metropolitan markets, there was only one newspaper.

Now everybody competed with everybody else. And the other thing that we know now is that the audience is different than the audience we imagined in the 1960s. That is an audience that is much more emotional driven by what news makes them feel, not only what makes them think. It is an audience that is really cared about kin and being represented not the abstract polity, but having their own kin, their own social network represented. It’s an audience that wants to express themselves as much as they want to consume — to tell their own stories. So if the news media are to survive, they need to engage the audience where they are at, they need to tell stories that are told by people from different groups who feature a broad spectrum of factors in society, guided by norms of inclusivity, cosmopolitanism, among others, and that therefore tell stories about kin in emotional ways, and allowing people to express themselves, not only to listen.

Noshir Contractor: As I look at the conversation we’ve had today, as well as the corpus of your scholarship, I think you are making a really compelling intellectual argument for much more of a cultural perspective on web science. Where do you see this work go in the future? And how do you think web science needs to be paying more attention to these aspects that you have raised today?

Pablo Boczkowski: I think the development of computational tools in the social science has been one of the most incredible and productive areas of growth in the social science. And I think it’s only the beginning of that, for obvious, you know, technical reasons. But I think the energy and the attention that has been paid to that has made us sometimes pay comparatively less attention to dimensions of the human experience, that cannot really be captured by that. For instance, counting frequencies of words, would not tell you what that means, to the people who are using them. So I wish that as the development of computational methods and an actual, you know, computing technology develops, we don’t forget to continue investing intellectual resources and capital resources for that matter, in the development of intellectual work on a more cultural perspective that can complement from a cultural interpretive standpoint, the incredibly exciting work that is at the forefront of computational social science.

Noshir Contractor:  Thank you, again, Paulo for all the work that you’ve been doing in this area, and for coming and sharing some of these insights with that. So I will certainly recommend Abundance to anyone who is interested in learning about different ways of rethinking the extent to which we are consuming media for cognitive purposes, rather than for affective purposes, as well as for understanding that we might be blinded by views of media consumption based on the global north versus the global south or by our own age groups, as you’ve described. Thanks again, Pablo, very much for taking time to talk with us.

Pablo Boczkowski: My pleasure. Thank you very much for the invitation.

Episode 21 Transcript

Taha Yasseri: On Tinder, both sides swipe on each other. And then when there is a mutual interest, they can talk to one another. It’s symmetric by design. But in practice, we see that 80% of conversations are initiated by males. And even in those cases, the 20% of conversations that females start to talk and take the initiative, they are punished for that.

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

My guest today is Taha Yasseri. You just heard him talking about the gender gap that exists when it comes to who starts conversations on the dating app, Tinder.  Taha is an associate professor at the School of Sociology and a Geary Fellow at the Geary Institute for Public Policy at University College Dublin, Ireland. He has been a Senior Research Fellow at the University of Oxford, a Turing Fellow at the Alan Turing Institute for Data Science and AI, and a Research Fellow at Wolfson College at the University of Oxford.  He has studied the dynamics of social machines on the Web, online collective memory — and my favorite: online dating. Welcome Taha.

Taha Yasseri: Thank you very much. Noshir, I didn’t know you’re on the market.

Noshir Contractor: I am not in the market which is exactly why I like to be an observer. But there are lots of people who are in the market for online dating. And as you mentioned in one of your recent articles, it’s over a $2 billion business just in the United States and is expected to continue growing in the foreseeable future. 

Taha Yasseri: A lot of things that we do have changed due to internet-based technologies and web technologies. But to me, one of the most important things that have been revolutionized over the past 10 to 20 years is dating and mating — the way that we meet and we choose our partners for shorter relationships and for longer relationships, sometimes lifelong relationships. 

Noshir Contractor: I remember when online dating first began, and there was still a stigma against it. And people would not even be willing to admit that they were going online to look for dates. That changed. Tell us why you think it changed. What brought about that change? And what got you interested in doing research on this topic.

Taha Yasseri: Any new technology has its own stigma. Particularly online dating was seen as a tool or an environment for you know, not committed relationships or behavior that are promiscuous, which is something in general, something that the societies are less judgmental about it at the moment, but also, people realized, no, actually people can find partners online and through this apps or websites that they can buy, you know, settle down with and get married to and create lovely and happy families. 

Noshir Contractor: One of the things that we have been asking ourselves well before online dating is what are the kinds of traits that men find attractive about partners they are seeking, as well as females find attractive about partners that they are seeking? What is your research on online dating told us about differences or disparities in user behavior between male and female users?

Taha Yasseri: One of the most striking things that we have seen is the imbalance or the gender gap in initiation of the conversation. These modern technologies, they try to give equal weight to both genders. For example, let’s say on Tinder, both sides swipe on each other. And then when there is a mutual interest, they can talk to one another. It’s symmetric by design. But in practice, we see that 80% of conversations are initiated by males. And even in those cases, the 20% of conversations that females start to talk and take the initiative, they are punished for that. They receive less responses, compared to conversations that are initiated by males. As if collectively, we judge them because of cultural biases and the baggages that we as a society, still are dealing with. So in a newer project, We looked at 10 years trends in online dating, and we were hoping to see this gap is actually getting closed, and the balance is increasing. However, it wasn’t the case, actually, we realized that over 10 years, the gap in initiation rates has increased. And that simply tells me that it’s not only the technology, we have cultural baggages and we have things that we want to move on. And only having a shiny website is not the only solution. We require other things as well, which might not be even easy to attain through that technology.

Noshir Contractor:  And indeed, online platforms are reifying the norms that preceded the platform’s in terms of males being the ones who were expected to initiate these kinds of interactions.

Taha Yasseri: That’s very true and we can look at those trends at a very large scale. People click and people send messages. And we look on into the logs generated by these activities rather than asking people because of course when it comes to dating and mating, people’s behavior can be very, very different what they say on a survey or on a questioner. And I think web science methods and computational social science methods are particularly adequate to to address these questions, and to look at the traits in mating — our preferences and our behavior.

Noshir Contractor: Well, one of the things that we know historically, or at least, It’s socially been circulated that women put more emphasis on income and education when it comes to potential partners. And there is always the debate about the importance of physical attractiveness. What does your research show about these questions and if these criteria are changing, since online dating first began?

Taha Yasseri: You’re absolutely right. It has been predicted or reported based on a small scale studies that females put more emphasis on societal features like income or education, and males with more emphasis on physical attractiveness. This was something we observe in our analysis as well. But what we saw and it was interesting was that the emphasis on education and income is decreasing. People are more accepting of differences between their own education and income level and their potential partners, particularly when you look at female users. And it could have many reasons. Of course, one important factor here is women are much more independent today compared to 10, 15, 20 years ago. And that makes the income or education level of their potential partners less relevant so they can focus the interest into other factors. 

When it comes to physical attractiveness, one of the things that I find fascinating coming out of the analysis was that we looked at the popularity of profiles, measured through the number of messages that people receive, versus the self reported attractiveness. If I tell you Noshir I’m a 10 out of 10, you would think I would be very popular on this online websites. Well, actually, in practice the most popular male users or profile owners on online dating websites, and the ones who actually think they are at 10 out of 10, are not receiving as many messages. This is something we call the douchebag effect, because you know, someone who thinks “I’m a perfect 10,” particularly a man who thinks I’m a 10, out of 10, probably is lacking some other personality features that are attractive to female users. But when we look at the female users, the higher they rated themselves, the more messages they received, the ones who thought are 10 out of 10, were actually the ones who received the most messages. 

The other factor why very attractive males do not receive a message could have to do with self confidence and self esteem of female users. They might think, oh, that guy is out of my league, I might not even try. Whereas men don’t have this understanding of their capabilities. You know, even if they are sure that the potential female partners out of the league, they still try.

Noshir Contractor: There’s also been a perennial debate about whether dating and partnerships and romance is more likely to succeed when birds of a feather flock together. Or the other saying, which is opposites attract. What did you find about the similarity between profiles and the extent to which it might have predicted future success in terms of online dating?

Taha Yasseri: One of the things that people have credied online dating for is there a higher ratio of interracial marriages and relationships today compared to 10, 15 or 20 years ago. It is very difficult to argue that this is primarily due to the surge of online dating or it’s something that happens anyway, parallel to online dating. But I do think online dating provides us with much more diverse of a pool of potential partners. When we looked at data, however, we realize that homophily, or similarity between potential partners is not a very strong predictor of success. We couldn’t measure success after relationship, of course, because the irony of online dating is that if it works, you lose your customers. But we could see, for example, if people exchanged phone numbers, or if people carry on chatting for a while, we took measures of success like that. And we realized homophily plays very small role, this could be a reflection of a bigger change in our society is that now we are more curious and more accepting of people who are different to us.

Noshir Contractor: I want to move this to another part of your research where you have argued about whether we can use algorithms and intended bias injected within these algorithms, to move us away from our natural tendencies for homophily, for creating echo chambers or creating fragmentation. And so tell us a little bit about how you got interested in this notion, how we tend to naturally move towards segregated networks in segregated societies. And then you have a very depressing message, you say that, even if we are to use positive algorithms to try to break away from these tendencies for homophily and echo chamber — your research shows that we’re not likely to be successful.

Taha Yasseri: We all agree that we have become very fragmented in our political opinions, particularly in the US, I would say, in the UK, some countries that have gone through a lot of trouble in recent years just because of the divide in the society. So in that sense, bubbles have formed and echo chambers are there. I had heard and I had read that people say it’s up to the platforms to break the bubbles. They have to use algorithms to mix people up and connect people from other opinion camps. And we thought okay, well let’s let’s see if that works. So we developed a mathematical model, and we realized, as long as we have homophily in the network, as long as there is a slightest tendency for an individual, to prefer a connection to like-minded people, over a connection to someone dissimilar to them, no matter how much algorithmic bias we introduce, bubbles will form. We might postpone them, but we never can break them. Because that tendency — that homophily tendency — is so strong that we basically practically need huge amounts of algorithmic intervention, which, of course, takes all of the joy out of the online social networks, right? We do not go on Facebook or Twitter just to fight. And that’s actually what social network companies have capitalized on. Because if you’re happy there, we interact with people who are like minded and support our opinions, we spend more time there and we are more likely to click on the ads and so on. So confrontation is not something social networks advocate for, and combining that with homophily, and the ease of disconnecting from people who are different to us on social networks, all this together make the formation of echo chambers and bubbles inevitable. It sounds very grim. I agree. But we also propose a couple of solutions. It’s not that that’s the end of the story.

You know, before the internet, I live in a village, not everyone thinks like me. My neighbor might vote differently, might think differently. It’s not that I just move next day, you know, I still go to the same church or to the same strip club, depending of my interests. And I interact with people who are different to me, and through this interaction, I might not completely change my opinion. But at least I appreciate the differences. I learned to understand and acknowledge the existence of other opinions. On Twitter and Facebook, we are encouraged to block people, but we shouldn’t just block others or unfollow others, because we don’t agree with them. This is such a new web thing that we just don’t talk to the person so easily, and web gives us the opportunity not to see that person ever again. Whereas in that village, I had to see that neighbor anyway. We somehow have to introduce mechanisms, which encourage people to keep the interaction on and carry on interacting with people who are not exactly the same as themselves. And can I think of an example, of course, Wikipedia, that’s where these conflicts and these clashes of opinion happen. And I have spent years studying edit wars between editors of Wikipedia. One thing that we realize is that the more conflict and the more interaction between opinions and an article there is the quality of the article increases. 

Noshir Contractor: And so what I’m hearing you say is that if Facebook were to take work to make the algorithm make interventions that you think might help, that unlike what you or the algorithm might hope, which is that I will look at this and consider other points of view and it will broaden my perspective. Instead, what the user would do is simply walk away from Facebook because it’s not feeding them what they want to hear.

Taha Yasseri: Either that could happen. So that explains why social media platforms might not even try. The other thing and that is based on research that Chris Bale and his colleagues have done in their control experiment. People who are exposed to content from a different opinion have become more extreme in their own opinion because that wasn’t necessarily a interaction between humans, it was me seeing some content supporting the opposite opinion. And I never had the chance of having an active interaction with some human of that opinion. So I don’t think content sharing, meditated by algorithms is the solution. All we need is human to human direct interaction. And it is not comfortable. We all know that. And the cost for the platform could be that people walk away and their revenue might go down. 

Noshir Contractor: So you already spoke about the fact that as as an example of good engagement, where editors and Wikipedia would go off to one another. And the more they argued with one another and debated one another, the higher the quality of the final Wikipedia page that they were debating. But you also talked about the role of not just human editors of Wikipedia battling with one another. But bots, in Wikipedia, battling with one another. 

Taha Yasseri: Yes, they do. They’re not doing much creative work there, but they do a lot. In some Wikipedia editions, more than half of the edits are coming from bots. But because they never asleep, they’ve worked 24 hours a day. And they do very little things. They fix typos, they add commas, as we continue with Wikipedia. And as we develop the technology, bots now do more sophisticated things. They detect vandalism, they even create articles based on a structured information they are fed with. As I said, we were studying conflicts among humans. And it was a very long shot for me to think maybe we should also look between if there are edit words between bots. And my hypothesis was, there wouldn’t be any because bots are not emotional. They don’t take things personal. But then as soon as we looked into the data, we realized there have been pairs of bots undoing each other’s contributions for more than three years. And no one have noticed, because no one is actually looking at bots, we trust these machines, because they’re predictable with that. Yes, they’re predictable at the individual level, to some extent. But if you have learned one thing from complex system studies is that system behavior is very different to individual’s behavior. 

Noshir Contractor: How do you think that humans will play a role in brokering or mediating these kinds of arguments that emerge and don’t seem to end amongst bots?

Taha Yasseri: Ss long as we know the system, and we can predict its behavior, the good thing about sociology of machines, as opposed to sociology of humans is that we have full power, and we have all the agency that we need. Whereas if we understand the problem in a society, we might not be able to come up with an immediate solution, even if we come up with the solution, they might not be able to implement it. But the good thing is that those bots have no agency and they are serving the purpose of the owners and the society they’ve worked for. In that sense, I think things are easier. However, the difficulty comes from the fact that we have zero history of sociology of machines. We just arrived to this land and we just discovered this creatures or started to build them and embed them at every corner of our bedrooms and living rooms and the streets. We are creating the systems, it’s already a bit late to start analyzing, studying the social behavior. But as soon as we do that, and we understand how they behave, coming up with a solution and implementing it, I think should be easier than the long lasting problems we have in our own societies.

Noshir Contractor: Web science scholars have for a while thought about the web as being a social machine. And what you’re highlighting is that given that the web is a social machine, or was a collection of social machines, we need to come up with a new sociology of these social machines.

Taha Yasseri: That’s a very elegant way of putting this, that is true. The thing that I might propose to change here is to turn machine to machines. Because we have different machines coming from the fact that we have different actors. One of the sad things we learned during the pandemic was that this utopian image of a global society is not relevant. Neighboring countries blocked each other’s purchases, because of competition. When it was about the masks and the tests, and then the vaccines and so on. Therefore, the social machine of the web is not just one entity. There are competing entities. And when we saw complexity in behavior of Wikipedia bots, are very, very nice and good. I can only imagine how things could go bad and wrong when we have competing interests among not very good and not very well behaving, automated bots that are fighting for the benefits of the owners.

Noshir Contractor: I want to end by taking you to yet another issue that you have been doing some really exciting research, and that is on the topic of collective memory. One of the things that has been argued about the web in general is the fact the internet doesn’t forget. We have the archives that is allowing us to go back and look at rewind. At the same time, the European Union has to lead the way in terms of regulation that gives individuals the right to be forgotten, or at least for some of their actions to be forgotten. Tell us a little bit about how the web is able to advance our understanding of what our collective memory is, how we socially generate these common perceptions of any event on the web, and how those perceptions might change over time.

Taha Yasseri: Collective memory is not a new term, people have been talking about it at least for a 100 years. But it’s the first time we can measure it, we can put a number on it, we can look at an airline crash, and measure how many people read the Wikipedia page about this event, how many people googled it on on Google Trends data. And then 10 years later, look at the same rate and see how this number has declined over years. This is very materialistic and very operationalized, maybe oversimplified way of measuring memory. But It’s a good starting point. We have taken a similar approach, as I just described, and looked at logs of pageviews on Wikipedia and Google Search volumes, and so on, one of the first things we realize is that well, our attention is biased. We are much more attentive to things that are closer to us that are benefiting us and that are related to us. But then we also realize our memories are very much biased, to be remember past events only if they are somehow connected to us. Web science allows us to study these patterns. 

Noshir Contractor: What did your research show us about any difference in generations when looking back at events in the past? Were there differences in how one generation might view a set of events  compared to others?

Taha Yasseri: One of the limitations we have to admit that web science has is it doesn’t give us much of historical view. In our analysis, we of course, had data for the last 20 years, but we couldn’t say how much our results generalized 200 years ago, based on the data that we had from recent years, one thing that we could say is that are tied to time and scale of collective memory is around 40 years. 

Noshir Contractor: But Wikipedia does have entries for events that happened centuries ago.

Taha Yasseri: That’s very true. And that’s exactly why we could see how people react to those events in the last 20 years, and how people reacted to events more much more recent in the last 20 years. And these are people who are using Wikipedia. What we cannot talk about is how people would have reacted to those pair of events 100 years ago, because we simply didn’t have any tool to measure their behavior.

Noshir Contractor: Exactly. Well, there are many things that web science can do and others that we may recognize our limitations, at least science at this point in time. So again, I want to thank you so much for talking with us about how the web has changed online dating or maybe hasn’t changed online dating, the extent into which algorithms may or may not be able to help us confront the challenges we faced with echo chambers — the sociology of machines, as he said about how we might be looking at bots fighting with bots, mediated by humans, and then again, how all of this shapes our collective memory, I want to thank you again for taking time to talk about this. You’ve been such an exciting scholar at the forefront of web science. And we all look forward to seeing continuing research come from you and your team of collaborators. So thank you again, for joining us today. 

Taha Yasseri: Thank you very much. It’s been a great pleasure. Thank you for having me. 

Episode 19 Transcript

Sinan Aral: While there has been misinformation and disinformation throughout human history, we’ve never had a technology that has essentially rewired the central nervous system of humanity within one decade, that accelerates the spread of information, as much as social media does, in an algorithmically controlled fashion. 

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

My guest today is Sinan Aral. You just heard him talking about the impact of social media in spreading misinformation and disinformation. Sinan is a global authority on business analytics, award-winning risk researcher, entrepreneur and venture capitalist. He is the David Austin professor of Management, Marketing IT and Data Science at MIT, where he also directs MIT’s initiative on the Digital Economy. He’s also a founding partner at Manifest Capital. Sinan has won numerous awards, including the Microsoft Faculty Fellowship, the National Science Foundation Career Award, and the Fulbright Fellowship. In 2018, his article on the spread of false news online was published in Science. It went on to become the second most influential scientific publication of the year in any discipline. And in 2020, Sinan published his first book, the Hype Machine, which received  a best book on AI award from Wired Magazine. Welcome Sinan.

Sinan Aral: How you doing Nosh? Great to see you.

Noshir Contractor: Thank you so much for joining us on this podcast. You’ve had quite a year. I want to first start by asking you what got you interested i looking at information on the web?

Sinan Aral: Well, I was a PhD student at MIT and I knew I wanted to understand technology, I didn’t know exactly how I wanted to get into it. As I was studying at MIT, I was taking statistics classes that assumed that all of our observations in the data were independent. And I was taking sociology classes with pictures of network diagrams in the research articles that uncovered the tremendous interdependence of our world. And I thought, a lot of the models of society could really be explained by the ebb and flow of information between us in our complex interdependence. The major thing that’s different today in that web of interconnections and the flow of information, even though the human species has been interdependent for a very long time, is the digital flow of information. And so I thought that’s where a lot of the answers to unexplained phenomenon in society were going to come from. And I’ve been researching it and studying it ever since.

Noshir Contractor: You talked about this web of connection in the digital economy, which is exactly the focus of why your work is so central to web science. Can you give an example of what you mean by an unexplained phenomena in society that can be well studied through these means.

Sinan Aral: So for instance, businesses try to understand demand patterns, for example, you know, it’s thought about in terms of the products that are sold, it’s thought about in terms of consumer preferences. And for many, many years, the thinking was that there’s some distribution of product characteristics. different products exhibit different characteristics. There’s some distribution of consumer tastes — different people have different tastes. And you know, there’s a match between these products’ characteristics and consumers’ tastes, and that can explain changing patterns of demand for different products over time. But one thing that has become clear over time is that people talk to each other. And they share their own opinions about products, and now we have digitized that communication in the form of social media likes, comments, and so on. And in the form of reviews and ratings where we are constantly describing to each other in digitized recorded mass scaled format, what we think about — and not just products, political candidates, you know, bills before Congress, you name it. And it turns out that the patterns of choice that human beings make, whether it’s voting, or whether to get a vaccination, or whether to buy a certain product, are heavily influenced by the patterns of communication and sharing of information about those political candidates or public health behaviors like vaccination or products. And so a significant fraction of the variance can be explained by the ebb and flow of information online, whether it’s through person to person communication on WhatsApp, or through microblogging services, like Twitter, or whether it is reviews and ratings from the crowd. The scaling of public opinion is changing the nature of the way we decide and act.

Noshir Contractor: I want to take you back to a decade ago, where you published one of the first articles that looked at doing natural experiments in the field. So to speak, I’m thinking back of your article in Management Science, titled Creating Social Contagion Through Viral Product Design. That article was one of the first attempts in my opinion that tried to look at how word of mouth campaigns could be digitally transmitted. And you did this experiment on Facebook that I would love for you to describe. And in particular, tell us about what got you to think about the question, but also the strategies that you used to answer those questions, which was quite novel at the time.

Sinan Aral: Our experiment was an experiment in the wild among 1.5 million Facebook users. So it was perhaps the first very large scale online experiment about the causal effects of information flow in networks on behavior. And the reason I became interested in it is to basically scratch an itch, which is how a lot of research starts. And that is to understand really the causal effects of networks on outcomes and behaviors in society. What we did was we developed an app for Facebook, along with a movie studio. And it was a movie app, you could friend celebrities, buy movie tickets, rate movies. We wanted to understand this concept of what we call viral design — Can we design products that are more likely to be shared amongst friends? So we built several features, a invite your friends button, and then a passive awareness campaign feature. The first feature allows you to press a button, there were buttons throughout the app that invite your friends, that showed you a list of your Facebook friends, you can pick who you wanted to invite and invite them. The passive awareness whenever you took a key action in the app, like rated a movie, it would send a message to all of your Facebook friends that said, :Hey, Sinan just rated this movie four out of five stars, you might be interested in the app, here’s a link to download it.” We created three versions of the app, a control group with neither of these features, and two other experimental versions of the app with these features. And then as people downloaded the app, we randomly assigned them to one of these three versions of the app. And then we just simply observed the sharing and diffusion of the app through the Facebook network. And any differences in the speed, breadth or depth of the diffusion of any of these versions of the app has to be causally driven by the existence of these features, because we randomly vary just toggling on and off those features. And that was it. And we found very significant differences. And then we were able to study, what is the power of a personal invitation? What’s the power of a passive awareness campaign? And as well, what are the actual differences in the rates at which apps spread with or without these features?

And we found evidence of network effects that when you use the app with your close friends, you were less likely to give the app up. If you use the app with acquaintances, you were more likely to give it up, which indicated that network effects varied depending on the closeness of the relationship. Now we’re doing massive scale experiments to measure that across all the platforms, Instagram, Snapchat, Facebook, Twitter, and so on. Because we think that’s such an important part of the economics of social media today.

Noshir Contractor: And one of the things that I think distinguishes that kind of work you do, this carefully controlled experiment where you put people in different conditions, helps address an issue that has plagued prior research in this area. I’ve heard you talk about this before — talk about that analogy you’ve used here.

Sinan Aral: When we, as network scientists or web scientists study patterns of behavior in a population, sometimes we tend to foreground the explanations that we are investigating, and tend to ignore that there could be many other confounding effects. So when we study the diffusion of a behavior through the network, and we see that Wow, people who are connected to one another, tend to exhibit this behavior in succession, one after the other rapidly, we say, “Wow, there must be some sort of network effect here that that this behavior is passing from person to person to person, because it only shows up in the network in succession between people who are linked together, rather than more randomly in the network of people.” And that seems like a logical conclusion. But it’s not always a logical conclusion. In fact, many times it’s not a logical conclusion. 

So the analogy is to a crowd of people in a field watching a political rally or a concert, and you see the first umbrella go up in the bottom left hand corner of the field, and then immediately the umbrella next to it opens. And then the umbrella next to it opens. And then the umbrella next to it opens. And you see this crowd of umbrellas, opening one right after the other from the bottom left hand corner of the field to the top right hand corner of the field. And one explanation for that is that the first person, open their umbrella and then nudge the person next to them and said, “Hey, open your umbrella, you know, it’s the cool thing to do, everybody’s doing it.” That’s one explanation, the social influence explanation. But another explanation that’s much more likely, is that there’s a passing shower that is moving dynamically from the bottom left hand corner of the crowd to the top right hand corner, hitting raindrops on the heads of the people. And that is what’s causing them to open their umbrella. And so I use that analogy, because sometimes as web scientists and network scientists, we assume social influence, where it’s really probably some third factor that’s causing that pattern of behavior and the data. So we have to be careful about correlation and causation, if we intend to be rigorous about network effects.

Noshir Contractor: Sinan, when you began working in this area, I got the sense that a lot of your interest was to see how products could be made viral — how they could be designed so that we could have social contagion, and do it in a way that was scientifically grounded and effective. Somewhere along the way, I got the sense that what you thought was going to be a positive set of strategies began to trouble you where information was being sent, or misinformation was being sent. How did that happen? 

Sinan Aral: I mean, I think that technology is agnostic. And you know, one of the themes of my book Hype Machine is that if we intend to solve the social media crisis that we find ourselves in, we have to get past this debate about whether social media is good or evil? Because the answer is yes. And we need to understand how it can promote positive outcomes in society and how it can be dramatically negative for our democracies and our economies and our public health. And so I’m interested in both the good and the bad, in part to promote the good and counterpart to contain the bad or to work to build systems that reduce the negative effects of social media.

Noshir Contractor: Let’s talk about the book, the hype machine. I love the title. I’m going to first ask you, how did you decide on the title of the book, which I noticed is not only the title of the book, but also the title of chapter three in the book: The Hype Machine.

Sinan Aral: I considered a lot of things. I thought carefully about the chapter titles. And it’s super interesting to have that conversation with you. Because I’ve done many interviews over the last year. And nobody’s asked me that question, I’ve been waiting for someone to ask me. You know, I titled the book, the Hype Machine, because social media is built on an engagement model, a business model on engagement. And the way it works, obviously, is that social media companies sell attention as a precursor to persuasion. And they sell that as ad inventory. So the way that they maximize the opportunities to sell advertising, is to engage people, and to keep them engaged. And so the machine is designed to hype us up. And that’s where the title of the book comes from.

Noshir Contractor: You have a chapter that is titled “The End of Reality.” And that sounds very depressing, Sinan. Tell us about how that title came about, and what’s the thesis of that chapter.

Sinan Aral: In 2018, we published a 10 year study of the spread of fake news online. We worked directly in collaboration with Twitter and had access to the entire Twitter historical archives. And we studied the spread of all of the verified true and false news stories that ever spread on Twitter over 10 years. And what we found was that false news traveled farther, faster, deeper and more broadly than the truth in every category of information that we studied, sometimes by an order of magnitude. While there has been misinformation and disinformation throughout human history, we’ve never had a technology that has essentially rewired the central nervous system of humanity within one decade, that accelerates the spread of information, as much as social media does, in an algorithmically controlled fashion. 

And so we’re at a particular moment of risk. And in this chapter, I discussed the potential impact of the spread of falsity on democracies, economies and our public health. I talk about coronavirus misinformation, I talk about meme stocks and how misinformation can affect the stock market. And I talk, of course, about democracy and elections: the 2016 US presidential election, the 2020 presidential election. I talk a lot about deep fakes and the science of fake news. What is it about human cognition that makes us susceptible to falsity? And what some of the solutions might be? In this book, I try very hard to remain rigorous throughout the book and allow the science to lead. And you don’t need to exaggerate it, for it to be dramatic, because there is dramatic, rigorous science out there that should be compelling enough to motivate policymakers and platform designers and leaders to the potential peril that we face with social media, as well as the tremendous promise.

Noshir Contractor: I’m aware of several people, perhaps Mark Twain, and you can correct me if I’m wrong here, who, a long time ago, said that lies can travel twice around the world before truth could put on its boots. So why is it that today, the web might be changing that particular phenomena? And is it simply a question of the lies traveling 100 times around the world before truth could even put on its socks, let alone its boots?

Sinan Aral: The phrase fake news was first mentioned in a Harper’s Magazine news article in the 1920s, so you’re right when you say that, you know, fake news is not new. And it’s interesting, the Mark Twain quote, is actually not a Mark Twain quote. I’ve heard that quote, attributed to so many different people incorrectly, which itself is ironic. To bring it back to current times, there are a couple of things that make the spread of falsity today particularly dangerous today. And that is speed and the algorithmic amplification of falsity, as well as the targeting that happens. So we don’t know who is seeing which information. The speed with which information travels today is nearly instantaneous. And it is much faster than even just a decade ago in terms of the spread of false news, but also true news. This favors falsity, because we’re 70% more likely to share a verified false news tweet than a verified true news tweet. 70% more likely, over a 10 year period of all the tweets on Twitter, that’s a big number. And false news travels about six times as fast as true news. The entire planet could be mistaken about something, about a consequential choice that needs to be made, and if the truth doesn’t catch up quickly enough, then we can make very significant errors. 

Elections are a dramatic example. The rise and fall of equity prices are another example. And then finally, whether or not you get vaccinated in time to stop the spread and whether you can achieve herd immunity against coronavirus. These are three dramatic examples. The hype machine was published in September 2020. And in the book I predicted all three of these things would happen. In the book I said that we were gonna see violence during the 2020 election. We had the Capitol Riot in January. I predicted the rise of meme stocks. We saw the GameStop stock price rise. I predicted that misinformation around vaccines would create protests around vaccines that would disrupt the vaccination process in the United States. In January 2021, we saw Dodger Stadium in Los Angeles shut down by anti -vaccine protests. And I don’t consider myself an oracle. These outcomes were entirely predictable. We’re at a particular moment in history, where the spread of falsity can have even more dramatic implications than it has in the past.

Noshir Contractor: Amazing. Now another chapter title: and Networks Gravity is Proportional to its Mass. 

Sinan Aral: I wanted to make a point about economics, but in a way that non-economists might understand. And you know, people who have taken grade school physics, understand mass and gravity in a sense. And really the point I wanted to get across in this chapter is how important the economics of the social media economy are to the outcomes we see regarding democracy, public health, our economy, stock market, business outcomes, and so on. And the main economic force that shapes the social media economy is what economists call network externalities or network effects. And that is to say that the value of a platform or a product is the function of the number of people who use the platform or product. And so the size of a Facebook network, the size of Twitter’s network, has implications for the attraction it has to new users and the stranglehold it has on current users. We’ve got a big conversation about whether we need to break up Facebook and whether we need to break up big tech, we’re worried about the rise of monopolies in the social economy. And what people I think are missing is that in order to really create competition in the social economy, we have to deal with the structural economic phenomenon that are creating market concentration. And that really is driven by network effects. And in an economy driven by network effects, big platforms have big power, they have power to attract new users, they have a stranglehold on current users. If you want to speak with your friends and family, you can’t leave Facebook, Twitter, and the major platforms today, because that’s where everybody is to talk to. And it’s hard for new entrants to get new users because of the network economics of the social economy. And so a lot of the outcomes we see, in terms of misinformation, in terms of effects on democracy, economy, public health, in terms of the very nature of competition in the social economy, stems from the simple fact that a network’s gravity is proportional to its mass, the amount of power that a social network platform like Facebook has, is proportional to its size. And if we want to address that, from a policymaking standpoint, or a business standpoint, we have to make structural reforms to the economy itself, that address network effects. And as I described in the last chapter of the book, that means instituting interoperability and social network and data portability, which is itself the main structural reform to the social media economy that will have the single biggest impact on the level of competition in that economy.

Noshir Contractor: Let’s go to that final chapter, you title it, “Building a Better Hype Machine.” What you just described was, the larger the network, the more sticky it is, the more likely it’s to keep you down on it and not let you go away from it. How does interoperability solve the problem? And why would the large networks today have any incentive to participate in that?

Sinan Aral: That’s the main question. And I think the answer is obvious. They have no incentive to change, because they’re making money hand over fist, consumers have no choice but to use the platforms that have the greatest network externalities, because that’s where everyone is to talk to. Now, it’s interesting that you started with the 2011 paper that we wrote, where we first began to measure network externalities in the social economy. That was a decade ago. Now we’re doing a very similar experiment across all the platforms to measure, well, how big is the network effect for a Facebook or a Twitter or Snapchat? And what can we do to break this network effect? So if I was to hypothesize to you, how would you feel if you couldn’t send a text message from Verizon to Sprint? You think I was crazy. You think I was absolutely insane — What do you mean, I can’t send a text message from one mobile carrier to another, of course, I should be able to do that. And then I say, Well, why can’t you send a message from Facebook to Twitter, or from Instagram to Snapchat? And suddenly you think to yourself? Hmm, that’s interesting. Why can’t I do that? Well, the reason is because they have made themselves incompatible in order to retain their network externality and their network value.

And because of that, consumers have no choice. And when consumers have no choice, and they can’t switch from one platform to another, without incurring significant cost, then there’s no incentive for that platform to give consumers what they want, which is a clean internet that protects their privacy, that reduces bullying, that reduces misinformation, that does something about social media manipulation during elections, that reduces the amount of hate speech on the platforms and so on. Because all of that stuff is engaging and profit maximizing, it’s allowed to continue. But there’s a bill in front of Congress now called the Access Act, which would require any social media platform greater than 100 million users to become interoperable with other social media platforms. If that were enacted, and if I came up with a social network that said, I would protect your privacy, I will eliminate phishing on my platform, I will have tight security, and you can speak to anyone on Facebook, Instagram, Snapchat, and Twitter, then people would be able to choose that social network. And as more people chose that social network, the larger platforms would have to make reforms to provide similar levels of privacy and security in their platforms, and so forth. That is an example of how interoperability creates competition in networked industries.

Noshir Contractor: Now, for a long time, even before, before we got to social media platforms, the notion of having some kind of interoperability goes back to the very start of the internet. The brilliance of the web was in large part based upon HTTP, the hypertext transfer protocol and allowed interoperability. Do you see that there is any reason to believe that through legislation or technological innovation, we will see a breakthrough in this interoperability dilemma that we face right now?  

Sinan Aral: I’m sincerely hopeful that we do because I think this is just an obvious point — imagine a balkanized internet where you had not a global internet where you could share information, but you had many different privately controlled, intranets that people were part of and had to pay to be a part of, and to get access to information and so on. The level of innovation and collaboration and communication and life-saving health information, and everything that is shared, because the internet is interoperable and is free to build on is staggering. If you don’t create interoperability, you really stifle the innovation and creation of value tremendously.

Noshir Contractor: Well, And thank you again so much for joining us and giving us a chapter-by-chapter tour, in some ways, of the Hype Machine Book. It’s a wonderful read. And I would definitely recommend that to anyone in web science. I look at your scholarship as being one of the poster children of really excellent cutting edge web science. Thank you again Sinan, and good luck for all the additional work that we will be looking forward to seeing coming out of your research in the years ahead.

Sinan Aral: It’s a true honor and a pleasure to join you. I’m looking forward to seeing you in person again very soon and to give you a hug and a high five, because it’s been too long. We’ve all been through a lot and it’s great to connect virtually but I’m also looking forward to connecting with you physically in person as well soon. 

Episode 18 Transcript

Matt Weber: The Internet Archive has about — last count — nine petabytes of archive data, I wouldn’t be able to begin to tell a student how to begin cracking open that repository, we don’t really have the tools for that yet. So in many ways, we’re still developing the technology to be able to look at some of these questions at scale.

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the web science trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

My guest today is Matt Weber — you just heard him talking about some challenges and questions related to web archiving. Matt is a faculty member in the Department of Communication at the School of Communication and Information at Rutgers University. With more than a decade of experience researching information, ecosystems, organizations and communities. Matt focuses on the use of large scale web data to study processes of change. Some of his current areas of focus include work and algorithms and knowledge, public policy processes, production of media and the science of communication within these information ecosystems. In addition, Matt has been an active member of the web science community. He’s the program co-chair for the ACM 2021 Web Science Conference, and delivered a keynote at this year’s conference just earlier this week. Welcome, Matt. 

Matt Weber: Thank you, Noshir. It’s a pleasure to be here. 

Noshir Contractor: To get us started, take us back to when you first got interested in looking at the web as a changing process. What got you interested in this?

Matt Weber: I came into academia, having been a marketing professional working in the media industry, and worked at an ad agency for  a number of years, and then ended up working at Chicago Tribune Tribune Corp, and saw firsthand how badly companies were responding to shifts in communication technology and shifts in news media production. So come 2008, I found myself as a graduate student having the opportunity to head over to Oxford to take part in their summer doctoral program. And it happened that that year the Web Science Trust sponsored the summer doctoral program at the Oxford internet Institute. And so that experience was my introduction to a lot of the core ideas of interdisciplinary research that are central to web science scholarship. And so from very early in my career, I started to see these intersections between the broader questions that I wanted to ask about the interplay between technology and information ecosystems, and some of the core questions that were being asked by scholars who are working in the area of web science.

Noshir Contractor: A lot of the work that you’ve done is based on the assumption that the web is not a fixed entity, and that in fact, it is not just growing, but also in many ways, losing a lot of what is on it. The web itself is extremely ephemeral. And you note in your work that in a study of 10 million web pages, researchers found that the average web page remains live for barely three years. And that a study of Twitter data, focusing on major social events found that 11% of relevant tweets were not available after one year and 27% were not available after two years. How does this ephemeral nature of the web and Twitter affect our ability to understand society?

Matt Weber: So many of us who study topics related to the web look at single snapshots of instances that we record when we engage in our scholarship. And it’s rare that we look backwards in our research or that we have the opportunity to look at larger periods of time we engage in web based research. A lot of that has to do with the availability of data, a lot of that has to do with access to data. 

This is a question that I confronted when I first started my research. I came into graduate school and I wanted to study how news media had been adapting to web based technology. By that point, most newspapers already had web pages, I didn’t have a time machine, I didn’t have a way that I could go back in time and record what I wanted to see. So go back to that opportunity, I had to be at summer doctoral program, it happened that the web science group had brought in a speaker from the Internet Archive. Well, I found my time machine. 

In that moment, I found this resource that had for at that point, 12 years been archiving all of the available web data they could get their hands on. And so I spent a lot of time as a graduate student, building the foundations for providing researchers with access to open up archived web data, and gain access to these larger swaths of data that previously had been made and accessible. 

Now to answer your question about the ephemerality of the web and web based technology., that’s really key. Having archives and having repositories allows us to examine change over time. And it allows us also to see what has been lost over time.

Noshir Contractor: What does it mean to create an Internet Archive? Is it taking a snapshot every day, every minute? How does that work?

Matt Weber: When we think of a term, Internet Archive, many of us at first glance, think, Oh, I’ll be able to go and look at the web page and just replay it across time. And the reality is far different from that. The Internet Archive itself — archive.org — the initial example of what an internet archive is, was started with the idea that this would become the library of the world, it would become a online home for all web content, digitized music digitized books, this would be a go to resource for anyone looking for free information, free knowledge on the web. In many ways, that’s what the Internet Archive has become. But the concept of internet archiving is much broader than that. The Internet Archive is an example. But then look at other organizations like the Library of Congress, the British Library, countless national libraries across the globe, that have all created their own separate repositories to archive either their national web domains or specific subdomains of content that are of interest. The question within each of those domains is what is actually being preserved? Each librarian or each archivist who is working within the library has to make a decision about how often do we record webpages? How detailed are those records going to be? How much of the content are we going to preserve? How accurate is the replay going to be? And what we find is an incredibly wide variance across these libraries.

Noshir Contractor: I want to turn to this question that given that we have the data and given that we now are able to roll back and essentially play like a movie, how the web looked at at different points in time? What are the kinds of questions that you think we can begin to ask now in ways that we weren’t able to do before?

Matt Weber: Before I answer the question about the kinds of research that we’re able to address this type of web archive data, I want to point out that one of the fundamental problems we still have is that even with all of the advances in technology, we still are not very good about accessing this type of archival web data at scale. The Internet Archive has about — last count — nine petabytes of archive data, I wouldn’t be able to begin to tell a student how to begin cracking open that repository, we don’t really have the tools for that yet. So in many ways, we’re still developing the technology to be able to look at some of these questions at scale. There are a number of researchers right now, including myself who are working to tackle these challenges. There are obviously a whole host of questions that we can start addressing. For me, I think one of the most fascinating things is to be able to look at different aspects of our information ecosystems, the environments within which we live in and operate in today to understand how those ecosystems are evolving over time. 

Noshir Contractor: Can you share some insights about what are the kinds of things we are able to learn that we were not able to do before we began in web science to study the internet archives?

Matt Weber: One area where I’ve been looking for the past five, six years is at the growth and demise of various aspects of our local news ecosystem. And by looking at scale and leveraging web archive data, we’re able to unpack specific findings such as the connection the percent of minority residents within a community has to the overall health of the information ecosystem in a local community. For instance, we see that as there is a greater and growing Hispanic population, there is a pullback on the part of corporate news organizations in terms of the amount of content that’s being provided to that community. We see more niche newspaper outlets coming in to fill the gap. And that’s a story that without being able to look back through the repositories that we’ve built up, we would never have been able to detect and to pull out. 

Completely different topic, we have a repository built around social media data, and news media data tracking the events around both Superstorm Sandy and Hurricane Katrina. And you’re able to see how community partners were able to work together in very niche micro clusters to basically fill the gap of information in the early weeks, early days after each of these disasters, to create a nexus for information for the communities that were affected.

Noshir Contractor: As you look at these insights that we get from people studying past events on the web, are there ways in which the insights helping us come up with actionable ways of doing things differently moving forward?

Matt Weber: When we talk about web archives, the term archive is maybe a misnomer in many ways, because it implies something that’s been archived, stored and put away. We’re talking about data that’s contemporary today, and then chronologically works backwards. And so much of the web archiving research that we’re talking about, is present to modern day, but allows us the ability to then look at the evolutionary path going backwards. And so again, I’ll come back to my own work, right now, looking at local news information ecosystems — we’re leveraging that work today to advocate for policy solutions in a number of states that are looking to create new models to support local media environments.

Noshir Contractor:  You mentioned earlier about some of the challenges that we face that there are technological challenges associated with being able to navigate a study of the web. Yet, the combination of new tools and metadata formats have demonstrated that some of this analysis can be conducted more affordably. Tell us how hopeful you are about this, this trend towards making these data more accessible.

Matt Weber: So let’s start first, with advances in computing. We’ve seen significant progress in our ability to work at scale on the web. And web science being the interdisciplinary home that is, is a perfect venue to be talking about  this type of scholarship and this type of education. With regards to research, increasingly work that took a supercomputer, work that took a computing cluster, can today be run through Amazon Web Services on your laptop.

Add to that some of the more programmatic technological advancements. All of that goes to say that we can work with larger sets of data in a fashion that is much easier than it was even two or three years ago, 

The challenge with web archive data is still in translating between say, your library and the work file format. There are groups out there that are making a lot of advancements in this area.  And I think in the next three or four years, we’re going to see even more gains, they’re going to make this research much more commonplace.

Noshir Contractor: That sounds exciting. We’ve been celebrating all the incredible insights we can get from looking at the Internet Archives. But do you also have examples of concerns about limitations, things that may be lost in a biased or systematic fashion? That might then in some ways limit the confidence we have in our inferences based on looking at the Internet archives?

Matt Weber: That’s a fantastic question. I think it’s a fundamental problem with web archiving, that hasn’t been fully addressed. The process of archiving works very much like a network, you start with a few central nodes. And you archive out from those central nodes, you pick your starting point and say who’s linked to these nodes, and then who’s linked to those nodes, and you continue to crawl on. And so this sets a dominant hierarchy for what is going to be archived, what is going to be stored, if you are a niche community on the web — if you are a, say, a group of minority-serving newspapers in Newark, New Jersey that has a very small web presence, but a very strong impact in your community — if you’re not connected to the main network of media organizations, and the main network of information websites in the state, most web archiving platforms will miss you, will simply skip over as if you never existed. And so when the researchers then go back to use this archive, to leverage this archive, from their own work, if they’re not aware of these gaps that may exist in the archive, the presumption will be that those websites never existed, that those communities never had access to information that was being provided. Even though there was a very robust community there. And I say this, I use a Newark example, because that’s exactly what happened in own work. 

Noshir Contractor:  And I imagine that everything that you’ve just described, if it is a problem in New Jersey, I can only imagine how much more of an issue that is, in other parts of the world, in the global south, which have a much weaker digital footprint in some ways. To what extent do you see that as a limitation in terms of our ability to make inferences?

Matt Weber: You mentioned the global south, we travel around the globe and pick your example, let’s go say into India and look at news provision in India. And a lot of that is either A. still happening through printed paper or B. happening on technological platforms that skip what we know to be the mainstream web. So a lot of news dissemination via apps like WhatsApp, that are increasingly community based platforms for spreading news and information through a community. And all of that is overlooked by this traditional web archiving type of technology. 

As web scientists, as researchers studying in this domain, we have to be increasingly attentive to multi method research that allows us to more accurately represent the gaps that may exist and dominant modes of data collection. Unless you engage with the communities and talk to people living in these communities to understand how they’re getting access to information, how they’re getting access to news, you wouldn’t understand where those gaps were. And so more and more today, when we have greater access to data at scale, we simultaneously need to be leveraging partnerships with other scholars, partnerships within communities, to better identify where the gaps exist in the data that we’re relying on for our research.

Noshir Contractor: How concerned are you that as we move towards platforms like WhatsApp, for example, that those platforms which are often in some way shape or form walled gardens? That archives may not be necessarily tapping into what is happening within these sort of private spaces? 

Matt Weber: We were deeply concerned about walled gardens a decade ago when newspaper companies started putting up paywalls and limiting access to certain types of content unless you were a paying subscriber. At a very different scale, and a very different level, we’re having a similar conversation today, when we talk about walled gardens, we’re talking about information that we can’t access. Now, some of that today is happening because of increased concerns on the part of consumers around privacy. Part of the shift to messaging and information dissemination on platforms like WhatsApp comes from an increased desire on the part of consumers to have privacy and the information that we’re sharing. This creates a lot of challenges for us as scholars, in terms of the information that we study in the information that we hope to have access to in order to better understand the social lived world that we engage in, day-to-day. There are no ready answers for this. But the lessons that we’ve learned from the past two decades of research, examining web data, living in the world of web science has prepared us to better tackle these questions going forward. Hand-in-hand with that, we talked about Twitter as a great example of a company that’s opened up data, we’re seeing pressure on other companies like Facebook to make some of their data more available. And so we are seeing some forward progress in terms of opening up other platforms.

Noshir Contractor: You brought up the issue of privacy amongst individuals as being a major driver of these moves to platforms that provide walls in which people can discuss, which brings me to one of our closing points here, and that is article 17 of the GDPR — the general data protection regulation — talks about the right to erasure or the right to be forgotten. To what extent do you see this right to be forgotten, and giving people the ability to go back in time and delete some of the information that has been aired about them as a serious concern in terms of being able to study the archives?

Matt Weber: The joke has always been something along the lines of: Be careful what you say on the web, because once you say it, it’s impossible to delete. Web archives are the embodiment of that challenge. Once content has been stored into a web archive and preserved in some fashion or other, it’s technically very hard to go back in and scrub every mention from every archive. We as researchers have to be very careful about what we share in terms of personally identifying information when we go back and use web archives in our scholarship. 

And on the other hand, the archivists themselves also have an obligation to better understand how we can work across web archives to adhere to standards, like what GDPR has set forward. With regards to the right to be forgotten. I expect that that type of legislation will continue to grow over the next decade. And we will have to find other ways to make sure that even going back in time, you have the right to have your information removed from these types of repositories.

Noshir Contractor: Given how much you have been thinking deeply about web science and the web over the course of your academic career already, what do you see as some of the most challenging issues that you or others within the web science community need to be placing more of an emphasis on than we currently do?

Matt Weber: In this conversation alone, we’ve hit on a number of the key themes that are pressing issues for the web science community at large, but also for the group of scholars who are thinking about the data that we have available to address these questions. And I would include web archiving in that set of data, we need to do a better job of thinking about how we address privacy in the data, privacy rights and the data that we are accessing. We need to do a much better job of making sure that a diverse set of populations are accurately represented in the data that we’re using. I think both of those fronts, there are decade’s worth of unanswered questions that I know many of us are working to address. Those are critical areas right now.

Noshir Contractor: Wonderful. Well, again, I want to thank you so much, Matt, for taking time to talk with us. And you’ve done so much in helping us recognize the importance of looking at that web as an ephemeral changing dynamic process and telling us about how we can learn so much about society by not just looking at a snapshot of the web at one point in time, but by essentially rewinding and playing back how the web has constantly been changing over the last few decades here. And I thank you again, for both your research and your engagement with the web science community. As I mentioned, you’re the program co-chair for the ACM 2021 Web Science Conference. And we all look forward to listening to your keynotes that you will be delivering on the 21st of June. So thanks again, Matt. And we look forward to learning more about your research in the years ahead.

Matt Weber: Thank you Noshir, I really appreciate the conversation.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening. 

Episode 10 Transcript

Eszter Hargittai: Back in the early 2000s, my advisor in graduate school, Paul DiMaggio and I suggested the term digital inequality, to use instead of digital divide, to signal the spectrum of differences among people after they go online. So I actually kind of wish that people would not use second level digital divide, and third level of digital divide at all, and just would stick to digital inequality when they’re not talking about access differences. 

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the web science trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

Today, we welcome to the podcast, Eszter Hargittai. You just heard her talk about her work about digital inequality and web science.Eszter is a professor and holds the chair in Internet Use and Society in the Department of Communication and Media Research at the University of Zurich, where she also heads the Web Use project research group. Her research focuses on the social and policy implications of digital media, with a particular interest in how differences in people’s web skills and digital literacy influence what they do online. She is one of the most highly cited researchers in Web Science with more than 40 of her articles being cited over a 100 times, and overall she’s been cited over 27,000 times. She has also edited three widely acclaimed books dealing with how we do research. In 2019, she was elected as a Fellow of the International Communication Association. In addition to her academic articles, she has published numerous Op-Eds in a wide variety of prestigious outlets. Welcome, Eszter. 

Eszter Hargittai: Thank you. It’s great to be here. 

Noshir Contractor: I’m delighted to have you here, because what you study at the cusp of looking at how web literacy is so important in terms of understanding and advancing web science is something that we all need to be thinking much more about within the area of web science. As I see it, a fundamental premise of your scholarship and public advocacy is that gaining access to the web and internet, does not in and of itself, solve the problem of the so-called digital divide. Tell us how you got interested in this topic, and came to this premise that has been so pivotal in your work.

Eszter Hargittai: So I was in college back in the 90s. And that was the time when the internet was starting to diffuse beyond academic circles. I actually started college when it was not yet automatic that you got an email address, but I asked for one. And so I started spending quite a bit of time online, and then studied abroad in Geneva, my junior year, which was 1995-96. And interestingly, this was very close to where Tim Berners Lee developed the web. And this got me quite interested in understanding how people keep in touch, especially since I was thousands of miles away from both my family and my college — many of my college friends, I was interested in how people keep in touch through this medium. But also realized pretty early on that just because people gained access to it didn’t mean that they would use it equally the same. And I continued to see this just in my own life with people around me. And as a sociologist, which is what I was getting a degree in, and someone interested in social inequality, I started wondering how the ability to use the internet and the web and to understand the web, so web-use skills, how this related to people’s background.

Noshir Contractor: And I remember that when the word digital divide first gained currency, most people equated digital divide as being associated with whether you had a computer or not, whether you had access to the internet or not. And then subsequently whether you had access to the web or not. And you were amongst the first advocates to say that that’s a very superficial definition of digital divide. 

Eszter Hargittai: This phrase, digital divide, I think, lost a lot of utility as the internet diffused to a larger segment of the population. And as people started incorporating the web into their lives, it was no longer that meaningful to talk only about the digital divide as such, which is, as you said, about access differences. 

And so I suggested in one article that I called the “Second level digital divide,” I suggested that web use skills were also very important to how people were integrating the web into their lives. Now, I will say in retrospect, I kind of wish I had not introduced that term. I think it has led to a lot of confusion in the literature. People now even talk about the third level digital divide. But back in the early 2000s, my advisor in graduate school, Paul DiMaggio and I suggested the term digital inequality, to use instead of digital divide, to signal the spectrum of differences among people after they go online. So I actually kind of wish that people would not use second level digital divide, and third level of digital divide at all, and just would stick to digital inequality when they’re not talking about access differences. And in some ways, I’m almost upset with myself that I introduced what ultimately I think has become rather a lot of confusion into literature. 

Noshir Contractor: It’s a nice dilemma, to have something that you popularized, that you then want to backtrack and retract after the cat is out of the bag, so to speak. In your own work, you have used many different kinds of approaches to help understand these digital inequalities. Can you talk a little bit about the kinds of studies and creative approaches you’ve used to help tap into measuring the extent to which these digital inequalities might surface?

Eszter Hargittai: My dissertation actually was about people’s web use skills. And I did this by interviewing people in person and collecting some survey data from them, but then also observing them as they use the web. So giving them questions, so-called tasks to perform, and then recording what they did, and later analyzing what they did, and quantifying what they did. And then ultimately, what I did was I took the measures of what were actual skills, right, because I had data on their actual ability to, to solve tasks online, and then looked at what survey questions correlated with those actual skills. And this is how I came up with a proxy measure of web use skills, that has since fortunately been used by lots of others, and continues to be a helpful measure. What I have found interesting in the literature is that many people go about coming up with these proxies from more of a psychometric measure perspective. But while psychometric measures are helpful for intangible things, like if you want to measure trust or privacy, that’s not really the best approach to study something like web use skills, because web use skills is a trait in people that we can, in fact, measure objectively as a skill. And so then we should, and then come up with proxies. The reason I suspect people haven’t done this too much, having done this work myself, is that it is a lot of effort. 

Noshir Contractor: So just to make this concrete, this is so interesting. What would be an example of a task that you might give someone to do on the web as part of these studies? And what would be examples of specific web use skills that you are looking to see whether they use those or not?

Eszter Hargittai: So one area that’s pretty hot these days is algorithm literacy. And so I’ve seen people try to come up with survey measures without the actual skills and the actual skills would involve sitting down with someone and seeing how they navigate, say, YouTube and how they look around the site to see whether they understand why their videos on the side showing up. Like, do they understand recommendations? Do they understand where those recommendations come from? Do they understand different feedback that they give to the site how that might influence what then the site gives them? So that would be a concrete example of actual behavior and skills you could measure and then come up with survey proxies for that. 

Noshir Contractor: And what kind of differences do you see in skills amongst people?

Eszter Hargittai: Skills vary very much across the population. A very consistent finding across time at this point and also across different national samples is that education is very much related to skills. So people who are more educated also tend to have a better understanding of the web and have higher level web use skills. 

A less obvious finding has been that there’s considerable variation, even among young people, right. So there’s a lot, there are a lot of assumptions in the media, but also just generally, you talk to a person on the street that people who are young are automatically savvy with the web, because they grew up with it. And so I didn’t think this was the case, but I then studied this scientifically, collecting data on young people’s web use skills, again, both in-person observations and also survey measures, and found that even within young adults, there is quite a bit of variation. 

And even within young adults, socioeconomic status matters. And young adults who come from more privileged backgrounds will have higher levels of skills. One other thing I’d like to say about age is that another assumption is that older people are necessarily worse than younger people. It is the case that once you get into ages 70s, 80s, and, higher, people’s web use skills do drop. However, if you look at people 50 and under, and certainly 40 and under, there really is not actually an age correlation with skills. 

Noshir Contractor: This is really an important finding. Because, as you pointed out, there is a lot of conventional wisdom that assumes that, Oh, young kids know exactly what to do on the internet. And I’m glad that you brought that up. Some of the other areas that you’ve looked at include gender differences, nationality, education, socioeconomic status, ethnicity, and so on. In particular, I also want you to help us understand something that is so important across web science. And that is the impact of disability status in terms of web using skills. And I know that you’ve done some really interesting work in that area.

Eszter Hargittai: Disability status is not something that a lot of work has looked at in web use studies. So with my collaborator, Carrie Dubrovsky, we have, using early 2000s data, have looked at how people with disabilities compared to people who don’t have disabilities, in their web use skills and what they do online. So from before, we have found that people with disabilities had lower level skills. But more recent data actually suggests that people with disabilities have caught up. And we no longer find this, what we would call a digital disability divide, although back to that whole digital divide issue. 

And then beyond this, we’ve also found that in some cases, people with disabilities are actually more active on the web. So they participate in online activities where they make their voices heard, in some cases more than their non-disabled counterparts. 

Noshir Contractor: Well one of the reasons why this disability inequality might have diminished is because there has been a concerted effort to make the web more accessible across different sectors of society. To what extent are the results and findings that you’re reporting, an acknowledgement that our efforts to make the web more accessible are in fact yielding payoffs?

Eszter Hargittai: Yes, so I think one could legitimately see it as a sign of that. It seems that if people with disabilities are able to be actively engaged online, that means that the web is in fact more accessible. One of the challenges of this area of work is national samples only have so many people, only capture so many people with disabilities. So it would be helpful to have a better understanding of how different types of disabilities relate to online behavior. And for this, we’d need much larger samples on people with disabilities. So it would be nice if there were resources to do more targeted sampling of those populations. But overall, it’s fair to say that certainly, some parts of the web have become quite accessible to different types of people.

Noshir Contractor: So there is some encouraging news out there, but I’m sure you agree that there is much more that remains to be done, even as new as new technologies come on the web, including things like virtual reality, and augmented reality, and, and so on. 

Eszter Hargittai: Exactly. And I think it also concerns partly educating the public, who are not necessarily disabled, right, so just to give one example, Twitter has the alt text when you upload an image, which means that you can say in text form what’s on your image. And I always do this when I upload images, but I suspect the vast majority of people don’t do it, not because they don’t care to, it’s partly, and this is where we’re back to web use skills, they don’t actually know it’s possible. 

Noshir Contractor: Who do you think is responsible, or should be responsible for helping educate the public? You have obviously done a lot in uncovering this as a scholar and a public intellectual. Do you see a role for some organizations, maybe the platform providers themselves to help inform and educate the users about these kinds of digital skills? 

Eszter Hargittai: I definitely think it has to be this multi method approach, so to speak. First,, we shouldn’t ignore educational institutions, right? So we, we have to move beyond this assumption that young people know understand the web anyway, so we don’t have to teach them, because that’s wrong. But then, of course, the largest segment of the population is not in educational institutions, and you don’t want to ignore them. So where else can we help? So certainly, libraries and community centers can play a role, and they often do play a role, they offer workshops.

Part of it would be on platforms to recognize that their users come with different abilities to their platforms. It’s a question of usability, right? So I don’t think it’s realistic for them to do a little skill enhancement programs. Rather, they should recognize that these skill differences exist and then address that in how they put together their platforms.

Noshir Contractor: You’ve talked several times about how important and how creative one has to be to do research on topics like this on the web is, in some ways, qualitatively different and more challenging, from doing research in-person, pre-web days, for example. And I’m absolutely fascinated by the titles that you used for three books that you have, edited, and or co-edited. Starting with a book titled Research Confidential in 2009, followed in 2015, by a book titled Digital Research Confidential, and a book in 2020, called Research Exposed. That sounds like a tabloid feel to the whole thing. Tell us about why you thought about these titles and what you want to convey by these titles. And obviously, the content of these books. 

Eszter Hargittai: I should say that the titles came from other people. So I owe others credit. The idea here, that studying the web requires being creative, as you noted, and in all sorts of ways, while the web generally offers all sorts of opportunities, it also offers challenges, or comes with challenges. And I felt like there just weren’t enough write-ups of how empirical social science actually gets done in terms of the day-to-day reality, right, so there are lots of methods books, lots, I mean, infinite number of methods, but but they usually tell you the ideal type of a project or what what you should strive for. But anyone who has actually done empirical work, knows that nothing ever works exactly as you plan, that it’s much more messy, and there are just so many issues that come up. But we don’t tend to write about those in academic writeups. So I wanted there to be a venue where people were just genuinely honest about all the challenges they encountered, but then also share how they dealt with them. I’ve been extremely fortunate to have amazing contributors to these volumes, who have been very generous with their time in sharing their experiences of all sorts of projects, studying the web, whether from big data, log data, to using more traditional methods, like interviews to understand how people use the web, to web scraping, So there are lots of different types of methods in all of these volumes. 

Noshir Contractor: And I think that what you just described is very well captured in the subtitle, so the second part of the titles of each of these books were, in the case of the first book, it was solutions to problems most social scientists pretend they never have. And the second which was called the secrets of studying behavior online, and the most recent how empirical social science gets done in the digital age. So you’re pointing out that your focus is not on aspirational methods, but how it actually gets done, and how people then deal tactically with the messy aspects of the challenging aspects of doing research on the web. 

Eszter Hargittai: Exactly. It’s the behind-the-scenes realities, it’s the ugly sides, the difficult sides that not only do, do people not write up in articles, partly because there’s just not usually room for these things in articles. But also, because they may be deemed as embarrassing, but, but part of the idea is precisely to acknowledge head-on that this is the messiness is part of the research project, there is no such a thing as a perfect research project. Those subtitles very much capture exactly what the books are trying to do.

Noshir Contractor: Fantastic. Well, one of the things that you’ve also been doing lately, is taking advantage of this pandemic situation that we find ourselves in, and have used that to initiate a really large data collection effort about COVID-19, and collecting nationally, representative sample data across three countries, tell us a little bit about what the study is, and also about the book on COVID and digital inequality that you are writing as an academic trade book for MIT Press.

Eszter Hargittai: So, back last spring, when the world went into lockdown, I was wondering how to, how to cope just like everybody else was. And it became quite clear very quickly that, that the web, digital media were going to be very important in this whole situation. And suddenly everybody was commenting on this. But I was someone who had been studying this for 20 years. So I felt like okay, doctors were doing more than their share by treating people. What could I do as a social scientist, and I felt like, I can contribute by trying to understand the social side of this and as an expert in, in studying people’s web uses, trying to understand how the web played a role in all of this. So with my team at the University of Zurich, wonderful group of junior scholars, we decided to do some surveys. And so early April, we fielded a national survey in the US and then mid April, one in Italy, and one in Switzerland. And then in early May also two more in the US. And so the book that I’m writing about digital inequality, and COVID, looks at the early days of the pandemic, and really the lockdown time, and how people were using the web at the time. And perhaps not surprisingly, but I think it’s very important to document, traditional markers of inequality like socioeconomic status, again, play a role in the extent to which people were able to pivot to the web for things that they needed done. But as we discussed earlier, in this conversation, some groups that we might not expect to do well, like people with disabilities, were actually doing quite well compared to others in their engagement online.

Noshir Contractor: Well, we’re going to look forward to reading this book when it’s, when it’s published. One of the things Eszter that you managed to do in the abundance of spare time you have given everything we’ve already talked about, is spend time dealing with outside the academics. You’ve been a very active public intellectual in terms of general audiences, op-eds, etc. But also, in talking with policymakers. Tell us a little bit about how webs science can shape policy in terms of influencing policy makers in these kinds of contexts. What has been your experience? And what are lessons that we can learn on how to do that more effectively?

Eszter Hargittai: Academically, our, our work is important and interesting, and hopefully insightful. But ultimately, it’s very helpful if we can then influence policy, where our findings can be translated into real world outcomes. And so at the very core of my work is this point about, don’t assume that young people automatically digitally savvy, don’t assume that once people get connected, they, they will have equal, even access. And definitely will not have equal skills in using the web. It’s very important to get this out to policymakers. it’s important to keep in touch with very different constituents and colleagues, right? So it’s important to attend different conferences, it’s important to write op eds.

So op-eds, are a really terrific way to get your message out to a broader public. But it’s a very different writing, from writing academically, just incredibly different. That’s something I had spent time on it, I think, was a really good use of time.

I have been affiliated with Harvard’s Berkman Kline Center for Internet and Society, and they do really well in connecting with the policy world. Through them, I spoke with the Obama transition team in 2008.

Noshir Contractor: One of the things that you have actually taken on and as a passion is to write about academic career advice. You have columns and blogs in places like Inside Higher Ed, etc. And so as we wind down this interview, what kind of advice would you give webs science scholars at different stages in terms of what they should, what priorities they should be having, what they should be thinking off what they could be mindful of both from a scholarly point of view, but also in terms of public engagement?

Eszter Hargittai: Actually, it’s been very interesting, as I’ve worked on this COVID Digital inequality book, I’ve actually found myself reaching out to people I knew from graduate school who are not in my specific field. So I was getting a PhD in sociology, and I’ve been reaching out to, for example, people in economics, political scientists. They could help with things that weren’t as obvious for me to tackle. Too often, I’ve seen junior scholars be hyper focused, and that if x is not doing their exact work there, that they can’t be relevant. I think this is, this is really a shame. Because you have so much to learn from people who are not necessarily doing what you’re focusing on. 

Web science is, by definition, interdisciplinary. So I think it is extremely important for people to cultivate networks of others in web science who aren’t necessarily from their own discipline, right? So if you’re mostly doing social science, be sure to be talking to the experts who are more in computer science or who if you do more traditional methods, talk to the computational social science, computational communication science, people and you can learn so much from collaborating with people who have different methodological backgrounds, for example. 

Noshir Contractor: Excellent advice. And then in closing, one last question, we spend a lot of time today talking about the pandemic and of course 2020, 2021 has had in addition to the pandemic several other global cultural reckonings, etc. Taken together with the pandemic, what is the one thing that you think would have been different in this entire experience that we are going through still, for better or for worse, without the web?

Eszter Hargittai: The web cannot be taken out of the COVID conversation, right? So the — us experiencing this is so much about web-based communication. I’d like to think mostly for the better, because of the connections we made the not not having to feel as isolated, if you were able to connect with others for social purposes, many, many people being able to continue their work, even remotely, these are all positive aspects. Negative aspects are, would be, the potential for very quick dissemination of misinformation, of disinformation. Recognizing this, then we need to think about ways to counter that. And so yes, that’s a potential negative, but the positive also, of the racial justice issues that happened last summer in the United States, people’s ability to connect with like-minded others to be able to organize in support of those experiencing injustice. The web is very important to this. 

So generally speaking, I think it’s important to recognize that the potential of any technology, including the web, depends on multiple factors. It depends on how governments respond to them, how they support them or restrict them. It depends on what actions the business sector takes. And then it very much depends on how users approach these technologies. Ultimately, though, I believe that it has been for the positive. 

Noshir Contractor: Fantastic. Well, thank you, again, Eszter for taking time to talk with us and to enlighten us about the nuanced ways in which one should be looking at digital inequalities in society. Notice I didn’t say digital divide. But also more importantly, thank you for your incredible scholarship over the last decade here and more where you’ve really helped understand and advance the process of doing research on the Web and also as we’ve just discussed, advocating for it to the general public and policy-makers, so thank you and I look forward to the next decade of research from you.

Eszter Hargittai: This has been a delightful conversation, Nosh, and thank you so much.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

 

Episode 8 Transcript

Sandy Pentland: The area of privacy and data ownership is the main thing that I’m trying to sort of push on. So we have things like GDPR, the California privacy protocols, things like that. But having rights over your data doesn’t really do much for you. It’s just like, bunches of bits, right? What do you do? And we have this problem that some small number of organizations have huge amounts of data, very unequal. And I think that the solution to that is the area that web science and people ought to think about, which is how can people take control of their data to get the medical service, the government, etc, that they want? 

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

Just a moment ago, you heard Professor Alex “Sandy” Pentland talk about the intersection of data and privacy, which is just one of the areas he studies. Sandy is one of the most cited web scholars at the crossroads of web science, network science, and computational social science. He’s a professor of Media Arts and Sciences at MIT, and directs the MIT Connection Science Research Initiative. He also helped create and direct the MIT Media Lab and Media Lab Asia in Mumbai, India. He heads MIT’s Human Dynamics Group, which is one of two groups at MIT that is a member of the Web Science Trust global network of laboratories. And he co-leads the World Economic Forum Big Data and Personal Data initiatives. In 2011, Forbes named Sandy as one of the seven most powerful data scientists in the world, putting him in company with the then-CEO of Google, Larry Page. His work has pioneered organizational engineering, wearable computing, and modern biometrics, among other things. Welcome, Sandy.

Sandy Pentland: Hey, thank you for inviting me.

Noshir Contractor: Delighted that you’re able to join us here today! I want to start by asking you to tell us a little bit about how this incredible work that you’ve been doing on the web and how the web is shaping society? To what extent and how did you get interested in looking at the web? Because I do know, your scholarship, even before you were looking at the web, you had already made a name for yourself in areas such as image recognition, etc. What got you interested in looking at the web? And how did you get started in that?

Sandy Pentland: Well, I’ve always been interested in human interaction, human perception. And, you know, sort of around the end of the 90s, I was setting up laboratories in India, we were living in India, trying to set up sort of things like the (MIT) Media Lab, but in India, and I noticed that the Board of Directors we had was terrible. And it wasn’t that they weren’t smart. It’s just that they had too much charisma, too much personal force. And I got interested in how did that sort of nonlinguistic, the sort of style of speaking, change decision-making. And then that was this sort of honest signals work, that that, you know, people are aware of, and turns out that you can do things like get early warnings of depression and things like that using this. But when mobile phones came along in the sort of mid 2000s, we started using those. And, of course, that’s part of the mobile web, that is the mobile web. And so suddenly, we were looking at not just two or three people talking, but hundreds of people talking and even more. And I did experiments like looking at how communities make decisions by looking at their face-to-face interaction, as well as their digital interaction. And of course, as the web exploded, and you got lots more video conferencing, and things like that, it just became web science, right?

Noshir Contractor: You were one of the early people who looked at the web as an opportunity to be able to study signals, your book, Honest Signals was very influential in making that point. And more generally, social signals that you looked at, etc. One of the things that strikes me about your work is that you are amongst the first that looked at the web as an opportunity to be able to get data about human interaction and perceptions.

And at the same time, your work has also been equally influential in recognizing the web as a source of concern in terms of privacy, etc. Can you share with us a little bit about how you straddle and how you reconcile these issues in your own scholarship, and what you advise policymakers in this context.

Sandy Pentland: Now, the core attitude I bring is one of science. I’d like to understand, particularly human nature, and how it is that we learn, make decisions, form society. And what you see very early on, when you begin looking at this, is that people form into cliques around topics. So there’s your buddies, the people you talk to, and then people go off and explore to find other sorts of opportunities. And that, that exploration is critical for development of a community but the interaction of the community, the separateness from the rest of society, is critical for people developing modes of operation or norms of operation. 

As you begin to look at this, you realize, “Oh, my God, I can tell who this guy’s friends are. And I can tell who the boss is. And I can tell that he broke up with this person over there.” And so this brings all of the sort of classic privacy concerns. And it’s actually a lot more concerning than, say, Twitter or something like that, or Facebook, because records from cell phones in particular, right, or where you actually were. So it’s who you actually spend time with, not what you say about it. And it’s extremely illuminating for people’s, not just personality, but their social structure, what they believe, where they’re going, independent of what they say. So that led me to start the discussion at Davos that led to GDPR, and have been developing technologies to be able to preserve the good parts, which is the communication, the community building, the exploration for new ideas, but without having the downside of privacy and security risk.

Noshir Contractor: Well, one of the things that you must have thought about is also the variation in norms around the world, on what constitutes privacy. Clearly, you know, some of the things that we see in GDPR, for example, may not be very appealing to even a US audience, and certainly would be different from what the Chinese or the Russians to just name two, what are your thoughts about the extent to which these kinds of policies need to be responsive to specific countries and cultures, or you have the belief that there are certain human rights, basic fundamental principles of privacy, that should be true for everyone on the planet?

Sandy Pentland: There is a certain way of thinking about it, which is universal, it has to do with human nature. But there are variations. And so what I see is that privacy fundamentally, is about individual freedom: can you make, learn things, work with people, do things without any interference, and also without other people knowing, so that you feel more free to try things that you might not want to talk about in public, and, you know, this doesn’t have to be dark. This is like, you know, I’m gonna date this person, but I don’t want that written on my record for the rest of my life, because it may not work out, right. 

But there’s a big axis in societies that has to do with individualism versus the social fabric. So in the United States, we’re extremely towards the individualistic side. In Eastern culture, it’s much more for the good of the group. But the issues are always the same. It’s just there’s a control knob that has to do with the value of the collective versus the value of the individual. And one of the things that people get wrong about this, I’m talking about the law, as well as the large discussions that people have, is that a certain amount of clannishness is key for social support, for mental health. You’ve got to have your buds that support you, you got to have people you bounce ideas off, if you don’t have that, people go haywire. They really do. It’s not a tiny thing. The biggest predictor of mental health is social interaction.

That’s why solitary confinement is this horrible, horrible sorts of punishment. But the question is, is what do you mean social? Is it just your gang, a small group of people? Is it the people in your neighborhood? Is it you know, your government? different societies, different cultures have very different answers to that. But we shouldn’t forget that there needs to be a circle of trust for a human to be healthy.

Noshir Contractor: That’s really a very interesting way of addressing these variations that we see around the world. We’ve talked a lot about phones and mobile phones and how that is protected. That became very influential in getting you started looking at these kinds of signals. You’ve also been one of the pioneers of these intelligent wearable devices, and sociometric badges comes to mind. Can you talk a little bit about what led you to that, and what you see moving forward about the future of those kinds of devices as a way of tapping into social signals, but also maybe providing you some feedback on the basis of that?

Sandy Pentland: Yeah. So I started the wearables group, which was really sort of the pioneering wearables in society type of group. So we had people running around with displays on their head and computers in their backpack, and stuff like that. And it was a response to the realization that we were going to have wireless communication — at that point, we didn’t have WiFi, or even cell phones — and that computation was going to get very small. And we wanted to experiment with what happens if your glasses had displays in them, what happens if things could whisper in your ear, etc, etc, etc.  What quickly became clear is, first of all, things on your body have a social dimension that computer screens don’t, you, you present yourself, you want to look attractive, you want to look credible. So what you wear is really important. And, and also, the second thing was,the main thing you do in your physical world are social things, not information tasks. And so connecting with people better, being more responsive, being more of who you want to be, those are the things that really take off. And people keep forgetting this. So what we saw, what I’ve seen is very slowly, people are figuring out ways to incorporate this in social interactions, in the real texture of your life. One of the main barriers has been batteries, for God’s sakes, yes, it turns out the battery technology is critical, because you can’t be like, you know, recharging things all the time. There were mistakes like Google Glass, which actually is a brilliant idea. But then by putting the camera on it and making it look all space age-y and George Jetson, people revolted. On the other hand, something that reminded me of your name, when I met you, again, would be pretty awesome for most of us, right? Or directions so that you don’t get lost. So what’s happening with that is that it’s being driven by health concerns now, of course, monitoring yourself for COVID, or just staying healthy. And then also, people being at home more, they’re interested in things that maybe aren’t exactly wearable, but are very different formats, from what we use now. And so we’re going to see a lot of creativity in these wearable things, driven probably principally by health concerns.

Noshir Contractor: Wearable devices has become now an extended part of the web, of the mobile web, if you may.

Sandy Pentland: Wearable devices, payment devices, things for managing traffic, and so forth. And of course, the downside of that is privacy, because now there’s a lot more information about you. And cyberattack, because the surface for attack is growing exponentially. Which means you know, taking down all the payments or using the payment machine to get at the core bank or all this sort of cyber attack stuff is going to increase by order of magnitude at least.

Noshir Contractor: You published recently, some research looking at blockchain transactions. Can you talk a little bit about how that research should be really important and salient for those interested in web science?

Sandy Pentland: Yeah, so we published something, a book, actually — MIT Press — called Trusted Data. And what it does is it lays out the architecture that you need to have to survive in this coming era, much greater cyber attacks, IoT, and other sorts of problems, echo chambers, etc. And the core thing is that the web was designed as a communication medium, getting bits from here to there, not 100% reliable, but cheap, good, fast. But it was not designed as a transaction medium, the sort of thing where you know, I pay you a little bit of money, you do a service, and if you screw up, I can sue you, and, you know, I mean that sort of legally binding real transactions, it’s terrible at it, because you don’t know if the messages get through there. They don’t have standing in legal courts. And so what’s happening now with IoT, Internet of Things, blockchain — don’t think Bitcoin, think just ledgers that record stuff in a, in an immutable way, a very serious way — and AI also is you’re getting the evolution of the web, from a communication medium to a transaction medium. 

And you can get a picture of this on Amazon, right? You know, you like, ask for things, it shows you, you click, you bought it. Right? One click —  all that. And that’s because within that tiny walled garden, they can take care of all the security and who you are, and payments and contractual things. Imagine that was on the web as a whole. So you know, you could say, design and build a house with one click, because it would go off and find the architect and the architect would put the plans in the mix, and the computers would merge it and find a build. I mean, you can imagine a world that is almost magical, because things would happen reliably auditable, traceable, legally-binding, fair, you know, you can build that in there. 

And to help do that, we’ve set up these sort of protocols that have blockchain and stuff in them. And I’ve gotten the European Union to adopt this as their core architecture for data. And also a number of large organizations like Fidelity now uses this architecture. Intuit uses an architecture like this, other companies that handle a lot of our life. 

One of the aspects of it is really interesting is is that law is turning to be a network of web science. Because as you get more things on the web, more of it has to do with legal rights and complaints and resolutions and so forth. So I just launched a thing called law.mit.edu, which is an alliance of law schools around the world, to think about how law can make the transition to this sort of digital age, because it’s not obvious. I mean, when there’s so much that has to be human-centric or human-centered in law, human judgment. And that’s under threat as things become more computerized. But you have to have to become more computerized to deal with this much more extensive digital environment. And so resolving that conflict is something that I wanted lawyers and computational people to think about toget her, so that we don’t end up in some horrible place.

Noshir Contractor: That shows again, why web science is so fundamentally an interdisciplinary initiative. And that requires us to think systemically across all disciplines to address and understand and enable the web as we know it now. You’ve already touched on many things that need more progress. But if I were to ask you, amongst the many things that you’ve been thinking about, what are some of the areas where you believe we have seen the least progress in web science? And that you see we need to be able to put much more emphasis in the near future?

Sandy Pentland: Well, the area of privacy and data ownership is the main thing that I’m trying to sort of push on. So we have things like GDPR, the California privacy, protocols, things like that. But having rights over your data doesn’t really do much for you. It’s just like, bunches of bits, right? What do you do? And we have this problem that some small number of organizations have huge amounts of data, very unequal. And I think that the solution to that is the area that web science and people ought to think about, which is how can people take control of their data to get the medical service, the government, etc, that they want? It’s not a matter of money. Money is a distraction. It’s really, are you getting the care, the government and the opportunities that you ought to do? All of this sort of anti-racism stuff revolves around this because this community wants to be treated fairly. Well, what does that even mean? How is it being treated, they don’t have the data? What is beginning to happen, now, is you see data cooperatives, data unions forming, or a community or a neighborhood. Everybody doesn’t give away their data, they set up something like a credit union that holds their data for them. So they still own their data. But now the data is in a place in this credit union, this data union, where they can analyze it and ask, are we getting the same medical services as those guys over there, because they have data from lots of people?

Data is the new resource, like labor or capital. We have credit unions, those came about as agricultural credit unions in the 1870s 1860s. We have labor unions, those came around the 1900s. And now we need data unions, where groups of people pool their assets, which they have the legal right to do under GDPR, and CPP to be able to stand up for themselves. And I think that sort of forming of community,rebuilding trust around facts, around data will have a transformative effect. 

Because as people work together, to have freedom, fairness, power to thrive, they build trust with each other, it reconstructs a lot of what is damaged today. Community should have the right, more, of self determination. And that will help solve a lot of the problems out there. That’s fundamentally a web science type of issue.

Noshir Contractor: Yeah, you raise a really good point that we have spent a lot of time and energy, focusing on being able to have access to our own data. But what you’re pointing out Sandy, is that that is a necessary but not sufficient condition for us to be reap the benefits collectively, of owning our data.

Sandy Pentland: Exactly. Som So at the beginning, I talked about this access of individualism to social fabric, even in places like the US which are very individualistic, the community that you’re part of, and you get to choose what community, the community that you’re part of, is necessary for your support for you to thrive. And, and we tend not to recognize that. And I think the time is now where people are waking up to the fact that we need to pool assets locally among our community, to be able to reinvent ourselves and get what we need.

Noshir Contractor: I think that’s really on point. In closing, Sandy, I wonder if you could reflect a bit on what our current situation, whether you’re thinking of COVID-19, or about the race issues that we’ve been confronting as a society and globally, in fact, how would this have been different if it were not for the web? 

Sandy Pentland: An interesting observation is that our large scale, non-web administrative structure — and I’m not talking about federal government only, all the way down — were unprepared and uncoordinated. And part of that is that different places, they have different culture. Texas is different than Massachusetts, I’ll tell you, oh, Montana is different than New York City. It really is. Not just a culture. But the physical. And I’m what’s ended up happening is the more effective things have been things that were adaptive to local communities. And you see the same thing in the science that’s going on. 

So there’s an explosion of science to find vaccines and treatments for COVID. But none of that is NIH or CDC or WHO project. That’s all local research groups working over the web, to find new solutions that they didn’t write any proposals for. Now the NIH furnished those labs. So they’ve provided the ground and the tools, but the actual research direction, that’s people grassroots pulling together, and the web is the thing that enables it. Alright. So what they’re doing is a dynamic learning network. So the web as learning network, and that is the way we’re making lightning speed progress on this problem, not big government programs. The big government programs set the infrastructure, just like the big government program helped invent the web. But it was these grassroots things that allow this sort of agile, locally adaptive exploration that’s gonna save us.

Noshir Contractor: It again, highlights the fact that even though the web was not originally invented with a particular set of activities in mind, certainly, perhaps not the activities that we’re experiencing right now, it has served us well in being able to, as you said, at the grassroots level, dynamically be able to coordinate large groups of people and large communities to come together and mobilize in helping us address these challenges.

Sandy Pentland: Yeah, I think maybe two parting bits. One is, you know, the birth of the World Wide Web was research projects, distributed research projects that weren’t directed from the top, they came from the bottom. And the second thing is, is what we’re doing right now, would have been impossible five years ago. We have constructed a web that supports all this sort of stuff, just in time for the pandemic, it’s, it’s, it’s crazy. If this had happened 10 years ago, none of that zooming stuff would have been plausible. Lots of these other things just wouldn’t have happened. And so by luck, or whatever, we find ourselves in a world that is, is webified, for better and for worse, but a lot of it is for good because now we can learn faster, we can adapt. We can do things together in ways that just simply weren’t possible until just very recently.

Noshir Contractor: Thank you again, Sandy, for taking time to talk with us today and sharing your insights. You’ve been one of the pioneers of a lot of what has been happening across disciplines and helping understand the web, and we greatly value you taking time to talk to us today.

Sandy Pentland: Thank you for thinking of me and enjoy talking to you. Take care. Thank you.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

Episode 6 Transcript

Brooke Foucault Welles: The really interesting thing about hashtag activism in particular is that it becomes this kind of shorthand organizing principle for people who have experiences that don’t normally get covered in mainstream media or by mainstream press. To come together and share those experiences and the collection of those experiences does two interesting things. 

So first it validates them right: so you’ve experienced something, I’ve experienced the same thing. If we can connect those experiences, then suddenly, our experiences collectively feel more real. And when people can collect these things together, they become newsworthy in themselves. So we’ve seen, over time, the mainstream press maybe not covering individual incidents, but covering the hashtag and the collection of those incidents as a newsworthy event. 

Noshir Contractor: Welcome to this episode of Untangling The Web, a podcast of the web science trust. I am Noshir Contractor and I will be your host today. On this podcast we bring thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

Our guest today is Brooke Foucault Welles — you just heard her talk about #Hashtag Activism, the title of an award-winning book she recently co-authored. She’s a professor of Communication Studies and a core faculty member of the Network Science Institute at Northeastern University. She’s also the director of the Communication Media and Marginalization Lab at Northeastern. That’s CoMM for short. She studies how online communication networks enable and constrain behavior, with particular emphasis on how these networks both enhance and mitigate marginalization. And in 2019, she was a general co-chair for the 11th International ACM Conference on Web Science. Welcome to the podcast, Brooke.

Brooke Foucault Welles: Thanks Nosh. It’s great to be here.

Noshir Contractor: First, Brooke, I want to start by congratulating you on the publication of your book #HashtagActivism: Networks of Race and Gender Justice which you co-authored with Sarah Jackson and Moya Bailey, and was published by MIT Press earlier this year. I also want to congratulate you on being recognized by the international communication association with a 2020 applied Public Policy Research Award that was associated with the publication of this book.

I want to start with the title of the book. #Hashtag Activism. What does that mean to you?

Brooke Foucault Welles: Thanks, Nosh. So it’s an honor obviously to both publish the book and receive an award for our work and hashtag activism. You know, it was a term that was coined by the press that kind of malign this form of activism that emerged around the so called Arab Spring and the Occupy movement where folks use the internet as a way of organizing and proliferating messages of resistance and solidarity with marginalized communities.

We kind of co-opted or hijacked that term to really interrogate the role of the web can play in advancing progressive social justice movements. And our argument in the book, and more broadly in our body of work on hashtag activism is that hashtag activism is a logical and sensible extension of the use of media by resistance movements and social justice activists, dating back to the historical Black press and the civil rights movement and everything in between. Hashtags in particular have become associated with social justice movements in a way that’s meaningful and powerful and affecting social change.

Noshir Contractor: And you’ve talked about in your research talked about several specific hashtags that you have looked at over the years, things like “Girls Like Us,” “Ferguson,” “my NYPD,” what did you find when you were looking at these specific hashtag”

Brooke Foucault Welles: Yeah. So the really interesting thing about hashtag activism in particular is that it becomes this kind of shorthand organizing principle for people who have experiences that don’t normally get covered in mainstream media or by mainstream press. To come together and share those experiences and the collection of those experiences does two interesting things. So first it validates them right: so you’ve experienced something, I’ve experienced the same thing. 

If we can connect those experiences, then suddenly, our experiences collectively feel more real. And when people can collect these things together, they become newsworthy and themselves. So we see over time, the mainstream press maybe not covering individual incidents, but covering the hashtag and the collection of those incidents as a newsworthy event. As you know, in communication, mainstream media coverage is still the gold standard for setting an agenda and create a policy change.

Noshir Contractor: And from a point of view of web science, what insights have we gained by looking at the role that hashtag activism is playing in changing society and transforming the public discourse.

Brooke Foucault Welles: That’s a great question. So, one of the things that’s really striking about web science as a set of tools and also just sort of logic of thinking about the world, is that these conversations have always been happening but they’re happening in private, and in a way that was really hard to track, especially at scale. And now we can — Not only can activists and and people,  regular people find each other online, but we as researchers can find those spaces and reflect on them more fully and completely. Obviously we don’t have access to everything that’s happening, but we have access to so much more. And so much more of the kind of routine everyday organizing efforts that are going on. And so web science gives us the tools and the access to study that and understand how it works.

Noshir Contractor: Indeed, your book covers this topic quite extensively, but it ends in about 2017. A lot of people call the summer of 2020 a global reckoning on social justice. Given everything that we’ve seen recently, what new insights or what generalizations do you think about and reflect upon in light of these social justice movements?

Brooke Foucault Welles:. So one thing people will often ask, when they ask about this book, is why these particular events or why did this happen in the way that it did. And of course, we don’t have the counterfactual world where Ferguson didn’t happen or the Me Too movement didn’t happen or some other thing didn’t happen. 

But one of the things that I think gets lost when we talk about hashtag activism is that there are, of course, the sort of these spectacular events. So, high profile murders draw a lot of attention where people rally around and there’s massive spikes in certain hashtags. But these networks get built. So, the web is sort of inherently networked and these networks persist, right, so these conversations are still going on. It may be quiet ways. And so it’s not as if people stop talking about Black Lives in 2017 and didn’t talk about it again. In fact, they’ve been talking about it the whole time. And these networks kind of laid dormant. They weren’t covered in the media. And then another horrific incident happens. Lots of people are paying attention and we suddenly see not only the activation of the folks who are talking about these things in 2017 and 2018, but this whole new swath of folks who are suddenly in tune with what oppression and marginalization looks like because they’re seeing it and they’re experiencing it in their everyday lives. So I think, I think part of the reason we have that sort of massive surge of attention and the sustained both online and offline protest is that the networks are there and the networks are building, feeding each other and sustaining each other to keep this movement going.

Noshir Contractor: You raised two really interesting points. One is that these networks, if you may, go through periods of dormancy or latency and not much in the public eye, but they’re all there, and then they occasionally will surge in visibility. And the second thing I also heard you mention was that in many cases these networks are not exclusive with each other, that there’s a lot of overlap amongst these various networks and they sort of build on a symbiotic relationship amongst them.

Brooke Foucault Welles: Mmhm. I think that’s right. So we, you know, in the book, we focus on race and gender justice and we do have examples where it’s one or the other. But almost all of them involve both race and gender justice because, of course, those things are intertwined and I’ll add you know in this sort of COVID, pandemic era, environmental and health justice are also intertwined with all of those things. So we see the kind of multiplicity of oppression and marginalization coming to bear and really being discussed in these networks and grappled with in real ways.

Noshir Contractor: And another major contribution that you’ve made to web science is co-editing a volume called the Oxford Handbook of Network Communication. Along with your own work in that area, you’ve talked a lot about the transformative power of networked counterpublics. How does the term networked counterpublics relate to hashtag activism?

Brooke Foucault Welles: Yeah, so this is, thanks in huge part to my co-authors, we coined this term to get here. But it’s an extension of public sphere theory. So, very briefly, you know, publics are groups of people who engage in democratic deliberation. Certain folks, historically, and presently are excluded from those kinds of deliberations so, people of color, LGBTQ folks, women and so on, aren’t full participants, or aren’t always full participants. They form counterpublics. And, you know, fast forward to the networked and online era. Of course, these things are playing out online as well. So “networked counterpublic” really captures this idea that there are groups of marginalized folks who are coming together online to discuss issues and then also advance counter narratives in the mainstream. So it’s kind of a heady theoretical term, but it also has these really applied consequences and implications that we, we can see it happening in these coalitions of folks on the internet bridging across web pages, blogs, different social media platforms and so, to really create an advanced an agenda for social change.

Noshir Contractor: Now this might sound as though the web is just absolutely superb and fantastic and utopian for celebrating network counterpublics, hashtag activism, etc. And yet there are stories on the web, where, for example, in the gaming community, there was a lot of attacks against female members of the community, etc. Can you talk about how the web might be having the dual edge effect — both positive and negative — in the context of some of these issues?

Brooke Foucault Welles: Yeah, of course, that’s right. The very same systems that enable progressive racial and gender justice activism, among other things, also enable regressive radicalization and harassment. And I also note that although I think the architecture of the web is set up to be open and available for everyone, they have corporatization of the capitalization of the web has created the structures that are that are actually hard for marginalized folks to engage in. So the fact that we have Black Lives Matter or Me Too is not entirely because of the web, but sort of, in spite of some of these corporate structures in place.

So one of the things I think web science in general, you know, in society in general, frankly, really needs to grapple with right now is what does it being an open Web actually look like and how might a web serve better the cause of justice better, more than the flow of information sort of unfettered, in ways that can be harmful. And, you know, there are no right answers to that. But I’m confident the web science community can figure them out.

Noshir Contractor: So are you suggesting that we need to have a hashtag activism that focuses on the design of the web and to open the web, #OpenTheWeb?

Brooke Foucault Welles: I love that idea. Sign me up.

Noshir Contractor: Great. Well, I think there are many who would agree with you on that. So, based on your perspectives and all the scholarship and activism practitioner work that you have focused on in your own research and scholarship, what do you consider some of the most significant issues that need to be further addressed by web science?

Brooke Foucault Welles: For sure, #OpenTheWeb is one of them, so how can we think about not just ethics and websites but justice in web science and how we optimize a web for justice in order to correct the current and historical harms. You know, I would also love to see a tighter integration in, you know, not just the social sciences and engineering and stem sciences, Which I think web science does really well, but bridging into the arts and humanities as well. So I think there are space to come up with interesting collaborations and interesting futures for the web, when we bring in kind of the full spectrum of folks working on these spaces.

Noshir Contractor: I think that’s that’s a very inspiring idea. Can you just off the top ot to put you on the spot — Can you think of a couple, couple of strategies that you would offer if hashtag open the web became a thing? And someone approached you and said, What’s the, what’s one thing we can do to stop in that direction? What would you come up with?

Brooke Foucault Welles: That’s a great question. So you know immediately, one thing that comes to mind is creating activist-centered tools. So activists are now working within the sets of tools that are provided by corporations which you know comes with things like surveillance of their data, corporate control over what they can and can’t say. So more open access tools, things that folks can use on their own terms, where they can retract their data if they so choose, or maybe end-to-end encryption in such a way that they can’t be surveilled comes to mind easily. You know, I would also love to see obviously more Black Indigenous and minority ownership over some of these systems. So, promoting, not only the, you know, kind of educational pipelines, which I think we’re increasingly sensitive to but also the corporate pipeline. So how do we get folks who are not just developers, but CEOs of companies and organizations working in this space and obviously then collaborating with folks like that to make sure that their businesses are sustainable, you know, well researched and accessible to everyone.

Noshir Contractor: That sounds really exciting. You mentioned that one of the things that web science does well is cultivate collaboration between the social sciences with STEM fields science, technology, engineering, mathematics, and you also advocated for bringing in more of the arts and the humanities. Can you give an example of a web science project, can you point to one that has done that well, or a hypothetical project that could do that really well?

Brooke Foucault Welles: There’s a book called Data Feminism, which is just a lovely example of, of how embracing the arts, humanities, social sciences and technical sciences, yields new insights on how to observe and subvert power on the web. So, I totally recommend folks read it. And they did a lovely job, which is integrating across all of these things, showing how, you know, systems can oppress people but also help people kind of subvert those oppressions through clever hacks and understanding both the art and the science of the web.

Noshir Contractor: And so, in what way did the arts and humanities contribute to this effort?

Brooke Foucault Welles: So in some ways by studying sort of arts and artist collectives and gives inspirational ideas about how to think differently about power and organizational structure, right? So folks often have very different ways of organizing over there. So that’s one concrete example. I think other ways, you know, sort of applying web science tools in order to create things that don’t have sort of capitalist monetary value, if that makes sense. So things that are lovely to engage with are beautiful, but aren’t necessarily efficient or profitable, I think helps scientists think differently about the value of their work so engaging these exercises a student or practitioner might inspire new ways of thinking about what it is we’re doing here.

Noshir Contractor: That’s really good. Well, in closing, here, I wanted to ask you a question that is relevant to our current times. And so what is the one or two most significant things in your opinion, that would have been significantly different, or for better or worse during the COVID-19 crisis, and then the other crises that have now come along with that, to what extent would this have been different without the web? Can you conceive of what today would have been without the web.

Brooke Foucault Welles: So I’m going to give this response from my own sort of unique perspective as a, you know, an American, living in Massachusetts raising kids. At a federal level, we had a pretty complete failure of communication, right? So, so, that’s an interesting example of when centralized communication really broke down, in terms of what to do and how to do it, but we saw lots of people rising up on the web and taking that spot disseminating good information, science-backed information on how to handle ourselves during a pandemic here in the US. My friend’s an epidemiologist, suddenly got zillions of followers on Twitter. You know, rectly. I don’t think that would be possible without without the social media or web. because folks are looking to scientists di

The other thing I want to lift up is just the incredible work of teachers and educational technology-makers to create opportunities for children to stay connected and sustain learning, you know, as a parent of kids in that age category. It’s been incredibly helpful not only you know for their benefit, but for my benefit as someone who then needs to work and find a way to educate children and work at the same time. The fact that those tools exist and that they can be disseminated locally, you know, regionally and even globally and kids can continue to have learning experiences engage with one another and with educators all over the world is pretty incredible. So I’m grateful for that.

Noshir Contractor: And I’m glad that you preface it by saying that you were making these observations as an American based in Boston, because we know that these kinds of privileges that we might have here are not necessarily universal and that while the web is the World Wide Web, the benefits of the web are not necessarily worldwide. And so your points are really well taken up there.

I want to again thank you so much for talking with us about the work you do. You are uniquely positioned as one of the rising stars in the area of web science, both in terms of the research you do, in terms of the activism you do, translating it into applied areas working across a variety of disciplines. And I also want to take a minute to thank you for helping build a community of web science. You’ve been one of the organizers of web science conferences in previous years, and you’ve been very active and engaged member of that community. So thank you again very much for taking time to talk with us today.

Brooke Foucault Welles: It’s my pleasure.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

Episode 5 Transcript

Fil Menczer: Astroturf is alive and well, unfortunately, and it’s getting more sophisticated and harder to detect. And so in some sense, it’s job security. There’s no shortage of research challenges, you know, even 10 years later to try to identify this kind of manipulation.

Noshir Contractor: A decade ago, Fil Menczer was studying digital astroturfing right as it was ramping up online, and he’s continued with that work. But that’s not his entire breadth of research. Fil is a distinguished professor of Informatics and Computer Science at the Indiana University School of Informatics, Computing and Engineering. He’s also the Director of OSoMe — not just the word awesome, but the Observatory on Social Media. Shortened, it becomes OSoMe, pronounced “awesome.” 

His research spans web science , computational social science, network science, and data science. He focuses on analyzing and modeling the spread of information and misinformation in social networks, and detecting and countering the manipulation of social media. Besides all his professional activities and accomplishments, Fil has been an early fan of the web science movement, and in fact organized the Web Science Conference in 2014. Welcome, Fil. 

Fil Menczer: Thank you very much for having me, Nosh.

Noshir Contractor: Let me start with something that I know you spend a lot of time thinking about and uniquely positioned to help kick us off here. How do you think social media can be manipulated for the spread of information?

Fil Menczer: Essentially, you know, social media are platforms that let people communicate and share their opinions and their thoughts. And also everybody has a responsibility in also spreading other people’s opinions that they agree on. So in some sense, we’re all editors, but we don’t all have, you know, the ethics and the experience and the skills of journalists. we’re vulnerable to being misinformed, we’re vulnerable to spreading misinformation ourselves. 

On top of that, platforms have all kinds of mechanisms that they use, for various reasons, often very good reasons. For example, trying to figure out what’s interesting and making recommendations about who to follow or who to friend, or what to pay attention to. And our research shows that all of these mechanisms have some unintended consequences. So for example, showing people how many people have liked the video makes them more likely to look at it. And that’s something that can be gamed. Or recommending, a friend of a friend might accelerate the formation of echo chambers, where you are exposed to less diverse points of view, and perhaps even more vulnerable to you know, to be manipulated. 

And then on top of all of that, platforms generally have API’s — application programming interfaces — which are ways in which one can write code and programs to interface with these platforms. On the one hand, this is wonderful because it allows us to collect data and do research. It also allows different people to come up with new applications, new ways to use this data. And those are good applications. But at the same time, it also allows bad actors to manipulate the platform by creating fake personas by impersonating people, by creating the appearance that many, many people are sharing your opinion, or angry or happy or supporting an idea or attacking a candidate. Where in fact, this is all the work of maybe one single entity. And so people can be tricked. Because our natural you know, cognitive and social biases tend us to trust things that come from our friends or or pay attention to things that look like they’re getting a lot of attention. And those things can be gamed. 

So it is really easy actually to create social bots, that’s the term that we came up with several years ago to identify these inauthentic accounts. And then those accounts can be used to game and manipulate and also to amplify the spread of misinformation. We’ve shown that in our work as well. So it’s not a simple answer. There is a very complex interactions between different algorithmic biases and social and cognitive biases that play together in creating this ecosystem of information, which unfortunately, is vulnerable in many ways.

Noshir Contractor: You were one of the first people if I remember talking about astroturfing on social media. Can you tell us a little bit more about how far we’ve come in both the growth of astroturfing and how we can combat astroturfing today?

Fil Menczer: Yeah, in fact, it was 2010 when we started collecting data from Twitter, on a large scale, and actually there is a connection to web science there. I don’t know if you know the story. But at the Web Science Conference in 2010, there was an article by (Panagiotis) Takis Metaxas and Eni Mustafaraj on a bunch of fake accounts that had attempted to manipulate a special election that was happening in Massachusetts at that time, and to replace Kennedy who had died. And they found that the night before the elections, a bunch of fake accounts pushed some misinformation about the Democratic candidate. And that generated a lot of traffic, even though Twitter took down those accounts very, very quickly, because they were doing typical things that spammers do. And despite this on the day of the election, if you search the name of the candidate on Google, you would find this fake news because those social bots had been successful in creating a viral cascade. And then Google picked up that signal in their search engine. 

So that was a fascinating paper, it actually got best paper award at web science. and so as I was watching it, and talking with, you know, with Takis and Eni afterwards I was thinking, you know, is this an isolated incident, we need to get more data and see if this is, in fact, just the tip of the tip of the iceberg. And that’s where we started this whole study of manipulation, of social media, and astroturf, which is like fake grassroots campaigns. And what we found, in fact, is that it was very widespread. And when you look at systematically everything that was being shared on on Twitter about the elections, that was a midterm election year, there were thousands and thousands of memes and links to fake news. And that’s the year that we found the first instances of fake news websites, and we found bots that were coordinating to support a candidate, and to amplify and make it trend and bots, they were spreading fake news — real fake news, like completely manufactured, made up attacks against candidates and then targeting journalists trying to get it to go viral. And that’s when we realized this was a system that was extremely vulnerable. 

And our first tools to detect this were based on looking at the structure of the network, of the diffusion of networks. And that gave us some good signals, so we could build very simple machine learning algorithms to try and detect these kinds of astroturf. And over the next 10 years, you know, that has continued and and and now we’re, you know, we’re looking at individual accounts that may be inauthentic as well as coordination, that doesn’t happen even without donation, they may not be bought, but they may be a bunch of accounts that are run by people, but that impersonate other people. So even though it looks like it’s 1000 independent voices that is pushing a particular message or conspiracy theory, it’s really one, you know, entity that’s really controlling all those accounts, even though maybe they are using software, maybe they’re not using software. So astroturf is alive and well, unfortunately, and it’s getting more sophisticated and harder to detect. And so in some sense, it’s job security. There’s no shortage of research challenges, you know, even 10 years later to try to identify this kind of manipulation.

Noshir Contractor: Could you talk a little bit about the network by which these messages spread? Is there a way to tell whether a message was astroturfed? In other words, artificially made to spread as compared to one that was truly organic and truly grassroots, rather than artificially grassroots?

Fil Menczer: Those were the early days, where a lot of these kinds of manipulation was easier to spot than it is today. It was easy to detect some of these manipulation but of course, there might be other astroturf and social bots and malware and manipulation that we did not catch, you know, so we only know the things that we did find. But among those at that time, our intuition was that, like I said, the structure of the network could provide useful cues. So what we did is we built this diffusion network, where a node is an account, and a link between the two accounts identifies either a retweet — at that time quoted tweets didn’t exist yet — but it could also be a mention or reply. 

And so now we have this network with different kinds of edges. And we can look at things like you know, how influential a node is by looking at how many times it is retweeted. So we could look at the distribution of hubs or popularity or influence among the nodes and extract statistical features. For example, you know, the skewness of the distribution of the degree or the strength which is the way that degree of these nodes, we could also look at the community structure. Was the network fragmented into, into many different groups or was like one big connected component in the network? And also, you know, was the idea whether it was a link to a fake news site or hash tag or whatever, was it injected by many independent people?

And also, we could look at the distribution of the weights on the edges, right. So for example, if you have two accounts that retweet each other thousands of times, you know, that would be demonstrated by a very heterogeneous distribution of weight degree. And that was a very strong signal. So the very, very first two bots that we discovered, was in that way, we found that there was this edge between two accounts that had a weight of 10,000. And we thought there was a bug in the code. And eventually we realized, no, no, this is no mistake, these two accounts, are we shooting each other 10,000 times in the last week. And so we looked at them. And then we realized, “Oh, my gosh, these are obviously bots.” They were two accounts that were just automatically posting and reposting things at very, very high volume. And now today, if the two accounts that did that would be immediately detected and suspended by Twitter. So you have to be a little bit more sophisticated in order to evade detection. But at that time, simple signals like those were sufficient. 

And these days, our bot detection algorithms use much more sophisticated algorithms to look at over 1000 different features that characterize not only the structure of the network of diffusion, but also characteristics of the accounts, of the profiles of their friends, the content that they generate. We do we do speech analysis for the content, we look at sentiment analysis, we look at temporal patterns, for example, not just how frequently they tweet, but also do they do it in a bursty way, like humans, or in a regular way that looks more automated. So there’s lots and lots of different signals that we try to pick up to try and infer whether there is some, you know, automation.

Noshir Contractor: Well, one of the things that you’ve described is that there’s a constant, cat and mouse game between your ability to detect structural signatures and signals, and what those who are trying to evade your detection are going to continue to improve those signals. In the context of bots, are we at a stage where you find that bots are being created to create new bots?

Fil Menczer: (Laughs) That’s a very interesting question. Meta bots. We haven’t seen evidence of that. However, what we do find evidence of is some sets of accounts that are all very, very similar to each other. So for example, they all have, you know, a pattern in their name, like a common first name, followed by an underscore by a common last name, followed by a sequence of digits. Also, they might all have the same description.. So a lot of times what we what we find suspicious is not the behavior of a single account. So you have to look at not the pattern of an individual account, but the pattern of a group of accounts. And then you might say, each one of these accounts looks perfectly reasonable, it looks like maybe you know, a person who posts about politics, maybe supporting this candidate or that candidate. But now if you look at 10 of them, and they see that they are tweeting at the same time, or they’re tweeting exactly the same sequences of hashtags, or they’re all retweeting one account that they’re trying to, to support and amplify. And then that’s where you say, “Well, what is the difficult probability that by chance, you have this kind of behavior by many independent accounts,” and if that probability is very low, you know, let’s say 10 to the minus four, then you say, “Okay, this is a suspiciously similar behavior, probably there is coordination.” 

Other examples are accounts that post the same images in sequence, or very similar images. So there’s lots of ways that we’re looking at to identify this kind of coordination. And so that coordination in some sense comes because under the surface, there is probably an entity that is using software to automatically control all these different accounts. Even if the messages are coming from a person like there is a human that says, you know, go red or vote blue. But that human now is doing that on 100 accounts or 1000 accounts. And so we can spot the pattern of similarity that gives it away in some sense. And so those are ways in which, you know, the arms race that you were talking about is really happening. Not only individual, you know, bots are becoming more sophisticated, but also humans are mixing with software to create accounts that are more difficult to detect by looking at individual accounts. And of course, looking at large groups of accounts is computationally much, much more challenging. And so it requires a lot more work and, and sophistication and also more, you know, computational power. So it is tough to catch all of this abuse.

Noshir Contractor: It almost seems like we need a Turing test to detect whether it’s a human being or a bot that you’re dealing with on social media.

Fil Menczer: That’s a very interesting observation. In fact, the key of the Turing test is that, you know, you were talking through an interface with either a human or the computer. And so the only thing that you could see was, was there, you know, whatever they were saying, and how, and in some sense, social media have made that easy for anyone, because all you see is the presence of social media, you have no way of knowing who’s behind that identity. Even platforms, you know, they they have access to some additional signals, like, you know, maybe phone numbers, IP addresses, but even platforms cannot really know sometimes for sure, who’s behind an account in certain typically, they can see where they are violating some terms of service, and whether there is coordination, but nobody knows who’s behind it. 

And so this means that there is plausible deniability, if this campaign is trying to promote a particular candidate, that candidate can claim, perhaps correctly, that they had nothing to do with it. And there is no way of proving who’s behind it. Very, very rarely, do we have, and only through extensive work by intelligence services, can we say, “Oh, for sure, there was that particular state actor behind this activity.” In the majority of cases, at best we can detect them and maybe alert people or perhaps remove them, if they are, manipulating the public or abusing, you know, the rules, but we cannot really say, “Oh, that that actor is behind it.” And so that actor is free to just start over, maybe tweak the algorithm and do it again. So yeah, it is as hard as, as the Turing test.

Noshir Contractor: One of the things that you are really well known for his being the director of the observatory on social media, which abbreviated to OSoMe, and it’s pronounced awesome. As someone who appreciates the creation of clever acronyms, I’m truly impressed with “OSoMe.” And I wanted to ask you a little bit about what went behind that there are many people who, for decades have talked about creating some kind of an observatory to study the web, you have done it, and you’ve done it successfully. And you now host several tools that you’ve created, that go beyond your own research, but actually gets used not only by others within the research community, but it’s routinely used by journalists, and so on and so forth. Talk a little bit about how you made this happen. And what were the lessons you’ve learned from it?

Fil Menczer: Yeah, so OSoMe. It is a, is a cute acronym. I can’t take credit for it. It was, I think it was our Associate Director for Technology who came up with this idea. The idea of using the word observatory in it actually came from web science, because the web science community, you leading it, among others, was really sort of pushing this this idea of collecting data on a large scale from the web, to get to a deeper understanding about some of the social you know, social impact and social phenomena, you know, of society, in some sense. Our behaviors, how are they affected by the information that we see, how is data about our online behavior is telling us something about social action, about norms, about behaviors, and about vulnerabilities, which was the part that interested me in particular. And so that’s where the idea of observatory came from, it came from the web science community. 

So, as you say, in addition to doing lots of research, we also like to develop tools. And that’s because, you know, in web science, we think that it is important to go beyond just research and to actually do work that can impact society in a good way. And, and for me, you know, based on based on our skills, one of the things that we could do to help a little bit was to make the tools that we build for research and push it a little bit further to the point that they can be used by a broader audience so that they’re not only useful to write a paper, although that’s important, too. But they can also be used, like you said, by journalists, by investigative reporters, by civil society organizations, and also by common citizens to gain an appreciation for whether they are vulnerable, whether they are talking with another human being, whether they’re being manipulated. 

So, for example, our most popular tool is called Botometer. And it started from our research on bot detection. And then we had a demo for a grant that we had, and as part of that demo, we thought, okay, let’s just, you know, put it on a little website so that you could, and then we realized, wow, this could be useful to other people. And so eventually, it became a public tool that is now called Botometer. And that is used a lot it. We serve between five and 600,000 queries per day.

Noshir Contractor: Wow. 

Fil Menczer: It’s very, very popular. Now, obviously, it’s not perfect, because just like any machine learning algorithms, it makes mistakes. And there is a lot of research challenges that we’re pursuing. And some of my students are working in their dissertation about how to improve these tools, how to make them better able to recognize suspicious behaviors that are different from those in the training data. Also how to combine supervised learning and unsupervised learning, as I was saying earlier to detect these coordinated manipulation campaigns that may not necessarily use automation. So there’s a lot of research challenges in building these tools. But we try to also bring the results of those out into things that can help other people. 

So Botometer is our is our best example. But we have several others.

Our latest tool that we’re kind of excited about is called BotSlayer. And it is a way to let people even without technical skills, set up an infrastructure in the cloud, where they can, with just a few clicks, track all of the tweets that matched some query as they happen and look at them in real time on their screen, and also have all of the entities that we extract from these tweets, an entity can be a link to a news article, it could be a hashtag, a username, a phrase, and then for each of these entities, see how many people are sharing it? How many unique people are sharing it? Are bots more likely to share this particular entity than something else? And also, is there coordination among these accounts? Are they all retweeting the same, you know, the same set of users and so on. So, in some sense we’re making a very complex tool that we’ve been developing for our research and putting it at the fingertips of other researchers. journalists, and nonprofit organizations, so we have hundreds of organizations around the world that are licensing this, and we hope soon to have the next version that will be a little bit better and more robust, and make it available so that people can use it to study COVID-19 to study, you know, the current protests around the Black Lives Matter movement, and so on. 

So those are some of the things that we have out there, there’s a few more. We really think that creating tools, you know, and making them openly and freely available to the community is an important part of our mission of the observatory.

Noshir Contractor: And your group is so good at it. And it’s really making a major contribution to academia, but also to society at large. And so thank you, again, for all the work that you’re doing on that front. One might get the impression listening to this conversation, that bots are always evil, especially when you have apps called Botslayer, for example. Now, is that true? And if not, then how can you distinguish between a good bot and a bad bot?

Fil Menczer: That’s a very good question. And absolutely, that’s not true that all bots are bad. You’re absolutely right. In fact, many, many bots are very useful. And we all use them, right? If you, for example, if you follow the feed of I don’t know, The Wall Street Journal, or the New York Times, or your favorite news source, that’s a bot, right? It’s an account that automatically posts things that you can extract from an RSS feed or you know, or some other source. And then there are some bots that are funny and interesting and entertaining, and others that are kind of trivial. So there is a huge range of behaviors, but many of them are perfectly innocuous or even or even helpful. And our research is focusing on detecting automation. Because when that automation is not revealed, then the bots can be used to manipulate. Now, if a bot says I’m a bot that tells the time every hour like at Big Ben, there is nothing wrong with that, it’s not trying to mislead anyone, right. And if it says, “I am the you know, the New York Times, and I post the news every five minutes,” there’s nothing misleading about that. People know who they are following. But if a bot says I am Nosh Contractor, and I’m a professor at Northwestern, and here’s why you should really believe that if you want to, you know, be cured from COVID, you should drink Clorox. I mean, that’s, that’s an inauthentic account, that is impersonating a person. And making it look like that person is saying something, which in this case, in this example, is false, and in fact, dangerous, very dangerous. And this is done a lot. 

So we hope that the focus on detection of automated accounts and not only automated accounts, also coordinated accounts, like I was saying earlier, can be useful in spotting this kind of abuse. That’s the one that we are worried about. Obviously, we’re not worried about benign bots. 

But sometimes the same technology that lets you detect one also lets you detect the other. So we train our machine learning algorithms with whatever bots we can find out there, either because they tell us themselves that they are automated, or because some human experts have looked at them carefully and concluded that they are automated or perhaps because you know, Twitter has taken them down, so we know that they were inauthentic. And so we use those data sets of labeled accounts to train our algorithm. And the hope is that they are obviously they’re not used to do anything against, you know, benign bots, but they could be used hopefully to alert people about the malicious ones.

Noshir Contractor: Thank you for clarifying that difference, because I think it’s an important distinction to recognize and appreciate that bots are not intrinsically nefarious and that we interact with them all the time.

Fil Menczer: But there is also like everything in between, right? There are accounts that for some while they are doing good things, and then they are turned or because you know they are hacked or because people let applications post on their behalf. And so you’ll have accounts that are partly automated, partly manually controlled. So it’s a very complex ecosystem where you find all sorts of complex behaviors, and it’s really hard to make sense of it.

Noshir Contractor: In closing, I want to ask you about something you have already referenced. We live right now in an age of reckoning when it comes to social upheaval, in addition to the pandemic. And I want to know, if you could share some of your opinions about how things would have been different, for better or for worse, if we were experiencing this without the web?

Fil Menczer: Oh, without the web? Oh, my gosh, that’s a really, really interesting question, and tough as well. (Sighs). Well, I’m an optimist and a technologist. So I would say that overall, probably the better outweighs the worse. But certainly, we have plenty of examples of both right? In some sense, the web has, you know, enabled some amazing advances, whether it’s in, you know, sustaining social movements for the advancement of humankind, creating public awareness about huge planetary issues and challenges with global warming, and, you know, pandemics and racism and creating awareness of these issues. Imagine that the economic harm of the current pandemic, as bad as it is, and it is terrible. Imagine how much worse it would be if we didn’t have communication technology, so that we could still teach remotely as badly as we do that. But at least, you know, it’s something that we could do remotely and, you know, let alone conferences and, and teleconferencing and so on. But just the capability of being able to, to connect with each other, even at a distance, you know, the world would be much worse-off if the web didn’t exist to enable those kinds of interactions. 

So there is a lot of good in it, there is good in it in allowing, you know, minorities or groups that have less power to put their message out there. So to some sense, the democratization of information. That was this utopia of the early days of the web that we all bought into, certainly I did. to some sense, it has happened. And so the world is all the better for it. For the same reasons why the web can be used for all these good things, it can also be used for all sorts of bad things. This has been true of every technology in history. And it’s also true of the web. And it’s true of social media. And it’s true of zoom even, the last thing that we now realize how it can be abused, so, of course, technology can be abused, and all the things that have been talking about.

And our research is really focused on those kinds of abuses and manipulation, whether it is spreading, you know, misinformation, or suppressing, the voting, which for me is one of the huge challenges ahead of us that one of the reasons that motivates our desire to detect manipulation, because we see that that’s one of the main applications, you’re probably not going to change people’s opinion, if you weren’t going going to vote for one candidate, you’re not going to vote for the other candidate. But you might let somebody decide not to vote, if you convince them that that candidate is really not that much better than the and the other. And I think that this probably has happened in the past and will continue to happen. And then we haven’t even yet seen large scale consequences of new technologies that are just now becoming mature, like deep fakes. And I think that those possibly could pose big challenges in the next few months. So as for anything, there are good things and bad things, and certainly the world would be very, very different without the web, for better or for worse, I would say more for worse. Overall, I’m still an optimist, and I think that we can make things better.

Noshir Contractor: Well, thank you again, Fil, for talking with us and giving us some really interesting insights about the role of bots and the ways in which you and your team has helped contribute to detecting the nefarious bots and helping make the world a better place as a result of that, especially with not only your own research, but the tools that you’ve made available. So thank you again very much. 

Fil Menczer: Thank you so much for having me. 

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.