Episode 35 Transcript

Tim Berners-Lee: We have to realize that the people who run the social networks, the existing social networks, can change them. You know, they’re all just code. Maybe you’re a feminist blogger on Twitter or something. And you get up in the morning, and it’s hard to avoid people saying really nasty things about you. Then it’s reasonable to go to the platform. It’s reasonable to hold them accountable.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast, we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web. My guest today is Sir Tim Berners-Lee, who you just heard talking about how we can reengineer digital technologies for good. Tim is a computer scientist best known as the inventor of the World Wide Web.

But before I introduce Tim, an announcement: This is not only the 35th episode of Untangling the Web but the last one for this season. Keep your ears open on your favorite podcast platforms for a mini episode reflecting on the past, present and future of the Untangling the Web podcast series.

That said, let’s get back to Tim. Tim is a professorial fellow of computer science at the University of Oxford and a professor at the Massachusetts Institute of Technology. He is the director of the World Wide Web Consortium, W3C for short, which oversees the continued development of the web. He co-founded the World Wide Web Foundation and is the founder and president of the Open Data Institute. He is also the founder of the Web Science Trust, which produces the Untangling the Web podcast. In 2004, Sir Tim Berners-Lee was knighted by Queen Elizabeth for services to the global development of the Internet through his invention of the World Wide Web. Welcome, Tim.

Tim Berners-Lee: Thanks, Nosh.

Noshir Contractor: It’s a true delight to have you on an episode of Untangling the Web as somebody who created the World Wide Web. I want to start by asking you, how you, following the creation of the World Wide Web, decided that one needed to think about the science surrounding that and your use of the phrase “web science”?

Tim Berners-Lee: People think simplistically of the web as being a technical system. Somebody makes a link and somebody else follows a link. The computer, when they click on the link, does the business of bringing up the destination webpage. And the person who made the original link, why did they make the link? So things like podcasts or before them blogs, you know, they spread because people interact through the web. The web and all things like it, they’re sociotechnical systems. They are people and computers interacting. And so they needed to think about, at the time, to get people from both sides, social sciences and the, the technical side, to collaborate together very closely. Hence web science.

Noshir Contractor: And the significant difference is that once the computer was invented and we created a discipline called computer science, unlike computer science, your vision of web science from the start was to be an interdiscipline, as you said, on both sides of it.

Tim Berners-Lee: Yes, in a way, one of the models was cognitive science, that people took, they very deliberately took sort of psychology and neuroscience and a bunch of different disciplines, including AI I suspect, and said, okay, if we’re going to study the brain, we have to get all the people who are looking at different sciences to talk together. We had a figure like, yes, there are lots of neurons in the brain. But there’s even more web pages out there on the web. So if something is going to require study in its complexity, the web merits it, just like the brain.

Noshir Contractor: What are the areas where you think we need to see a lot more progress? If you had to make a recommendation about what web science today should be focusing on, what, what would be your recommendations?

Tim Berners-Lee: For a lot of people out there, there’s a huge concern with the web in general is fake news. And not just fake news, but the deliberate manipulation of people to do things which is not in fact in their best interests. So I think, because the web is such an important tool for making democratic decisions and so on, what I think we need to do is build more tools for democracy, build more tools for coming to collective agreement and understanding about the way forward. Something like Wikipedia or Reddit or something where people come online and interact together, when it’s a system which allows people to, to interact in a way that’s managed. The system is providing the process which end up leading to people being more constructive, respecting each other’s opinions rather than attacking them, and so on. So I think the way we build our social networks in particular and our interactive collaborative tools has a lot to answer for. And so I think it’s important that we build new ones, which are better at that, which are better at helping us be collaborative, constructive, and democratic.

Noshir Contractor: One of the reasons why the web has been so fascinating is that in some ways, there is no centralized control. So engineering in that sense becomes a tricky issue where you’re creating something that has at it’s very premise, a level of decentralization.

Tim Berners-Lee: Yes, but you still create it. You create a little protocol, for example, whether it’s sort of email, or the ability to link to somebody’s blog, or something, and that technical protocol comes along with an accompanying social protocol, the social protocol being people like to be read and will do almost anything to get that. There used to be hit numbers on the bottoms of blogs. And you’d get up in the morning and see how many people read your blog overnight. And so people are motivated by being read and to be read they need to have a good blog or a blog that people thought it was worth linking to and so on. The sociotechnical system then evolves to become, you get a virtuous circle where people try to make their blogs better and better, and they link to more and more interesting blogs, and the blogosphere just seems to become ridiculously beneficial, ridiculously valuable. And that’s the way the technology itself should work when – on a good day.

Noshir Contractor: I was just going to touch on that – on a good day. Because as you pointed out, in many cases, you’ve described examples of virtuous cycles. And we also witness many instances of vicious cycles, whether it is in terms of overall societal well-being or human rights, democracy, education, take your pick. What do you think today are ways in which we need to reengineer the web to help make the good days more frequent and the virtuous cycles more likely than the vicious cycles?

Tim Berners-Lee: We have to realize that the people who run the social networks, the existing social networks can change them. You know, they’re all just code. Maybe you’re a feminist blogger on Twitter or something and you get up in the morning, and it’s hard to avoid people saying really nasty things about you. Then it’s reasonable to go to the platform, it’s reasonable to hold them accountable. When the Web Foundation looked at online, gender-based violence, I think there were three cases where they looked at what was going on with the platform, and they suggested that the platform could make a bit of a change. And the platform made that change. But certainly things like should you let people know how many people have liked their stuff? Or can you remove the toxicity of people just talk about what they’re doing without being able to see what the reactions were? We should do the web science, do the analysis, but also we should be prepared to force platforms to change if they are creating badness in the world.

Noshir Contractor: And you’re absolutely right, there are small pieces of information or nudges that we could tweak on social media platforms that can have enormous implications in terms of our motivations for posting things or also nurturing more constructive deliberative discussions. You’ve been very instrumental in creating this ecosystem of institutions, the World Wide Web Consortium (W3C), the World Wide Web Foundation, and the Web Science Trust, among others, to nurture and to tend the growth of the web. What prompted you to build this ecosystem? How do you see these connected to each other and helping one another as part of an ecosystem?

Tim Berners-Lee: So in a way, it was a tremendous luxury to be able to sit down in an office at CERN and just write down HTTP and HTML specs and code them up without anybody looking over my shoulder. But those days were immediately over the moment the thing started to take off. Then immediately, we had to make sure we had agreement, we had interoperability, every web browser across the planet had to speak the same language. It had to be able to talk to any web server. That’s a big ask. So the engineers working on areas where protocols and natural policy is important, they have to do this very grown up thing of realizing that other people have got great ideas too. And if you go to the working group and sit down with them, between you, you will actually come up with something which is even more powerful than you could have got.

So the W3C is a standards body. It started in ‘94. People came to my office from Digital Equipment Corporation, American computer company, came in to my office in CERN and said, you know, we need to have standards, you need to make a consortium. The X Consortium, it was run by MIT. We like that. I wanted it to be international, so we had a base in France and a base in Japan and China. There are hundreds of exciting specs being produced, adding accessibility to it and adding internationalization to it. So putting all sorts of values behind that community and saying, everything you do must be as accessible as possible to people with disabilities. Everything you do must work in any culture no matter what language people speak. Starting the consortium as a community with those values I think has been really crucial to the healthy progress of the web.

There was a feeling when we got to the point where, oh, somebody noticed 20% of the world is actually on the web. Then that meant, that 20% of the world is enabled, more. If we’re constantly making the digital divide bigger by making the web more powerful, don’t we have a moral obligation to try to get the other 80% of the world online as quickly as possible. Partly in the way we design the web to make it work on mobile devices and so on, talking to governments about making sure that they do allow a competitive industry for the internet connectivity, for example, that the internet is affordable, things like that. So the Web Foundation came out of two pieces then: the shouldn’t we make sure that the other 80% of the world get online as quickly as possible, then we have a moral obligation also to make sure that the web serves humanity, that it’s used for good. That was the Web Foundation, while meanwhile the Open Data Institute came from a more focused realization of the power of data. When Gordon Brown was prime minister of the UK asked, what should the UK do to make the most use of the best use of the Internet? And we said, “Well, you should put all your government data online for free.” And, and then he said, “Okay, let’s do that.” And so that project, the Open Data Institute was to help that.

Noshir Contractor: Tell us a little bit about what prompted you to come up with the Web Science Trust.

Tim Berners-Lee: So web science was initially very much on the academic side, compared to the consortium. It was important that we got our colleagues in different fields to come and help us figure out how to analyze this complicated thing that is the web. It needed an institutional home and an institutional existence and the Web Science Trust turned out to be the most useful thing. We have a web science journal and a web science conference, and so on, and so the Trust holds those things together.

Noshir Contractor: You have, for a long time, obviously, been passionate about the virtue of the web being decentralized. Your most recent work is to revisit that decentralization in terms of launching Solid, or SOcial LInked Data.

Tim Berners-Lee: Solid is a new level of protocol for the web, it’s using HTTP and it’s using linked data, and you know, we talked about the blogosphere. That’s one place we could start. So back in the day, right, so imagine you were a total geek, and you had a computer in your basement, which is running a web server and you connect it. You have to get a domain name, then you put interactive things on your web server. You could be creative. And most in particular, you could chat there and chat with other people and you could link to other things. Whereas now everybody’s on Facebook. Everyone’s on the big social networks. So what went wrong with this picture? The feeling with the blogosphere was, it was decentralized. Everybody had their own website. And there was a certain glorious power. People call it now sovereign identity. You wrote the rules of your own website. You could turn it off, you could unplug it, or you plug it in and just be part of humanity. So the Solid project, the SOcial LInked Data Project, is a movement, if you like to, let’s say, how can we go back to those days? We’ll make it so that everybody does have a website. You have a place where you can publish stuff. It’s like, a bit like a Dropbox account or a G-drive. It’s a personal cloud. It’s like a thumb drive in the sky. The very enabling thing is that once you put stuff in your Solid Pod, anybody anywhere to access it. A Solid Pod gives you the ability to connect to anybody else, to share with anybody else. So you don’t have to get everybody onto Flickr. You can certainly share the photographs with people on Flickr. You can share them with people who are on Facebook or people on LinkedIn. In a way, it breaks down those silos. Also from the point of view of ethically, because you’ve left the user in control of their data, actually, that’s what you should do, because of GDPR, and just the ethics of data ownership and data control. So now we believe philosophically that everybody should, you should be able to control your own data. Solid gives you the technology to be able to do that. So that’s more or less where Solid came from.

Noshir Contractor: That’s fascinating, Tim. Is it fair to say that two key tenets of Solid is that each individual is now in more control of their own data, but that the interoperability available through Solid means that social media platforms don’t own your data but can essentially share data that you want to share with them through the protocols of Solid?

Tim Berners-Lee: Interoperability is, yes, it is really important. It’s really valuable. At the moment, the social network programs are talking about portability, and the government may mandate portability. That is, if I’ve got all my photographs in Flickr or Instagram, I should have the legal right to extract them and then put them into Google or Dropbox or whatever. Interoperability is much more powerful. Interoperability is, it doesn’t matter where your photographs are. You can use any app. So I’m building a great slideshow with my favorite slideshow app, but my kids are all watching the slideshow using completely different apps on their phone, because the way it’s stored in Solid Pods is standards. So that means we need a lot of standards. In the internet stack-level terminology, this is layer seven. This is the application layer. So what we’re doing with Solid is we’re taking all that stuff and we’re making standards, which will be interoperable between people’s Pods – very much more empowering for the user to be able to run, choose to run different applications at the same time on the same data.

Noshir Contractor: I can see that this is very empowering for the user. What would you say is the incentive for the existing platform providers to play and make their work compatible with Solid? From their point of view, what is the incentive to do this when they already control large parts of the market for example?

Tim Berners-Lee: To a certain extent, if they don’t, that’ll be fine. We’ve seen this before. If you insist on being a silo, you can be a silo for quite a long time. AOL was a silo for quite a long time. But eventually AOL had to admit that there was this internet thing, and then bit by bit, and then it became a part of the internet rather than the other way around and so on. So when you have a silo which does not interoperate with the rest of the world, the classic walled garden, the jungle outside the garden always becomes more exciting.

Noshir Contractor: It is extremely exciting to see the vision of Solid. Solid would make sharing your vaccine status, for example, a lot easier than it might be currently. As we look at the last couple of years in the pandemic, what are some of the ways in which you believe the web has helped humanity cope with the pandemic, learn from the pandemic?

Tim Berners-Lee: Well, I suppose people, what people will remember will be we worked from home. When the only way you can work is to work from home, then clearly it’s about do you have a device? Do you have connectivity? One of the more relatively recent standards in a web is this web RTC, real time communication, which is the way we can sit in a webpage and have a video chat. So obviously, that’s been a huge part of it. Some of the interesting things to think about is, there’s always been the synchronous-asynchronous question. So if you’re teaching a class in person, it’s synchronous, they have to be in the same space, and then they have to be on the same time zone. If now they’ve gone home to their homes all over the world, they’re on different time zones, actually, that doesn’t work. But is it fair to prioritize synchronized if people are actually spread all over the world? I think the ways in which we can optimize between synchronous and asynchronous is interesting. I use Solid a lot for managing my life. I have lots of different to-do lists. We have a tracker of all the incoming requests for me to do things like this. So when the office of TBL sort of goes over those requests, and that shared state is really important. We can see which things we’ve said yes to and which things we said no to and which things we’re not sure about. So I think the, while synchronous communication maybe is something we’ve learned to do a lot more during the pandemic, we should now also really think about asynchronous communication as having a web of data, which captures the state of our company, of all the relationships in a company and its providers and suppliers, status of all of the relationships, the contracts, and the tasks, and so on. So in a way, one of the things I want to use Solid for is, I want everybody to be able to capture the way they collaborate, in particular, and especially the way they do democracy.

Noshir Contractor: That’s really interesting, because in some ways, we were so fascinated with the ability of technology to enable synchronous communication that we have to, as you point out, now reimagine the virtues of asynchronicity and the ability it gives us to have an ambient awareness of a lot of various facets of our life.

Tim Berners-Lee: Yes, and you and I are having a nice synchronous communication. It was nice to chat to you, Nosh, but then actually, the reason we’re doing this is so that it will become a piece of audio. I’ve been listening to lots of podcasts and reading lots of blogs, and you can listen to something happily until somebody comes into the room, and then you have to start reading it. So one of the things I want us to be able to see is much more facility for moving between these modes. Can you get your technicians to produce this podcast so that there’s a printed version and a listened-to version such that I can switch between them at any point on the way. For us, it’s synchronous. For everybody else, it’s an asynchronous piece of, of a resource which they can go and use whoever they are, whenever they want. To be realistic, we have to maximize that side of the value.

Noshir Contractor: And indeed, this podcast does have a transcript. And so if there are those who prefer to read it, rather than to listen to the conversation, they have the ability to do that. I take that point really seriously as a way of us reimagining what communication would be post-pandemic, whatever way, shape, or form that takes. Well, Tim, thank you, again, so much for taking time out of your extremely busy schedule to share with us some of the ways in which you not only invented the World Wide Web, but also have continued to nurture an ecosystem of institutions that are doing their best to keep this web decentralized. I look forward to our next opportunity to meet in person again. But in the meantime, I want to thank you so much again for taking time to talk with us about the web and your vision for it.

Tim Berners-Lee: It’s me that thanks you. Thanks for having me.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Susanna Kemp. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks so much for listening to both this episode and all of our episodes over the last year and a half.

Episode 34 Transcript

Brewster Kahle: We’re now going backwards and digitizing books, music, video. And we really want an open library system as opposed to a commercial answer to the whole thing. There’s so much wonderful things that are just not being read because they’re not that available, and people are going to read whatever it is they can get their hands on. Misinformation can be rife and just published out the wazoo.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast, we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web. My guest today is Brewster Kahle, who you just heard talking about his vision to create an all-encompassing online archive.

Brewster has spent his career intent on a singular focus: providing universal access to all knowledge. In 1989, he created the internet’s first publishing system, called Wide Area Information Server (WAIS for short), which he later sold to America Online. In 1996, he co-founded two sites to help catalog the web: Alexa Internet, which he sold to Amazon, and the Internet Archive. The Internet Archive is one of the largest libraries in the world and now preserves 99 unique Petabytes of data, books, web pages, music, television and software of our cultural heritage. In 2001, Brewster implemented the Wayback Machine, which allows public access to the World Wide Web archive that the Internet Archive has been gathering since 1996. Brewster was elected a member of the National Academy of Engineering in 2010. He’s also a member of the Internet Hall of Fame, a fellow of the American Academy of Arts and Sciences and serves on several boards. Welcome, Brewster.

Brewster Kahle: Thank you. It’s great to be here.

Noshir Contractor: This podcast, of course, is Untangling the Web. But when I think of you and everything you have been doing in your career, I think of you as somebody who’s contributed to help us rewinding the web rather than just untangling it. And in that spirit of the Wayback Machine, I want you to take us back to 1992 when you first came up with the idea of WAIS. Tell us what prompted that. And it’s important to note that in many ways, WAIS was a precursor to the World Wide Web.

Brewster Kahle: Absolutely. The idea was of the internet, the opportunity was to build the library, well, of everything. Could you take the published works of humankind and make them available to anybody, but not just anybody, but any computer. Could we go and mush people, networks, and computers together? This was sort of the dream back in 1980 to try to figure out, how do we go and build this? First, we needed to go and build computers that actually could handle this. And Danny Hillis at MIT who I worked with had a great idea called the connection machine of making a supercomputer out of lots of little computers. And so I helped build that to go and try to make it so we could go and handle building the library of everything. And then I built WAIS and did that in 1989 and then made it publicly available for free on the internet in 1992, as you point out. The idea is to try to get publishing to go so that you don’t just have one big database of everything, you wanted to have people be able to have their own information in lots of different servers, a decentralized system. And that was the idea of WAIS. WAIS was kind of the search thing at that time.

Noshir Contractor: So as you looked at it, the library that you were building was a distributed search and document retrieval system where documents could be distributed all over the internet. And what you were providing was an indexing system, in the parlance of library talk, and you were trying to see how one could search for these documents anywhere on the web and then how one could retrieve it. At the same time that you were thinking about WAIS and how it fed into the World Wide Web, you also were thinking about a different product called the Alexa Internet.

Brewster Kahle: So WAIS helped get people online and made everyone able to become a publisher. And could you even, you know, control the distribution of your works? Could you even charge for it? We made the first subscription-based service on the internet. We made the first ad-based system on the early web to try to help make that all work. But once we got kind of the commercial side going by ‘94, ‘95, then the idea was, we could turn to build the library. So Alexa Internet and the Internet Archive started on the same day in 1996. And one was a for profit, and one was a nonprofit. And the for profit, Alexa Internet, was to catalog the web. So we could start crawling the whole World Wide Web and trying to find related links. The thought was that the search engines were going to give up steam, that the keywords weren’t going to be enough to get you the right document out of billions. Well, I was kind of wrong, because Google has done such a fabulous job. But we do really need some of these other things like related links, like if you’re looking at a webpage, tell me am I, like, is this crap or is this good? What else have people said about it? How long has it been there? If I’m looking for other things like it or maybe other points of view, what can I go see? Maybe it’s actually now that we have disinformation being broadcast so widely on the internet, that this technology that Alexa Internet was really designed to do was important. And the idea was also to go and leverage the link structure of the web and the usage trails of the web. And the idea then, also, for profits don’t last that long. So we said, okay, let’s go and build a contract into the soul of Alexa Internet that all the data that was collected would be put into this new nonprofit called the Internet Archive. So every day since 1996, it’s been donating data to the Internet Archive.

Noshir Contractor: If I recall correctly, they announced that they’re shutting down Alexa in May of this year.

Brewster Kahle: Oh, so sad. Yes. But it was a good 25 year run, which is a lot longer than most tech organizations, commercial ones, last, but nonprofits tend to last much longer.

Noshir Contractor: One of the ways in which I first encountered Alexa was basically as a way of understanding web traffic. A lot of people who do web science research would use Alexa data, you can know a little bit about the status of a website, how well trafficked it is. And so that’s one piece of metadata that you might consider when you’re looking at a website. But as you point out, Alexa was also archiving the web. And when you say it was doing that every single day, help me understand. Does it take a snapshot of the internet every single day? Does it sample it and say, I’m going to do every part of the internet once every week or month? How does that work?

Brewster Kahle: Let’s take the Internet Archive, this post-Alexa internet. What we do is, we have many different crawlers that basically go through the web – each one have different mandates. There are about 3,000 crawlers that run on any particular day, There are about 900 organizations now – libraries, archives, museums – that work with the Internet Archive, where they go and state particular mandates to these crawlers, that they say they want this particular subject area, they want this particular language, they want this particular whole country domain. They want it this deep, they want it this often. And a total of over a billion URLs every day get archived by the Internet Archive, to just try to keep up with what’s going on out there. Then we index it to make it available in lots of different ways, including the Wayback Machine.

Noshir Contractor: So tell us a little bit about how the Wayback Machine sits in some ways on top of the Internet Archive.

Brewster Kahle: We found that the average life of a webpage is about 100 days before it’s either changed or deleted. So we basically needed to try to keep up with that and then make all the out-of-print web pages available to people. So the way that the Wayback Machine works is completely simple. Fundamentally, it’s a line and a file for every URL we have. And it’s sorted based on the URL and the date. And every time somebody wants to look up a URL, we go and binary search this, well, multi-terabyte file to be able to find the most relevant page for that user. Or every GIF, every JPG, every JavaScript file is indexed in this way. And by running it in a parallel computer, much like the Connection Machine, we’re able to go and pull these out at 1000s of times per second, for the millions of users that use the Internet Archive’s resources every day.

Noshir Contractor: As you know, there have been several movements around the world, especially from the European Union, to legalize the right to be forgotten. And I imagine that the archive might make it difficult for people to have the right to be forgotten. What are you doing in the archive in terms of addressing this issue?

Brewster Kahle: Oh, yeah, a lot of the web was not really meant to be publicly available always. And so we take requests from people to remove things from the Wayback Machine, and those come in all the time from users, and you can write to info@archive.org and, you know, say what URLs or domain name, and then you have to try to prove that you own that so you can’t delete microsoft.com or something like that, and then it’s removed. And that seems to work pretty well.

Noshir Contractor: I was struck by a comment that you wrote that for the cost of 60 miles of highway, we can have 10 million-book digital library available to a generation that is growing up reading on screen.

Brewster Kahle: You know, being brought up during the tail end of the hippie generation, right, so the utopian “let’s build a better world,” I took that all very seriously and being a technologist tried to figure out what could we do. We thought, let’s start with what became the World Wide Web. But then also, let’s do television, radio. So we’re trying to get good at those. But we’re now going backwards and digitizing books, music, video. And we really want an open library system as opposed to a commercial answer to the whole thing. There’s so much wonderful things that are just not being read, because they’re not that available, and people are going to read whatever it is they can get their hands on. And this next generation is going to learn from whatever they can get. And it’s, a lot of it’s crap. Misinformation can be rife and just published out the wazoo by anybody with some budget, because a lot of the good materials are locked up behind paywalls, are still in print, or just, they haven’t really moved into the bigger picture of the opportunity of the internet. And so we’re gonna want to put the best we have to offer within the hands of our children.

Noshir Contractor: So it sounds like while you began by trying to create an archive of the internet, you’re now moving more towards creating an archive on the internet.

Brewster Kahle: It’s a good point. Absolutely. We’ve got maybe five or 6 million books that have been digitized. And we’re starting to do periodicals. First going and digitizing these for the blind and dyslexic. Then we make it somewhat available, you know, to, for instance, machine learning researchers, but also through borrowing, interlibrary loan, controlled digital lending, those sorts of things. You shouldn’t have to be at Yale to be able to see some of these good works.

Noshir Contractor: At the end of the day, even what is digitized is being supported on some material resource, whether it’s a disk drive or something else. I recall reading that you were inspired by the Global Seed Vault idea of trying to keep one physical copy of perhaps every book. Now maybe it’s not a physical copy as in papyrus or paper, but a digital storage record. And talk a little bit about the fragility of all of these different media that we have, starting with paper, but including many of the servers that you have and how often you have to be careful to make sure that those servers don’t get obsolete or die.

Brewster Kahle: I mean, it’s such a problem. You see these beautiful pieces of papyrus from 5000 years ago, it’s great. But it seems like it’s getting shorter and shorter. So some of these new technologies like microfilm and microfiche, they were reported that they could last 500 years. And so we’re starting to collect the microfilm and microfiche not only to preserve the microfilm, microfiche, but to then also digitize it. So we’re moving forward, but we’re always keeping the physical materials. The Internet Archive works with other libraries that have these large physical archives to keep these.

Noshir Contractor: You have also argued that the value of digital archives is not just historians, but also to help resolve common infrastructure complaints about the internet, such as adding reliability to 404 document not found. Tell us a little bit more about what you see as the value in that space.

Brewster Kahle: Yeah, at least let’s fix some of the bugs on the web. The 404 document not found is just bad engineering. We made a little extension that you can add to your browser such that if any of a number of errors come up, then we’ll probe the Internet Archive Wayback Machine and see if it’s got it. I think, also, the big opportunity is thinking at scale. My friend Jesse Ausubel put it: humanity got a long way with a microscope; What we need now is a macroscope, an ability to step back, understand the bigger trends. There’s a great interface on top of the Television Archive that’s just the transcripts that were taken by a fellow named Kalev, and he made GDELT, and you can go and do queries to find out terms, how much were they on one cable channel versus another over time, and you can start to see biases in these bubbles by stepping back and getting a bigger picture of what’s going on. I think that’s absolutely critical. People are very good at getting excited about some tweet or blog posts or Facebook something or other, some cable news, dramatic whatever. And it’s difficult to put it in context. If there would be a wish that I’d have for the next 10 years of web science and the like, is let’s build context into our online experience. So that’s not necessarily fact checking. It’s, what were the debates around it? What’s the information around it? It’s all the sorts of things that scientists in the academic publishing used to know about before the paywall sort of took over. This, this whole approach, I think we need to bring that to a much broader population.

Noshir Contractor: In many ways, what you’re talking about is a much more nuanced version of what we sometimes call metadata, that is, data about the data in this particular case.

Brewster Kahle: So Bill Dunn was one of my mentors. He did the electronic side of Dow Jones. He was the first purchaser of a Connection Machine to go and do full text search. And he had this saying back in the mid-80s that the metadata is more important than the data itself. And that’s what Google leveraged with their anchor text and PageRank. It’s what Alexa did by looking at user trails to be able to find: people that like this webpage, what web pages did they like even more? The importance of the internet is not the computers at the edges, it’s that we’re all connected.

Noshir Contractor: So you mentioned GDELT as being an example of a repository that has become incredibly helpful to scientists, including web scientists, to study ways in which information is flowing and how it impacts public opinion and so on. Given that you have built this incredible Internet Archive, and given that there are so many people who are using it to help understand society today and how it has been in the past, can you share with us some of the most interesting insights that you have learned from what you or others have found by using the Internet Archive as a way of studying society?

Brewster Kahle: Oh, I wish I had more time to go and study society. Mostly I’m just a librarian building the darn thing. We went and studied all of the political ads in the United States to understand the wash of money that’s going over the media system based on Citizens United and other decisions by the United States to allow corporations to pay for politicians. And it’s fascinating to see how much money and just the barrage of ads that you would get if you were in a battleground state. I mean, you couldn’t flip channels fast enough to not be seeing an ad at all times. So there are these things you can kind of see by stepping back. Let’s see. The World Wide Web made it so that you could take unpopular websites and they could become popular, which is a really good sign for going and having an ecosystem that’s alive. When you get too much either regulation or too many monopolies going and controlling things, a lot of that will slow down and stop. And I’m very excited about the decentralized web technologies. Let’s see another round or two of these to go and put people back in charge of some of these technologies rather than just these very large corporations that have started to take over whole media types. Let’s build open systems that lots of people can play. I like games with many winners.

Noshir Contractor: It sounds a bit like history repeating itself, because when the World Wide Web and WAIS and other technologies were spawning, it was also in response to corporations at the time, think of companies like AT&T, for example. The mantra was decentralized. Now we are saying we want a new wave of decentralized technologies. So was there a cycle where this decentralization gave way to a certain level of centralized authority that we now need to renew our efforts at decentralizing the web.

Brewster Kahle: We certainly need to renew our efforts. But the decentralization never needed to come. If you actually had government antitrust law that was actually, you know, used as much as it was before 1980 when things started to collapse in terms of antitrust, then I think we would have had an ecosystem without having to go through revolutions. And I’m hoping that we invent something better.

Noshir Contractor: Speaking of inventing something better, I was fascinated by a recent blog posting by you titled “Imagining the Internet: Explaining our Digital Transition.” My understanding here is that you have talked about the different metaphors that we have used to talk about the internet from the time it began. Tell us a little bit about what those metaphors are and how you see us trying to imagine the internet of the future.

Brewster Kahle: If we’re trying to, you know, look forward, I find looking backwards and seeing the trajectory we’re on to try to understand where we were going might be useful. So what I did is I went and tried to look at what was the metaphors that people had for the internet and tried to track that change over time, if you will. So the first one, I would say, would be the library. But then it moved and it started to become other things, like, just portrayals of a raw network. I would say then cyberspace was a term that people use. So it was far away. Then it started coming home towards being a frontier, the Electronic Frontier Foundation. It was a wild west and it had to be navigated. Then there was information superhighway. We moved to surfing. So now, it’s not just some place out there. But now you can experience it, you can ride on it, you can use it. Then I would say the next one was the Facebook, right? The idea of the Borg. Your cell phone was glued to your face. So now, where does it go from here? I would say, the thing we’re wrestling with around now is algorithms. If that’s where we are, then what happens next? And I would say machines are starting to not need us anymore. [The machines sort of detach.] In The Matrix in ‘99, Agent Smith has this terrific rant about people being a disease.

Noshir Contractor: That does paint a somewhat dark picture of where we are headed.

Brewster Kahle: I mean, you can’t see a movie these days without it being frickin’ dystopian. People are anxious. They are really worried about what’s going on. They are not feeling in control. I would like to make it so people have a feeling that they’ve got some level of control of what it is they’re reading, what it is they’re writing, where it’s going, their privacy, their sense of self, their friends. And we have done almost everything we can to strip that away from them. I do like the Alan Kay “don’t predict the future, go and invent it.? We as technologists should do a better job than just go to, “Hey, let’s go make a ton of money and be like a rich internet mogul.” Let’s leave a better environment for people to be the most they can be, they can be creative and feel safe and achieve and build and grow. That’s what our technologies and our internet should be for.

Noshir Contractor: And that’s a wonderful place to end this conversation. A very upbeat note, very inspiring. Brewster, thank you so much again for joining us and for all the work that you’ve done in helping us to understand the archive of the internet and to, as I said, to rewind the web and the Wayback Machine. I would certainly recommend folks take a look at the blog entry that Brewster has just been summarizing at brewster.kahle.org as well as play with the Wayback Machine if you haven’t. It’s a lot of fun and somewhat embarrassing to go back and see what kind of websites we created back in the ‘90s and also in the early part of the century. So thank you again, Brewster so much for joining us today.

Brewster Kahle: Thank you very much, Noshir.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Susanna Kemp. I am Noshir Contractor. You can find out more about our conversation, whether you are listening to us today or via the Wayback machine decades from now, in the show notes. Thanks for listening.

Episode 33 Transcript

Howard Rheingold: I ended up creating a course called “social media issues” around my book Net Smart. My answer to Is this any good for us? has been, it depends on what people know, that it’s no longer a matter of hardware or software or regulation or policy. It has to do with who knows how to use this medium well. And I felt that if you mastered these five fundamental literacies or fluencies, that you would do better.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web. My guest today is Howard Rheingold, who you just heard talking about how we can use media responsibly.

Howard is an American critic, writer, and teacher. He specializes in the cultural, social, and political implications of modern communication media, such as the internet, mobile telephony, and virtual communities. In the mid 80s, he worked on and wrote about the earliest personal computers at Xerox Palo Alto Research Center or Xerox PARC, for short. He was also one of the early users of the Whole Earth ‘Lectronic Link or The WELL, an influential early online community. And in 1994, he was hired as the founding executive director of HotWired. He is the author of several books, including The Virtual Community, Smart Mobs: The Next Social Revolution, and Net Smart: How to Thrive Online. Welcome, Howard.

Howard Rheingold: Good to be here.

Noshir Contractor: Howard, you were one of the folks who was there, even before the birth of the web, and certainly at the birth of the personal computer and the very first online communities. Take us back to what things were at Xerox PARC, where so many important things were invented that helped shape the web that was yet to come.

Howard Rheingold: I found my way to Xerox PARC because I heard that you could edit writing on a television-like screen with a computer. And I had been a freelance writer for 10 years at that point, and my technology was a correcting electric typewriter, which meant that you could white out the last line that you wrote. And people who lived through that era know that you marked up your pages, and sometimes you literally cut and paste them, and then at a certain point, you had to retype them, which is really a pain. If you’re going to write a book of 400 pages, you probably retyped 3000 pages. I found an article in the 1977 Scientific American titled “Microelectronics and the Personal Computer” by Alan Kay, and it had images of what he called a Dynabook of the future, pretty much an iPad. I called and asked if there was any writing jobs that they needed at PARC. Eventually, I got the job of roaming around and finding interesting people and writing about them, and then the Xerox PR department would place it in magazines. I drove half an hour from my home in San Francisco every day so that I could work on their computer there.

When ARPA decided they only wanted to do defense-related research, all of these smart young researchers came to Xerox, because they hired Bob Taylor and they gave him $100 million in 10 years before he had to produce anything, and so he got all of these superstars, really superstars, in one place. I mean, they ended up creating not only the visual interface for the personal computer we know today, but also the laser printer and the local area network. My research tools were a typewriter, a telephone, and a library card. I was interested in extending those capabilities. And then I met Doug Engelbart. Engelbart was talking about using the computer to extend human cognitive capabilities – augmentation, he called it. I’m interested in the intersection of technology and the mind.

It occurred to me in 1983, I think it was, Time magazine made the personal computer the person of the year, and I thought, boy, there’s a much bigger story here to be told. So I wrote a book called Tools for Thought. And I wrote a chapter about what was happening online. The internet didn’t exist yet. The ARPANET did. And I got a modem, and I plugged my telephone and my computer into it. And that’s when I discovered the WELL, which had been started by the Whole Earth people. The WELL was like three dollars an hour. And I got totally sucked up into that. And I tell the story in my book, my wife became concerned that I was spending so much time having fun online. And I wrote an article for the Whole Earth Review in 1986 or 1987 on virtual communities. And I wrote that because so many people had been saying, or implying to me, that there’s something pathological about communicating with people you don’t already know through computer networks. And I had seen that all the things that happen in a real face-to-face community, you know, people meet and fall in love and get married, people get divorced, there were funerals and parties, and we passed a hat when people were having hard times, we sat by people’s bedsides when they were dying. So that’s why I wrote about virtual community. I discovered that this diverse group of people I could connect with through a computer, not because we knew each other, but because we had similar interests, could really serve as an online think tank for me and help amplify my ability to learn about the things that I was writing about. So I became interested professionally with this as a tool, but also as a writer, I became interested in where is this all leading? What is this all doing to us as individuals and as communities and societies?

And because I wrote enthusiastically back then, a lot of people since then have set me up as kind of a straw man utopian. But in fact, if you read the last chapter of my book The Virtual Community, it’s called Disinformocracy. I’ve been writing about what might go wrong as well for a long time. And I think it’s important to have a nuanced view of technology, that it’s okay for someone who’s critical to also be enthusiastic. Another thing I’ve always been interested in is how can you look at the signals that we see today and make some kind of extrapolations about the future? So, back in The WELL, we thought, what we were very enthusiastic about, a few 100 people, that this was going to be a big deal someday. And back then it was like words on the screen. But we knew that someday there would be the processing power and the bandwidth for us to have audio and video and graphics. And so I’ve just had a fortunate position in time and space being in the San Francisco Bay Area in the 1970s and 1980s to be a participant observer.

Noshir Contractor: And you were not alone. There were so many other people. I mean, you had this fascination for using the tools for your own trade, but then also using that same curiosity to project further not just how it was going to help you at that point in time, but how these tools, the computers, the mouse, what was happening at Xerox PARC and The WELL, had the potential to transform society. Several people were in a similar situation to you. What do you think motivated you to say, no, I really see something special happening here, and I’m going to write about it, whether it’s to evangelize, or as you point out also, point out cautionary aspects about it.

Howard Rheingold: I thought, here is something very important happening. People recognize something important is happening. Steve Jobs and Bill Gates, they knew what Xerox PARC was doing. They adopted it. I thought it was very important because this was our consciousness and our capacity to think and communicate meeting our ability to build technologies that are very powerful. And you know, one thing that I think we the human race noticed from the nuclear physicists and the bomb was that human ability to create powerful technology seems to be racing ahead of our ability to know what to do with them morally and ethically. And so it struck me that big changes were going to come. When I was writing The Virtual Community, I found a graduate student at UCLA, his name was Marc Smith, a sociologist. He was studying Usenet, and I asked him, why do people give information away to other people that they don’t really know? He said, “knowledge capital, social capital, and communion.” That was a great lens for looking at things.

So fast forward to 1999, 2000. I’m in Tokyo. I noticed that people are walking around looking at their telephones. They’re not listening to them, they’re looking at them. A couple of weeks later, I happened to be in Helsinki, other side of the world, and I noticed some teenagers looking at their phones and showing their phones to each other. What was going on here? Those were signals. 1999, the World Trade Organization meeting in Seattle was disrupted by protesters who used the internet to coordinate. In the Philippines, Joseph Estrada, the president, was deposed after mass demonstrations were organized spontaneously using SMS, which wasn’t happening in the U.S. in 2000. It really took off after the iPhone in 2007. So I asked Marc, what’s going on here? And he said, it looks like the merger of the telephone and the internet was lowering the barriers for collective action. So, you know, like any good freelance writer, I went and did some research. Trying to find social scientists to help me understand what the signals meant was really part of this process of looking at the future.

I was saying, the computer, the telephone and the network are merging into a new medium. We don’t really have a name for it yet. Well, now we call it the smartphone. But that ability, I thought, would signal another kind of phase change in the world in which people were able to organize collective action in the physical world through their connection online. I guess you would say that the insurrection of January 6th was an example of that, as well. So again, throughout this process, the question of Is this stuff any good for us? kept arising. I started teaching, I guess about 2005, because I saw college students were using these. But the universities, you couldn’t take a course on What does it mean? anywhere. They invited me to teach this course on digital journalism at Stanford. I noticed that there were very few teachers using forums and wikis and blogs. Because I was teaching about social media, you know, it only made sense that we use that social media in the process of doing. I ended up creating a course called Social Media Issues around my book Net Smart. My answer to Is this any good for us? has been, it depends on what people know, that it’s no longer a matter of hardware or software or regulation or policy, it has to do with who knows how to use this medium well. And I felt that if you mastered these five fundamental literacies or fluencies, that you would do better.

Noshir Contractor: And when you said how to use a social media well, you parse that into how to use the social media intelligently, humanely, and above all, mindfully.

Howard Rheingold: Yeah. So what are these five essential literacies? Attention, crap detection, participation, collaboration, and network awareness. I start with attention. The bad news is that the business model of the web has to do with attracting and engaging and maintaining your attention so that they can sell you things. And the people who are engineering these apps are very good at doing that, and we’re all suckers. The good news is that there’s ample evidence both from millennia-old contemplative traditions and from neuroscience that you can begin to understand how to deploy your attention more productively, something called metacognition. So one of the things I taught my students was, you know, becoming aware of where you put your attention is important.

So that was the first chapter, but then I told the story of my daughter when she was in middle school, this was before Google, but she was using search engines. She was beginning to put queries in to do her homework. And I sat her down and said, I showed her a website called martinlutherking.org. I think that they’ve changed their identity. But it’s actually run by white nationalists. And I showed her how to find that out, that you can go to the library and get out a book, and that book was edited, it was published, it was purchased for your library, it was assigned by your teacher. Each of those were kind of gatekeepers to kind of guarantee that what you’re reading is accurate. You can now ask any question anywhere, anytime and get a million answers in a second. But it’s now up to you to determine which of those are accurate information, because a lot of them are wrong. So crap detection comes from Hemingway saying every journalist should have a good internal crap detector.

And then the next one was participation. And we really wouldn’t be having this conversation about the web if it wasn’t for participation. It was created by millions of people who put up websites and put up links to other websites. From the Google twins to Mark Zuckerberg, people invent things in their dorm rooms, and it changes the world. And part of that is the miracle of the architecture of the internet. You don’t have to get permission to start a new search engine or social network, as long as it operates according to the technical protocols of the internet. You just need people to come to your website.

So when I wrote Smart Mobs, I became interested in dynamics of collective action. How humans cooperate and what the barriers to cooperation are is probably at the root of our most significant global problems, from climate change to nuclear weapons to interstate conflict to land management. Elinor Ostrom won her Nobel Prize because she came up with design principles that if a group that was managing a scarce resource used these design principles, they would succeed.

Noshir Contractor: You were amongst the first who introduced or at least popularized the term collective intelligence: when all of us can be smarter than any of us. Today, there’s a lot more interest in collective intelligence. There are conferences on the topic, centers around the world studying it. But again here, there was a signal that you picked up on before others.

Howard Rheingold: It was pretty obvious even back in The WELL. You got a group of people together, you could solve problems together online. Going back to Engelbart. Engelbart was not primarily interested in hardware and software. He was interested in, – and he used these words, “increasing the collective intelligence of organizations,” collective IQ, he called it.

Noshir Contractor: Net Smart talked about five fundamental digital literacies. And we talked about attention, crap detection, participation, collective intelligence. Can you talk a little bit about the fifth one – the network smarts?

Howard Rheingold: Although we’re used to the term “social network” in response to Facebook, social networks are something that precede technology by a long ways. The way I would describe it is, well, your family, your friends, your teachers, your neighbors, those are your community. The person you buy coffee from, the stranger you see when you’re walking your dog, the people you communicate with online, those are your network. They don’t all know each other. In a community, people know each other. Way back when Marc Smith told me about knowledge capital, social capital, and communion, one of the things that I taught my students was how social capital is cultivated and harvested online. The traditional definition of social capital is the ability of groups of people to get things done together outside of formal mechanisms like laws, governments, corporations, and contracts. If you are a farmer and you have good relationships with your neighbors and you break your leg, your neighbors will come in and help you with your harvest. Well, there’s a lot of social capital to be had online if you know what you’re doing. I learned this way back in The WELL. I learned, if somebody has a question and I have the answer, even if I don’t know that person, doesn’t cost me anything to give them the answer. Well, if you get several hundred people together who have different kinds of expertise and they all do that, suddenly, everybody is empowered. But you know what, people aren’t gonna give you answers unless you give answers yourself. I think anybody who is in a support group online knows about that.

Noshir Contractor: You can’t go there and simply want to take things from other people and not also then contribute to the public good.

Howard Rheingold: Oh, that’s right. When you study human cooperation, what’s called altruistic punishment is a big part of that. It’s not just laws that enable people to live together, it’s norms. Why do you get angry when someone cuts ahead of you in line? It’s because they’re breaking the norm, and you feel that you need to enforce that.

Noshir Contractor: Yeah. So in closing, then Howard, I want to go back to something that you’ve done so well over the last several decades, and that is detect signals and use those to project what’s coming down the pike. What are the signals that you’re detecting today that might tell us about what is going to happen in the next couple of decades?

Howard Rheingold: I think the most important one is the disintegration of consensus about what’s real and what’s not. Misinformation seems to travel much faster than corrections. The anti-vax movement worldwide is a good example. You know, the Enlightenment came along and said, well, let’s not have theological arguments about what causes disease, let’s use microscopes and see if we can discover the physical causes of it, so relying on science and coming to some kind of consensus about what we all agree it’s real. That seems to be in big question. That’s a very troubling signal to me.

Noshir Contractor: Does this also have implications for another term that you spent a lot of time thinking and writing about: virtual reality or augmented reality? And what is real or not real in that context?

Howard Rheingold: I spent some time in Second Life, which is not immersive, but a kind of metaverse. There were people doing very interesting things, but it was not the next big thing. I don’t think people are gonna want to have avatar meetings and buy avatar groceries and socialize to the degree that the Metaverse vision from Facebook is promulgating. I just don’t think it’s going to appeal to everybody that way. I also think that there’s some problems. In Second Life, there were what were called griefers. You would be having a seminar and a bunch of flying penises would disrupt it. I think we’re going to see that kind of disruption in the Metaverse, and we’ve seen that Facebook has been unable to moderate even in its two-dimensional form. What would be really interesting in something like that would be a molecular biologist taking you through a walkthrough of a ribosome, an archaeologist taking you on a walkthrough of the pyramids to do things in three dimensions that you can’t do any other way. And I know that they’re using it for things like protein folding these days. And I think that being able to navigate and manipulate a three-dimensional world has research and educational implications that really have not been tapped.

Noshir Contractor: It’s been a real delight, Howard, hearing from you as somebody who was witness to the birth of many of these technologies, and you have done a great job of envisioning so many of the phenomena that we have been experiencing and perhaps we should have paid more attention to you when you first raised it, then we might not have found ourselves in some of the predicaments that we do today. I also obviously want to thank you for all your work as an educator, helping to make the next generation more network smart than we were. So thanks again for joining me today, Howard. It’s been a real pleasure.

Howard Rheingold: Mine too.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Susanna Kemp. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

Episode 32 Transcript

Safiya Noble: I have found in my own community, as a, you know, Black woman living in Los Angeles watching so many different kinds of predictive technologies, predictive policing, experimented upon my own community, that I think we have to abolish many of these technologies, and we actually have to understand to what degree are digital technologies implicated in some of the most horrific forms of violence and inhumanity.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web. My guest today is Dr. Safiya Noble, an associate professor of Gender Studies and African American Studies at the University of California, Los Angeles. She is the co-founder and faculty director of the UCLA Center for Critical Internet Inquiry, an interdisciplinary research center focused on the intersection of human rights, social justice, democracy and technology. She is the author of the best-selling book Algorithms of Oppression: How Search Engines Reinforce Racism. Safiya was also the recipient of a 2021 MacArthur Foundation fellowship. Her nonprofit community work to foster civil and human rights, the expansion of democracy, and intersectional racial justice is developing at the Equity Engine. Welcome, Safiya.

Safiya Noble: Hi. First of all, I just want to say thanks so much for having me on the podcast today. This is such a thrill and such an honor to be in conversation with you.

Noshir Contractor: Thanks for joining us here today. Safiya, your book about the algorithms of oppression focuses on search engines. And you talk about your experience using these search engines and recognizing very quickly that they are not quite as objective and neutral as one might like to believe they are.

Safiya Noble: A decade ago, I was thinking about large scale digital media platforms. And I had kind of spent my whole first career in advertising and marketing. And as I was leaving the ad industry and going back to graduate school, it was so interesting to me the way that people were talking about search engines at the university. So I was at the University of Illinois at Urbana-Champaign in the information school there. Lycos, and Google and Yahoo, you know, these new technologies were amazing the way that they were indexing the web. But I also had just left this experience of trying to game these systems for my clients when I was in advertising. So really, I understood them as media systems. We were buying ads and trying to optimize. We were hiring, like, programmers to come into the agency and, like, help us get this General Motors ad up on the first listing, things like that. So it was interesting to kind of come into the academy and study search engines in particular, because I was just so fascinated by them. They were just kind of so banal and non-interesting compared to social networking sites that were coming into vogue. And it was kind of in this inquiry that I started doing searches and looking to see what kind of results we get. And one day I just kind of stumbled upon searching for Black girls. You know, I’m a Black woman. My daughter at the time was a tween. I realized that when you search on Black girls, you were met with almost exclusively pornography. And I was thinking about like, what does it mean that you don’t have to add the word “sex,” you don’t have to add the word “porn,” but Black girls themselves, that phrase is synonymous with hyper sexualization. This was kind of like a sexism 101, racism 101. And that really was the thread that I started pulling on that led to the book Algorithms of Oppression.

Noshir Contractor: Are you suggesting that the reason the search engines were privileging these kinds of search results is because that actually reflected something that was happening in society already and it was simply being amplified here? Or was this some other kind of manipulation that resulted in those?

Safiya Noble: That’s the question, right? I mean, I think that the prevailing logic at the time, 10 years ago, was that whatever we found in technology was purely a reflection of society, right, that the tech itself, those mechanisms were neutral, and that if you found these aberrations, it was because people were misusing the technology and that’s what was becoming reflected back. And I felt like that was an insufficient answer. Because I knew from my previous career, we had spent time trying to figure out how to manipulate these technologies. So I knew that if they were gamable, that that was a design choice or a set of kind of business imperatives.

Noshir Contractor: Sometimes referred to innocuously as search engine optimization, which was a term that was used at the time.

Safiya Noble: It was interesting, because search engine optimization was kind of a nascent industry. And then, you know, our beloved colleague Siva Vaidhyanathan wrote this incredible book called The Googlization of Everything: (And Why We Should Worry). I felt like, this is the jumping off, this book is actually the place now that I can go even more specific about how this skews in these kind of historically racist and sexist ways toward vulnerable people, toward communities of color, toward women and girls under the guise of being neutral, normal, and apolitical.

Noshir Contractor: During the COVID crisis, we see algorithms now are also playing a sinister role in the propagation of information, not just in terms of giving us search results. Talk about some of the work that you have been concerned about with regards to how algorithms were employed by university hospitals in terms of guiding vaccine distribution.

Safiya Noble: This was one of the most egregious, I think, examples of a kind of distorted logics that are embedded into a lot of different kinds of software that we use every day and really don’t think twice about. So you know, there was this story that kind of went viral pretty quickly about how the vaccine during COVID-19 would be distributed. And of course, we had so many frontline workers who desperately needed that. And Stanford University Hospital, in the heart of Silicon Valley right, you have the hospital that deploys an algorithmic decision making tool to determine who should get the vaccine first. They determine that the algorithm suggests a group of people that happen to be people who are retired doctors, who are at home, who are mostly protected. We have to look and understand the data, the logics that are imbued into the different kinds of systems that we are making ubiquitous, because they will inevitably have huge consequence.

Noshir Contractor: Concurrent to that we also saw an infodemic. You’ve talked about the role of the algorithms and the propaganda in terms of escalating violence against Asian Americans.

Safiya Noble: One of the things we want to remember is this dynamic interplay between social media and search. During the Trump administration, he was one of the greatest propagators of racist propaganda against Asians and Asian Americans by invoking sarcasm and hostility toward our communities, and in suggesting that Asian Americans and South Asians and really Asians throughout the diaspora were responsible for Coronavirus, right. And so this, of course, we know, also elicited incredible violence. If you come across something like racist propaganda against Asian and Asian American communities in social media, on Facebook, you might go and turn to a search engine to query whether it’s true. And this of course becomes extremely dangerous, because we know that search engines also are likely to be flooded with disinformation and propaganda too. Using something like Google as a fact checker for propaganda, and then having that propaganda be made visible to you, only confirms the dangerous kinds of ideas that you might be experiencing or exposed to.

Noshir Contractor: So there is almost a symbiotic relationship between what propagates in social media and then what shows up on your search results. One example that you’ve talked about is how Dylann Roof was influenced by reading white nationalist websites before massacring nine African American church goers.

Safiya Noble: Yes, that’s right. What we know is that the most titillating and often egregious – which includes racist and sexist propaganda against religious minorities and sexual minorities – this kind of material on the web is actually very, very engaging. It’s fashioned many times like clickbait, so that it looks like it could be kind of true. And this is one of the main arguments that I really try to make in my work, which is that it is not just a matter of the fact that people are clicking on it, because guess what, people who are against that kind of material are also clicking on it, trying to understand what in the world is going on. But every one of those clicks really translates to profits. It will, in fact, contribute quite handsomely to the bottom line for many of these companies.

Noshir Contractor: Which takes you back to your initial profession, in the ad business.

Safiya Noble: Right. Publicly traded companicompasses are required to maximize shareholder investment and profit at all costs, so there is no social responsibility, there is no social justice in the frameworks of Wall Street. I think we’re seeing now a decade of the results of that kind of misplaced value in our society.

Noshir Contractor: A lot of the tech industry engages in some pretty brazen experiments where they try to engage in some kind of an intervention where they would experiment with a particular kind of manipulation for a day. And you’ve compared that to something that would not be allowed in industries such as the pharmaceutical industry, in the tobacco industry.

Safiya Noble: The way in which Silicon corridors around the world are able to articulate their work is shrouded in math, science, engineering. We have to be careful about how we deploy even words and frameworks like science, that get used as a shield sometimes right for some of the most egregious kinds of products and services. I heard the investigative journalists who broke the story about Compass, the recidivism prediction software, right, that is profoundly racist, predicting African Americans, I think it was like at a four or five times the rate of white people who were arrested, to go back to jail or to stay in prison. They had all these boxes of paper documents to prove how the harm was happening. And the programmers, they didn’t want to hear it. And I remember sitting on this panel, in fact, it was at Stanford, with these journalists. I thought to myself, you know, we would never let three guys rent some space in a strip mall like those guys, and cook up some drugs and roll it out in pharmacies, right and then when people die or people are harmed, we say, “Hey, it’s just chemistry. Chemistry can’t be dangerous,” right? Like, we would never do that. So why is it that in the tech industry, we allow this kind of deep belief in the neutrality of these technologies without realizing that so many people have personally been harmed. And I think that we have to look at other industries, even like the era of big cotton, you know, during the transatlantic slave trade. We had many, many arguments during that era, where people said, you know, “We can’t do away with the institution of slavery and the enslavement of African people and Indigenous people, because the American economies are built on it, it’s impossible, our economy would collapse.” And that is actually the same kind of discourse that we use today. And I think there’s a lot to learn from these other historical moments, and figure out how we will shift the paradigm as we did for those other industries.

Noshir Contractor: That includes the abolitionist movement as well as a historical precedent.

Safiya Noble: Absolutely. When I set out a decade ago to work in this area, and to think about these things, I didn’t think of myself at the time as being, like, an abolitionist. I thought I was curious in doing this inquiry, and I knew that there was something unjust happening, and I wanted to sort it out. And now I can truly say that there are so many technologies that are deployed that are made with no oversight, with no regulatory framework. I have found in my own community, as a, you know, Black woman living in watching so many different kinds of predictive technologies, predictive policing, experimented upon my own community, that I think we have to abolish many of these technologies. And we actually have to understand to what degree are digital technologies implicated in some of the most horrific forms of violence and inhumanity. Doing this work for 10 years has definitely moved me to the place of thinking of myself as kind of an abolitionist in the sense that, like during the transatlantic slave trade and during the institution of slavery, it really was a small handful of people that were persistent about the moral and ethical failings of the economic and kind of religious political system that was holding up such an inhumane set of business practices and social practices and political practices for centuries. And I think it will be those of us who are trying to point to these dangerous moves, that probably we will be articulated as some type of abolitionists in this sector, trying to raise attention to the costs, the long term costs.

Noshir Contractor: How does one bring a more concerted way of addressing all of these injustices?

Safiya Noble: We need the structural changes, we need different laws, we need different policies. The law, it’s not the only thing. But it certainly is very important. What I worry about with predictive analytics is that so much information is collected on people that then is used to determine whether they will have an opportunity, but also to foreclose opportunities. And of course, you know, we have to think about each of us. Imagine our worst moments of our lives. And I think, what if that moment is the snapshot in time, collected about me and that determines, it’s a no-go for Safiya Noble, right? And of course, it also forecloses any possibility of redemption, of forgiveness, of empathy, of learning, of change. And I think we don’t want to live in a society without those qualities. And predictive analytics really forecloses the opportunity for being known, for changing, and for having a high quality of life. Cathy O’Neil says in her book, Weapons of Math Destruction, she says, predictive analytics make things much better for people who are already doing great, and much worse for people who are already not doing well. And I think we want to take that heed seriously.

Noshir Contractor: Technology has had a history of widening the knowledge gap and the information gap in society. And so this is a natural progression of what has preceded it. One of the things you have also discussed as a way of addressing some of the issues you just talked about was proposing an awareness campaign and digital amnesty legislation to combat the harms perpetuated by algorithmic bias.

Safiya Noble: When I was a graduate student, I was thinking about what happens when you are ensnared in a search engine, for example, and you can’t fight your way out of it, right, your name is destroyed. And of course, we have legislation in the EU like the right to be forgotten, that helps address this, right. We don’t have this yet in the United States. Every engagement we have on the web is bought, sold, traded by 1000s of companies. So how do we withdraw? How do we pull ourselves out of these systems? How do we create amnesty out of these situations? I’ve been trying to talk to lawmakers in California about what would it mean when we do pass our own kind of localized versions of GDPR? I love this article written by Jean-François Blanchette and his collaborators about the social value of forgetting, like why we seal juvenile records, so that those mistakes don’t follow you into the future. So how do we grapple with that now in this global web? Could we imagine and reimagine the way in which we appear? I once heard the director of the FBI, and he was at a conference, and he said, “As far as the government is concerned, who you are is your digital profile.” Can you imagine? I mean, people like us who study the worst parts of the web, who are on the internet looking at terrible things all the time. All the things we’re doing on the internet, nothing could be further from the truth about who I am as a person.

Noshir Contractor: That’s a really important point, because we celebrate the fact that now we can store and put into memory everything. But you’re pointing out the perverse aspects of keeping that memory always available to everyone. And you worked with engineers, executives, artists, and policymakers to think through these broader ramifications of how technology is built, how does it get deployed, and how most importantly, does it get used in unfair ways?

Safiya Noble: I think one of the most impactful organizations that I’ve been able to be a part of and on the board of is the Cyber Civil Rights Initiative, which is really the organization that has helped develop and pass all of the non-consensual pornography or revenge porn laws that we have in the United States. I think that’s a place that actively engages with trust and safety staff in large tech companies to try and help them understand truly the cost of their algorithmic distortions of people, many times young women. I have many relationships into Silicon Valley. The benefit of having had my whole first career in corporate America for 15 years before being an academic is I really understand, when you work in a large global company, you’re one person like sometimes on the Titanic, and it’s going down, and you can’t actually stop it yourself. You know, you’re trying to figure out how to leverage and work across all kinds of teams and all kinds of people. I don’t only stay in academic and, kind of, activist spaces, or community spaces, I also go into these corporate spaces, and I try to talk about the work and give them the examples and challenge them to think differently. And I do think that now, if I look out at engineering programs, you see slowly schools changing, and that means that that’s because industry is also saying, maybe we need people who have some background and experience in social sciences and humanities, too.

Noshir Contractor: But you are saying that in general, you find many of them to be receptive to these ideas and willing to be educated about these issues?

Safiya Noble: Absolutely. I think there are a lot of people who do not want to look back and feel that they are on the wrong side of history, that they didn’t ask tough enough questions, that they took for granted the wrong assumptions. I can tell you in the classroom, for sure, as I train engineering and computer science students who take my classes, 10 years ago, they were hostile to the idea that their work had any political value. And now, traditionally aged undergraduates are completely clear when they enter the field, and they’re here to change it.

Noshir Contractor: So in addition to the work you do with corporate America, as well as activism, you’ve also rubbed shoulders with celebrities. Meghan Markle has cited Algorithms of Oppression as key to understanding the online vitriol that was spewed about her.

Safiya Noble: I got this email that said, you know, “Please save the date to meet with the Duke and Duchess of Sussex.” And I thought it was like a scam email. Okay, long story short, it was real. They had been given my book by my former dean from USC. Meghan, I think she really saw an explanation for the incredible racist and sexist vitriol that she’s experienced on the internet. You know, the way that I could articulate how Black women and women of color become fodder for destruction, almost like as a sport on the internet, I think she really, she had experienced herself. And sent here was someone explaining that this is actually a business imperative for companies. And so they have given, you know, resources to the UCLA Center for Critical Internet Inquiry. Bot Sentinel, just, you know, issued a couple of really important reports that showed how just a few dozen high impact accounts on social media were coordinated to basically try to destroy their family, destroy them. But I will tell you, Meghan and Harry, they understand that a girl living in Iowa, a teenager living in Oakland, that people who are vulnerable that don’t have the platform and resources they have, could never fight off these kinds of internet attacks and cyber bullying and trolling. And I think that is what they want to put their time and efforts and support around people who get that and care about that are also working on that, too.

Noshir Contractor: Wonderful. So as we wrap things up here, I want you to look ahead. I want you to tell us a little bit about what plans you have as part of working as a MacArthur Genius, and also the launch of your new nonprofit Equity Engine.

Safiya Noble: I’m still pinching myself. I’m very grateful. And there are many Black women and women of color, who with just a little bit of support and scaffolding could continue to have very big impact. I mean, I look around, and I see, whether it’s Stacey Abrams, or you know, a whole host of Black women, too many to name, and women of color, who are holding up families, villages, neighborhoods, the country, democracy, and who are really under-resourced in doing that work. And so I’m really trying to use this opportunity to build Equity Engine. I’m hoping that people will just give resources and their networks and their power to women of color, because the one thing that women of color have is incredible sense of justice. And our work has been on the frontlines of expanding human and civil rights around the world. And we are also the least resourced in doing that. And so the Equity Engine is really a place for people to help us hold Black women and women of color up and build their power, in many of the ways that this MacArthur Fellowship is helping me do.

Noshir Contractor: I love the term Equity Engine as well, I think it’s a very apt name for what your vision is. Thank you again so much, Safiya for speaking with us. You brought us lots of really interesting insights and awareness about some of the ways in which we need to be much more skeptical about the web in general and about the algorithms and do something about it to make a difference.

Safiya Noble: Yes. Well, thank you. It’s such an honor. I have followed you and my whole career, and I just am so honored that I’ve done something right in life to get to be in conversation with you, so thank you so much for this opportunity.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Susanna Kemp. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

Episode 31 Transcript

Vint Cerf: When I joined the company, Larry and Eric and Sergey said to me, “What title do you want?” And I said, “How about Archduke?” And they said, “You know, the previous Archduke was Ferdinand, and he was assassinated in 1914 and it started World War One. Why don’t you be our Chief Internet Evangelist?” When people ask me about this, I tell them I’m Geek Orthodox, because my intent is to spread the internet religion. The idea here is that people should be able to get access to information and collaborate with each other on a global scale.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor, and I will be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web. Today my guest is Vinton Cerf, who you just heard talking about taking on a leadership role at Google, where he contributes to global policy development and the continued standardization and spread of the internet.

Vint is widely recognized as one of the fathers of the internet. He’s the co-designer of the TCP/IP protocols and is vice president and Chief Internet Evangelist for Google. He is a former member of the U.S. National Science Board and a past president of the Association for Computing Machinery. He is a recipient of numerous awards, including the Presidential Medal of Freedom, the National Medal of Technology, and the ACM Turing Award. Vint, thank you so very much for joining us.

Vint Cerf: Well thanks so much for inviting me to join the show, Noshir. This is a topic dear to my heart, which is, you know, what’s happening to the internet and the World Wide Web and how is it affecting our society? So let’s have a look at that.

Noshir Contractor: Absolutely. So let’s take us back to the early mid-70s, when you were working first as a PhD student at UCLA and then went on to be a faculty member at Stanford coming up with a design of something called the TCP/IP protocol. And you did this in collaboration with Bob Kahn. What do you think was remarkable about that moment, in coming up with a quote-unquote protocol?

Vint Cerf: This is sort of like the question that says, describe the universe in 25 words or less, give three examples. First of all, the work at UCLA on the ARPANET was commissioned because the Defense Department was spending money on research and artificial intelligence way back then in the mid- to late-60s for as many as a dozen universities. Everybody kept asking for new computers every year. And ARPA said, “We can’t afford that, so we’re going to build a network and you can all share your computing resources.” So we built the ARPANET first as a resource sharing system and second as a test of a theory that packet switching would be suitable for computer communication. It worked extremely well. And somewhere around 1971, one of our colleagues at Bolt Beranek and Newman, Ray Tomlinson, came up with networked electronic mail. So we were early adopters of that technology. Bob Kahn came to visit me at my lab at Stanford in the spring of 73. He had worked on and been very key to the architecture of the ARPANET project. He then went to ARPA, and began working on a problem related to command and control. And that is, how do I use computers in that application? And of course, immediately recognized that in order for this to work, the computers would have to be in mobile vehicles and ships at sea and in aircrafts, in addition to fixed installations. So Bob shows up and he says, I’ve already started working on a mobile packet radio network and the packet satellite network, and we’ve already got the ARPANET. How are we going to hook them all together to make it all look uniform so that any computer on any network can talk to any other computer on any other network? And within a period of about six months, we came up with a strategy for doing that. So by January of 1974, I was already working with a team of graduate students at Stanford to develop a detailed specification of the TCP protocol. And then by 1983, January of 83, we were able to turn the entire shebang on. So the internet begins operation on January 1 1983.

Noshir Contractor: The goal of the protocols that you’re describing was to provide interconnectivity between all kinds of networks, and these networks of networks came to be called as the internet as we now know it. Arguably a second-most famous protocol, besides TCP/IP, on the web is the HTTP protocol that we use. Why do you think that some of these protocols succeed and take off the way they did while others might be technically sound, but are unable to take off as much?

Vint Cerf: Well, several things influence those outcomes. The first thing is that Bob and I gave away the protocols. We basically published freely. Now others were pursuing similar ideas. But those were not open source, whereas TCP/IP was fully open, which led, of course, to commercial availability of routers executing the IP and the BGP, and the other protocols that make the Internet work. So Tim, emulating this release mechanism in December of 1991, announced the World Wide Web, released the protocols freely, and encouraged people to make use of them. The early browsers including Tim’s and others, like Netscape Communications, had property that you could ask the browser to show what the HTML source code was that produced the web page that you were looking at. So that meant you could copy other people’s web pages and change them and explore and so on. This is super important from the standpoint of learning from other people. So there weren’t classes in webmasters. They were aggregations of people who were trying these things out and sharing what they knew and taking advantage of seeing what other people had done freely.

Noshir Contractor: Your next foray after DARPA was at MCI Mail. One of the things that I think I heard you say, and I just want you to clarify, is that the original idea of connecting these computers was to share computing resources and that email messaging was an afterthought that came about in that network.

Vint Cerf: That’s correct. People were leaving messages for each other on the same machine. And so the next step is to be able to leave messages for someone on a different machine that’s part of the same network. And so that’s what led to the internet or ARPANET email. So we all made heavy use of that plus remote access to the time sharing systems, terminal access, and also file transfers to move data back and forth. I was invited to come to MCI and build an electronic mail service for them. That was late 1982. Already, there existed email services. CompuServe, for example, had one, and I think General Electric had one. So we tried very hard to put the MCI Mail email system into a place where it invited interaction with other email services and other communication services. So we introduced into MCI Mail the ability to cause the email to be printed and mailed or printed and FedExed or sent through Telex, which is of course a 19th century, you know, invention. And eventually, we got it to send faxes, as well. And so I was very proud of the fact that we were able to design a system that was that general in order to bring people into the email world even before they had that capability.

Noshir Contractor: And at that time, giving access from MCI Mail to print, for example, which today might seem antiquated, must have been at least a way of getting some people into the email space.

Vint Cerf: That’s right. And, you know, there are situations where people want to deliver hardcopy. So MCI Mail is retired now. So I don’t have that ability anymore, now I still have to print things out and put them in envelopes and post them myself. And I’m kind of missing the capability that we invented 40 years ago.

Noshir Contractor: So now we come into the 21st century. You’ve been at Google as the Chief Internet Evangelist. I’ve heard of evangelists in other areas, such as religion, but you were the first person I heard of who had taken that title at a tech company.

Vint Cerf: It was not a title that I asked for. When I joined the company, Larry and Eric and Sergey said to me, “What title do you want?” And I said, “How about Archduke?” You know, that sounded like a fantastic title. And they said, “You know, the previous Archduke was Ferdinand, and he was assassinated in 1914 and it started World War One. Why don’t you be our Chief Internet Evangelist?” And I said, “Okay, I can do that.” When people asked me about this, I tell them I’m Geek Orthodox because my intent is to spread the internet religion. The idea here is that people should be able to get access to information, they should be able to find it, and share it, and make use of it, and collaborate with each other on a global scale. And of course, that’s what internet and some of its applications tried to do. And one of those applications, of course, is the World Wide Web, which Tim Berners-Lee introduced in late 1991. And he did so by putting another layer of protocol, HTTP, which you referenced earlier, on top of TCP/IP. It created a capability that had not existed before, except perhaps in the form of the online system that Douglas Engelbart developed in the mid 1960s. Tim certainly augmented much of that with video and audio as well as formatted text and imagery. It was an idea ready for its time, because now we see web pages in the billions and billions of users as well.

Noshir Contractor: As you worked now in this new role, one of the things that you have been at the forefront of is trying to help us get ready for the time when everything will have its own unique internet address. And a lot of that has now been referred to as the Internet of Things and the advent of the Internet Protocol version 6.

Vint Cerf: When Bob and I did the original design, we actually asked ourselves, you know, how much address space should we plan for? And we did a few back-of-the-envelope calculations. So we allocated eight bits of address space for networks, which would allow for up to 256 networks. Then, very quickly, as the Ethernet took off as a commercial product, and as other networks started to emerge, the consumption of address space became very rapid. So we had to redesign the interpretation of the 32-bit address space. By 1992, it became very clear that we were going to run out of that 32-bit address space no matter how we sliced it up. So the Internet Engineering Task Force, which is the primary standards body for internet standards, devoted about four years of intense debate over what new version of the Internet Protocol should be adopted. And they ended up with something we now call IP version 6. So it had 128 bits of address space. It’s 340 trillion trillion trillion addresses, and I’m hoping we won’t run out until after I’m dead. Then it’s somebody else’s problem. 1996 is when the standardization happened. And I thought everybody would instantly recognize the intelligence value of switching over quickly to the larger address space, so we wouldn’t end up with a terrible transition problem. Well, unfortunately, 1996 was exactly in the middle of the dot boom. And so everybody was too busy throwing money at anything that looked like it had something to do with internet. And nobody had run out of IPv4 address space yet. And so there was no real motivation to implement it. And so here we are today, in 2021, when only about on the average 30 percent of the possible parties have IPv6 implemented. The Internet of Things is going to consume address space like crazy. Already, we have people who have anywhere from 10 to 50 devices at home that are consuming IP address space. So we’re going to need a lot more than can be provided by IP version 4.

Noshir Contractor: What would you suggest are some of the most compelling reasons why the world needs to switch over ASAP to IP 6?

Vint Cerf: If a device needs to communicate on the internet, it does need an IP address. I mean, that’s sort of a given. We’re starting to see a proliferation of programmable devices, including appliances, you know, things like the refrigerator and the microwave, and all these other things are increasingly driven by software. And the utility of having them online is that they can be serviced online, new programming can be provided, I mean, take a Tesla car as an example. It is, essentially, a computer on wheels. So making it do new things is a matter of downloading new software. That will be true for many, many devices. So having IP addresses is really a critical part of being able to interact, download new software, correct bugs, provide status information. One of the things that the pandemic has forced us to do is to explore remote medicine, because the doctors are saying “Don’t come into the office.” Well, that’s not a very satisfactory situation if they don’t have information about your blood pressure, and you know, your pulse rate and all the other things that might be important. So you can see medical instruments, and in fact, your mobiles becoming remote medicine devices, but they will need internet access in order to deliver the result to the doctor who can then evaluate it. So I’m anticipating that healthcare, in addition to security, in addition to device appliances and manufacturing plants, are all going to require internet address space in order to be part of this online environment.

Noshir Contractor: I’m also taken back to the comment you made about how the web was invented on the backbone of the internet in some ways. How do you see the relationship, symbiotic or otherwise, between the internet and the web?

Vint Cerf: Well, the web wouldn’t exist, I don’t think, without the internet, and the internet wouldn’t be nearly as useful without the web. And in fact, there’s one other technology that merits mention here. In 2007, you’ll recall that the iPhone was introduced by Apple. The project to develop a handheld mobile telephone was started in 1973. And that’s the same year that Bob and I started working on the internet. And then these two technologies kind of went along in parallel, you know, not interacting, particularly until Steve Jobs introduces a mobile telephone that is full up computing capability, has the ability to access the World Wide Web and the internet in addition to having a camera, and other sensors on board, I mean, the most astonishing and rich collection of enabling technology in a handheld device. And the result was that a mobile made the internet more accessible because you could get to it wherever you could get a mobile signal. The internet and the World Wide Web, these two technologies were mutually reinforcing.

Noshir Contractor: Fascinating. You spoke a little bit about how the mobile phone increased the accessibility of the web to everyone. I want to talk about another aspect of accessibility. You and your wife Sigrid both had to deal with hearing deficiencies. You’ve become a leading advocate of accessibility. What grade would you give accessibility today in terms of the internet and the web?

Vint Cerf: Well, with a few exceptions, mostly C-minus, but to be fair, making things accessible for such a broad range of disabilities is hard. My biggest concern, honestly, is that too few people who are making applications in the web or on the Internet, are familiar with methods for making things accessible. If they don’t have real experience with those disabilities and the technologies to assist them, then their intuitions may not drive satisfactory design. And if I could add one other statistic, many people who argue in favor of making things accessible will quote a statistic that says there are a billion people in the world with disability of one kind or another, leaving you with the impression that investment in accessibility will help one eighth of the world’s population. What they’ve left out of that calculus is that every piece of assistive technology helps people who need to communicate with people with a disability. So this is an important investment for everyone in the world, not just for people who happen to have a specific disability need.

Noshir Contractor: That’s a very fair point. One of the things that you have recently raised concerns about are the risks of digital obsolescence. Tell us a little bit about what has made that a serious concern for us.

Vint Cerf: Well, there are several elements to this concern. I’ve been calling it a digital Dark Age. Here’s the problem: Digital media are not known to have significant lifetimes. You know, you think about a DVD, three-and-a-half inch floppy. You know you don’t have readers to read them anymore. That’s what triggers my big concern about digital preservation is that we will have a big pile of useless bits. Think of a spreadsheet or think of a video game. The software that makes those bits useful may not run on operating systems of the day 100 years from now or even 10 years from now. And things that you thought were important and should be of legacy interest to our descendants may not be accessible to them, because we didn’t take into account how to assure the longevity of interpretation of digital content.

Noshir Contractor: So you’re implying that we may have better records, going back centuries of materials that were on paper than we might have within the 20th and the 21st century.

Vint Cerf: That is correct. And if you think about it, there are no digital media that have anything close to the lifetime of even today’s crappy wood pulp paper, let alone the much higher quality rag content paper. And of course, if you want to go back further, you go to vellum, which is goatskin or calfskin, or lambskin or something. And that stuff lasts a couple of 1000 years. Now, I’m not proposing that we should switch to some kind of vellum, you know, calf-skin digital medium. But I do think a notion of digital vellum is important. And what that means is that whatever the medium is, and assuming that we’ve copied things into new media in order to provide longevity of the bits, that same digital vellum needs to have a software environment that makes it possible to correctly interpret content.

Noshir Contractor: So alongside being a visionary of what happens to the internet and the web and digital content over time, you’ve also been a champion of looking at what happens to the internet and web over space. And here you are today consulting with NASA and being a visionary on an interplanetary network. Tell us about the challenges that that faces and the opportunities it presents as we decide to go back to the moon and on to Mars?

Vint Cerf: Well, first of all, I was so fortunate as a high school student to go to work for a company called Rocketdyne, which is part of what was then called North American Aviation. I ended up working on the statistical analysis of the F1 engines, which formed the booster phase for the Apollo spacecraft, the Saturn rocket in particular. So I had this early introduction, and considered it an absolute delight to find it later in life an opportunity to reengage. So when the Pathfinder landed on Mars in 1997, after 20 years of unsuccessful Mars landings, I flew out to the Jet Propulsion Laboratory to meet with the team that was working on packet communications in space. And we sat around the table, asking ourselves, what should we be doing now that will be needed 25 years from now? And we concluded that we should think about how to design and build an interplanetary extension of the internet. And over the ensuing 23 years or so, we refined and standardized those protocols. They are in operation on the International Space Station. We have prototype software running on Mars since 2004 in order to support many of the Mars landers. And we anticipate application of these protocols in the Artemis and Gateway missions. There’s nothing magic about this. And it’s not necessarily visionary, it is simply recognizing that there will be a real need. The thing that’s the most interesting though is the question of how does this evolve in such a way that the commercialization of space happens? We’re asking questions like, well, gee, what’s the legal structure that should apply to this? Can people own anything in space? Can you buy an asteroid or claim an asteroid? And we don’t have answers to these questions yet. But we need to get them before they become pressing.

Noshir Contractor: So amongst the challenges, obviously, for this interplanetary internet, is that there’s really long distances amongst these planets, and heavenly bodies. That clearly provides challenges not just for the transfer of data, but for voice interaction, which by definition would then become largely asynchronous.

Vint Cerf: The standardized protocols for interplanetary internet are designed to take into account variable and lengthy delay as well as disruption. And that’s why we were forced to go design a whole new suite of interplanetary communication protocols to take these problems into account. So we have variably delayed and disrupted communication, which is a parametric space outside of the space in which the TCP/IP protocols were designed. But that’s exactly why this has been such an interesting exploration, because it’s Terra Incognita to sort of mess up a metaphor. So at this point, I’m very confident that we will see this emerge. The consultative committee for space data systems, which is an international organization made up of all the spacefaring nations, has already engaged in standardization, along with the Internet Engineering Task Force. So for me, this is the beginning of another adventure into the solar system. Now, we’ve even started thinking about interstellar communication.

Noshir Contractor: Well, what a tour we’ve had today from the start of the TCP/IP protocols, and here we are talking about interplanetary internet and the relationship and interdependencies between the internet and the web overall. Thank you so much, again, Vint for sharing your thoughts and your visions with us about all of this.

Vint Cerf: Your listeners will have to decide how to distinguish hallucination from vision. Hopefully I’ve succeeded. I always enjoy these chats. I really appreciate the invitation to join you. And of course, I’m eager to hear from your listeners if they have ideas that they’d like to pursue.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Susanna Kemp. I am Noshir Contractor. You can find out more about our conversation today, whether you are here on earth or via the interplanetary internet, in the show notes. Thanks for listening.

Episode 30 Transcript

David Lazer: Facebook has this large pile of things it could show you. It chooses some and not others. If we’re trying to characterize the global emergent tendencies of those social algorithms in promoting some content versus others, what are those kinds of tendencies? On this new observatory, the objective of this is to create a large panel of subjects where we’ll both monitor their online behaviors as well as the behaviors of the platforms with which they are engaged.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor, and I will be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web. Today, my guest is David Lazer, who you just heard talk about an effort he’s leading, funded by the National Science Foundation, to build an observatory to study online human behavior as well as the algorithmic strategies of social media platforms.

David is a University Distinguished Professor of Political Science and Computer and Information Science at Northeastern University. He’s among the leading scholars in the world on misinformation, with some of the most highly cited papers. His research, published in journals such as Science and Proceedings of the National Academy of Sciences, has received extensive coverage in the media. In 2019, he was elected a Fellow of the National Academy of Public Administration. Welcome, David.

David Lazer: Thank you for having me, Noshir. It’s delightful to be here.

Noshir Contractor: I want to start by trying to trace the ways in which you started out as a political scientist and then got interested in issues related to the web.

David Lazer: Well, I was really interested in the role that the web and related technologies played in our political system and in how the government works. In the early years, I was interested in the notion that there were transformational effects of network technologies on both the organization of government – that is how it ran at odds with the hierarchical structures of government – but also could change the relationship between government and citizens. With colleagues, we did a number of online experiments, for example, of having citizen town halls where citizens met with their members of Congress online. This was back in 2006. These highlighted the potential for raising up the discourse about politics between citizens and their representatives. That was where we started or where I started, was thinking about how government could be rewired because of the web and related technologies.

Noshir Contractor: People in general, at the time, were still excited about the web, and all the potential good it could do.

David Lazer: I still actually have some fair degree of optimism. There was a 2018 session that Harvard was holding for all the newly elected members of Congress in 2018. And we gave literally every sitting member of Congress a copy of our book on the potential of the internet to change their relationships with their constituents in positive ways.

Noshir Contractor: I imagine you’re talking about the book Politics with the People: Building Directly Representative Democracy that you published along with your co authors in 2018.

David Lazer: It was explicitly a book about the potential transformation of our democracy. We did these experiments with constituents meeting with members of Congress, and they were full-blown, randomized, control treatment experiments, and so we could really make robust scientific inferences about the impact that these kinds of discussions had on individuals.

Noshir Contractor: For those who are uninitiated, what does an online experiment in this context look like?

David Lazer: In these experiments with members of Congress, we used a survey firm to recruit a sample of people who said that they were willing to participate in an online discussion. And what the actual session was was a member of Congress having a discussion with them around a hot button issue, immigration. And we then randomly assigned people to participate or not participate. Comparing the people who participated in the sessions versus the people who didn’t, people who participated in the sessions did have a sort of shift in the direction of their members. And they were also more likely subsequently to vote. So we were able to look at voter data, and to see who voted and who didn’t vote. These are sort of administrative data that are generally available. The critical flavor here is we can actually take people from around the world and put them in the same virtual room, and then make that room structured in a way that enables certain kinds of communications, disables other kinds of communications. So in a sense, the web is the ultimate laboratory for studying human interaction, because it’s so malleable, and because so much of the world is readily accessible wherever they are.

Noshir Contractor: And that web is a treasure trove both in terms of having digital traces, but as you point out now, also in providing a platform to collect data from individuals.

David Lazer: One of my favorite quotes from one of the citizens coming out of it was, “Huh, policy is a lot more complicated than I thought it was.” Which is certainly the truth. I think that these sessions made me feel quite proud of our democracy and what our democracy could be. And so part of our commitment was to figure out how to translate those findings into actual democratic practice. And we actually wrote up a report, a guide on how politicians should and could use the web to support deliberative democracy. We’ve talked to politicians about this, to all the big tech companies. We did a speaking tour in Silicon Valley to talk about the role that these platforms could play in enabling democracy and not just disabling democracy.

Noshir Contractor: In addition to your own work in online platforms, you’ve actually helped develop platforms for other scientists to be able to conduct research. Talk a little bit about Volunteer Science and what hopes you have for that moving forward.

David Lazer: So Volunteer Science is a platform we started building around 10 years ago. There are a lot of startup costs to doing an experiment. So you have to build all that infrastructure and so on, then you run your experiment, and then you stop running your experiment, and then it just sort of decays, it’s not replicable, or it’s expensive to replicate. And so the objective with volunteer science was to make it easier for experimenters to get experiments up and running quickly. The platform carries a lot of the weight. It allows you to rapidly instantiate versions of your experiment. So it basically lowers the startup costs of running an experiment quite a lot, and as well as the management costs of running an experiment as well as the management costs of running an experiment. And of course, you can also then recruit samples from around the world. We’ve done data collection where we’re trying to do the same kind of data collection for people who are participating from India and South America and North America. And so we’re able to recruit much, much more interesting and diverse samples, as well. These screens are almost literally like windows into people’s lives. And we can literally reach through those windows and say, “Please, participate in, in this experiment with us, help out science, volunteer for science.” And we get on the order of 100,000 people a year who come and volunteer to participate as subjects.

Noshir Contractor: And you have spent a lot of time and been amongst the leading scholars looking at the use of big data to understand human behavior. Tell us about the article in 2014 in Science where you critique the Google Flu Trends. So tell us what the Google Flu Trends was and then how you critiqued it.

David Lazer: So Google Flu Trends was a landmark paper published in Nature that examined the relationship between searches for the flu and prevalence of the flu. And in particular, it was aggregating flu related searches, or things that I should note are correlated with the prevalence of the flu. Being used at the time, was nowcasting, the idea that standard methods for evaluating prevalence of the flu involve a multiple week lag of collecting all the data, aggregating it, and then looking at how many cases there were two weeks ago. When a contagious disease is spreading, you’d like to know when and where it’s spiking, because that can direct both preventive measures, as well as measures that will mitigate the harm, like, you know, surge of capacity in hospitals and the like. And so the Nature paper was, I think, it asserted the fact that you could do this all much faster. It offered a method to do that by saying we have tons of searches on Google, and we can see how they correlate with the prevalence of the flu in the U.S.

Noshir Contractor: So the assumption here is that when a flu is breaking out, people will go on the web and search for terms that are relevant to the flu like “fever.” And that list of search queries is then raw data that they put into a model to make a prediction about when and where the flu is spiking.

David Lazer: Exactly. Although, they looked at all search queries, including ones that were clearly not related to the flu, and then fit them to flu prevalence. Then what happened was it repeatedly stopped working well. You know, high school basketball turned out to be predictive of flu prevalence. Well, it’s because high school basketball has its peak of its season apparently at what is typically roughly the peak of the flu season. They said they looked at some and weeded them out by hand, which is not best methods. There was an offseason flu, I think was it H1N1 if I’m remembering correctly, that it then did very badly at predicting, and partly it’s because they had built something that was partially flu predictor and partially winter predictor. And so what we were doing was, in our critique that appeared in 2014, was not to sort of discredit the whole notion of what was being done in that Google Flu Trends paper, but to say that there are ways that if you’re not careful, that big data will lead you in the wrong places. And then in a follow up paper we proposed a new method, which involved what we called human computation, which involved humans looking at some of the search terms. Because we’re, humans are much better at interpreting what the intent was when someone searched, like, why did they search for this? Oh, I bet it’s because they were sick, right. And so we devised this sort of thing for human coders to come up with an evaluation like that. We were then able to fit it to a sample of people who had agreed to have their search terms evaluated. And we asked them whether they had the flu, and we were able to predict whether individuals had the flu, and then aggregate that upwards, nationally, and that showed a very different kind of methodology that in part leveraged the human interpretive abilities, in addition to the big data component.

Noshir Contractor: So you got some ground truth data by actually collecting the search queries from the same individuals who then reported, whether they had the flu or not.

David Lazer: That’s right. Also the other thing we saw was that there are – and this is not shocking, but – there are real differences in how people search. And so we found, for example, very sizable gender differences in search tendencies. Women in our sample actually had higher levels of searching about the flu before flu season. And then when there was flu in the household, their search level didn’t go up that much. Men, on the other hand, had a baseline level of searching for flu information that was pretty darn close to zero, and then they’d freak out online when they had symptoms. And so what we’re really finding out when we look at these kinds of search queries, is how often men have the flu. Now, that actually may be fairly predictive of how often the population has the flu. There’s some perverted element to that in terms of where we’re getting signal. That’s obviously a more general concern in the social sciences is that who’s on the web, how much, from where is, is not representative of the general population. And even if it were, their behaviors may not be because there may be these very differentiated ways that those behaviors manifest.

Noshir Contractor: And of course, today, during the pandemic, we have really hastened and become much more focused on being able to get these data in as close to real time as possible so that the policies can be responsive to those. I want to switch our attention from pandemics to infodemics. Talk to us a little bit about what got you excited or intrigued or interested, or concerned about fake news?

David Lazer: I was particularly concerned, as many people were after the election of 2016 in the United States, because it felt like there was a real breakdown in our information ecosystem. And so with a number of collaborators, we put on a major conference in February 2017 on the science of fake news. There were psychologists looking at this, there were political scientists looking at this, there were computer scientists looking at this. But rarely do they connect outside of their disciplinary silos. And we pulled together all the speakers from that conference, and out of that came a 2018 paper that appeared in Science, titled “The science of fake news,” and it was putting together a multidisciplinary perspective on misinformation and fake news. Fake news is a very narrow and specific thing, but misinformation is the more general thing.

Noshir Contractor: Can you point to some insights that might not have been gleaned in “The science of fake news” were it not for having an interdisciplinary take on it.

David Lazer: I’ll point to another paper of mine from 2019 that also appeared in Science that examined the prevalence of fake news on Twitter. That paper was a collaboration among three computer scientists, a cognitive psychologist, and myself, a political scientist by training. From the social science perspective, it involved thinking about how do we build a high quality sample? So we said, “Well, we really care about humans.” And so how do we develop a large sample of humans? And the way we did that was we built these computational methods to disambiguate and match the Twitter data to voter data. And then we were able to then computationally extract from Twitter the prevalence of fake news as well as make certain inferences about what people were exposed to. There’s no way that just a team of social scientists could have done this. But also, there’s no way that team of computer scientists would have done it either. It required, I think, really the best elements of computer science and, and the social sciences.

Noshir Contractor: That’s an excellent example of why web science calls itself an interdiscipline, because it transcends specific disciplines that contribute to it. I want to touch on some of the recent efforts that you’ve just launched with a major grant from the National Science Foundation that picks up some of your earlier work back from 2013 on issues such as algorithmic auditing and online personalization. So, to begin with, what do you mean by algorithmic auditing?

David Lazer: So much of what we see on the web is mediated by platforms. You know, Facebook has this large pile of things it could show you. It chooses some and not others. And so what we really have are these computational curators. So Facebook is looking at your Facebook friends and saying, you know, that looks awfully boring, we’re not going to show it to you. You know, typically, we’ve described that curation process as being algorithmic. And really, it’s something that I’ve called and others have called the social algorithm. It’s this interplay of humans and computers that results in emergent behaviors in terms of what you see and what you don’t see. These algorithms are predictive models. We’re trying to optimize on some metrics. And that metric might be something like, what gets you to click on something? Because basically, what platforms are able to do is translate your time on platform into profits in various ways. And so the question when we talk about algorithmic bias – and this is a phrase that manifests not just about the web, but like everything from criminal justice to housing to credit decisions, and so on – if we’re trying to characterize the global emergent tendencies of those social algorithms in promoting some content versus others, what are those kinds of tendencies? And what are the kinds of patterns? So for example, to what extent do we see personalization? Like, if you’re searching for something on Google, do you see the same thing as I do? And the answer is, actually, generally you do. One of the things that Google does do is that they will geo locate some of your searches. If I search for pizza, it will show pizzas around Boston. It doesn’t do it with politics. So like, if you’re looking at my for my member of Congress, Google doesn’t geolocate that. So it’s making decisions as to how to treat different kinds of information, different kinds of queries. And there’s also a question of, like, what kinds of sources Google promotes. Do they promote misinformation sources? But then we could think of, does social media, we’re looking at Facebook, tend to promote more emotional content? Or does it tend to demote civic content. So we can imagine, like, auditing a platform like Facebook in terms of understanding what content it systematically promotes or demotes. So in any case, on this new observatory, the objective of this is to create a large panel of subjects, a very large panel of people of 10s of 1000s, where we’ll both monitor, all with consent, their, online behaviors, as well as the behaviors of the platforms with which they are engaged. And in that way, we’ll be able to get more of a handle on these algorithmic structures that are so, so very important in modern day society.

Noshir Contractor: So tell us what you expect you will be able to do once you have this web observatory up and running. What are the kinds of questions that you will be able to answer more definitively than we are able to do today?

David Lazer: I should note that this is really a communal resource. It’s not just a resource for me and my collaborators. The objective will be to set up an infrastructure that provides access to these data – analytic access – while guarding the security and privacy of people who are participating in it. And so the objective then is to understand everything from what kinds of content do different platforms promote? You know, do we see higher quality or lower quality information being promoted? Do we see biases in terms of what’s promoted? Do we see certain demographics being targeted? So really trying to understand how people get information, the role that the platforms play, and how the platforms play together. Our objective is to have a picture of the emergent and evolving web and people’s behaviors on the web.

Noshir Contractor: Wow, what a tour de force we have gone through today, going back to you being amongst the early people to see how web could change politics and journeyed all the way to now setting up this communal good as an observatory for studying online human and platform behavior, and as you pointed out, not just a single platform, but the ecosystem of platforms as they all interact with one another. David, such a pleasure to have you as my guest on this 30th episode of Untangling the Web. I look forward to continuing to see the research that comes out of this observatory and all of your other very exciting initiatives. And so thanks again very much for joining us today.

David Lazer: Noshir, it’s been a pleasure and an honor speaking with you.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Susanna Kemp. I am Noshir Contractor. You can find out more about our conversation today, provided they are not demoted by the algorithms, in the show notes. Thanks for listening.

Episode 29 Transcript

Siva Vaidhyanathan: Mark Zuckerberg, you know, the picture is where he’s focused. It has a lot to do with the vision that Google has, that Microsoft has, that Amazon has, that Apple has, to some degree. They are all in a race to become what I had called in my book “the operating system of our lives.” If any of these companies become dominant, that kind of concentration of power should make us all worried deeply.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast, we bring in thought leaders to explore how the web is shaping society and how society, in turn, is shaping the web.

My guest today is Siva Vaidhyanathan, who you just heard talking about how big tech companies have expanded to serve a much bigger purpose in our lives than they originally did.

Siva is the Robertson Professor of Media Studies and Director of the Center for Media and Citizenship at the University of Virginia. The Center publishes the Virginia Quarterly Review and produces several podcasts including Democracy in Danger, which is now in its third season. Siva is the author of The Googlization of Everything: (And Why We Should Worry), published by University of California Press in 2011, as well as Anti-Social Media: How Facebook Disconnects Us and Undermines Democracy, first published by Oxford in 2018. He has written several other books, has also appeared in several documentary films, and written for many major periodicals. He is currently a regular columnist for The Guardian. In 2012, he was a keynote speaker at the annual ACM Web Science Conference. Welcome, Siva.

Siva Vaidhyanathan: Oh, thank you, Nosh. It’s really good to reconnect with you and to be part of this conversation.

Noshir Contractor: Well, thank you again for joining us. It’s been a while since you connected and presented your work, back in 2012, to the ACM Web Science Conference, and that happened to be shortly after the publication of the book The Googlization of Everything: (And Why We Should Worry). In retrospect, 10 years later, that was pretty prescient. Tell us what got you interested in writing this book and what prompted you to use the phrase “the Googlization of everything.”

Siva Vaidhyanathan: You might remember back in 2004, there were pretty big headlines about Google’s effort to scan in the entire collection of the University of Michigan library and substantial portions of, like, six other libraries, including the Harvard’s library and Oxford’s library and Stanford’s library.

Noshir Contractor: Yes.

Siva Vaidhyanathan: That was the early version of what ultimately became Google Books. At that moment, Google was six years old. Google had been around for less time than Brad Pitt and Jennifer Aniston had been married, and that didn’t last. And all of these libraries and major universities around the world were saying, by all means, take control of how we will encounter and discover centuries of knowledge and centuries of culture, and I thought this was bizarre, because remember, Google’s mission statement is to organize the world’s information and make it universally accessible and useful. And that’s a lovely mission statement for Oxford University or for Harvard University or the University of Michigan. Why were those libraries not just outsourcing this project, but yielding control to a commercial service which violated, from my point of view, all of the ethics and norms and values of librarianship? There were going to be privacy issues, copyright issues. Now, like everybody else in 2004, I loved Google, I used Google constantly. But of course we all were learning quickly that there are biases built into the search process by virtue of the record of us using the search service, but also the algorithmic choices that Google’s engineers had made along the way and were constantly tweaking. This was going to be an opaque system, as every corporate system is. It was going to be without accountability. It was going to have tremendous power over what we think is true and beautiful and good. And I said, “Wait a minute. Let’s figure out what values we would want in a global digital library,” and, if so, ask ourselves, is Google the right agent? It started out of a conversation with librarians, who were immediately appalled by the fact that their bosses had signed these one-sided contracts. So the other question was, like, who knows what’s going to happen to Google? Why would you put this impressive, bold vision for the organization of the world’s knowledge on a company so young, so inexperienced, so locked in itself, so arrogant, when you have thousands of trained librarians and millions, maybe billions of dollars, if you collectively pursued a project. So I looked at, for instance, the Human Genome Project, which was a knowledge organization challenge that had really hit its peak around 2000. It was about who would map the human genome, and there was a company called Celera that had come in and said, we’ve got this shotgun method of examining the genome and we are going to do it faster than any of the publicly funded projects. The governments of the world, they set up a project that ended up tying Celera in the race. And I said to myself, why can’t we do that with all of the poetry, and all of the history, and all of the almanacs that sit in these libraries? But then I had to figure out, what does Google mean to us, like what does it do to us, how do we live through it. What happens if it becomes more important in our lives than it was in 2004? So ultimately, it came out of 2011 at a moment when people weren’t ready to start questioning Google. I was fortunate in the sense that I could raise some very crucial questions that now are part of everyday discourse.

Noshir Contractor: So today, as you look back, what is the one thing you regret not worrying about then that has now become a worry?

Siva Vaidhyanathan: YouTube. YouTube, which Google had bought a few years before I finished the book, was important, and yet it had not become the corrosive force that it is now. It had not yet clearly shown itself as a recruitment tool for extremism around the world. I also didn’t do much with Android. I didn’t foresee the notion that the Googlization of everything included the Googlization of our operating systems for most of the world, right, you step outside the United States and Western Europe, nobody uses iPhones, right they all use Android devices. An update for 2021, I think half of the book or more would have been about YouTube, and maybe a substantial chapter would have been on Android as well. It would have been a very different book, but similar themes, issues of concentration of power, both over knowledge and politics, issues of the ways algorithmic choices and values are baked in

Noshir Contractor: And after you had begun to till and plow the concerns with Googlization, in 2018, you published a book titled Anti-Social Media: How Facebook Disconnects Us and Undermines Democracy. What prompted the switch to Facebook?

Siva Vaidhyanathan: The election of 2016 came. The Trump campaign had exploited Facebook quite deftly and had done so under the radar of political journalists, who were used to following television buys, right, they would report weekly on what states Clinton and Trump were up, you know, the same story they had been writing since 1968. And I said to myself, they are missing what really happened, that the Trump campaign had no professional politicians working for it except for, you know, Paul Manafort. So if you’re dealing with amateurs, Trump is notoriously cheap, you’re hiring all these people from the Trump Organization, which is basically a Facebook ad scam company, and it’s been selling steaks and ties on Facebook for years. That’s what those people knew. They knew how to do Facebook marketing. They spent very little money, because you can do that on Facebook, and they precisely attach themselves to motivate voters who are otherwise unlikely to vote thus increasing turn out slightly for their own side, while using Facebook ads and targeted content to disengage voters who might otherwise have voted for Clinton. It only took a total of 80,000 votes split over Wisconsin, Michigan, and Pennsylvania, to make Trump President. Now that’s the Trump story, but Narendra Modi had done the same thing two years earlier in India right and had done more with Facebook and done worse things on Facebook. And we saw in early 2016 in the Philippines Rodrigo Dutarte doing the same kind of campaign. It became clear to me that if you’re that kind of political candidate running that kind of movement that depends on inciting fears and passions, Facebook is the perfect system for you, because that’s what it amplifies. I wanted to make the claim that Facebook is undermining democracy, because when you look globally, it is creating or amplifying or increasing the coarseness of political discourse and crowding out any form of political discourse that can be rich and deliberative and humane, and while that’s a long and slow process and Facebook is but one actor in that process, it’s deadly to anybody who believes in the future of a democratic republic.

Noshir Contractor: So, unlike The Googlization of Everything book, in the case of the anti-social media book you did provide an update in 2021. What was the takeaway from the update?

Siva Vaidhyanathan: By 2018, by the time the book comes out, journalists had caught on to the story. They also dug deeper or helped me understand things deeper in other places, and we had extra stories. All of the sudden, by 2021, we can add Brazil, we can add Mexico, which a lot of people ignore the effect of Facebook and Whatsapp in Mexico and AMLO and his authoritarian tendencies as well, right, so now I have more data points to tell my own story. So I knew that there had to be factual updates about Facebook. For instance, we had gone from 2 billion users to nearly 3 billion users in the three years that I had written that book, so that had to change. And there was so much more to talk about in terms of the potential or at least interest in regulation. In 2018 I was very bullish on antitrust as a way we could reduce the power of Facebook and Google, but specifically Facebook, and I no longer think that. And I made those arguments in a series of articles that I’ve written making what are sort of counterintuitive arguments, like that quitting Facebook isn’t going to do any good and you might actually do more to limit the power Facebook by staying on it.

Noshir Contractor: Well let’s talk about some of those recent columns, so let’s focus on 2021. Back in January, as you mentioned, you wrote a column in the New Republic titled “Making sense of the Facebook menace: Can the largest media platform in the world ever be made safe for democracy?” In July, you wrote an article in WIRED magazine titled “What if regulating Facebook fails? It seems increasingly likely that antitrust and content moderation tools aren’t up to the task.” I see a more and more ominous sense from you, as we progress through your columns in 2021. Talk about this.

Siva Vaidhyanathan: As I watched both the tone and the substance of regulatory debates in Brussels, in London, in Ottawa, in the United States, and in New Delhi, I started to lose faith in any sense that a company as big and powerful and wealthy and embedded in our lives as Facebook could be sufficiently restrained by the toolkit we have brought forward from the 20th century. All the discussions were about employing 20th century means to address a company that didn’t exist in the 20th century. The best example of this is competition law in Europe or antitrust in the U.S., where you know you can fine Google or Facebook for anti-competitive behavior, and that takes them a week to make back. The notion that breaking up Facebook the way that Standard Royal was broken up in 1910, the way that AT&T was broken up in 1984, none of that tracked for me well. Facebook’s sins and crimes against competition are unlike those companies. You can’t make the case that Facebook is restricting or holding back innovation, which is an important economic argument one has to make within antitrust, you can’t make the case that advertising is more expensive or less effective since Facebook rose. I became dissatisfied also just with the notion that American public discourse can’t seem to grasp the idea that Facebook matters more in the world’s largest democracy than it matters in this democracy, the world’s largest economy. I am willing to bet that when Mark Zuckerberg wakes up and logs in in the morning, his first thought is about India, and his second thought is about the United States. India has nearly 300 million Facebook users and WhatsApp users. That’s only one third of its population. The potential for growth in India is astronomical, and clearly the future, if not the present of Facebook is India. You have to include Egypt, you have to include Turkey, Brazil, the Philippines, Indonesia, one of the countries we tend to ignore. You think Facebook cares about its own image in the United States? Facebook already achieved the level of user penetration it was ever going to achieve back in 2010 or 2009 in this country. I wanted to make the case: we need to think bolder and more radically about regulation and we need to think more globally about what we are confronting.

Noshir Contractor: You’re right that the number of users may have reached a plateau but the growth has come by the acquisition of other platforms like Instagram and WhatsApp.

Siva Vaidhyanathan: I mean, Zuckerberg knows what he’s doing when he buys platforms like that, and we have to add Oculus to that as well, right, the virtual reality platform. It’s also pretty clear to me that the next five years, we will see Instagram and WhatsApp and probably to some degree Oculus, folded into the Facebook experience in what they call in the company Blue – the standard Facebook interface that we use on our phones and on the web. And I think that’s really what Zuckerberg would like. He doesn’t like having this trifurcated experience. He would like there to be a meta company that he now calls Meta.

Noshir Contractor: Well, speaking of which, in Slate in November of 2021, right after Facebook renamed itself, you had a column titled: “You don’t change your name to ‘Meta’ if you think anyone can stop you. Facebook’s rebranding isn’t a PR move. It’s a vision.” Tell us more about that.

Siva Vaidhyanathan: So in my observations of how Mark Zuckerberg works in the world, I think he is a supremely confident person. He has never let a scandal or an uproar or a problem significantly change his outlook, his agenda, or his company. All the changes he has made have been cosmetic. You know, the big picture is where he’s focused, and the big picture has been consistent probably since the day that Facebook went public, maybe sooner. It has a lot to do with the vision that Google has for its future, that Microsoft has, that Amazon has, that Apple has, to some degree. They are all in a race to become what I had called in my book “the operating system of our lives.” To be the company that most significantly manages, monitors, and monetizes the data that flow through our houses, our cars, our bodies, our refrigerators, our minds, our eyeglasses. Because, and to a lesser degree Apple, they already won the battle to be the operating system of our mobile devices: our phones. Microsoft, for the most part, won the battle globally to be the operating system of our computers that’s in our desks. And Facebook has won the race to manage our social lives. Google has won the race to manage knowledge and navigation. So all of these companies, you could envision them saying, we’ve carved out our lane. nobody’s coming close to us. Instead, they all have a much bolder vision. So Zuckerberg, he’s got this idea of the metaverse where our consciousness is embedded in flows of data, and flows of data are embedded in our consciousness and our bodies and our world, and there is no clear distinction among reality with a capital R, virtual reality, and augmented reality. What I see coming down is potentially much more dangerous than even Facebook, because if any of these companies become dominant in this world, that is a tremendous amount of power. That kind of concentration of power should make us all worried deeply. How that power is exercised I can’t predict. The pattern has been that it’s exercised largely benevolently in intent but clumsily in execution, which allows for easy hijacking by nefarious forces. But one thing we know is that giving so much power to that industry has not made life richer, better, more peaceful, more satisfying, more humane. We’re living faster, we’re living more conveniently. I don’t have to leave this chair. What kind of life do we want to live? We can live a glorious life as human beings who are so easily connected, who have such access to information, but we can’t think of connectivity and information as ends in themselves. They are resources to be harnessed and used carefully toward a good life.

Noshir Contractor: A lot of people who are trying to make sense of the Metaverse, some of them are critical and are talking about coining the phrase “not averse” as a way of looking against the Metaverse. These new developments, this is qualitatively different from what we’ve seen so far with Facebook or Google and represents the next intellectual challenge for web science, but civil society challenge for all of us.

Siva Vaidhyanathan: Our scholarly community is engaged with these issues fully and has been for decades. Katie Pierce at the University of Washington has been writing about the ways that authoritarian dictators exploit social media outside of the gaze of Western powers for 10 or 15 years, and had some reporter for the New York Times or the Washington Post or CNN taken Katie’s work seriously, in 2011, 2013, 2015, then what we saw in 2015, 2016, 2021, would not be a surprise. I am in awe of Katie Pierce, of Meredith Clark, who’s doing amazing work on Black Twitter, among other questions of how Twitter affects daily life and the future of journalism. These are the sorts of questions that we ask in our worlds, in our conference rooms. The conversations are exciting. I think we are getting to the point where more scholars are able to get their work out to a larger public. Our community is not prescient, but we’re careful, and we’re engaged. We are still looking at all of this with fresh eyes, with the tool sets of multiple disciplines.

Noshir Contractor: I thank you so much for taking time to talk with us today and look forward to your next venture, whether it’s going after the next company, whether it’s going after Metaverse or something else that you choose to do. It’s always going to be exciting, and your ability of being able to tell stories and make compelling arguments is exactly why I would again recommend folks listen to your podcast that you co-host titled Democracy in Danger.

Siva Vaidhyanathan: Oh Nosh, it’s been such a pleasure to catch up with you. I hope I can see you in person very soon, let’s do that.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Susanna Kemp. I am Noshir Contractor. You can find out more about our conversation today in the show notes using, with caution, your Google, Apple, Amazon, or Microsoft devices. Thanks for listening.

Episode 28 Transcript

Sonia Livingstone: I’m actually so horrified by that metaphor of policing. You know, the first wave of parental controls were all about various forms of kind of spying and secretly monitoring your child and then punishing them when you found they’ve done something wrong. But what are children doing? You know, they absolutely believe that new technology is their way ahead and should be under their agency and control. It’s become very pernicious that the discourse has somehow set parents and children against each other in some kind of mutual struggle.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

My guest today is Sonia Livingstone, who you just heard talking about the tensions internet use can cause between parents and their children. Professor Livingstone is in the department of media and communications at the London School of Economics and Political Science. Her research examines how the changing conditions of mediation are reshaping everyday practices and possibilities for action. She has published 20 books on media audiences, specifically focusing on children and young people’s risks and opportunities, media literacy, and rights in the digital environment. Professor Livingstone currently directs the Digital Futures Commission with the Five Rights Foundation and the Global Kids Online project with UNICEF, and her recent book with Alicia Blum-Ross, Parenting for a Digital Future: How Hopes and Fears About Technology Shape Children’s Lives, was published by Oxford in 2020. Sonia was a keynote speaker at the ACM 2012 web science conference and was awarded the title of the Officer of the Order of the British Empire in 2014 for services to children and child internet safety. Welcome, Sonia.

Sonia Livingstone: Thank you so much, Noshir. It’s great to be here.

Noshir Contractor: Well let’s take on that last statement: your recognition for your services to children and child internet safety. What prompted you to focus so much of your attentions on this particular topic?

Sonia Livingstone: I began my career thinking about media audiences, and I focused on everybody. And when I did one project on children and the changing media environment, I discovered how much all kinds of stakeholders and the public were actually kind of keen to have and engage with the kind of knowledge that us academics produce. The question around risk to children, risk and harm and internet safety, has been building and building as an area where there can be real kind of action consequences from the evidence that I and my colleagues produce, and I found I wanted to get engaged in that and influence that process as well.

Noshir Contractor: And it’s a great example of taking the research that we do as academics and making a real difference in the world, and when you talk about the digital futures, you focus on several dimensions. Can you tell us a little bit more about play and education?

Sonia Livingstone: I guess I like being where things are contentious. So the Digital Futures Commission, we chose these two topics to focus on as oddly contentious. Our focus on education data, that began as trying to think about a way that the move towards data could be positive for children and how could their data be used from their learning in ways that would benefit them in all kinds of ways and turned into a kind of a dystopian exploration about how value is being extracted from childhood and big tech is profiting from you know everyday activities. And, I mean, play seems like the essence of childhood, but if you talk to parents about playing online, they don’t want their kids to break rules or make new friends or experiment and explore or get into trouble, you know all the things that are part of play suddenly become really difficult in the online context. The purpose of the Digital Futures Commission is to try to think our way through the difficult, naughty, instances in order to, yeah, maybe redesign the web.

Noshir Contractor: One of the issues that you discuss in detail in your work is the amount of agency parents give their children in these contexts. Can you talk some more about the role of parenting in this particular situation?

Sonia Livingstone: One thing that we learned from a public consultation we held with children earlier this year is how much they kind of want the kind of commercial games made for them to give them more agency. Agency is really hard for children to exercise online. When Alicia and I wrote the book about parenting for digital future, and we were one Britain one American, we had a lot of discussions about the difference between a kind of an American ethos, which is perhaps more protectionist and talks more about parent kind of right to manage and organize children’s digital activities, whereas in Britain and in Europe we’re a bit more kind of child rights focused, and there is more emphasis I think on children’s agency and autonomy, even though that might mean some greater – both risk to the child and also privacy between the child and the parent, and that does kind of change kind of how one can take a child rights perspective, which is what I always try to do my work, but also what the responsibility and role of the parent is in this situation.

Noshir Contractor: When you use the word childrens’ rights, is it only about protecting children, or is it also about giving them certain privileges rather than just focusing on protection?

Sonia Livingstone: With the UN Committee on the Rights of the Child I’ve been developing this document, it’s called a general comment, which offers guidance to states on how to think about children’s rights in relation to their digital environment. So the purpose of the Convention is to remind states that children have rights too, in other words, that human rights do apply to those, and people get it, I think, for the right to protection, but they don’t get it so much for the right to privacy or for the right to offer civil rights and freedoms, the right to expression, freedom of thought and assembly and so forth. And the Convention on the Rights of Child adds some things that are specific to children, so one thing is that children should have their rights considered according to their evolving capacity, and another really emphasizes the child’s right to be heard, because children by and large are not attended to in forums where decisions that affect them are taken. And then I guess the Convention adds a number of procedural rights, if you like ways in which, for example, remedy must be child specific. And buried in the Convention are a few extra rights, like the right to play. States kind of need to adjust their mechanisms to recognize children’s rights otherwise actually they’ll be infringed.

Noshir Contractor: And so you really are engaging both on the children’s rights issues with parents on the one hand, as well as governments, and also industry, I imagine?

Sonia Livingstone: In all that work I did on internet safety, it really was kind of quite policy focused. It’s kind of increasingly becoming important to think more about the industry side, the design side. So many policymakers over the years, say to me, “Well we want X or Y to happen, but the industry is always pushing back saying it’s too late, because things have been set up in this way.” So I think the whole sizable movement now of “by design” – privacy by design, safety by design, security by design and so forth – is an effort that I’m keen to capitalize on so that we do begin to ask the questions about users, people, children also, from when digital products are first designed.

Noshir Contractor: So is there any success story small or large that you can point to that came out of these efforts at helping to influence design, especially when dealing with industry?

Sonia Livingstone: In Britain, we have a new code which embeds privacy by design for children, and in fact requires that providers treat children in a what’s called an age appropriate way. So we saw a whole raft of changes from social media platforms, turning off the possibility of unknown adults contacting somebody identified as a child, turning off autoplay for children on YouTube. So we’ve just proposed playful by design, which adds not only that the hygiene factors have got to be dealt with safety and privacy and security ethics, but also that build in imagination, choices for children, so they can determine their own kind of pathway through digital play and ensure that things are more diverse in terms of the emotional kind of experiences on offer and the forms of representation. So we’ve kind of identified, you know, not just how to eliminate problems, but also what would be good, and so we’re now working with games designers workshop that and co-design tools. In some ways the business models are, of course, against us. But I’m written to every day by a range of providers, large and small, who say, you know we want to do the right thing by children, but what is it and who’s going to guide us and where do we go for resources. And that’s the need that we’re trying to address.

Noshir Contractor: One of the things that you’ve discussed also is screen time and what you call “healthy screen time.” A lot of the publicity that we get is about parents policing their children, and platform developers making screen time available for users to monitor and assess. But at the same time, you’ve also said tell parents to stop policing children.

Sonia Livingstone: I’m actually so horrified by that metaphor of policing. You know, the first wave of parental controls so called were all about various forms of kind of spying and secretly monitoring your child and then punishing them when you found they’d done something wrong. But what are children doing? You know, they absolutely believe that new technology is their way ahead and should be under their agency and control. It’s become very pernicious that the discourse has somehow set parents and children against each other in some kind of mutual struggle and, interestingly, research shows how unproductive this is, because if parents do take that kind of authoritarian approach, that’s when we see children find ways to go online. I don’t know that I really want to talk about healthy screen time so much as thinking in a more nuanced way about the content that children engage with with screens and the kind of context in which they do that, and that always requires an evaluation of what are the merits of the content and what is the child getting from it, and I think parents are actually really keen on that too, but I think as a society we haven’t given parents very many sort of bearings in how to make that judgment, so they don’t know what’s a safe or unsafe website, and we have a lot of marketing that makes a lot of false claims, so they feel kind of without them moorings in making parental judgment.

Noshir Contractor: You’ve said on some occasions that “coding is the new Latin.” Talk a little about how children are dealing – and what we can do as you said to focus not just on the hygiene practices of protecting them, but also unleashing their creativity and innovation.

Sonia Livingstone: That “coding is the new Latin” was said to us by several parents in our Parenting for Digital Future book. I think in a number of parts of the world, certainly here in Britain, there’s been this kind of big push over the last 10 years to introduce coding into schools and informal learning settings, and this is often taken up with great enthusiasm by both children and parents who, when they’re kind of handed this sort of sense that there are these technological skills that they could gain that are going to be the Latin of the future, you know they’re kind of very keen to try to take them up. But, as we tracked in our book, there’s also all kinds of ways in which relatively privileged middle class parents can put more resources behind it, so it becomes yet another form of concerted cultivation. And we did trace this kind of very disheartening series of small but really consequential ways in which poorer kids kind of got dropped out or couldn’t make the connection between what their teachers wanted of them and what their parents were able to support them in, so the opportunities are there but inequalities are really major challenge.

Noshir Contractor: It won’t be the first time in history that technologies have had these differential impacts based on socioeconomic status, for example.

Sonia Livingstone: In a way, because it’s a familiar problem, it doesn’t necessarily gain the attention, though we are having some interesting debates here about the idea of digital poverty and that one should kind of specify what is the minimum technological support that a family might need just as one might look at their minimum kind of economic needs or nutrition needs.

Noshir Contractor: What you’re discussing takes on even more meaning and importance when we think of all the emerging economy countries and the kids in those countries and where their futures would be headed.

Sonia Livingstone: Most of the research I’ve been talking about is yes, Global North you know it’s 10 percent of the world’s children. When we did the work on children’s rights for the UN Committee on the Rights of the Child, we did a global consultation focusing on countries in the Global South, and what was really fascinating to me was the incredible range of challenges that children are perfectly articulate in telling us about, you know very often around access, cost, the difficulties of living in rural or impoverished circumstances, different family composition, different cultural values, all kinds of diversity in what it is that children want from technology. You know as a field, we’ve got a lot of mind stretching and diversifying to do in terms of setting the frames, and that of course has to be a global conversation, one that’s much more inclusive and collaborative.

Noshir Contractor: I would be remiss in not asking you for your reaction to the recent stories especially about the so-called Facebook Files and Frances Haugen’s comment about the research being done internally by Instagram, pointing to the fact that they are turning 10-year-olds into social media addicts, and that’s a quote.

Sonia Livingstone: You know, as an academic I believe research should be independently funded, peer reviewed, and fully published. There is something very problematic about industry doing research which reveals problems with its product that damages its users and yet it doesn’t make that public, so I think, you know, she’s done the world an enormous favor in making known what we kind of knew already. That said, clearly it wasn’t great research, there’s plenty of research out there in the world, done independently, which shows that social media content can be harmful, especially to vulnerable teenagers in certain circumstances. I think really no researcher is going to stand up and say social media makes children addicted or social media is the sole cause of harm, so you know we have to have a more nuanced debate, we have to think about the quality of the research, we have to think about the power of the public conditions under which research gets property reviewed and critiqued, and then we need to keep in mind that any harms, anything going wrong in the lives of our children is multiply caused. Social media is just part of a really bigger picture.

Noshir Contractor: On the one hand, you are encouraged by the fact that these private companies and platforms are at least making an effort to do some research, but then, the quality of that research remains suspect, and yet, the dilemma is that these private platforms have access to incredibly large amounts of data that would lend themselves to a lot of rigorous research, and yet a lot of the research that you said that is being done by academics, is not leveraging this data. Is there an opening for academia to start a dialogue with the private platforms in order for them to make their data available for rigorous research that is more transparent than studies such as the ones Frances Haugen is reporting on?

Sonia Livingstone: I think, sometimes at least the politicians kind of representing the interests of the Academy have said, give us your data. And the platforms say you know, “We have an awesome amount of data, what do you want, and what are your questions?” And so I think we need to get clever in a way about specifying what data we want so that it becomes more precise. I just wonder if there are analogies that we could learn from. In the transport industry, did they work out how to figure out the safety of cars and traffic and trains and planes, you know, using industry data but also having independent scrutiny? You know, there must be some kind of precedent, it feels like we’re inventing this discussion de novo and we’re not doing a very good job at it.

Noshir Contractor: Well I think that’s a really good idea to look for analogies in terms of being able to set up better academic industry partnerships. In closing here Sonia, can you talk a little bit about what’s next, what’s coming down the pike in terms of the Digital Futures Commission? What’s next on the horizon?

Sonia Livingstone: I think for the Digital Futures Commission, what I really want to do is kind of distill from what we’ve learned about play and what we’re learning about education data, what that guidance would be for innovators and designers so that they have got the place to turn when they want to know how do they get it right for children. There isn’t synthesized guidance. And I think for me, the intellectual question coming up is really, how much are children not exactly the Canaries in the coal mine but a way in which society can think about vulnerable internet users generally, because there are clearly all kinds of parallels with other vulnerable or disadvantaged groups when it comes to digital design and technology policy, and it’s still an open question in my mind whether, you know, each group has to kind of fight its cause or whether there’s a way of kind of coming together and recognizing that the days of digital design for rather privileged, you know able bodied people, whether those days are done, and I don’t know what that new world will look like, but I think it’s going to be a really fascinating debate.

Noshir Contractor: Well again, thank you so much Sonia for taking time to talk to us about this incredible, intricate relationship between children, parents, government and policy makers, and platform developers, and I think you’ve really helped us a great deal in trying to rethink and reimagine what these relationships should be with the goals of helping preserve the rights of children and indeed to unleash their creativity moving forward. So thank you so much, Sonia for talking to us about this today.

Sonia Livingstone: Thank you so much, this is just the journey and many people on it, but it’s always fun to discuss it with you, thank you.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Susanna Kemp. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

Episode 27 Transcript

Deborah McGuinness: So I’m kind of famous for this wine and foods ontology that I literally did in my very early days, when I was taking a graduate class. You know, I had to write an expert system that would make a recommendation. And so I said, “Okay, well, what am I passionate about?” Well, I happen to be passionate about wine and food.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor, and I will be your host today. On this podcast, we bring in thought leaders to explore how the web is shaping society, and how society, in turn, is shaping the web.

My guest today is Deborah McGuinness, who you just heard talking about creating ontologies for computers. These ontologies can help us pair the perfect glass of wine with our steak – or develop personal health management plans. Deborah is the Tetherless World Senior Constellation Chair and Professor of Computer, Cognitive, and Web Sciences at Rensselaer Polytechnic Institute, or RPI, in the United States. She is also the founding director of the Web Science Research Center and is a fellow of the American Association for the Advancement of Science. She’s also the recipient of the Robert Engelmoore Award from the Association for the Advancement of Artificial Intelligence. Welcome, Deb.

Deborah McGuinness: Thanks for that wonderful introduction. It’s wonderful to be here.

Noshir Contractor: So I have to say that the title of your chair intrigues me. Tell us more about the Tetherless World Constellation.

Deborah McGuinness: Well, a constellation is a feature that our university president put together. Usually, universities have one professor in one area and then don’t have overlapping professors. But her idea was to bring together constellations or groups of stars to make significant contributions in carefully chosen areas. So the original plan was for this contribution to be in, kind of, mobile computing and the future of the web. And then we modified that some to be really, less about mobility, and more about the future of how we work with tetherless communications, as well as tethered communications. I typically refer to that as the future of the web.

Noshir Contractor: That’s a fascinating vision.

Deborah McGuinness: Yes. One of the reasons I left my position directing the Knowledge Systems Lab at Stanford University was because of the interdisciplinary nature and strengths of RPI. I find that my most fascinating work is at the intersection of communities. And actually, that kind of is a perfect tie in to web science, because I don’t think I know of any discipline that’s more interdisciplinary than web science.

Noshir Contractor: That’s absolutely where I wanted to go next, given your interest and skills at being able to navigate interdisciplinary work. You’ve been one of the pioneers in this area, so take us back a little bit to how you got started in the area of web science.

Deborah McGuinness: Well, you know, I’ve been working in knowledge representation and reasoning and the languages and environments to model and reason with knowledge for my entire career. So in the early days, that was languages literally for the Semantic Web, but it was before we called it the Semantic Web. So it was languages that let you get to the implicit information from the explicit statements and were computationally amenable to working with computers. Then, when I went to Stanford, we did a lot of really big, often government-sponsored projects, to do ontology-enabled, or encodings, of meaning. So we did large applications that understood what terms meant, because we encoded those meanings in ontologies. And so, you know, for my entire career, I was making these languages and I was making these environments that were making kind of smart recommender systems or smart data portals. And then when I went to RPI, we kind of took that to another level and made it even bigger. And so when web science was emerging, they needed people who had languages that could not just encode how you’re going to write something on a page, or how you’re going to link one page to another, but actually, what those terms in the page mean. And then also, as I mentioned earlier, I’m really just fascinated by interdisciplinary work. And this just seemed to be a complete and total perfect match for that.

Noshir Contractor: So I’m going to take you back and try to help unpack some terms that you use in the context of web science, for somebody who may not be familiar. You use the words knowledge representation, language, ontology. By language, you mean computer languages, I guess?

Deborah McGuinness: Yes, I typically mean languages for computers. We might focus more on markup languages, so languages that help you annotate terms that you’re going to see in a description of something. And that has initially been, well, “I’m going to write this in red in a particular font.” You know, I’m kind of famous for this wine and foods ontology that I literally did in my very early days when I was taking a graduate class. You know, I had to write an expert system that would make a recommendation. And so I said, “Okay, well, what am I passionate about?” Well, I happen to be passionate about wine and food. Later, we called it The Semantic Sommelier. Someone would say “I’m having steak for dinner.” So we also had some rolls in the background that said, with a meat dish without a spicy sauce, we might have a red, full-bodied, dry wine. Once we’ve got that markup, and I’ve got, say, Forman Cabernet Sauvignon in my database, and then we can retrive, not just that particular wine, but we can also retrieve the description of the wine. So let’s say I’m in a restaurant and they don’t have that wine. I can say to the sommelier, “Well, do you have any other red, full-bodied, dry wines?” And then they could list off the ones that match that description.

Noshir Contractor: And so in the context of a recommender system for a search, you would say, “Show me wines that have a certain quality.” And if the information about the wine is encoded in a markup language that includes those characteristics, then rather than just search for the word Sauvignon Blanc, you will now be able to get a Sauvignon Blanc recommendation based on certain attributes of the wine that have been encoded into the website. Can you amplify it in a more accessible manner than I just fumbled through that?

Deborah McGuinness: Oh, well, actually, I thought you did a pretty good job. So I created this wines ontology and this foods ontology. I made it public. And I was also very active in the description logic community. And so at the time, anybody who was doing work in description logic always looked around for a way to test their work. So almost everybody who did a thesis in the 80s and the 90s – and I think they’re still doing it – tests on some version of the wines and foods ontology. And then later, when I was very active in the World Wide Web Consortium’s standardization effort to make recommendations for languages for the web for encoding meaning, we also wrote a guide to how to use the language, and we used a version of my wines and foods ontology.

Noshir Contractor: That’s a great story. I recall you at web science summer schools and web science workshops for the Web Science Conference talking about these issues and getting excited about it. You mentioned that one of the things that the World Wide Web Consortium has tried to focus on is creating these standards. And the example you gave early on in terms of markup language for things like, you know, whether you want something in red or whether you want a particular kind of font – those kinds of markup languages are extremely standardized, I would argue, around the world. How do you assess the extent to which ontologies have been standardized and embraced and adopted on the web?

Deborah McGuinness: You know, that’s a really interesting question. To get a very detailed, precise description that you can really make critical decisions based on, like how you should treat somebody in a healthcare setting, for example, you really better have somebody who understands the domain – so in this case medicine – very well, and understands what the language that you’re going to encode that meaning in is capable of doing, and then further understands what the reasoning systems that are going to use that encoding can do with it. And that’s a couple of skills that not everybody has put together. So the ontologies see great success when people really understand what they can do. And then they start to see some disillusionment, when people understand how hard it is to get them really well done and very precise. So they’ve taken off, and they’ve kind of gone through the Gartner trough of disillusionment maybe a couple of times. And the reason, I believe, they’re on the upswing, again, is because as, you know, the world knows, machine learning has exploded, and the datasets are getting larger and more accessible. The machine learning community and the extraction community and the embedding community is starting to realize that if they get a little bit of semantics. They can start to tell their algorithms how to use the meaning and get even better results.

Noshir Contractor: So most people have heard of things like tagging on the web, and in a sense, tagging is a form of ontology, but it’s a crowdsourced form of ontology, so it doesn’t have some of the rigor that you’re talking about.

Deborah McGuinness: Yes, so you can see efforts like ConceptNet. You know, in the very early days, MIT just said, “put a bunch of sentences together.” And so those sentences have words in them. And you didn’t have the connections between them, and you didn’t give people information about how to do it. But if you have even small synonyms, like automobile, or auto and car, we might call them synonymous, then you can make that link. You can start with just simple bits or small amounts of semantics from just making relationships between synonymous terms. But then you can also start to make more sophisticated relationships, like wines might have a color associated with them, and they might be made from a particular type of grape. And then, over time, when we’re trying to make more sophisticated recommendations, such as, say health advisors, you might start to have information about when your blood work is out of a range, say for a glucose measurement, which is related to diabetes. You might want to target an intervention with a drug that can help to get your blood work back into the right range.

Noshir Contractor: You have actually been working for a long time, specifically, applying semantic web concepts in the area of health. Tell us a little bit about where things started in that area and where you think there is potential for further advancement in terms of health web science.

Deborah McGuinness: That’s really, I think, an up-and-coming area. One of my large projects right now is from the National Institute of Environmental Health Science. And it’s to create a data portal. And I’m in charge of the data science piece of that, where basically, I need to come up with the ontology or the terminology that allows us to integrate data. And in this case, it’s about exposure, like whether your mother might have been exposed to heavy metals at a time during your development, where that was not good for you. So it captures information about exposure and health outcomes. So that in itself, I think is critically important, because we can collect data, we can integrate it in the way that you could pool the data together and do studies on larger numbers of people, which might let you have more confidence in the outcome of any statistical correlation that you’re seeing. You’re only going to be able to do that integration and harmonization if you understand what terms mean. You typically get some data that comes with a data dictionary. So when I see education or ED1, that means that the mother went to junior high school as her highest level of education, which allows us to figure out whether I’ve got studies whose highest education level was college or beyond. And that lets me pull data together and look at more studies that might be compatible to put together. So that’s kind of step one. But then, the next step, the one that I think I’m even more excited about, is personalized health, and, you know, precision medicine, and where I can enable people to help themselves. I want to help everybody in some patient’s ecosystem. So I want to help the person to make wiser choices when they’re not going to their doctor. And I want to help the medical professional make suggestions that are aware of a person’s individual status. So if I’ve got some blood work and one of those numbers is out of range, we can see whether there’s some intervention that might be amenable to me, that I might be able to make a small lifestyle change before I start to make a medication change. But I think a future is something like a personal health knowledge graph. So a graph has nodes and arcs.

Noshir Contractor: What would be an example of that?

Deborah McGuinness: So I might have a personal knowledge graph about Deborah. So Deborah’s a node. And then she’s probably got a lot of arcs coming out. One of the arcs might be demographic information. We might have information about my age. We might have information about my location. And so all of those are going to be arcs that are going to have values. But then you might also feed in the information from the monitor that I wear on my wrist that captures my motion, my steps, and actually also my sleep score. And you might have information from my smart scale, for what I weighed this morning. And then you might actually also be able to track that over time.

Noshir Contractor: Okay, so we’ve got the knowledge graph, I have an idea. How does this then translate into helping you leverage this personal health knowledge graph that you just described?

Deborah McGuinness: Yeah, so I want my personal health knowledge graph appear to be locally available, you know, through probably my smartphone. And maybe I’m allowing it to give alerts. So I could actually also let it give me alerts, when I’m near a healthy venue, when it’s close to the time that I might eat when I’m away from home. I’m not aware of anything that does that today. But there’s probably some startup doing it somewhere.

Noshir Contractor: And so one of the ways it knows whether a place is healthy or not, is because they are using an ontology system, where they label themselves as “I am healthy.” Is that how this would work?

Deborah McGuinness: Another way of doing it is labeling the site with your menu items. A lot of sites these days have some kind of nutritional information about the things that they’re serving. So you could have a query that says, “Does this restaurant expose that it’s got items for sale that fit particular characteristics?” So let’s say under a certain number of grams of carbohydrates, maybe that have ethnic aspects, you know, maybe I want Indian food that meets those characteristics or something. So it’s not just that the restaurant says, “I put a label of ‘healthy’ on my restaurant,” but they expose information that lets the smart query ask the right kind of questions that are personalized to me.

Noshir Contractor: You mentioned a few minutes ago that you haven’t seen many of these applications out there. Why do you think we haven’t seen it? And why do you think now is the time that perhaps a startup is working in this area?

Deborah McGuinness: That’s a very good question. I think we’re poised to do that now, maybe better than, say, 10 years ago. There’s way more open data all over the world. I think it’s more common today that restaurants have this kind of information. And there’s also potentially more awareness. There’s more awareness that being overweight or metabolically unhealthy is a tremendous risk. It’s a tremendous risk for a lot of diseases. But it’s definitely a risk for COVID.

Noshir Contractor: So you’ve given us a lot of food and drink for thought in talking about how web science is contributing to our health. You co-authored a book on this back in 2014. And as you look at that now, seven years moving forward, where do you see us going in the next few years in terms of health web science?

Deborah McGuinness: You know, I think we might have been a little bit early on health web science then. I think there was less acceptance in the medical community. I think these days, medical professionals, they have too big of a workload; they don’t have enough time. I think we’re starting to see a lot of apps or services that they’re beginning to trust because those apps or services are vetted, and they’re showing with evidence basis that they’re making good recommendations that the doctors can at least somewhat rely on. You don’t want your app to be taking over what your doctor did for you all the time. But you want that to be helping. And then at the same time, I think more and more we’re seeing the regular Joe looking for tools and applications that can help them lead a healthier, high-quality life. I don’t want to rely on having to go to my medical doctor every week, because, you know, nobody can afford that time or money wise. I want to be able to have tools that help me to do that in my day-to-day life. And you’re also seeing a push from technologists who realize that we’ve got a lot of foundational hardware, a lot of foundational data, what appears to be unlimited compute power. So the time is kind of ripe for these applications to take hold.

Noshir Contractor: Which brings us back full circle to the idea of web science being so interdisciplinary, because this is a classic example, as you’ve described it, of people like yourselves who come from a background in computer science and web science having to work very closely and gain the trust of, in this case, the professionals in the health community, as well as the laypeople, the general public. And until you have that sort of connections and trust amongst these various stakeholders, you’re not going to see health web science reach critical mass.

Deborah McGuinness: Yes, that’s exactly right.

Noshir Contractor: And I can imagine that many doctors might be initially threatened or suspicious of these technologies, because, for example, there is quite a lot of chatter these days about whether the notion of you going for an annual physical checkup is somewhat antiquated. Why go once a year when you have all these health monitoring devices that are monitoring a lot of your vital statistics 24/7?

Deborah McGuinness: Well, I don’t think any of these tools are going to replace the need for a medically trained professional, I think they’re just going to augment the professional. I don’t think there’s really any replacement for a truly caring, trained medical professional seeing you at least now and then, and certainly helping in a time of crisis.

Noshir Contractor: I’m sure you have reassured many physicians who might be listening in on this podcast. Again, Deb, thank you so much for taking time to give us a lot of insight about how much more we can do in the area of health web science than maybe a few years ago when we were fascinated by websites like WebMD and so on. There’s so much more that we could be doing, and you have certainly been one of the thought leaders and visionaries in this area. And thank you again for taking time to talk with us about some of where you see health web science going.

Deborah McGuinness: And thank you very much for your insightful questions. And I look forward to continuing this discussion on the web and off.

Noshir Contractor: Absolutely. Untangling the Web is a production of the Web Science Trust. This episode was edited by Susanna Kemp. I am Noshir Contractor. You can find out more about our conversation today – while enjoying a highly recommended wine – in the show notes. Thanks for listening.

Episode 26 Transcript

Sandra González-Bailón: Even though there’s a lot of political organizations and a lot of politically motivated individuals who are trying to organize the next big thing to pursue their cause. It’s very difficult to predict social dynamics. I don’t think that’s depressing. I think that’s actually a reminder that nothing in social life is fully determined.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the website’s trust. I am Noshir Contractor, and I’ll be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web.

My guest today is Sandra González-Bailón, who you just heard talking about the unpredictability of political mobilization and how we can study that on the web. Sandra is on the faculty at the Annenberg School for Communication and affiliated faculty at the Warren Center for Network and Data Sciences at the University of Pennsylvania. Her research lies at the intersection of network science, data mining, computational tools and political communication. Her articles have appeared in journals such as the Proceedings of the National Academy of Sciences, Nature, and Social Networks, among others. She is the author of Decoding the Social World, published by MIT Press in 2017, and was also the keynote speaker of the ACM Web Science Conference in 2019, in Boston. Welcome Sandra.

Sandra González-Bailón: Hi Nosh, thanks for the invitation to join! It’s a pleasure to be here.

Noshir Contractor Thank you again for joining us today. Sandra, I’d like to believe that you are amongst the first of what I would call “bona-fide” web scientists, who began your career looking at the social world through the lens of the web. And I want to begin by asking you to help dissect the title of the book, Decoding the Social World, and the subtitle, Data Science and the Unintended Consequences of Communication.

Sandra González-Bailón: This idea of unintended consequences takes root in the fact that often in network systems, there’s no one who’s really in charge of the dynamics that take place in those networks. And so, you know, you might want to create a message that would go viral, but it’s really not up to you to allow that message to go viral. That depends on what other nodes in the network other people in the network will do. We can try to reverse engineer and unpack how those dynamics emerge and take place, and get a better sense as to how they happen, when no one really is designing those dynamics or in charge of determining how those dynamics emerge.

Noshir Contractor Could you give an example from your own research where something that happened was not intended or not anticipated? Some collective behavior, for instance?

Sandra González-Bailón: A lot of my applied research in which I use social media data analyzes episodes of collective effervescence, right? Moments where the whole is more than the sum of the parts, when suddenly you have a critical mass of people who are communicating about a particular issue or a particular topic or organizing around a particular movement. And of course, we hear a lot about the episodes that are successful, the moments where those processes of collective effervescence result in massive mobilization on the streets, in massive protests. And what we often forget is that for every successful instance of massive mobilization, there are many examples of unsuccessful attempts at mobilizing that critical mass. And so even though on one level, those episodes of political mobilization are intended — of course, there’s a cause to fight for — what is unintended, is the level of success. You don’t always have control over how many people will be retweeting your hashtag, or how many people you will convince to mobilize and take to the streets. That’s how I understand unintended consequences. And it’s really a shorthand to refer to the lack of control that we often have on collective dynamics.

Noshir Contractor Is the implication then Sandra, that we can only explain some of these major events in retrospect? That we are incapable of engineering events such as these? That might sound a little depressing.

Sandra González-Bailón: It depends on how you think about it. Because to me, it’s the opposite of depressing — if we could anticipate and predict it would mean that we live in a deterministic world, right? For me, unintended consequences offer really a window through which freedom and agency can squeeze in.

Even though there’s a lot of political organizations and a lot of politically motivated individuals who are trying to organize the next big thing, to pursue their cause. It’s very difficult to predict social dynamics. I don’t think that’s depressing. I think that’s actually a reminder that we are not determined, and that nothing in social life is fully determined. And I think at the same time, there’s also value in trying to understand how these things happen.

Noshir Contractor Let’s talk little bit more about some other examples from your book, because you also look back at the historical developments in technologies. And I think that that’s really important, because we tend to be caught up in the moment and make it sound like what we are seeing today is truly different from what has happened in the past. In many ways it is, but what has your historical insights about technology taught us about how to better prepare to understand the web today.

Sandra González-Bailón: I start the book with a preamble where I explained that the book really is a story of recurrence and change, right? The recurrent aspect is that for some reason, we keep on using the same metaphors to refer to these technological breakthroughs, right? In the 1800s, 1900s, we talked about the telegraph as the global nervous system of the planet, right, which is exactly how we talked about the internet these days. And so some things don’t seem to change too much, right? Human imagination seems to be like traveling the same old cliches, when it comes to coming up with metaphors. But what has changed and definitely has changed a lot is how we use the data that we generate through the use of those communication technologies to try to understand their impact on society better.

Like the fact that people suddenly could communicate across continents, via the Telegraph, it definitely shrank the world, it reduced social distance immensely. But they couldn’t get the sort of data we can get today. There’s a lot of progress when you look back, you know, the sort of questions that we can answer now, with that data, we have many more answers than they could aspire to have back then the metaphors haven’t changed, the answers to the same questions have improved immensely,

Noshir Contractor: One of your recent articles published in PNAS in 2020, focuses on how exposure to news grows less fragmented due to mobile access. What is it that you found in this article that surprised you?

Sandra González-Bailón: The findings in that article are counterintuitive in the sense that the prevalent view around how people get exposure to news suggests that technologies are entrapping classes of like-minded people. In the context of news consumption, this means that you would only consume those sources that will reinforce your opinions, and your kind of predispositions. And what we find in the article goes against those claims in the sense that, we find that rather than narrowing down your news, digital technologies, and in particular mobile technologies are sort of widening of the range of new sources that you consume.

Digital trade data is very rich, but it’s very different ways in which we can collect that data. And so one of the things that we do in that paper is to incorporate it into our analysis, mobile access to news, which changes the sort of conclusions that you can draw from the analysis compared to what you would find if you only tracked people through their desktop computers.

Noshir Contractor: One of the things that was interesting about this study was the kind of data that you use. Can you talk a little bit about the data, which was a five year time window involving 10s of 1000s of panelists. How does that compare with pure digital trace data?

Sandra González-Bailón: Yeah, this is a data set that’s compiled by a media measurement company. And it does rely on log data, right. So they do have mechanisms to track the behavior of their panelists when they are aligned. And what’s a little bit novel is the fact that they also track what people do on their mobile devices. We all know, intuitively and through our own personal experience that everything has gone more mobile now, right? And so from a scientific or measurement point of view, if we only track what happens in our computer, we are missing a huge part of online activities.

The other part is that, of course, the interface matters. The medium that you use as a content provider to deliver that content matters as well. Someone wrote a few years ago that the web was dead because of the rise of the app. So apps are like walled gardens, right? they are not this open, massive open space that the web is. The reality also is that online activities are more fragmented. One of the things that we also kind of point at, you know, in the conclusion of these of these papers that you mentioned, know is that we are also at a crossroads here, where we have to decide how we’re going to collect data moving forward, how we’re going to guarantee access to those walled gardens. Otherwise, even though we have more data than ever before, if only a handful of people can get access to that data, then there’s really not much difference, right?

Noshir Contractor: So there’s a paradox between the fact that there is a lot of data being generated, but a lot of that data is then not available for access to researchers like yourself to help reach these conclusions. I’m still puzzled by what seems to be a counterintuitive conclusion.

You’re saying in the study that as people access news and other items through mobile devices, that we grow less fragmented, yet society seems to be increasing, at least in terms of popular media coverage, increasing in our fragmentation. How do you reconcile the findings of your research with what we are told in the public about us getting more and more fragmented? Does the public media have it wrong?

Sandra González-Bailón: Well, the public media often has it wrong, but in this particular case, it’s a matter of what’s the level of analysis. What the paper does is analyze exposure to news and political content, but we’re looking at the information sources that people get exposed to. We don’t really talk about the processing of that information. What are you going to do with that information? Because sometimes, you know, you may read Fox News, that doesn’t mean that you are agreeing with Fox News. Exposure to content and information is one of those things that was very difficult to measure in the past. Now we have more fine, granular data to measure exposure, but it’s just that is, as just that, it’s just exposure to content.

Of course, the second part of the equation refers to the effects of that exposure, right? And so I think sometimes in the public discussions, we conflate a lot of things, right? We may be living in a very polarized society, but don’t blame digital technologies. What we see when we look at how people use those technologies to gain access to information is that their media diet are actually pretty rich and diverse. Now, of course, you know, that’s not the only reason why we might have fragmentation.

One of the beautiful, but also frustrating things of research is that it is very specific to very specific questions, and then you can extrapolate only so much from that research, right? Reality is very complex, there’s many moving parts. The conclusions that we draw from our analysis refer only to exposure to political content. It is not true that people get exposed only to a very narrow set of sources that they might agree with already, right? The number of sources you get exposed to amplify widens up over time part, especially due to mobile technologies. Now what the consequences of that are, it’s a second question. And there might be another paper following up on that first paper where we consider that question.

Noshir Contractor: So to summarize, what I’m hearing you say is that we can’t blame the web, for not giving us wider exposure. What we do with the wider exposure is different in terms of polarization and creating echo chambers.

Sandra González-Bailón: Yes. And of course, again, the web is one network, one layer in a very complex media environment we inhabit these days. What happens within Facebook is a different world. Again, Facebook is a walled garden, maybe in Facebook, we have this phenomenon of us getting trapped into echo chambers and ideological bubbles. Twitter is another layer, right? What happens on YouTube? These are kind of pockets outside of the web, very prominent pockets of activity. And maybe there, the answers would be different, right? What we analyze in that particular paper refers only to web activity, to what happens in these in this vast public open marketplace of idea called the web, right? Apps are a different world.

Noshir Contractor: One of the things that I also want to talk about is an even more recent publication of yours again, in the Proceedings of the National Academy of Sciences, focusing on the role of bots, versus verified accounts, in terms of dealing with contentious political events, can you tell us a little bit about what you found in that study?

Sandra González-Bailón: Yeah, so that study was also motivated by journalistic accounts of how much influence bots or these automated accounts that are engineered to meddle with organic communication in social media? Of course, the very first question you have to answer is, how are you going to identify robots, right? And so we capitalize on developments in automated classifiers, that, you know, that use a number of features to predict whether an account is bought or not. And then we also look at the verified feature that Twitter itself uses to identify accounts of public interest.

And so we come up with three categories of Twitter accounts, we called the media accounts, and these are automated. So these are accounts that our classifier suggests are automated, but that have also been verified by the platform. Legitimate news sources, oftentimes use bots to push notifications on your feed in a systematic fashion, right. And then we have the bots, which are accounts that our classifier suggests, are automated, meaning non-human, but that are also not verified by the platform. And then we have the rest, which are what we call the human accounts.

One finding which is consistent with what prior research suggests there is a huge volume around the events that we analyze, comes out of these automated accounts. But we also find that the verified media accounts are more central in the networks of information flow, meaning they were the reference points during these events. So people going on Twitter to find information about the protests that we analyze, tend to pay more attention to these very verified accounts and amplify those verified accounts more often.

Noshir Contractor: So the two movements that you studied, the one was the Yellow Vest Movement in France in 2018. And then the act of civil disobedience in the Catalan referendum in 2017. Tell us a little bit about why you chose these two events and examples of where a bot was more or less central than verified accounts in these two movements that you studied.

Sandra González-Bailón: The analysis that we run, really paying attention on the overall patterns, the aggregated patterns, right? And then this is partly to minimize cherry picking episode, where the one account that turned out to not be a verified account, got so many retweets. So we really look at the overall patterns of visibility.

And the reason why we decided to focus on these two events was, partly because we got a lot of press coverage, suggesting that these manipulated accounts were exacerbating conflict. And when you have episodes where people are protesting on the street, it’s highly volatile. And so if bots had that kind of influence, that would be something that we should know. It’s dangerous, right? Like they could really take things for the worst.

The other reason, and that was more in terms of research, is a lot of the work that is done in this area focuses attention only on the US and, or the Anglo Saxon world. That means the kind of knowledge that you can gain of how generalizable these dynamics are gets restricted, but also in terms of the methods and the tools that we have at our disposal, right, one of the things we do in the analysis is look at the sentiment or kind of the emotional content of these tweets. And many of these tools are designed for English only. So what we did in the paper is adapt one of these tools that allow us to extract sentiment. We adapted to Spanish, Catalan and French. And I think there is value in doing that.

Noshir Contractor: So one of the things that I’m noticing, more broadly across these two papers that we’ve been discussing, is that in both of these cases, your findings are suggesting that the general sentiment of blaming the web is not always well-founded. Is that a sentiment that you see broadly or just happens to be a coincidence based on the two papers that we’ve been discussing here? And what are the implications of your sense about the role of technology in these two cases,

Sandra González-Bailón: You echo the right spirit of those papers. I think there’s a lot of things we could blame social media companies or technology companies for, but we have to blame them for the right things, not for what we think they’re doing wrong. So I’m hardcore believer in evidence-based decision-making. So let’s pay attention to the evidence, so that we can think about how to redesign these platforms, to make them do what we as a society want them to do. That discussion has to be based on the best available evidence and technology companies need to be held accountable, because they do have an impact in the democratic and the democratic process. But those discussions should be based on the best available evidence and not on headlines in newspapers.

Noshir Contractor: So I’m going to put you on the spot, Sandra, given that you are a scientist, this area, and you pointed out that you would like to be able to influence change in media platforms? Do you have a pet suggestion that you think based on your own research, you would like to see media companies do?

Sandra González-Bailón: Yeah, I think they do have a role to play in improving the quality of the information environment that we inhabit. They are not the only ones, right? Some of the old players have a responsibility to and they don’t always do what they should be doing. And I do think that these companies have changed some important parameters. Algorithms for example, the way they reinforce certain patterns, or the way they float certain information, that’s a new parameter. And we don’t fully understand how those algorithms are shaping everything, including the democratic process. This is not to say that we need to get rid of algorithms, but we have to understand the impact that those algorithms have.

These companies often operate on the basis of business models that prioritize certain parameters, that get optimized in these algorithms. And those parameters are not necessarily the quality of the online conversations or the quality of our democracy. I do believe that these platforms are not doing their best in facilitating or encouraging the sort of healthy conversations that healthy democracies require. And we can do better in that respect.

Noshir Contractor: And so the algorithms are optimizing for things, but not necessarily the things that are good for society, it might be optimizing for things that might be serving the interests of the platforms and a business model, as opposed to democracy, for example.

Sandra González-Bailón: Absolutely. And again, if we were to get together and start having a conversation about what’s good for democracy, I’m sure we wouldn’t agree, right? I think it’s also unfair that sometimes we post demands on these companies that we as a society don’t have answers for. I do believe that we can settle on a common ground that has enough agreement to facilitate that kind of answer. And then once we have that answer, we can work hopefully together to make sure that we guarantee that we optimize for those parameters.

Noshir Contractor: I want to zoom out a little bit, Sandra, and ask you, where do you see the most important research that needs to be done in terms of the web, moving forward, either by yourself, or by the web science community? What are the big questions that you see, we need to be addressing

Sandra González-Bailón: So one question relates to the impact of these technologies and democracy in general. And I think there’s been a lot of emphasis on data sharing, sort of forcing many of these companies to offer data. That’s important, but I think more important than having access to the data, is defining the questions. It’s very difficult to come up with a data dump, it’s got to every question that any researcher may have.

One of the big challenges for us and one of the priorities we should all be focusing on is, what are the main questions that these companies, but also the research community should be working on to answer? And what are the kind of data infrastructure and research infrastructures that we need to be able to answer those questions, because this requires collaboration and creating bridges across labs and teamwork. And I’m not sure that academic institutions are designed to encourage that kind of teamwork. I think that’s one of the big challenges we face as well. We have to facilitate that kind of collaboration. And I think that’s where we should be putting all our energies on.

Noshir Contractor: I want to thank you so much, Sandra, for joining us today. You’ve persuaded me about the lack of determinism in our models being a good sign for society. And you’ve also talked us through two examples in your own recent research, where people tend to blame technology for certain kinds of phenomena, whether it’s contentious political events and bots or whether it’s increasing or decreasing polarization. And your research has convinced us that it’s one part of the puzzle ,but blaming technology is not in and of itself, valid, in the case of at least these two studies that you’ve done. So thank you again so much for joining us, Sandra. And I look forward to seeing your continued research in these areas.

Sandra González-Bailón: Thank you, Nosh, and I look forward to meeting you in-person again soon.