Episode 4 Transcript

 

Jen Golbeck: You know, I kind of jokingly said to someone at some point that I want to be the world’s expert on dogs on the internet. And I might be at this point, or at least up there with kind of pets on social networks. 

Noshir Contractor: That was Jen Golbeck. Jen is not only an expert on internet pets, but a leading voice in web science. She’s a professor in the College of Information Studies at the University of Maryland at College Park. You may know her from her TEDX talks or podcasts about web science and pets. She has been a research fellow of the Web Science Research Initiative, and gave a keynote address at the 2017 ACM Web Science Conference. 

Jen is also known for her work on computational social network analysis. Her models for computing trust between people in social networks were amongst the first in the field. And Jen’s also received a lot of attention for her work on computing personality traits and political preferences of individuals based upon their activity on online social networks.

Welcome, Jen.

Jen Golbeck: Thanks. I’m glad to be here.

Noshir Contractor: Jen, let’s start by learning about how you got interested in studying what you do now.

Jen Golbeck: I’m really lucky that the time of what was going on in technology and the time of my life intersected in a kind of fortuitous way. So the web came about, I think I was probably in middle school. And then when I was in high school, the early to mid 90s, I started designing web pages professionally, which you could do as a 15-year-old at that point. I did that throughout undergrad, and you know, through my master’s degree. Sometimes it was my entire income, sometimes it was a side income. But I was also on the path to get a PhD the whole time. And so when I came to the University of Maryland to get my PhD in 2001 in computer science, I met with Jim Hendler, who was my advisor. And I had actually started as an economics major at the University of Chicago, changed to computer science. But Chicago has the guys who did Freakonomics, you know, this behavioral stuff that wasn’t just, you know, markets and finance. And I loved that. 

And so I was like, Jim, how can I take this sort of stuff about how people behave, and things emerge out of that, and then cross it with the web, which is something that I’ve just been immersed in, you know, since I was kind of a thinking pseudo-adult? I said, “Can we maybe do like a social network and put that on the web,” and he was like, I mean, “That sounds interesting. Go ahead and try it and see what happens.”

That was 2001. So pre-Facebook, Myspace was just kind of getting started. And I was like, alright, I’m gonna study social networks on the web. I built some, you know, I studied some of the early ones that were out there. And so it got me into doing research, right as the entire universe of the web shifted into this place where humans were creating tons of content, people were spending a lot of time. And so it was just kind of natural, then, to flow into web science, you know, working in a lab that was looking at knowledge representation and putting information online, and then pulling my own interest of people online and what they’re doing and how to merge that with AI.

Noshir Contractor: It looks like you had the right time, and the right people to be working with, in addition to having the right skills for doing all of that stuff. One of the things that I remember reading earlier on about your work was this work in the area of trust-based recommender systems, and you developed a platform called Film Trust. Can you tell us a little bit about what you learned from that experience? And what do you think of the future of those kinds of recommender systems?

Jen Golbeck: That’s my dissertation work that you’re talking about. So I basically built my own social network because there was no data like this in existing social networks. And in there, you could go in and like, rate your favorite movies. Like you do now with Netflix or Amazon, whatever. And then you also could add friends like on any social network. But I added this system where you could rate how much do you trust this person to recommend a good movie to you, basically. 

And the question was, we had recommender systems at the time, like we have now with Amazon and Netflix and say, here are some movies that you might want to watch. Those generally worked by finding people with similar tastes to you and suggesting stuff they like, essentially. And so I was interested: Could we use trust that people express about their actual friends, and do a bunch of interesting AI with that and use that in place of similarity? So if I say I trust you about movies, even if it looks like we’re statistically different, can I maybe get some good information? And it turned out from that, that it does work. And it works really well in cases where I’m just very different than everyone else.

So an example I used to give all the time was, I’m a real film buff. I used to be a projectionist in a theater. I hated A Clockwork Orange, which is like a classic piece of cinema. I wish I had those hours back in my life. No one who is a film buff hates that movie, but I hated everything about it. And so recommender systems would see, okay, well, she loves all this classic cinema, of course, she’s gonna like that movie. And I’m like, any system that tells me to watch A Clockwork Orange is not one I want to use. Like, it doesn’t understand me. And trust is great at capturing those really extreme preferences on either end. 

And so it was this really interesting lesson in our social relationships and our understanding of how we relate to people has a power that statistics alone don’t capture. But we can put those things together with AI and some statistical analysis and all this data on the web, to kind of get the best of both worlds. And that’s now something that in all these personalization algorithms, like you see on Facebook, sorting your timeline, like you see in, you know, a lot of recommender systems, they’re incorporating those elements of social relationships. That was one of the things that I first investigated in that dissertation work.

Noshir Contractor: And that was very influential at the time. I remember looking at it, and people were beginning to understand whether trust-based recommender systems may be different, or augment purely algorithmic-based recommender systems. 

Netflix, for the most part, is making its recommendations based only on its internal algorithms certainly improved by the Netflix challenge. Is there a difference in the kinds of recommendations? Do you see some day when algorithmic recommendations like the ones that Netflix are doing will just get so good that you don’t need to rely on trust-based recommender systems or social network based recommender systems? And at the end of the day, I guess, do you trust your ability to report accurately who you trust?

Jen Golbeck: That’s a really good question. So I think it depends on the domain, you know, Netflix, I don’t think they really need to use a lot of social network data. Because for movies, you can get a lot about people’s preferences with the genre, the actors, like all this really detailed information we have. The same thing goes for like music recommenders, and all these kind of streaming music services that will make a channel for you. You don’t need a lot of social data for that it may help in little instances. 

But there are a lot of cases like, what do you want to look at on your social network feed that are much more social, and not just news stories, right? But like, whose friends’ kids’ updates do you want to see, you know, if it’s the person that you’ve been friends with, since elementary school, you may totally want to see that. If it’s, you know, some guy you met at a professional conference that you happen to friend on Facebook, you may not care at all about that. And your social network and your friends preferences can shape those sorts of personalizations in a way that I don’t think we’ll ever really capture with a purely statistical algorithmic recommender system. So I think depending on the context, the more social that context is, the more important it is to have social input to it.

Noshir Contractor: It also seems that in some situations, if you really trust someone, and they tell you something different than what you’ve seen previously, you might be more open to looking at it and considering it. While if a computer is basing its algorithm exactly on what you like, it may be less likely to provide you enough variety. One of the things where recommender systems have been criticized for is that they make you live more and more in an echo chamber. While in a social network, somebody that you trust might actually tell you something a little different that you may or may not like. If you don’t like it, you may not trust that person anymore, but you might like it as well.

Jen Golbeck: Yeah, it’s, it’s an interesting combination. I’ve had PhD students, I have one just graduated a year ago who was in journalism, looking at, you know, how do we figure out the news that people trust? You know, are they more likely to believe conspiracy theories or fake news or kind of legitimate, real journalistic standard kind of news? And how does that relate to the people who are sharing it with them? And I think that’s really important. 

If I have a really trustworthy source, who’s coming to me with something, that’s not what I might normally believe, it may make me more likely to consider that and understand that information. And so that’s one of those interesting ways where you could merge something like, here’s Facebook or Twitter with a really good model of what I’m going to like or click on or comment on, very good at that. Here’s stuff that’s gonna keep me engaged. But let’s broaden that out with a more diverse perspective. 

And something that they can see is different than what I might normally click, but coming from someone I trust, that’s a way to sort of say, okay, like, let’s expand to this viewpoint, and maybe look at optimizing things like the social good, or how informed someone is or how much they’ve considered a breadth of perspectives. It’s something that we need to get more into in the research, but I think social connections will be really critical if we start expanding recommendations like that. 

Noshir Contractor: And you continue to work on beyond predicting which movies that you might like, and you spend a lot of time looking at web data from the web, to understand individuals’ activities, attitudes, and behaviors. One in particular, was your work on trying to predict the extent to which a person might be able to stay within the Alcoholics Anonymous program or not? Can you tell us a little bit more about what you found there?

Jen Golbeck: Yeah, so that project we originally had started wanting to look at DUI recidivism, so someone gets a DUI, how likely are they to get another DUI, and we really dug for data on social media about that, but not surprisingly, people aren’t posting a lot about their DUIs. And, but what we did find in the process of looking for that is a lot of people talking about their problems with alcohol, going to Alcoholics Anonymous, if they were drinking. And so we did this study where we basically looked for everyone who announced on Twitter that they were going to their first AA meeting. And then we followed what they tweeted after that, you know, after filtering it out for jokes, or whatever, people who legitimately had drinking problems. And after they said that we looked at, did they stay sober for 90 days? Or did they go back to drinking, and we made sure they said, so it could be, you know, two weeks later, they complain, they were hungover at work, we knew they were drinking, it could be six months later, they were celebrating their six months of sobriety, so we knew they’d made it those 90 days. 

And then we just took all the data that we could model from their Twitter feeds, to try to see if we could predict that, you know, on the day, you announced you’re going to AA, can we predict if you’ll be sober? And so we looked at things like: Who are the people that you follow on Twitter? And how much do they talk about booze? How much do they use words about alcohol? Are you over 21 or under 21? How do you cope with stress, kind of using other AI as input. And these are all things that addiction researchers might consider. And so we use that as an input to our model. And we can predict with astonishingly high accuracy, 80% accuracy, if someone is going to stay sober or not, on the day they decide to go into treatment.

Is it good or bad, I really struggle with it. We have not made that tool available to the public, because I can see a lot of dangerous ways for it to be used. But it is also explainable. So if you say I’m going to go to AA and my algorithm says I don’t think it’s going to work, it can tell you, this kind of therapy might be helpful or changing up your social circle might be helpful, which I think could be really useful. And it’s one of those things where I am impressed as a scientist with the computational power of what we can predict from this web data. I am also very concerned as someone who plays in the social science space about the implications of that algorithm. There are good and there are bad and I think we just need a lot more work in the kind of policy space, the regulation space before a tool like that is brought out to the world.

Noshir Contractor: And this is exactly why web science is trying to navigate this balance between what can be accomplished technologically and what should be accomplished, or how should it be accomplished, from a social standpoint or a policy standpoint. And your work is just a really excellent illustration of how one tries to navigate through that dilemma. One of the things that this work shows is that if it goes in the wrong hands, for example, I can imagine somebody who is being pulled over by a cop, for example, right? And then the cop could potentially be using this algorithm in different ways to determine what kind of response in that particular situation. 

Jen Golbeck: One thing that we’ve seen with AI is that it’s used sentencing guidelines now. And it’s used in ways that we know are already profoundly unfair. But you can imagine this algorithm being included in the decision about whether to send someone to jail for a DUI or to send them into treatment, you know, if the algorithm says treatment will work, they can go to treatment, if not, they can go to jail. But the algorithm is wrong 20% of the time. It’s also pessimistic, so when it’s wrong, it tends to say you won’t recover when you will. Who knows what other biases are in there? We can’t really tell, but there certainly are some. So yeah, it’s really worrying to think about people who may very well mean well and want to make the right decision using this technology. Because AI has this veneer of objectivity, right, it’s math, it’s totally objective, it can’t be racist, or sexist or biased. But of course, it totally is, it just reflects human society. And people who don’t understand that and the errors and the pitfalls, may try to use it in ways that just echo all the problems that we already have, that we’re kind of trying to fix with technology. And that, you know, is worrying not just in this case, but in all these applications of AI and web data together that get out into the world. 

Noshir Contractor: Well, one of the things that you highlighted in your own work, is that when you have these predictions, if it goes into the wrong hands, for all the reasons you’ve been describing, could be something that could have unintended negative consequences. Do you believe that these predictions should always be given to the person involved? 

Jen Golbeck: I mean, generally, I think they should always be made available. Right? If I ask, I should be told 100% of the time exactly what I’ve asked for, I think I have a right to know that. I may not have that legal right right now, especially in the US, but it’s something that I am working hard for us to get, I think it’s important. Will everyone want to know, you know, not necessarily. A lot of this is benign — personality traits, what are your political preferences, stuff we already know about ourselves. But this alcoholism example, as one, you may not want to know when you go into AA if an algorithm says it’s going to work, you know, if the algorithm says AA won’t work for you, when you’re going and you really want to solve your problem, it may be discouraging to the point that you are so fragile in that recovery that you decide not to continue. So you may say I don’t want to know what the algorithm says, because it may tell me something that won’t help.  So I think they should have a right to know if they want to, but it doesn’t necessarily need to be automatically shared.

Noshir Contractor: I want to move us from social networking to social petworking. And I wanted you to tell us a little bit about the research that you’ve done on looking at social network sites for dogs versus cats. And why, turns out, that they do different things on these websites.

Jen Golbeck: I kind of jokingly said to someone at some point that I want to be the world’s expert on dogs on the internet, and I might be at this point, or at least up there with kind of pets on social networks. So I’ve always been fascinated by how people put their pets on social networks, as I’ve followed the development of these. And the work you’re talking about is some work that was looking at some of these early pet social networks, Dog-ster and Cat-ster, there is a Hamster-ster, where you could create a profile for your pet, make them friends with other pets just like you would do on Facebook or any other network. 

And what we saw when we studied this is that people use them quite differently. Cat people tended to participate in these kind of community forum discussions, they would do these role playing kind of games and exercises from their cats’ perspective. So there would be these like, cat weddings were very common. People would pick their cats to get married, everyone would come like they’d send invitations, they talk through the reception and everything at a particular time. Dog people tended not to do that kind of thing. Now, this was kind of early days of social networking, you know, mid 2000s. And we’ve seen that kind of behavior shift on to things like Twitter and Instagram now, where I have very popular social media pages for my dogs. I don’t post in their voice, but some people do that. 

It’s a really interesting way, where you can see like cat videos were the thing on social media for a long time. Dogs maybe have eclipsed that a little bit recently. But people interact in really different ways through that. One of the most popular cat social media accounts is, I think, “black metal cats.” And it’s like, death metal lyrics with pictures of cats. And they tend to kind of embrace that, that stereotype of cats, where dogs’ social media tends to be very wholesome and encouraging and supportive. It’s interesting that, you know, all the research bears out dog and cat, people tend to have different ways of approaching life. As that moves on to the internet, we’ve seen that consistently that they — they tend to behave in different ways, which I think is kind of fun and wholesome, interesting kind of research in the space.

Noshir Contractor: One of the things that I found really interesting is your explanation as to why cat people are more likely to organize virtual playdates on the web, as compared to dog people. And you mentioned that one of the reasons that might be the case is that dog people take their dogs on walks. And that might make it more sociable than most people, who don’t take their cats on walk. 

Jen Golbeck: I am doing some research on this topic right now, sort of the benefits of having a dog and a lot of research benefits of having a dog, not just online but offline, is that it absolutely makes you more social, because you’re out there walking the dog, even if you don’t go to a dog park, you tend to encounter other dog people, you can talk about your dog, if you’re a dog person, you know, if there’s people you see regularly on your walks, you may not even know the humans’ names, but you know the dogs’ names, you recognize those people. So you get to have these social interactions around your dogs, that you really don’t get to have as much as a cat person, because you’re not out in the world encountering it. So virtual spaces provide an opportunity to socialize around those pets. And I think that’s, that’s one of the like, really good things that we’ve seen in general on the web, is that it’s created these spaces for people to connect socially, whether it’s around a rare disease that they have, or you know, a life struggle they’re going through, or their pets or their hobbies, where it was hard to do that in-person, because there just wasn’t a lot of density or opportunities. The web has created those spaces, and so even though it can look a little weird looking in on these online cat communities, I think it’s great that it provides those opportunities to socialize.

Noshir Contractor: And unfortunately, many cases now, virtual playdates have become much more prevalent in general today because of the pandemic, in addition, of course, to the social cultural movements that are experiencing right now as we have this conversation. I want to close here, by asking you, if you could reflect on one thing about what we are experiencing in either of these two areas, and to see how it would be different, better for worse, if we didn’t have the web today?

Jen Golbeck: So I tend to be the pessimist painting the picture of our dystopian future and warning people of the bad things that are gonna happen. I’m not going to do that here, I will give you a positive view of what’s going on right now. You know, you especially look at the Black Lives Matter movement, everything that’s going on with the protest, police brutality, and then also against the administration’s handling of COVID. Social media has been a really powerful place for that. 

And I think an interesting way to think about these movements, not just right now, but going back to like Ferguson, looking at the Me Too movement, these, these social movements that have come is that if we look in pre-web times, our media was very much controlled and gatekept. We had a few major networks, they decided what was going to be shown, those were the voices that we saw. And they tended to be white voices, and male voices. I was born basically, you know, a little before 1980. So I remember as a kid, constantly being irritated in a way that I couldn’t describe, at the way women were portrayed in commercials and on TV. It wai not how I was, and it’s not how I was raised, but was so frustrating that the dominant view of women was like, “Oh, we need to be helped.” And you know, “I’m sort of ditzy and whatever.” And, you know, I can only imagine the experiences that people of color had with that. 

Social media and the web have given voices to every community that cannot be suppressed in the way that they were when there were these gatekeepers. And we have seen these movements. We saw Ferguson, which is something that I think would not have been covered by the media in the same way, if everyone was not on Twitter, when that was happening. We have the Me Too movement that gives people a voice to sort of challenge these large voices in ways that they couldn’t before that. 

And I think if we look right now at what’s happening, especially with the introduction of mobile and everyone having access to these platforms, from their mobile devices, posting videos. posting pictures, challenging what we would always see the police say. You know, there was a video that came out of police arresting, taking a teenager off the streets of New York City throwing her into an unmarked van and driving her away. There’s tons of videos of this, and the police say she was wanted on a bunch of other charges. And then the police were attacked with rocks and bottles. The video shows that they were not attacked with rocks and bottles. None of that was happening. And before social media, we would have gone “Well, the police said this. So that’s probably what actually happened.” And now everyone has the power to challenge these dominant voices. 

I think that shift of power away from institutions and people that have traditionally had it is very uncomfortable for the people and institutions that have traditionally had it. But it’s incredibly powerful in shaping and pushing for the change that we have desperately needed for a long time. So I think the web has facilitated that shift of power in a way that is so good for society. Even though we’re very disrupted right now, for lots of reasons. I think the web is playing a net good role in that. And it’s, it’s one of the, you know, most powerful influences that we’ve seen in the last 20 years. And I think it’s going to continue playing that role.

Noshir Contractor: I’m glad to hear that you are so optimistic about these things. And frankly, I’m optimistic knowing that scholars like you are leading and pushing the frontier in the area of Web Science. As you mentioned, I have been following your work since your dissertation days and have been really impressed with the ways in which you’ve been doing high-quality work, socially responsible web science, and being able to translate it well. And I definitely recommend that our listeners, follow you on Twitter and on one of your many Twitter accounts that you have, as well as, listen to your talks on on TEDx because they are really, really compelling. Thank you so much again, Jen for taking time to talk with us.

Jen Golbeck: Thank you. It was a real pleasure as always.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

 

Episode 3 Transcript

 

Wendy Hall: I always say there’s two things I don’t like about web science. One is web and the other is science. But the idea there was that it wasn’t just about the technologies, HTML, HTTP. It was about the web of people actually, it was about interconnectivity, and science in the sense of study of it. And nowadays, I say it’s, we’re studying, you know, our lives online, basically finding ways to do that.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast, we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web. 

Our guest today, Dame Wendy Hall, was involved in naming the very field we’re talking about. In a different world, we could’ve sat down to chat about philosophical engineering movement, or psycho history. But Wendy and other co-founders decided to call it web science.  

Wendy was a Founding Director of the Web Science Research Initiative and is the Managing director of the Web Science Trust. She became a Dame Commander of the British Empire in 2009, and was elected a Fellow of the Royal Society in the same year. She was elected President of the Association for Computing Machinery in July 2008, and was the first person from outside North America to hold that position. And now, Wendy is the Regius Professor of Computer Science at the University of Southampton, and the Executive Director of the Web Science Institute there.  In 2020, Wendy was appointed as Chair of the Ada Lovelace Institute by the Nuffield Foundation. Welcome, Wendy. 

Wendy Hall: Hi. It’s lovely to be here.

Noshir Contractor: Thank you so much for joining us, Wendy. It’s a special privilege to be able to talk with you today about web science, because in many contexts, I would consider you as the matriarch of web science. And I would like to begin by asking you to take us through what motivated you and your colleagues to come up with the idea of creating this entity called web science.

Wendy Hall: Well, thank you very much for asking me. We being myself, Tim (Berners -Lee), Nigel (Shadbolt), and Danny (Weitzner) — started meeting in Tim’s office at MIT, to talk about why the Semantic Web was not being taken up more than it had been: Why weren’t people interested in linking data? I’ve known Tim since before the beginning of before he launched the web, and had been around the evolution of the web. And he talked about Semantic Web in his very first keynote at the first Web Conference in 1994. But then everyone was focused on getting the web up and running. And that was thought of as a web of link documents. And we didn’t have social networks yet. The Semantic Web was always part of Tim’s big vision. That machines could help you link data, and when you could link data, then you could infer knowledge from the documents that you were linking or from whatever you were linking, if you could describe it with data.  

And people didn’t just didn’t get it, the web consort — W3C — had developed the Resource Description Framework. He published his paper with Jim Hendler, and others all about what the Semantic Web would mean. And he just couldn’t get people to think about linking data. And so when we started talking about this, and this was 2004, 2005, so 15 years after the web was launched. It was clear that we had to look back in order to look forward. 

So we started to look at how the web had evolved, what had been the tipping points that had for the web to take off? Why did people start adopting the standards that Tim eventually made completely universal? And we started drawing pictures, of how — and we realized that it was actually to do with people and not so much about…not just the technology, it was what people did with it, and how companies used it to create new businesses..

People started having computers at home, and then the smartphone appeared. And that was all happening as we were talking. So you can see it was taken off, but it was so interesting to think about what had happened. It was clearly a sociotechnical story. We have to study the web as an ecosystem. And has to be interdisciplinary studying, it had to bring in people from social science and law and economics and geography and physics and maths and history and education and politics and business studies and anything.

We called it the Web Science Research Initiative, between Southampton and MIT because that’s where we were based. Jim came on later, and then you came on later, while we were still the Web Science Research Initiative. 

But we didn’t really know what to call it. Tim wants to call it philosophical engineering. He studied physics at Oxford, and that was when it was called natural philosophy, so he wants to call it philosophical engineering. We all wanted to call it psycho history, from Foundations and Trends by (Issac) Asimov. Because you can’t predict what a person is going to do. But can you by looking back at history of development, can you forecast, not predict, what the mass of people will do? What society will do? And that was the really founding idea, but we felt people wouldn’t understand psycho history. I think when we make the film or write the book, they will, but we called it Web Science, for good or bad. And I always say there’s two things I don’t like about web science. One is web and the other is science. But the idea there was that it wasn’t just about the technologies, HTML, HTTP, it was about the web of people actually, it was about interconnectivity, and science in the sense of study of it. And nowadays, I say it’s, it’s we’re studying, you know, our lives online, basically finding ways to do that.

For web science, the other thing that was so important was the timing. 2005 was when we did our real thinking about this. And we thought about the name. And we actually launched it on the world in 2006. That’s when we did the press release from MIT, we had the piece in Science. And the amazing thing was that Facebook didn’t start till 2004, so. And Twitter hadn’t started then. So we were doing all this thinking before there were the social media networks, they didn’t exist, but we could see that they were coming and that the issues were the big issues for the future, were going to be issues like privacy and security and trust. I remember writing them, they were like the three term mantra we had, because we could see that they were going to be the big issues for the future, as this opened up to everybody. And the trouble is, everybody includes the good and the bad of us. 

When you think about Vint Cerf and Bob Kahn, when they founded the internet, it was a league of gentlemen who all trusted each other, they were all friends. If somebody did something wrong, you would just tell them to stop doing it. But once the web opened up to the planet effectively, then you can’t stop people. It’s very hard to actually stop people doing bad things on it. We think now about what would we do if we started again now, what would we do differently? Because that was what made it work, was this openness and its accessibility and the fact that anybody could set up a web page and a server. That’s what Tim gave to the world. But that meant anybody, including people that want to steal and harm and do all the nasty things that exist in the real world, do them online at scale. And that’s what we’re living with today.

Noshir Contractor: You’ve touched upon the issue that the web can be used for good and bad. And I want to ask you, to what extent do you see the mission of web science to be focused on the cautionary tale of things that could go wrong, as compared to the opportunities that it creates for novel ways of organizing, for example. How do you reconcile these two aspects of the vision of web science?

Wendy Hall: Wheel, when we started, certainly, in my mind, it was the — it was definitely the former idea. It was, how could we forecast what will happen if we do this with the web, if we create this, this ability, if we develop this standard, if we allow people to put videos on the web, right? 

When the web first started, you couldn’t get a picture or a video on it. It was a dream. And as those standards emerge, so you could, you then of course, have to think about what will people do with that? We — now we know. Tim will often say he would have put more security protocols in the standards if he’d realized, you know, what people were going to do with his invention. So it was the idea that this is never going to be a predictable science, but forecast how people will behave and what bad things could happen. And I think of it like scenario planning, and you sort of like, well, how can we make it shift? How can we make it? What can we do to make sure it goes more in the good direction than in the harmful direction? How can we mitigate against harm? And the problem with that, of course, is you’ve got to then observe what’s going on, you’ve got to have a way of observing, and analyzing what has happened in order to look forward. And then if you are observing what people are doing, you potentially change what they’re doing, you know. So it’s quite a difficult science to evolve in that sense.

Noshir Contractor: You spoke about the invention of the Internet, and how it was different in some ways from when the web was invented. In your own work, you’ve spent a lot of time thinking about today’s fragmentation of the internet and off the web more generally, I’d like to you to share a little bit about the concerns and issues that you see in terms of the fragmentation.  

Wendy Hall: What has happened is over the last 30 or so years, is that the internet has evolved in different ways in different regions of the world. And the geopolitical nature of that is fragmenting the internet and people often talk about the internet becoming bimodal between the US and China, the bi-ification of the internet. But actually, we think it’s more nuanced than that. 

My colleague, Kieron O’Hara and I, who have developed this idea, we’ve just just written a book about it. Think of it as four internets. So the first internet is the open free universal one that we think of as coming from Silicon Valley, all the big companies are there, and then you’ve got their mirror images in China. But actually, the different regions of the world act culturally differently. 

So in the US, the internet is very market driven. The big companies there, they lobby Washington for the regulations and tax reliefs, that will help them grow their companies and bring value to their shareholders, which is fair enough. 

Europe has taken a very different attitude and put data protection first. We don’t have any of the big social media platforms, they’re all based in the States. So the civil libertarian views from Europe have moved in a data protection way. So it’s culminated in GDPR, General Data Protection Regulation. And if you want to  be on the internet in Europe, you have to abide by those rules. If other countries want to sell their digital services in Europe, they have to abide by GDPR.  So that’s the sort of regulation before innovation sort of idea. 

And then of course,  moving to the east, you’ve got China, 1.4 billion people in the world. From the very get-go, China realized the power of the communication medium that the internet was, is. And so the basic rule in China is the government can look at anything. And so if you’re a company, you want to have a digital business in China, you have to abide by Chinese laws. And this is really beginning to fragment the internet. 

And I’ll say two other things  we talked about in the book about Russia being the spoiler. It’s not trying to create a new type of internet, it’s just trying to use the internet to interfere with other countries in various different ways. There are other — other regions that do that as well. 

And then there’s also the big point is that last year, 2019, we reached the 50/50 point on the internet, which means that 50% of the planet have access to the internet. That’s awesome. In 30 years, that’s happened. But it also means there’s 50% still to come. And that 50% is in, largely, in rural China, rural India, and rural Africa. The way India goes in terms of internet governance is really important for the future. And Africa will probably go the way of China because of the Chinese investment in Africa.   

And if you look at the numbers and populate the population numbers, we can end up with a very small part of the internet that is run by democratic governments unless India sticks with the open and free type of access that we have. And I can say a lot more but that’s it’s that sort of a geopolitical analysis of what’s going on. It’s really important, I think our message of the book is keep the technical standards open. Because if that goes and people start to create alternative views of the internet, which means the web can’t run across globally, freely, then you know, all bets are off. And the key thing is nobody owns any of this. Right? The web or the internet are not owned, there’s no one company, no one government. It is us, they are ours. So we have, I think, as much duty to look after them as we do to look after the physical planet.

Noshir Contractor: Yes, I think what you point out is that having these common technical standards does provide a prerequisite for creating a public good that would be global. And yet, that may not be enough in this context, that in some cases, you can still see fragmentation based on geopolitical forces.

Wendy Hall: And it all comes down to how countries govern data.

Noshir Contractor: Given that 50% of the planet is still not on the internet, a lot of places that you referenced were what we might call the global south. Do you see that as creating a fifth internet? Or do you forecast, to use your term, that it is going to fold into one of the existing four internets?

Wendy Hall: Well, our forecast is it will, it will fall into one of the existing four. But we do talk about a fifth internet in the book. But I’m not going to tell you; you have to buy the book.

Noshir Contractor: That’s a wonderful teaser. Wendy, in addition to playing a prominent role in research, you’ve also helped shape science and engineering policy and education. And you co-chair the UK government’s AI review, which was published in 2017. And the UK Government announced you as the first Skills Champion for AI in the UK. 

One topic that is particularly near and dear to your heart is the role of women in computing, and more generally, in science and engineering. Can you talk a little bit about where you think that is headed?

Wendy Hall: (Sighs) It is a tale of two cities in a way. When I was young, you know, I was born in the 50s. And the world was very different. And no one in my family had been to university before. And the reason I would be expected to go to university would be to find a better husband, and get married and have kids. That was the expectation. My parents wanted more than that for me. But the expectation was, my future was marriage and kids. And we didn’t have the equality laws in the UK then, and my very first job interview, as I was a mathematician originally, and I went for a job as a lecturer in maths, at a university I won’t name on here in the UK — it wasn’t Southampton — and my first job interview they told me at the end of the interview, when they decided to get the job. On the day, the head of department said to me, I’m afraid Wendy, you didn’t get the job because you’re a woman. And he told me that on the day. I was young, they didn’t think I’d be able to control classes of engineers and computer scientists. And anyway, the very next week, I got a job doing the same thing. But that was my first sort of realization that things were different for women. Now, they couldn’t do that today, but they might still think it. 

Then when I went to Southampton, we realized in 1987 that we had three years of computer science undergraduates with no women on them at all, women had just turned away and there have been women before and this was the time of the personal computers, the spectrums and the BBC Bs and the Commodore pets, and and suddenly computers were had become overnight toys for the boys. That really switched a whole generation of women off almost overnight. In the West, we have never really recovered from that situation. Countries that came later like your home country, India, that came later to the world of computers didn’t have quite that issue. And so if I go to India, I go to a computer science class in India, more than 50% of the students will be women, right? So it isn’t genetic, it is deeply cultural. 

I’ve tried all my life to turn that round and try and get girls interested in computing and really failing quite miserably, because the stats are just so bad. But the world around us has changed dramatically, of course, and, you know, women now aspire to much, much more, I think they’re still under pressure, they still have this problem of you can do everything. And so you try and be a mother and career woman. But, you know, it is possible. 

The cultural computing in some parts of the industry is just toxic for women. In particular, Silicon Valley, is well known for really, really being toxic for women. It’s so sad for me that we still have this problem, I meet more and more women. So in the world of AI, in some ways, this gets even worse, because when you take a master’s degree in machine learning, you can’t really take those degrees unless you have a computer science undergraduate or maths undergraduate program. So you already got a much, much shorter pipeline for those. So you’re going to increase the stereotyping. 

And we were so worried about this, when we wrote the AI review about how we would get more women coming into AI. But I do see lots of women are involved in AI in the areas of ethics, thinking about how AI is going to be used in society, that is attracting a much more diverse pool of talent. And so I think we need to capture that, that’s what we’ve been trying to do with the skills program in the UK, there will be lots of new jobs that are not to do with programming, to do with auditing AI, looking at bias in AI, design of AI to make sure it’s for the good, not going to be harmful to people or get the wrong results. You know, be biased. 

And I always make sure diversity is firmly in that ethical framework. My argument is if your workforce is not diverse, and I’ll tell you a diversity in its broadest sense, I mean, here, gender, race, culture, age, disability, and all everything you can think of, in that broadest range, then, if it’s if your workforce isn’t diverse, and there’s more chance that your AI systems are going to be unethical or biased in some way. So my mantra is, if it’s not diverse, then it’s not ethical.

I still want to get more women doing the feeding into the pipeline, I want to get more women interested in school, so they do the qualifications to and want to study computer science, but at least we see more women in the workforce.

Noshir Contractor: I think that’s that’s a very fair point. I think there is a temptation in the past to equate balancing the need for diversity and the need for excellence. And what you’re pointing out is that in fact, the two are not opposed to each other that they actually are symbiotically related.

You have been such a wonderful role model in all of these respects, and so we salute you for that, you’ve given us a really good story about how the web got started, and web science in particular got started. Based on your perspective, what do you consider as some of the most significant issues that need to be addressed by web science moving forward?

Wendy Hall: Well, I will have to answer that in terms of data. As you know, I’ve been passionate about the idea of building observatories for web science, the use of the term like the physicists observe the stars and the planets. And, and, and use that all that data to, you know, to work out where we came from, and where we’re going.

And part of that also is how you visualize, how you analyze the data that’s in there. But for me, the really difficult thing is getting the evidence we need. 

And it was, to me, it was all about how do we, how can we share the data that we collect, because it takes so much effort to collect that data. And then when the person who’s collected it retires, or leaves or moves to another job, it just all evaporates. 

And we need some way of being able to share data with other people in ways that’s legal and ethical. And, you know, people are not abusing, you get cited if your data is used, or you get some money for it, if people make money out of it. This  is important for companies, but it’s important for research scientists too. And we’re still struggling to find a way to do that. The hardest thing is actually curating the data and making it available. And then you’ve got the issue of well, what if it’s, you know, data about people and data that’s confidential to companies. So I think that is our biggest challenge is how, as a you know, how, as a community, we can crack that one. 

Noshir Contractor: That’s a really important issue. You also spend some time working with the Library of Congress and some of these issues, haven’t you? 

Wendy Hall: Can we tell the Twitter story? I mean one of the reasons I went there was we all knew that the Library of Congress was getting the Twitter feed and all the data. There was a server down in the basement of the Library of Congress, which was getting a Twitter feed every day. When that was deal was done, Twitter wasn’t the company that it is today. And so that data represents how the company is doing so and it’s very confidential. And of course, even though Twitter is open, you know, you tweet to the world. People — Twitter allows people to delete. So you know, there’s very confidential information in there. So they’re collecting all this stuff. had nobody using it. And so they turned it off as a project. And I understand why. But my reason, my worry is who is the custodian of all this data that in 100, 200 years time, people want to know, what were we saying on Twitter? What was on Facebook? Well, this is the record of our society. Right? And we are collecting snapshots of it in the libraries. They have, you know, the British Library, the library, Congress, they have Web Archiving projects, but it’s snapshots. And the Internet Archive takes what it can, but they’re snapshots. And I don’t know if the companies are storing it for the future historians. I think that’s another challenge for us as a community, how do we, how do we retain our memories, digital memories?

Noshir Contractor: In fact in some cases, I believe that there’s regulation that does not allow the company to hold on to data beyond a certain number of years. I wanted to ask you, as we consider this moment of social reckoning that we are experiencing alongside the pandemic, what is the one or maybe two significant things that would have been different, for better or for worse, if we were going through this period without the web?

Wendy Hall: Well, can you imagine, I say this to people, if the pandemic had happened anytime before 2000, 2010? We would not have been able to deal with it as we have. Not only is it kept our communities, it’s enabled us to see friends and family and talk to them, to actually have lockdowns in a way to save lives, and to keep our spirits up and to enable us to communicate, then life would have been really difficult.

And also the international work to share how to deal with the virus, right? Work about vaccines and antibodies, and what treatments to use and when to lock down and when to ease up. So we’ve rediscovered our love of the web and the internet and COVID,  the The TikTok videos and the zoom cocktail evenings. It will change our lives, it will change our world. It has taught us we don’t have to travel halfway around the world to go to a conference to give a single paper. I want to get back on an airplane. I’m sure you do too. But, you know, we’re beginning to understand that there is a world other than jetting around all the time. And before the pandemic. We were in the West, certainly we were worried about the harmful things that were happening on the web and the internet. We were worried about how to deal with that. We still are, but we have as I say, we have learned to love it again. We’ve remembered why it was invented in the first place. And I think that’s hugely important.

Noshir Contractor: Well, thank you very much again, Wendy, for taking time to take us through a journey of Web Science from where it started to where it’s headed. It was really a pleasure to speak with you. And again, thank you for all your efforts in leadership in terms of developing this field, but also in terms of the work you’ve done in related areas of policy and education. Thank you again.

Wendy Hall: Thank you Nosh, thank you for doing this series.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. This episode was edited by Molly Lubbers. I am Noshir Contractor. You can find out more about our conversation today in the show notes. Thanks for listening.

 

Episode 2 Transcript

 

 

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today.

On this podcast we bring in thought leaders to explore how the web is shaping society and how society, in turn, is shaping the web. Today, I am speaking with Professor Susan Halford from the University of Bristol in the UK, where she is a founding co-Director of the Bristol Digital Futures Institute.

Susan is a sociologist whose research is focused on studying digital data and infrastructures from a socio-technical perspective.

Before joining the University of Bristol. She was a founding director of the Web Science Institute and co-Director of the Center for Doctoral Training in Web Science at the University of Southampton in the UK.

Professor Susan Halford is a fellow of the UK Academy of Social Sciences and Royal Society of Arts and is currently president of the British Sociological Association.

Today, I talk with Susan about how and why she got involved with web science and her role in advocating a social science perspective to both research and education in web science. Susan, welcome to Untangling the Web.

Susan Halford: Thank you very much for inviting me, Noshir.

Noshir Contractor: I want to talk with you about your leadership and your engagement with web science in general, the challenges and opportunities that it can help address, and also its relevance during the COVID-19 era. So to start with, Susan, what does web science mean to you?

Susan Halford: What websites means to me as a discipline or as an area of research is a very, very broad umbrella of activities that explores the past, and the present, and the future of the web.

And I think under that really broad umbrella from an enormous number of different and very challenging research fields.

I think history is really important because the web didn’t just come out of nowhere. One day on Thursday afternoon. Now s there’s a long history and that predates even Tim Berner-Lee’s ideas about the web and that has really shaped what the web has become in the present.

But the present, has really, never ceases to surprise us in terms of how the world has shaped up. And getting underneath those dynamics: the economic dynamics, the technical dynamics and the social and political dynamics of how the web is evolving – and as you say exactly – how the web has changed the world and the world is changing the web, and what that might mean, not only for the future of the web, but for the future of society. And I think web science is uniquely placed to explore those really fundamental questions that matter to everybody on the planet. In fact, even those people who are not yet connected to the web, their lives are really shaped in terms of global markets in terms of the management of war, and all state activities. So I think web science is incredibly important to all of us.

Noshir Contractor: And you’ve spent years now focusing on your research on what you call socio-technical theory and methods, and how they apply to digital data and infrastructure. Tell us a little bit about what you understand by the term socio-technical theory and methods in this context.

Susan Halford: So as an organizational sociologist for many, many years, and not thinking particularly about technology at all, but thinking about organizational transformation of one sort or another.

And I suppose in about the year 2000, I started thinking about digital innovation and organizational practice, and particularly, exploring how digital innovation was changing, or, was imagined to be changing healthcare.

So I was working at that time in Norway, right inside at the Arctic Circle at the Norwegian Center for telemedicine, which is located in Tromso 200 miles north of the Arctic Circle.

And that time, the Norwegian government was making very big investments in digital innovation in healthcare for all kinds of reasons – to do with the geography of Norway, to do with the distributive population, to do with planning for a postal economy and transitions to the services economy in Norway. A high value services economy. And so I started thinking about what the implications of digital innovation in healthcare worth the organization and practice of care, and that became the point at which I realized that sociologists needed to understand much more about technology.

Not only about the idea of technology and the rhetoric around technology, but actually what technologies can and can’t do, and how they developed and what kinds of knowledge it takes, and what kinds of performances technologies have

But also that technologists need to know a lot more about sociology.

So that was the beginning of thinking in the socio-technical way. And for me, what socio technical means is the inextricability of the social and the technical. So if we think about how Bruno Lapour interprets it and says that you can’t look at the world and say that there is part of social life that has nothing to do with technology and that there is part of technology that has nothing to do with social life, you know they are inextricably unavoidably connected and that’s why this word socio-technical is so helpful because it conflates to two and insists that they’re joined together.

Noshir Contractor: And you talk about socio-digital transformations as an important object of study. Can you give an example of a socio-digital transformation that you have been thinking about?

Susan Halford: Social digital transformations at work, say distributed workplaces, for example, like we’re doing to some extent now.

And what that means for individuals, what it means for organizations, what it means for organizational cultures, what it means for markets.

So that’s a socio-digital transformation. I think we’re in the middle of one right now. And I know we’re going to come back to covid, but, you know, I was talking recently to a very large UK company, technology company, they moved 8000 people from office work to home work overnight.

You know, that is a socio-digital transformation. You can’t do that without digital technologies.

But digital technologies on their own, don’t make that happen. It’s people who every morning get out of their beds and go to their to their laptops and they work as if they were in the office and they manage their children somehow and organizations manage their appraisal and processes and recruitment processes somehow. So that’s the socio-digital. You know that you can’t do one without the other.

Noshir Contractor: That is really an important insight. You’ve been at the forefront, at least in the field of sociology, and more generally, in the social sciences of making this connection to studying the web. And you are currently involved in a big project on social sciences, social data, and the semantic web.

Tell us what you have been thinking about in terms of the role of big data, the opportunities, the challenges that it offers to advance a sociological analysis of this socio-technical system.

Susan Halford: Well, we tend to these days. I don’t know how you – what language you use – we tend to talk about new and emerging forms of data because people have become a little bit tired of big data.

So we are interested in large scale data sets, of course, and then the questions of velocity and veracity and dynamism and so on which people associated with big data. And I think we can’t get away from the fact that these data are really exciting and interesting in terms of the traces that they offer of everyday activities of social life.

And at the same time that they are unfamiliar. They’re generated in ways that we don’t necessarily understand for purposes that are not social science research.

They’re controlled and managed by companies in ways that we don’t necessarily have access to knowing what the provenance of the data are.

So my relationship with big data or with new and emerging tools of data is very much one that walks the line between recognizing the huge value of these data for shedding light on social practices in a way that we’ve not been able to do before and an insistence that we have to recognize the strengths and weaknesses of those data for social research. 

So a lot of my work has been in that middle space and refuting the criticisms of some sociologist who will say, “We don’t want anything to do with those data. They’re not remotely interesting and are not the kind of survey data or interview data that we’re really comfortable with. So we’re not having anything to do with them.”

And the big data evangelists, as I would call them, who think that we don’t need anything else; now, all we need is social media data to browser data or sensor data and that will tell us everything we need to know about social life and social practice – which it won’t.

So it’s walking that middle line and trying to bring sociology and big data and big data methods because for many sociologists – not you, because you’re a computational social scientist – but for many sociologists, we don’t have those computational skills. We don’t know how to work with those data.

So it’s about trying to bring the two worlds together in a way that makes the best of what each has to offer and enables us to do research with big data that is really constructive and perspective and valuable.

Noshir Contractor: And in your current position as President of the British Sociological Association, you have a really good vantage point in order to be able to make this argument to your fellow sociologists. You use the phrase symphonic social science. What does that mean to you?

Susan Halford: So that’s a paper that I wrote in collaboration with a very good colleague of mine, Mike Savage who is based at the London School of Economics and it came out of many of these debates about what was the value of new forms of data and computational methods to sociology and a concern that sociologists were too hasty in dismissing those data and those methods. And the term symphonic came from a recognition that one of the really popular – in fact, the most popular – in sociology publications probably of the last 10 to 15 years were working with diverse forms of data – found data – Data that we’re not really contaminants, but didn’t really join up in the kinds of ways.

But we’re pulling together all these different data sources in order to make a much bigger argument. And that the symphony  is the metaphor for that, that you bring in the violins or the cello, or the, you know, other the brass section.

And each of them on their own by something valuable, but together they make something that’s greater than the sum of the parts.

And so the three studies that we were looking at with Thomas Piketty’s Capital, Bowling Alone, and the third one was The Spirit Level by Wilkinson and Pickett. And all three of those books, although they’re very, very different and they’re not part of the concertive movement in social science at all. They come from different social sciences, in fact. All of them were doing three things. They were taking these diverse forms of data and pulling them together to do something bigger than the sum of the parts.

And they were using visualizations, in order to articulate that argument. So you could summarize all three of those books with a single diagram, actually, you know, the U shaped curve or the linear regression, whatever it is.

And they were using social theory in order to interpret the data. And so, we were using that, I suppose, as a device to say, look, you know, this is what very, very successful social science projects have done. This is not so different from working with new and emerging forms of data and it’s not so different from engaging with computational methods.

It’s not the on their own, those things will be the same. But if you combine them with social theory and with conceptual analysis, you know, actually, we can see this as a continuous or a recognizable way of using sociological evidence and making powerful sociological arguments.

Noshir Contractor: And indeed this is what has made website so exciting because it allows us to be able to draw on a very diverse set of methods and measures and address a large broad array of sociological and societal concerns, etc.

Susan, based on your own work, what would you consider as one of the key insights and contributions that your scholarship has made that as of relevance to web science?

Susan Halford: I think we’ve all been building the boat as we row, which is an old Norwegian expression as it happens, of web science. So I think what all of us have done has contributed to making web science, what it is, because we started out with a big umbrella that had nothing underneath it. And we’ve created what web science is through the things that we have done.

So, I hope that my contribution has been to insist on a critical, in a social science sense, a critical approach to data, to method, to be critically constructive about engaging across disciplines. You know, it’s still not terribly common and everywhere there’s calls for interdisciplinarity, from funding councils, from governments, from industry.

And yet it’s still really quite difficult to do. And I think what web science has done is to really make some big achievements in that area, whether that’s studying hate speech online, whether it’s studying the activities of states and insight in the cyber realm, whether it’s looking at mental health and young people, whether it’s looking at delivering healthcare to rural communities. 

That’s an answer about what I think web science can do. But the key thing, but I hope I’ve contributed, is in bringing together in-depth sociological research with in-depth computer science research.

Noshir Contractor: And speaking of interdisciplinary work, you co-directed when you were at the University of Southampton, the Center for Doctoral Training and Web Science which was funded by the UK’s Engineering and Physical Sciences Research Council EPSRC. In that role, how did you take on the challenge of looking at an interdisciplinary training for people who are interested in web science?

Susan Halford: Yeah, I think that the Doctoral Training Center was really critical to developing the Eeb Science Institute, because it threw together computer scientists and social scientists who’ve never worked together before and said, create a training program, in three months, if you don’t mind. And that was a really tall order. And there had to be a lot of learning. A lot of compromise. A lot of collaboration and we did it.

We created a program that was not here is computer science or the better social science on the edge. Not here is social science with a bit of computer science on hand, but that was 50/50

That was absolutely integrating both the social science and the computer science and insisting

students who came to us with a background in computing had to read social theory, students who came from social science had to learn semantic web technologies.

Everybody got thrown out of their comfort zone and by the end of the year, everybody had become an interdisciplinary scholar and that was remarkable.

Noshir Contractor: That is indeed remarkable. What would you consider as some of the most significant issues that still need to be addressed by Web Science?

Susan Halford: I think the future of the web is the most significant issue facing the world today. The fact is, as we know that the web will not stand still.

You know the word is not fixed unfinished or done, the web is changing all the time, and it is absolutely essential that we use that 10 years of experience that we have in web science to address now the future of the web: in terms of ownership, in terms of control, in terms of privacy and security. You know, in terms of human futures, we need to be working extremely hard as web scientists right now.

Noshir Contractor: And then in closing, Susan. We are of course going through the COVID 19 pandemic right now and I would like to get your thoughts on what this pandemic would have been and how it would have been different, for better or for worse, if we didn’t have the web?

Susan Halford: I think, you know, over the past three or four months, probably every day, somebody has said to me, we wouldn’t be getting through this crisis if it wasn’t for technology. I think, actually, what they need is the internet and the web because we wouldn’t be talking. Businesses would have ground to a halt. Education would have ground to a halt.

There would be far worse situations with regard to businesses, you know, at least those ones that have been able to move online. So in many ways, I think that’s right – except – as I said to you earlier, it’s not just the technology that is allowing us, many of us, to move through COVID reasonably successfully. It is the effort that people in organizations and governments and markets and making to ensure that that happens. 

So we couldn’t manage in the way that we are without the web but it absolutely underscores that the web is both human and technical.

Noshir Contractor: I think you made some really important inferences there about how we did have pandemics before the web, but the way the trajectory of those pandemics were quite different from what we’re experiencing right now

And as you point out, is clear that the web has definitely changed what a pandemic experience would be like today, as compared to in the past.

Susan Halford: Yeah, it has. I think it’s also highlighted, you know, there’s no doubt that the pandemic is turbocharged digital technologies and thinking about digital technology. So things that weren’t impossible but seemed impossible.

Before Christmas, you know, mass online education which disability activists have been campaigning for for 20 years and I’ve always been told it’s impossible, suddenly becomes possible, and so, that the goal posts have been moved and the goalpost is not going to be pushed back now. 

So the question is what do we do with that and how do we treat that in a really constructive way? Because if we look at the push towards data-sharing or towards, you know, apps or biometric passports, or apps for population monitoring and there is some really serious concerns there. And we need to make sure that that movement of the goalposts doesn’t allow in all kinds of practices and activities around the web that many of us would find very troubling and worrying for the future of the web.

So, you know, along with the positive, the socio-technical positive, and the things that this pandemic of the web has allowed us to carry on with, we really need to keep that critical scholarly and political attention to the web alive, more than ever, during the pandemic and ensure it doesn’t become a Trojan horse for many of the practices or rapid legislative changes, for example, that we really want to see.

Noshir Contractor: Yeah, things that we might be willing to do during a pandemic, or not necessarily the things we want to continue doing post-pandemic.

Susan Halford: Absolutely, and also on that that might include data shown, but it might also include working from home all the time. You know, I think just because people have been able to do it underneath extraordinary circumstances while homeschooling their children and doing whatever else. At the same time, doesn’t mean that that is a long term sustainable solution.

Noshir Contractor: Wonderful. Susan, thank you very much for taking time to join us today to share your insights with us, and more importantly, for your thought leadership in bringing into disciplinary approaches to web science. Thank you very much again.

Susan Halford: You’re very welcome. It’s a pleasure. Thank you.

Noshir Contractor: Untangling the Web is a production of the Web Science Trust. Thanks to Carmen Chan for editing and technical assistance. I am Noshir Contractor. Thanks for listening.

Episode 1 Transcript

Noshir Contractor:​ Welcome to this episode of untangling the web, a podcast of the Web Science Trust. I’m Noshir Contractor and I will be your host today. On this podcast we bring important leaders to explore how the web is shaping society and how society, in turn, is shaping the web.

Today I’m speaking with Professor James Handler. Jim is the director of the Institute for Data Exploration and Applications, IDEA for short. He is also the ​Tetherless World​ Professor of computer, web, and cognitive sciences at Rensselaer Polytechnic Institute (RPI) in the United States. He’s acting director of the RPI IBM artificial intelligence research collaboration and serves as a member of the board of the UK’s charitable Web Science Trust.

Hendler is a man of many accomplishments. He’s a fellow of the American Association for artificial intelligence, the British Computer Society, the Institute for Electrical and Electronic Engineers, The American Association for the Advancement of Science, the Association of Computing Machinery, and the National Academy of Public Administration.

Jim might well be the only individual who is a fellow of all of these professional associations. On a lighter note, in 2010, Jim Hendler was named one of the 20 most innovative professors in America by Playboy magazine.

Besides being involved in the start of the web, Jim was one of the pioneers of the interdisciplinary we call web science. Today, I talked with him about the origins of web science, how it has evolved over the years, and its relevance during the COVID 19 era.

​Jim, Welcome to this podcast.

Jim Hendler: ​Thanks, Noshir.

Noshir Contractor:​ I wanted to start by giving you an opportunity to share with our listeners what is meant by web science. What does that term mean to you and how did it get started?

Jim Hendler:​ Sure, great question.So, you know, the web was invented. And there are different dates you could use. 89 is where Tim Berners-Lee actually wrote the proposal for what became known as the web. By 90 and 91 there was code that was being shared. Really around 95, 96 he started see the take off of this and people becoming more and more aware that the web was there and that things were happening.

But it really started have much bigger impact by really it was the late 90s where you started to have search engines, you started to have monetization of things on the web. You started to have the social networks growing better now.

So, again, a lot of that was happening all around the same time from the 90s to, you know, the early 2000s and at some point, those of us who had been involved in the web and the web architecture, we’re starting to feel like understanding this thing was breaking up into many different pieces – you would go to one kind of conference and hear a lot of discussion about the mathematical underpinnings of networks and networks science, but some of that was really not about the web. The web was just one example of something much bigger.

And then you would go to another meeting and it might be, you know, the social impact of the web or legal aspects, because we were starting to see some of the early days of people beginning to worry about privacy and security on the web, things that are now much bigger issues. And then, there was sort of the engineering of the web. How do we build it? How do we do a better job of knowing that if we deploy a certain technology, you know, how will it get us involved, what impacts might have.

So, some of us started to believe that the web itself had become something we needed to understand. The web sits on top of a larger system known as the Internet, which has a lot of mathematical properties of its networking and, you know, a lot of the routing and things like that that gets talked about happened.

But the web thing was really sitting on top of that, and was its own entity that needed its own understanding.

And there were principles of how you design things that were standards groups, but there really wasn’t a lot of research going into the interaction between all these different pieces.

So some of us started to feel like maybe there was kind of a systems science to really understand the web in all its forms.

So around 2005, a group of us and Wendy Hall was one of the organizers. I was one of the organizers. Nigel Shadbolt was there. Danny Weitzner I believe was there and some other people. And a lot of other computer scientists and a few social scientists who were really trying.

It was an invite only workshop about 30 people, held in London, sponsored by the British Computer Society. And the goal was to say, you know, what did we really need to understand to understand the web? So a report was generated by that coming out of that workshop.

The report really got boiled down to a couple pages that got accepted as a perspectives piece by science magazine. So what most people call the start of web science was an article called “Creating a science of the world wide web.”

And it didn’t use the term web science in that article. It’s just that as people started to refer to the thing we had called for, which was something that would put the math, the engineering and the social on the same page, and get those different communities talking and working together. That’s when the term web science started to be used.

Noshir Contractor: ​And that article didn’t appear until August of 2006.

Jim Hendler:​ Yes.

Noshir Contractor:​ And so really, would you consider that as one of the dates that is most associated with, not the start of the web, but the start of web science?

Jim Hendler:​ Yeah so 2006 of that article, you know, people tend to try to find some definitive thing. That’s the start of something like this. So obviously, a lot of us were talking about stuff that now might be called Web science, but the term itself really grew out of that 2006 article and the first web Science Conference held in Athens was in 2007 based in part by the community started by in part by the community that come together around then.

Noshir Contractor: ​That you mentioned that as we got started on this. You were focused on a group of people, some of who were invited only at this event, and then subsequently unlike some other scientific interdisciplines, the web science decided to form a Web Science Trust. Can you tell us a little bit about why – what was the thinking behind the creation of the Web Science Trust, as opposed to, say, a learning society or some other division within another context?

Jim Hendler:​ Yeah, so it’s a good question and you know it’s a combination of design and accident, but what actually happened was the two e two leading institutions that had really been trying to create web science, sort of institutionalized at that point where MIT and Southampton University.

They created a joint statement to create something called the web Science Research Institute WSRI and very quickly, a few other organizations joined the University of Maryland where I was at that time was so, so I became the third school and there were basically five of us who were kind of leading things at that time.

As it started to grow. We started to create a network of laboratories, including your lab and others, and realized we needed something a more formal way of people to interact because something that was inherently interdisciplinary has a tendency to to coalesce around one of the disciplines, just you know that’s historically what’s happened.

And it becomes less and less interdisciplinary as it becomes its own discipline and we really felt that was the wrong thing that this needed to be, you know, I used the analogy, sometimes of climate science.

Right when you’re studying the climate, you need geologists and you need, you know, people who study the atmosphere and need these people to study the ocean and you need… but not everybody who studies the ocean is looking at climate, not everybody who studies the atmosphere is looking at climate and so it was the same thing with the web. We had people who are studying networks, but only some of them who really cared about the web. We had people who were looking at social impacts of growing communication networks, but some of them in particular we’re looking at the World Wide Web.

Now, in the past, you know, 15 years as we’ve grown the definitions of have slid a little bit about what is exactly the web and where the boundaries and things like that. But the Trust has really tried to be an entity that would help promote web science, help keep this network of labs going, helps make sure there was a conference and eventually a journal. So some of the things I’ve learners society does, but without really trying to create the way to learner society and be the kind of disciplinarity that tends to come with it.

So, again, web science tries to both be an entity that brings people together, but also an entity that doesn’t pull people out of the other things they do. So you know, some of them studies network and network science can also be a web scientist without there being any tension between them.

Noshir Contractor:​ It’s fascinating because it takes on the role of both helping to build a community, but also to curate that community intellectually and one of the challenges that I imagine you might have faced is the difference, if any, between those who think that what they are doing is internet science versus web science. Do you have any thoughts on that distinction?

Jim Hendler: ​You know there’s there’s been a lot of different terms that have a lot of overlap. So information science in the US was taking off around the same time and a lot of people were arguing that web science belong in information science. Other people were saying no, because information science really is sort of some schools doesn’t really include the computational or mathematical side of things. So, so, you know, same thing with internet science. There was a feeling that web might be too limited of a term. And frankly, I would love to see so called internet science and so called web science come further together.

But by in large, the desire to keep the social science piece wedded to the math and the engineering side has been tended to differentiate the websites approach brothers. That’s not to say no one else wants to do it, but I think the dedication to that as a sort of core value that we’re trying to bring people together across these different ways of looking at the web. And nowadays you know the mobile web. The, the big companies, the things that look at information and challenges that information, you know, again, they happen at a lot of different places and web science really would like to be an integral place that brings these people together.

Noshir Contractor: ​And I imagine that some folks also will be questioning whether the study that doesn’t happen, specifically on the web, but things are sort of migration into apps and different kinds of ways in which we are navigating this new world should or should not be included as spider web science?

Jim Hendler: ​You know, there’s always a tension in those kinds of directions for any interdisciplinary science, but I think that the goal has been open. The definition at a technical level of what the web is actually very different than what a lot of people think of. So a lot of people when they’re opening something on their browser. I’m sorry, opening something on their phone, not on their browser.

And looking at how you know the apps world aren’t realizing that it’s sitting on top of the web architecturally. So you know some of us like to think of it, as you know, I hate the word ecosystem, but I don’t have a better word.

That’s evolving and so you have parts of the web that moves one way and part moves the other. And again, part of what web science wants to do is not reject any of those parts and say, really, if we’re going to understand this thing we have to understand it as a system or systems as systems. We have to understand the interacting pieces, whether they are considered by a particular practitioner to be “web: or not.

You know, there’s a lot of overlap with work that’s done at the world wide web conference. But for example, some of the work there, really, is not particularly seen as web science per se because it’s technologies to enhance web products more than technologies that are really understanding the interactions happening on this huge network of information.

Noshir Contractor: ​Great. So 2006 when you wrote that article quarter that article and laid out some priorities. To what extent where those priorities focused more on seeing the web as an opportunity or web science as something that explores new opportunities versus focusing attention on potential concerns as the web became more and more prevalent. And while you think that, are there some areas where you feel that web sciences made the most progress, while others where you would like to see a lot more progress being made?

Jim Hendler:​ Sure, so you know, in that document and then not long after that we had something we produced we called it a manifesto, which may or may not be the right name. Something became a book on web science or a publication that you know later you joined for the second edition.

But we were really looking at trying to get thematically what this thing was what was happening, how it worked and and we always wanted to express both the positive and the negative.

Again as a science, you have to be looking at what’s happening. But again, part of part of what makes web science somewhat unique is he goal to bring together the people who are building it and can put in mechanisms to try to solve some of these problems with the people who may be studying the problems or the opportunities, trying to understand why does some things on the web, take off and “go viral”.

And here I don’t necessarily mean just like a video or something. But the whole use of the web for video, the whole how does that change the world when you know people couldn’t can video chat, rather than talking, in person or not, you know, again, how does that change from the phone network to the web when you have a web of information.

Not, you know, how to search work. But what a search does, what’s the impact of being able to find this kind of information.

As crowdsourcing Wikipedia, things like that grown, that’s become one of the things we’ve been trying to study and understand and that includes misinformation as well as information, right. So if you look at sort of the papers that have been presented at web science. In fact, some of the early best papers. One of them was how trolling on the web, right, influenced an election. This was actually the, Congress person from Boston from Massachusetts and it was interesting that that was one of the earliest studies of trolling and misinformation in an election long before it became part of the national election in the US and Brexit and things like that. So again, web scientists were really looking at these impacts in a very deep way.

Noshir Contractor:​ Yes because the idea of, for example, seeing the extent to which search became so important in the age of what was happening at the time.

I remember a remark made by someone that said that before search became a thing that the World Wide Web was like a library. With all the books strewn all over the floor and no easy way to do look for the right book and to be able to get to it. And I think, search was an example of something that really helped address an early challenge that was faced by the growth of information on that World Wide Web.

Jim Hendler:​ Right, to give you the counter example because, of course, it’s going to be a science, no matter what someone says, someone’s gonna disagree. But search also — as search engines took over, it became harder and harder to find the opposing opinion. Right. It’s hard to go to some of the major search engines now and say I’m searching for this but show me something very different. Show me…You know somebody who disagrees with this approach that.

So if you say this is the what most people believe. And I actually think, you know, and you’ve studied some of this that has that’s part of what creates information bubbles, because then the people who don’t believe some piece of that go off and create their own search structures and their own ways of doing things. So again, a lot of different ways of looking at how this plays out. And, you know, again, it’s so we used to say, you know, the sort of the metaphor of surfing the web had a connotation of a little bit of danger and serendipity. You might not end up where you were looking to get to search make some of that different.

And so people have been looking at how do we reimpose creativity and new kinds or search, how do we look at argumentation. So, you know, some of the exciting stuff happening in web science nowadays looks at some of the impact of these technologies becoming centralized and says, can we re-decentralized, can we find a way to put it back into the you know, away from the everything owned by a few big companies and much more back to the handsof the users that remains the tension we look at today and things like privacy privacy preserving technologies and things like that, which I’m sure will be topics for later podcasts.

Noshir Contractor:​ Absolutely. I think that I have a couple of closing questions and I’m going to wait and ask you a closing question on covid but fast forwarding to 2020 or 2019 domain, what are the areas where you feel that web science has made the most progress and what are areas where you don’t see as much progress or you would like to see more progress?

Jim Hendler: ​So I think where web science has made a lot of progress is helping to focus attention on I say really two different things. Several things have been really impacted by web science One is transparency and the whole open data movement.

So that was coincidence, with the growth of web science that was in part because a lot of the leading people in web science, including Tim Berners-Lee himself, were very involved in helping to try to get governments to open their data to make it more available to develop some of those technologies. 

I think also predicting some of the dangers. So early web science papers were already saying, you know, let’s look at privacy. Let’s look at security as companies on the web, grow, they’re going to be able to see our information to share it to track it you know as cookies came along.

So, you know, I find when I say things that a Web Science Conference that really bother some, you know, if I tell people in a normal setting that when you go on the web and you look at the price of something on a particular website, it may be different than, you know, someone else looking at it.

Because they’re using information about you to try to adjust the price, people are surprised. Web scientists aren’t surprised.

We’re exploring it. We want to understand both what are the algorithms that are being used, but also people who believe that that is problematic, how we might control it, things like that. So I think, you know, we, it’s more than we’re embracing exploring the problems. But of course, the web is a very fast moving thing.

I think the whole mobile web and app space that you talked about, you know, many of us view it still from a web development platform. Well, other, newer people come int web science are beginning to really look at those apps themselves but then you start getting into these boundary issues, right, if somebody has studied a particular Twitter phenomena is it or isn’t it and you know what we try to do is be very embracing right if the work is important and talks to it.

 Jim Hendler: That’s good. On the other hand, if it’s just a pure mathematical analysis of something happening in some different network. Then the paper that shows why that applies the web is going to be much more interesting than the paper that just says all networks have some feature.

So again, the boundaries are very hard to see, but I think that, again, the challenges of the web were something we embraced very early and are still looking at.

I think the opportunities become more apparent to people just as more people, you know, it’s just part of our lives.

I think the social impacts or something. So we keep trying harder and harder, bring in more social scientists

Particularly people who really can talk to the qualitative meets the quantitative to really try to, again, look at that triumvirate of yhe underlying math of what’s going on the engineering and building this thing because the web’s not a natural phenomena and the social impacts and policy impacts of that.

Noshir Contractor: One last question that we plan to ask everyone was appearing on the podcast during this pandemic time is by what is one thing that you personally believe by which the web is or was it could have been a real help during covid and or one thing I think the web has hurt society during the covid crisis.

Jim Hendler: Great question. So you know, so you and I are sitting here on opposite ends of the zoom channel and we could be on any of five or six of its competitors talking.

More people are working from home. Imagine the lockdowns that we had around covid and entire countries entire cities without the communication infrastructure.

And the communication infrastructure that provides the bits to move between things is the internet. And so, you know, it’s sort of hard to pull that apart. But the thing that lets people really interact at the information level that includes find each other for these videos that includes you know when I clicked on a link to open this zoom chat that clicking on the link and how that happens.

And the protocols that made that happen. That’s all web. So the web itself has really been absolutely instrumental in allowing communication to happen.

 Jim Hendler: You know, big international conferences that were canceled at the last minute were held online for this year, again virtually, and that virtual conference is made possible by the very technologies, we’ve studied where the negative comes in as the negatives of the web to get amplified and, you know, cultural differences, things like that impact, but, certainly in the States, we’re seeing the astounding growth of misinformation. The weaponization for politics of misinformation about covid. Significant amount of them are, you know, bots and trolls.

In fact, the largest network of pro-covid call it an anti-covid, whatever that means you cover your face or not those sorts of thing both have the same origin from the same trolling point which appears to be mostly trying to push divisiveness rather than push a particular point of view. So again, understanding how that works.

Understanding the math of that and being able to show people that will allow both people to understand what’s happening, I hope. But also, you know, engineers to understand what we might do to improve that situation. And what we can’t do.

Noshir Contractor:​ Thank you again, Jim, for taking time to talk with us about the history of web science and how it all got started because you have a very privileged position in order to share those insights with us since you were there at the time, it actually happened. And we’re also very involved in making it happen.

Jim Hendler:​ Thank you.

Noshir Contractor:​ Untangling the Web is a production of the Web Science Trust. Thanks to Carmen Chan for editing and technical assistance. I am Noshir Contractor. Thanks for listening.