Episode 25 Transcript | Untangling the Web

Nigel Shadbolt: The reason the web was taking off a scale, the reason we have these extraordinary constructs emerging, like the blogosphere, was that human beings were involved — human beings who were incentivized to participate, to share and join information together.

Noshir Contractor: Welcome to this very special 25th episode of Untangling the Web, a podcast of the Web Science Trust. I am Noshir Contractor and I will be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society, and how society in turn is shaping the web.

My guest today is Professor Sir Nigel Shadbolt, one of the founders of web science. You just heard him talk about why studying the web goes beyond the technical. Nigel is Principal of Jesus College and professorial Research Fellow in Computer Science at the University of Oxford. In 2009, he was appointed, along with Sir Tim Berners Lee, as information adviser to the UK Government. This work led to the release of many 1000s of public sector data sets as open data. He is the chairman and cofounder of the Open Data Institute, and a founder and chief technology officer of the ID protection company, Garlic. He is a fellow of both the Royal Academy of Engineering and the British Computer Society, and was knighted in 2013 for services to science and engineering. Nigel has researched and published on topics ranging from cognitive psychology to computational neuroscience, and the Semantic Web. Welcome, Nigel.

Nigel Shadbolt: Thanks, it’s great to be here.

Noshir Contractor: Take us back to what prompted you and your colleagues to come up with the idea of taking what was then a relatively young web, and recognizing the importance of creating a discipline called web science.

Nigel Shadbolt: I’d began my career a long time ago, my PhD was in artificial intelligence University of Edinburgh in the 1980s. And then I’d spent 15 years building an AI group within a pond of psychology. I’ve really found that extraordinarily enriching, you know, to understand the basis of human cognition to understand, if you’d like, the basis of the existence, proof of intelligent systems. And toward the end of my time, at Nottingham, I got a series of PhD students who had been looking at this new explosive area of the web. And the web appeared as this extraordinary construct that suddenly brought data at scale together. So that was a turn for me.

And when I moved down to Southampton and joined Wendy Halls group, we were very much seeing the web as a decentralized data asset, as well as an extraordinary concept for combining human ingenuity. We can come to that later.

That project that we worked on together, through Directed Word, was called advanced knowledge technologie — the act project — that bought leading universities in the UK together to look at how we can exploit this emerging, construct — the web, and tools from knowledge engineering and elsewhere, machine learning, as it was then, to try and understand data and knowledge at scale. And that brought me into contact with Tim Berners Lee. It was our interest around the Semantic Web that really brought us together but we, in contact with people like Jim Hendler, who I’d known earlier, because he also had prior history in artificial intelligence, and Danny Weitzner. So we were sat there sharing ideas around the Semantic Web the challenges therein. and the more we got into that, the more we felt there was an itch that needed to be scratched, which was all around this idea that, too often the challenge became reduced to one simply have technical architectures, where of course, in fact, the reason the web was taking off a scale, the reason we have these extraordinary constructs emerging, like the blogosphere, was that human beings were involved — human beings who were incentivized to participate, to share and join information together. And as we shared our experiences — Danny with a background in law, Jim with a background in AI and cognitive science, somewhat like myself, Tim and Wendy, we realized that there were all sorts of aspects of what we were trying to understand in the web that would never be solved, by simply appealing to the technical standards of web servers or web browsers.

So this, this immediately suggested, a wider interdisciplinary need. And we had always been interested in convening larger groups to discuss these wider issues of the impact of the web. And we struggled for some time to think about whether this needed to be convened at all, or would it just simply take care of itself.

That is certainly the case that we were aware that there were cognate disciplines. But the unique phenomena that the web presented us with was, for the first time, structures at scale that demanded to be explained and understood in their own terms. And we started with examples like the emergence of search engines at scale, like Google, the emergence of the blogosphere, the emergence of the beginnings of those social media platforms, the emergence of large collaborative activities like e-science, and I think we sat down and worked out that we wanted to persuade people that there were scientific questions that sat at the center of this intersection of methods that demanded their own singular attention.

Noshir Contractor: One of the things that you touched on briefly was the Semantic Web. For those who may not be familiar with that phrase, how does the Semantic Web distinguish itself from the web itself? We know the web as a set of web pages in its most primitive form that link to one another. But Semantic Web goes beyond that.

Nigel Shadbolt: It originates out of this really interesting idea, that if you could take some key ideas that were around in artificial intelligence and knowledge representation at the time, and distribute it at scale in the web, there was this notion that a little semantics went a long way.

Now, what did that mean? It meant that, for the first time with the web protocols, we had ways of persistently pointing to objects of interest in the web, either concepts or relations, things in the world, things in the cyberspace, we can argue about what those objects were. But they could be referenced, they could be dereferenced, with a URL, you click on a link, you get something back. So how could you think of the web as a linked graph of connected structure and content? We’re very used to thinking in those terms now, but back then,we didn’t have this way of thinking. So one of the early efforts was to generate a semantic markup language that went beyond, for example, what people understood at the time in HTML. And so the idea was to develop languages or ontologies that could be machine processed to describe the content, the semantics, the meaning of the content on the web.

Noshir Contractor: So can you give us a use case example — If a page has semantic markup, what could it do differently or more effectively, than what a page that simply has HTML?

Nigel Shadbolt: The Semantic Web of the early 2000s was a really rich place. So it wasn’t widely distributed enough. But there were, for example, ways of linking to academic texts. In fact, we again, we see the legacy in the way that bibliographic content is linked and threaded nowadays — there are controlled vocabularies for publications, there are controlled vocabularies for certain sorts of work we do, we, as researchers have our own identifiers, that describes something about the world we inhabit. And the vision was to try to do that much more at scale. So you know, and these experiments, these deployments still exist.

I would treat the web as a kind of distributed database. And I could send queries out to the web to find and collect information about all the conferences, the academic conferences in a particular subdomain, who were the key speakers. And that could be interrogated directly off the markup languages and the databases representing the markup languages in those pages.

Noshir Contractor: All of a sudden, I feel that what I’m able to get from the web today pales in comparison to what you’re describing, we could be getting from the web if you’re using these kinds of semantic mark-up languages.

Nigel Shadbolt: (Laughs). I think many of us wondered, yeah, if it would have become widely distributed — and it’s about network effects — you could get really powerful affordances. And for a while sites that were originally marked up using semantic web standards were extremely successful. The BBC, for example, ran its Olympics using this markup format, it ran its natural history programs, with a whole set of semantic web annotations that allowed you to literally query the content behind the web pages behind their great natural history programs.

I mean, some people think that there are important elements that have persisted. But the full blown inference scale across the web. I think one of the things that got in the way is that the perfect was sometimes the enemy of the good and the standards that were being promoted spent far too much time worrying about detailed niceties.The original web succeeded, because in a way it allowed things to be a bit scruffy around the edges, you know, there’s that great phrase to let the web scale, let the links fail. Pragmatism is always an important feature, I think, in understanding the different forces at work on a web at scale.

Noshir Contractor: You’ve just talked about one example of a challenge. Where in general, in the field of web science, have you seen progress? And where do you see continuing challenges for the next decade of websites?

Nigel Shadbolt: So I think one extraordinarily powerful area (is) the whole understanding of the network structure and topology of the web. And I remember in the original article, an article that Tim and I published in Scientific American, and then again, when we published in the ACM communications, we knew a use case would be understanding how to extract insight from the web graph. I think that’s been a tremendously powerful success. I would also say that the push for a certain sort of openness around the underpinning data that was the resource the one of the key elements of the Semantic Web. It was no accident, in a way that a number of us who were involved in that Semantic Web effort also became involved in the open data movement, because the key to success at scale has been open resources that everybody can exploit. And the greatest example of that in the earliest of the web was effectively the Google phenomenon, you know, Google became the extraordinary organization, it is, off the back of open data, crowd-sourced effort, you know, humans making links that expressed their interest and relevance in content.

Noshir Contractor: You spoke about this example of how humans were crowdsourcing the development of links across the web. That is one of the early examples I imagine, of what you have written about in terms of social machines. Tell us what social machines mean to you and what you have been doing and what you’ve learned from that in terms of the web?

Nigel Shadbolt: When we launched the web science initiative, back in 2006, when we were kind of thinking about that, as an enterprise. We were very aware the confluence of challenges, the deep synergies that existed between disciplines was something that we needed to understand. But of course, it was always understood that the web worked because it connected people at scale. And people are extraordinary information processers. There, of course, we have all the richness of our own humanity connected at scale. And when Tim wrote his book, Weaving the Web, he made a reference in that book to the concept of the social machine as being a world in which the machines did all the kind of routine boring stuff and that allowed humans to flourish. The truth, of course, decades later is somewhat more complex. Some people worry that it’s the people being given the boring tasks to do — why aren’t those things fully automated by our machines.

But what we do see in a social machine is this intermix of data assets, linkage, algorithms at scale on the web, and human cognitive capacity and that interpenetration of machines, and human problem-solving at scale defines a social machine. Now some of them are realized very simply. So the social media platforms, which link people together — and largely, it’s the linking and sharing of experience and moments that define them — when they began had very little in terms of fundamental processing of the content of those interactions. As time has gone on, the amount of inference can be drawn over our interactions, the amount of additional services that can be woven into a web at scale, from query answering to, speech recognition, through to photo recognition, there is so much now that machines are doing to organize and manage our own information that the social machine construct is very helpful.

It reminds us that really quite complex phenomena are made up of components of quite simple interactions, you know, likes and preferences, linkages, assemblages, aggregations. and in what we’ve done in the past, me and my team and others, in defining how to provide a classification — a taxonomy of different kinds of social machines, how to understand their characteristics — we see a spectrum from highly routine automated forms, various forms of citizen science will count through to much more effectively, creative exploratory tools.

Noshir Contractor: What would be one of your favorite social machines that most people might not have heard of?

Nigel Shadbolt: Well, I don’t know that wouldn’t have heard of, but one that I admired from the outset was a particular crowdsource platform, Galaxy Zoo. These were astronomers, who didn’t have enough fundamental research funds to spend on software engineers to build them automatic classification and recognition software. And they had all of this data coming in from the sky surveys, endless numbers of pictures of nebula and stars, etc. and not enough machine processing to classify it.

And what they did in that work was provide a platform that allowed people to participate and train them and induce them to be able to classify objects of interest. Human recognition and visual system is extremely powerful at categorizing and recognizing subtle distinctions. And still very often best in class at recognizing differences in equivalences.

And so we had millions of images being processed by hundreds of 1000s of volunteers, who ripped through this and began to make actually individual discoveries as well. Famously, participants in this citizen science effort are featured as authors on scientific papers of newly discovered astronomical phenomena. And that’s a lovely example where, again, what started out as a necessity for the scientists became a valuable resource in and of itself.

It introduced a wide community to the challenges of astronomy, it solved the problem for them. And along the way, they learned something really interesting, which is: allow the people to participate. Because originally they had allowed a great deal of in exchange or chat between users of the platform. Through time they realized that the side channel conversations, some of these volunteers were becoming experts in their own right at forms of exotic phenomena that the astronomers were either too busy or hadn’t noticed themselves.

Noshir Contractor: Which brings us to some of your more recent research, you made reference to the fact that social machines started out as computers doing the boring part and allowing the humans to focus on more the creative part. And you immediately gave a caveat that some people think that that might have flipped. The new work that you’ve been doing, which is called human-centric AI. To what extent are you concerned that it will stay human- centric?

Nigel Shadbolt: If we go back to our concerns, right back, when we began the website effort, it was very much from the outset about recognizing the intrinsic value and worth of the human element, you know, in all of this. And in an age of a resurgence of AI and algorithmic decisionmaking, new powerful methods being deployed, the concern is, do we retain and confer the values that matter?

We want, essentially, to imagine building systems which augment us and don’t oppress us. And I think that’s why you see what some people call the renaissance of ethics in scientific areas where the concern is the maintenance of human values, and certainly AI ethics has huge amount of attendant work around it.

And in my group, that materializes as concerns around choice — do we as individual consumers and citizens actually have effective choice when it comes to how our data is used? How our data is actually analyzed and aggregated? How can I effectively opt out? How can I exert more self determination? That’d just be one example. The second would be, do we think hard enough about age appropriate design about how as humans develop, grow up, their sensibilities, the ability for agency changes, and persuasive design methods, which is all about clever software engineers working out how to put the sweet spot to get the kid to click, we got to think hard about the pros and cons of all of that.

Noshir Contractor: You know, one of the things that has changed since you first began to study web science, is the ubiquity of data. Initially, people were using it on desktops and creating a certain small category of websites and so on. How has the ubiquity of data and its impact on the use of AI changed the way we think of the web and web science?

Nigel Shadbolt: I think it’s a fundamental change. I think you’re absolutely right. In a way, it was the game changer for AI. I think AI, as a field, got the web quite late. We were busy building knowledge base systems, we were busy very much with this kind of desktop or best server base model of our knowledge assets. And then suddenly, connectivity was out the box.

And it used to be a very significant effort to arrange and integrate your data assets together. And data curation was a huge challenge– suddenly, we have billions of pages of English text to analyze, billions of images, so on. So that’s been a game changer.

Of course. It’s introduced new classes of concern, which is at scale, modern algorithms are extremely data-hungry. And do we know enough about the characteristics of data to understand that the outputs – the classifiers, the decision makers — are giving results that are representative? Well, they may be represented with the data that’s been collected. But is that data even though it’s at large scale representative, the problem you want to solve? And so I think we’ve become much more aware in data science of the need for understanding the qualities and characteristics of good and effective datasets for training.

There’s a considerable concern around now that the data assets themselves, how can we guarantee that they have not in some form or other been tampered with? How can we authenticate them. And my work with the Open Data Institute is very much now around things like data assurance, we talk about data institutions, new ways of putting governance structures. And again, it’s not just the technical; we need technical architects to deliver the web at scale. But we also need institutional architectures to make sure that data is held and governed ethically and responsibly.

Noshir Contractor: Nigel, you were there at the very onset of both the web and web science, certainly. Where do you see the field of web science going? Unlike some other fields, the early stages of web science were nurtured by the Web Science Trust, by the Web Science Research Institute before that, and that set up a trajectory that was somewhat different from perhaps the launch of other disciplines. How do you see that has shaped web science? And where do you see it going now?

Nigel Shadbolt: At the time, we felt we wanted to use the convening power we had to draw attention to urgent research questions. In some sense, the questions were sufficiently urgent that they were going to get attended to. We thought it was important that there was a framework and in fact, in trying to work out how we should be as broad-minded as possible when it came to methods and methodology and techniques, we spent quite a lot of time convening groups from other disciplines together, we spent a lot of time imagining what curricular could be that weren’t just about network science, for example, they went broader than that.

So the question is, how do you stop it about being everything? You know, how do you provide a practical, pragmatic solution? And I think that the problems that we we sought to understand are still problems that we are seeking to understand. We have a better understanding, but I wouldn’t say we have perfect understanding. I remember that particular meeting, we tried to imagine what the grand challenges for our field were and with the ubiquity of data and the power of computing, and the developments in cognate disciplines, and just the sheer amount of development that have been in network based analysis and graph based databases, for example, knowledge graph work, has provided for a real acceleration of work in that field. And I think we could take a number of areas and say, that similarly, was a topic that web science called out, people were contributing to it, and it has succeeded through time, developed and matured. It doesn’t worry me too much that people necessarily say, “I want to put the label web science on this.” We often see in the development of subjects that field labels come and go. And indeed, what remains are the questions and the methods that have been put in place.

And I think what we would still argue is fundamentally important to web science is to be inclusive and admit and embrace diversity of disciplinary work. For me, the danger signs are always when people begin to patrol the boundaries of their discipline in a way that becomes exclusionary.

Noshir Contractor: Well, Nigel, you’ve been an incredible champion of this interdisciplinary work. And I suppose it comes somewhat easy to you, given that you yourself have had an interdisciplinary background, you’ve been interested in computer science in AI and philosophy and psychology. And so it makes sense, Nigel, that someone with your own interdisciplinary background would be championing for exactly that in the field of web science. And we’ve all been the beneficiaries of that. So thank you, again, Nigel, for what you’ve done to help advance web science. And certainly thank you very much for joining us today to share some of your ideas and your concerns.

Nigel Shadbolt: It’s been a great pleasure, Noshir — very, very good to talk with you.