Episode 11 Transcript | Untangling the Web

Ravindran Balaraman: In a country like India, the number of people who are active on the web far exceeds populations of most countries. But then here’s a significant fraction of our population that still doesn’t have access to the web and access to the services that are being provided on the web. So, this there is this digital divide.

Noshir Contractor: Welcome to this episode of Untangling the Web, a podcast of the web science trust. I am Noshir Contractor and I will be your host today. On this podcast we bring in thought leaders to explore how the web is shaping society and how society in turn is shaping the web.

Today I have the pleasure to welcome Ravindran Balaraman, who you just heard discussing the unique challenges people in India face accessing the Web. He is the Mindtree faculty fellow and a professor in the Department of Computer Science and Engineering at the Indian Institute of Technology Madras. He also heads the Robert Bosch Centre for Data Science and Artificial Intelligence at IIT Madras, which is the leading interdisciplinary AI research center in India and India’s first lab to join the Web Science Trust Network of laboratories from around the world. He co-founded the India chapter of the Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Dating Mining (SIGKDD for short), and he is currently the president of that chapter. His research is pushing the boundaries of reinforcement learning, social network analysis, and data text mining. And his work bridges the gap between theory and practice in machine learning. In 2019 he was instrumental in hosting the first Web Science Symposium in India. He was recognized in 2020 as a Senior Member of AAAI (Association for Advancement of AI) for his significant accomplishments within the field of artificial intelligence.Welcome Ravi.

Ravindran Balaraman: Noshir, thanks for having me on the podcast.

Noshir Contractor: It’s my pleasure. Thank you for joining us, I must say that I’m especially thrilled to have you as the first guest on this particular series that is joining us from the global south. I’m absolutely delighted to have your insights about what web science means and can do or can’t do in the emerging economies of the world. You have mentioned for example, that there are many India specific challenges that need to be addressed by web science. What do you think web science means in the context of countries like India?

Ravindran Balaraman: In a country like India, the number of people who are active on the web far exceeds populations of most countries. But then there’s a significant fraction of our population that still doesn’t have access to the web and access to the services that are being provided on the web, right. So, this there is this digital divide, which people talked about, when the IT services became more popular. Now, with the growth of the web, the society interactions are happening on the web, this kind of digital divide is getting exacerbated, it is getting much worse. Recently, a colleague of mine from our social sciences department, we have been looking at the impact of this worsening digital divide on the migrant population and in particular, their access to digital banking. So with the enablement of digital banking, there is so much more of our commerce, now, even in India, happens online. And there is a significant fraction of the society that is getting excluded from that. The migrant population, because they have now actually been, you know, transplanted into a slightly alien culture for them within the country. But they don’t want to use these web services that are available for the rest of the country. They don’t want to use them because they’re feeling even more alien. This is not within their realm of experience. One of the theories that we are posing now is that, maybe we should use techniques from AI, to make sure that these interfaces on the web that these people are getting access to reminds them of home, as opposed to having an impersonal voice that’s going to talk to them about, Okay, you want to do banking, and press this number or press that and then enter something here. And so can we have somebody talk talk to them in their local dialect.

Their portal to the web now becomes more like a slice of home. Given that very few countries have this kind of large internal migration of migrant population like India, it’s a problem that literally, we have to buckle down and start solving.

Noshir Contractor: This is really intriguing. Can you make a concrete example of what kind of migrant population you’re talking about, and what can be done to help them feel less alienated and more at home?

Ravindran Balaraman: So let’s take one concrete example. Like almost 80 to 90% of the construction workers in India are or people who are displaced internally, these are people who move in from a particular state in the north of the country called Bihar. And most of the construction workers in my state, my home state, which is the southernmost state in the country, come from Bihar. And it’s a completely different culture, not just language, from the way we dress, the climate, and the kind of festivals that we celebrate here, the food that is available to them, everything is different. So this is really alien country for them, and they tend to stick close to one another, right? And then you tell them that, okay, the government is offering you — no schemes — all you have to do is go online, you know, click a few buttons on your smartphone, all of them have smartphones, this is a surprise, all of them have smartphones, and they use that to connect with their families back home, right, and just call them or link with them on WhatsApp. So they are happy to do that. Not that they can’t get online, but they can’t integrate with a larger web community, mainly because they just want to use it as a conduit for connecting back home. My colleague in the social sciences department has been doing a lot of study on migrant populations within India. And so we are drawing on the insights that he has looked at, from their assimilation into local society, and then trying to look at how that affects their assimilation into the web. And then the insights that we have looked at is that the web actually gives us an opportunity to give them a slice of home, If you can tell them, okay, all your interactions with the web can happen in the local language and, and then you will log into a portal, and then it starts greeting you with local functions, local festival chat, asking you about your parents and stuff like that. So that’s the kind of idea that we are looking at. But that’s a solution that has to come from India.

Noshir Contractor: You touched on something at the start that I want to go back to you mentioned that as a result of the digital divide, difference in access has been exacerbated recently. And I want you to tell us a little bit more about the extent to which you think the presence of the web has contributed to this digital access divide. And or the extent to which AI is now becoming so permeated on the web is either mitigating or exacerbating these digital divide issues that you touched on.

Ravindran Balaraman: Almost every web service that you see online,? Whether it is like online communities like Facebook, or professional communities like LinkedIn, or whether you’re looking at services like Amazon or Google,, everything is strongly infused with AI. This enablement of AI is essentially making, you know, people rely more and more on the services because they are so much more convenient. Companies, because they are looking at where the bulk of their revenues are coming from, are tending to move more online. And so it makes it harder and harder for people to get services locally. So those who are not online are actually getting lesser and lesser services.

So it is certainly exacerbating the divide. Really, right now we are looking at how to make you know, the online access easier for people.

Noshir Contractor: It’s depressing in some ways to hear you say that AI might actually be exacerbating the divide, but you’re also looking and exploring at ways AI can be deployed to mitigate some of these access and divide issues. Can you give a specific example of something that is happening in India, that gives you hope?

Ravindran Balaraman: Languages is an important thing, right. So the most of the interfaces now have improved tremendously in India. A lot of companies are actually investing money in India, to build now local language interfaces. And my mother tongue is Tamil and I can talk to my phone in Tamil, and it does a perfectly fine job of transcribing it. Even if I give you a Tamil keyboard, right? So in fact, anyone who has tried using a Tamil keyboard knows that it is much, much harder to use than than the English keyboard. I would prefer typing in English than typing in Tamil, but I would love to talk in Tamil than in English. I can see more people getting integrated because of that.

I still think AI is at the end of the day a technology, right, it’s up to us to figure out how to use it. And and that is a stronger awareness among the government as well as among some of the bigger you know, enterprises.

We have to actually start providing all the services in a more accessible manner. And that realization now taking ground.

Noshir Contractor: You point out a really important issue that we tend to take for granted in many of the developed countries. In India alone, according to the census of India in 2001, there were 122 major languages in one country, and 30 of these languages were spoken by more than a million native speakers. So what you just described, the technology of using ways of translating these across languages, really helps connectivity on the web in a way that we take for granted when we speak one dominant language in the West. Can you talk a little bit about the ways in which the study of artificial intelligence has in and of itself changed as a result of the web? I remember going back several years when the initial Dartmouth studies were coming together to coin the term artificial intelligence, ai was mostly seen as a rule-based system where you would provide certain rules and certain kinds of reasoning systems. And today, that seems somewhat antiquated, or is it?

Ravindran Balaraman: Oh, yeah, that’s a huge debate. AI seems to go through these phases, right? So while, at one point of time, they say, it is all about logic, reasoning, rules, and inferencing on it. And then the next point of time, we say, Oh, no, throw out everything, you have to learn everything from data, learn from scratch, it’s all about statistics. We are seeing a strong swing towards the data driven statistical approach to AI. And part of the reason is the web. So it has been both something that really helped AI grow, because it’s giving you huge volumes of data. Not only it’s not only just giving you data, right, but it’s also giving you data with tags on it, because people are so good at labeling what they are putting out on the web., Because everything is becoming more and more digital, that data is getting readily digitized.

Some of the techniques that AI is using now has been around for a couple of decades, if not longer. They couldn’t succeed because they didn’t have this kind of volume of data that the web has enabled us to gather rights and so that way, the web has had significant influence on on the growth of AI.

Same time, I also have to say that the web also has caused us to kind of topple over and do things in a not so casual manner. Because if you look at some of the latest AI systems built completely on web data, you kind of see that they also tend to mimic the significant biases and prejudices that people bring to their writing things that they post on the web. And if you don’t do a capsule curation of what the data that you’re getting from the web, you’re going to systematize the biases by putting it into a machine. And it actually makes it easier for people to make the argument that humans can be biased the machines can’t be. But then what they fail to sees that the machine is going to be biased because it’s digesting the biased data that the people are putting out on the web.

So in some sense, it was great all that the web did. And it really gave a quantum boost to what AI was doing. But we are coming to a point where we have to start thinking very carefully about how we are going to take advantage of the data on the web.

Noshir Contractor: So Ravi, one of the examples that got a lot of attention in the US at least, was the fact that exactly as you said, if you use AI to screen job candidates, then these AI systems will reproduce the same biases in terms of gender, and underrepresented minorities in terms of interviewing and screening for job opportunities, etc. And one of the issues that raised was that very often these kinds of AI techniques, give you a result, but don’t necessarily explain how and why they got those results. Some of my friends joke that what AI lacks is a “why” button. And that if the if AI gives a result, you should be able to press a button that says why, and this raised the whole issue of explainable AI. Can you talk a little bit about whether you see that as helping address the issue and the concern that you just raised? But also, how far are we from being able to have explainable AI?

Ravindran Balaraman: So I strongly believe that before AI, can be truly, you know, let out free in the wild. We need to solve the explainable AI question. So in fact, the job screening thing was something that was pretty, obviously, AI going wrong,? But then there are a lot of subtle ways in which AI is influencing our behavior,? In fact, if I go online, right, so my phone starts recommending these stories for me, then it’s going to start coloring my view of what kind of stories are going to see, that’s just because the AI system is learning this, and then it’s putting those out. So it very quickly customizes it for your preferences.

We need to have something similar to the ”why” button. So what people do nowadays as explainable AI is to say that, oh, you asked me why I said that particular image is appropriate for for your search, right? I want to see a football match. And then it shows me a picture of a football match. And it might say something like, oh, that here at this top right corner, there is something that, you know, that caused me to make it into a football match. So it can’t even tell you that okay, well, I think it’s a football match because there are like 10 people here and there is one guy carrying a ball. Iit basically says okay, that is this part of the image, which makes me think that it is a football match. That’s certainly not a satisfying notion of explanation for people. So we are quite quite away from getting to explainability, as humans understand explainability I’m not even sure how soon we will be able to get there.

But, this is what I always tell people. You don’t know how a motor vehicle runs, when you don’t know the details of an internal combustion engine, but you’re happy to drive a car. Right? So if you can, The reason you’re happy to drive a car around is because he knows that there is somebody at least back there who understands and has done all the testing and everything for you. If you can come to a point where I can say that, I know why AI this, me being an AI expert, right? As long as I can say that, okay, I understand the explanations for AI is putting out and I’m happy to certify that AI is doing the right thing. The general public just had to accept it — okay, it’s come with a certification from AI expert, that they understood what it is doing. But if you’re going to say that, it has to go to a point where the general lay public, the end user is going to understand completely what the AI is doing, I think there’s still ways off from that.

Noshir Contractor: To what extent do you think that AI is enhancing trust on the web, or undermining it for the lay public?

Ravindran Balaraman: I mean I can tell you what I see around me, like at least in a large fraction of the Indian society, right? So we, unfortunately, tend to trust the web too much. The latest WhatsApp forward is taken as gospel. That’s mainly because the forward comes from a person that they know. And therefore they transfer the trust that they have of the person to the message that that’s been sent through them as well.

So the web in some sense really worsened the impact of rumors and things like that, because you have a verifiable media source that sent you the information. And we tend to kind of ascribe the same trust to that that piece of information as well, right? Even though the person who forwarded it to you, might not have known where the message came from. When we did get a news from newspapers and things like that, that mean, there is at least the hope that appropriate research has been done before things they put on print.

And I’m not sure whether AI is still playing a role here in terms of making this worse or better. But I think AI can play a role in making things much, much better in terms of,attaching provenance, or at least doing a very, very quick analysis of, you know, the consistency of information on the web. The biggest challenge in fact checking all the information that floats out on the web is you don’t really know what the ground truth is. And the rate at which information is generated on the web, you can’t also go after the ground truth, right. So at least AI systems operating at scale can verify the consistency of the information that’s out there is there are like 10 people saying one thing and 10 people saying something completely different, then at least you can say that, hey, look, I don’t think this is right, because there’s just too many different opinions about this. And everybody is also working on this kind of fake news vector, and so on so forth. But who’s to say your news is fake?

Noshir Contractor: In the past year, in particular, with the pandemic and the other global reckonings, there has been heightened focus on social justice issues. But I want you to talk a little bit about what social justice means, specifically within the Indian context. And to what extent does the societal interplay and impact of AI and web have for social good in India?

Ravindran Balaraman: Throughout the country, the whole notion of social justice is very strongly embedded,? In terms of opportunities, and jobs, and in academics and everywhere. It’s significantly different from state to state, there are places where this kind of social inequities are much more pronounced. It could be that the same community in the society is discriminated against in one state, but not in another state. It’s a very, very, very complex dynamic within the within the country. So it’s not clear how we would, you know, build AI systems that are uniformly fair across the entire country.

And again, sort of social good is concerned,? So there are various issues that people have looked at which they build solutions for in the west or in other countries. It’s kind of we struggle with implementing in India, because just implementing a system that would work for a million users alone, even though it will help the million users is grossly unfair to the Indian population.

Noshir Contractor: Can you explain more of why it’s unfair?

Ravindran Balaraman: It’s unfair in the sense that which million are you going to deploy to? Right, so who do you choose?Of course, there are a whole bunch of other factors that are going to come into play in terms of, to which fraction of the population do you have access that you are able to deploy your system to? There’s a whole bunch of other factors that are going to come into play.

We really need to figure out a way to scale it much, much larger, a couple of orders of magnitude larger than what we can do right now with our systems in order to make it truly country. Countrywide deployable.

Noshir Contractor: It sounds what you’re describing is a scaling problem. Help me understand why scaling is such a challenge.

Ravindran Balaraman: Well, let’s say that I’ve developed a system that tells me that, Okay, here are people with a certain medical conditions and, you know, they are having difficulty, you know, keeping to the drug regime, and you have to do some intervention to help them. Now I come up with a system that can look at analyze a million people, and then filter out, 1000 people who need this kind of intervention and then I can actually put, you know, like, healthcare workers, who can go help these 1000 people, right now I scale it. But now suddenly, I’m looking at 100,000 people who need this kind of intervention. It’s just not a question of computation being hard, it’s a question of actual deployment in the field, that makes it much harder.

Noshir Contractor: So the challenge is not just in the technology, and the web might help us identify those who are in need. But that still begs the question of how are we going to reach all those people in any physical, tangible way to provide the need that the technology has helped identify what they need. In closing here, one of the questions that we’ve been asking our guests is that as we have been going through 2020, and now into 2021, we’ve been dealing with obviously, the pandemic as well as many global reckonings sociocultural nature, political in nature. And I was curious to get your take, specifically, from an Indian vantage point, on how you think this period 2020 and 2021 would have been different, for better or for worse, without the web.

Ravindran Balaraman: I can’t imagine 2020 without the web. So we literally lived off the web, not only was I working on the web, I was meeting friends, having I mean, everything right, so I just can’t imagine how we would have survived 2020 without the kind of online work and online meetings that are happening. I strongly feel that things would have been for the worse in the last year without the web. You might have as well thought of how would they have gone through 2020 without electricity?

Noshir Contractor: Yes, indeed, yes, it’s become a utility that we take for granted in most cases now. Well, I want to thank you again very much, Ravi, for taking time to talk with us and specifically for giving us insights into how web science has a different lens when seen from the context of the developing world, in this case, particularly from India, etc. And we’re just delighted that IIT Madras, the Indian Institute of Technology, Madras, became the first member of the web science trust network of laboratories from India and you were certainly instrumental in making that happen.

And I wish you and your colleagues the very best in helping advance the notion of web science in developing countries etc. And we will be looking forward to hearing more about those insights in the years to come. So thank you very much again.