Shout Future: February 2017

In this industry, it's a tired old cliche to say that we're building the future. But that's true now more than at any time since the Industrial Revolution. The proliferation of personal computers, laptops, and cell phones has changed our lives, but by replacing or augmenting systems that were already in place. Email supplanted the post office; online shopping replaced the local department store; digital cameras and photo sharing sites such as Flickr pushed out film and bulky, hard-to-share photo albums. AI presents the possibility of changes that are fundamentally more radical: changes in how we work, how we interact with each other, how we police and govern ourselves.

Fear of a mythical "evil AI" derived from reading too much sci-fi won't help. But we do need to ensure that AI works for us rather than against us; we need to think ethically about the systems that we're building. Microsoft's CEO, Satya Nadella, writes:

The debate should be about the values instilled in the people and institutions creating this technology. In his book Machines of Loving Grace, John Markoff writes, 'The best way to answer the hard questions about control in a world full of smart machines is by understanding the values of those who are actually building these systems.' It's an intriguing question, and one that our industry must discuss and answer together.

What are our values? And what do we want our values to be? Nadella is deeply right in focusing on discussion. Ethics is about having an intelligent discussion, not about answers, as such—it's about having the tools to think carefully about real-world actions and their effects, not about prescribing what to do in any situation. Discussion leads to values that inform decision-making and action.

The word "ethics" comes from "ethos," which means character: what kind of a person you are. "Morals" comes from "mores," which basically means customs and traditions. If you want rules that tell you what to do in any situation, that's what customs are for. If you want to be the kind of person who executes good judgment in difficult situations, that's ethics. Doing what someone tells you is easy. Exercising good judgement in difficult situations is a much tougher standard.

Exercising good judgement is hard, in part, because we like to believe that a right answer has no bad consequences; but that's not the kind of world we have. We've damaged our sensibilities with medical pamphlets that talk about effects and side effects. There are no side effects; there are just effects, some of which you might not want. All actions have effects. The only question is whether the negative effects outweigh the positive ones. That's a question that doesn't have the same answer every time, and doesn't have to have the same answer for every person. And doing nothing because thinking about the effects makes us uncomfortable is, in fact, doing something.

The effects of most important decisions aren't reversible. You can't undo them. The myth of Pandora's box is right: once the box is opened, you can't put the stuff that comes out back inside. But the myth is right in another way: opening the box is inevitable. It will always be opened; if not by you, by someone else. Therefore, a simple "we shouldn't do this" argument is always dangerous, because someone will inevitably do it, for any possible "this." You may personally decide not to work on a project, but any ethics that assumes people will stay away from forbidden knowledge is a failure. It's far more important to think about what happens after the box has been opened. If we're afraid to do so, we will be the victims of whoever eventually opens the box.

Finally, ethics is about exercising judgement in real-world situations, not contrived situations and hypotheticals. Hypothetical situations are of very limited use, if not actually harmful. Decisions in the real world are always more complex and nuanced. I'm completely uninterested in whether a self-driving car should run over the grandmothers or the babies. An autonomous vehicle that can choose which pedestrian to kill surely has enough control to avoid the accident altogether. The real issue isn't who to kill, where either option forces you into unacceptable positions about the value of human lives, but how to prevent accidents in the first place. Above all, ethics must be realistic, and in our real world, bad things happen.

That's my rather abstract framework for an ethics of AI. I don't want to tell data scientists and AI developers what to do in any given situation. I want to give scientists and engineers tools for thinking about problems. We surely can't predict all the problems and ethical issues in advance; we need to be the kind of people who can have effective discussions about these issues as we anticipate and discover them.

Talking through some issues

What are some of the ethical questions that AI developers and researchers should be thinking about? Even though we're still in the earliest days of AI, we're already seeing important issues rise to the surface: issues about the kinds of people we want to be, and the kind of future we want to build. So, let's look at some situations that made the news.

Pedestrians and passengers

The self-driving car/grandmother versus babies thing is deeply foolish, but there's a variation of it that's very real. Should a self-driving car that's in an accident situation protect its passengers or the people outside the car? That's a question that is already being discussed in corporate board rooms, as it was at Mercedes recently, which decided that the company's duty was to protect the passengers rather than pedestrians. I suspect that Mercedes' decision was driven primarily by accounting and marketing: who will buy a car that will sacrifice the owner to avoid killing a pedestrian? But Mercedes made an argument that's at least ethically plausible: they have more control over what happens to the person inside the car, so better to save the passenger than to roll the dice on the pedestrians. One could also argue that Mercedes has an ethical committent to the passengers, who have put their lives in the hands of their AI systems.

The bigger issue is to design autonomous vehicles that can handle dangerous situations without accidents. That's the real ethical choice. How do you trade off cost, convenience, and safety? It's possible to make cars that are more safe or less safe; AI doesn't change that at all. It's impossible to make a car (or anything else) that's completely safe, at any price. So, the ethics here ultimately come down to a tradeoff between cost and safety, to ourselves and to others. How do we value others? Not grandmothers or babies (who will inevitably be victims, just as they are now, though hopefully in smaller numbers), but passengers and pedestrians, Mercedes' customers and non-customers? The answers to these questions aren't fixed, but they do say something important about who we are.

Crime and punishment

COMPAS is commercial software used in many state courts to recommend prison sentences, bail terms, and parole. In 2016, ProPublica published an excellent article showing that COMPAS consistently scores blacks as greater risks for re-offending than whites who committed similar or more serious crimes.

Although COMPAS has been secretive about the specifics of their software, ProPublica published the data on which their reports were based. Abe Gong, a data scientist, followed up with a multi-part study, using ProPublica's data, showing that the COMPAS results were not "biased." Abe is very specific: he means "biased" in a technical, statistical sense. Statistical bias is a statement about the relationship between the outputs (the risk scores) and the inputs (the data). It has little to do with whether we, as humans, think the outputs are fair.

Abe is by no means an apologist for COMPAS or its developers. As he says, "Powerful algorithms can be harmful and unfair, even when they're unbiased in a strictly technical sense." The results certainly had disproportionate effects that most of us would be uncomfortable with. In other words, they were "biased" in the non-technical sense. "Unfair" is a better word that doesn't bring in the trapping of statistics.

The output of a program reflects the data that goes into it. "Garbage in, garbage out" is a useful truism, especially for systems that build models based on terabytes of training data. Where does that data come from, and does it embody its own biases and prejudices? A program's analysis of the data may be unbiased, but if the data reflects arrests, and if police are more likely to arrest black suspects, while letting whites off with a warning, a statistically unbiased program will necessarily produce unfair results. The program also took into account factors that may be predictive, but that we might consider unfair: is it fair to set a higher bail because the suspect's parents separated soon after birth, or because the suspect didn't have access to higher education?

There's not a lot that we can do about bias in the data: arrest records are what they are, and we can't go back and un-arrest minority citizens. But there are other issues at stake here. As I've said before, I'm much more concerned about what happens behind closed doors than what happens in the open. Cathy O'Neil has frequently argued that secret algorithms and secret data models are the real danger. That's really what COMPAS shows. It is almost impossible to discuss whether a system is unfair if we don't know what the system is and how it works. We don't just need open data; we need to open up the models that are built from the data.

COMPAS demonstrates, first, that we need a discussion about fairness, and what that means. How do we account for the history that has shaped our statistics, a history that was universally unfair to minorities? How do we address bias when our data itself is biased? But we can't answer these questions if we don't also have a discussion about secrecy and openness. Openness isn't just nice; it's an ethical imperative. Only when we understand what the algorithms and the data are doing, can we take the next steps and build systems that are fair, not just statistically unbiased.

Child labor

One of the most penetrating remarks about the history of the internet is that it was "built on child labor." The IPv4 protocol suite, together with the first implementations of that suite, was developed in the 1980s, and was never intended for use as a public, worldwide, commercial network. It was released well before we understood what a 21st century public network would need. The developers couldn't forsee more than a few tens of thousands of computers on the internet; they didn't anticipate that it would be used for commerce, with stringent requirements for security and privacy; putting a system on the internet was difficult, requiring handcrafted static configuration files. Everything was immature; it was "child labor," technological babies doing adult work.

Now that we're in the first stages of deploying AI systems, the stakes are even higher. Technological readiness is an important ethical issue. But like any real ethical issue, it cuts both ways. If the public internet had waited until it was "mature," it probably would never have happened; if it had happened, it would have been an awful bureacratic mess, like the abandoned ISO-OSI protocols, and arguably no less problematic. Unleashing technological children on the world is irresponsible, but preventing those children from growing up is equally irresponsible.

To move that argument to the 21st century: my sense is that Uber is pushing the envelope too hard on autonomous vehicles. And we're likely to pay for that—in vehicles that perhaps aren't as safe as they should be, or that have serious security vulnerabilities. (In contrast, Google is being very careful, and that care may be why they've lost some key people to Uber.) But if you go to the other extreme and wait until autonomous vehicles are "safe" in every respect, you're likely to end up with nothing: the technology will never be deployed. Even if it is deployed, you will inevitably discover risk factors that you didn't forsee, and couldn't have forseen without real experience.

I'm not making an argument about whether autonomous vehicles, or any other AI, are ready to be deployed. I'm willing to discuss that, and if necessary, to disagree. What's more important is to realize that this discussion needs to happen. Readiness itself is an ethical issue, and one that we need to take seriously. Ethics isn't simply a matter of saying that any risk is acceptable, or (on the other hand) that no risk is acceptable. Readiness is an ethical issue precisely because it isn't obvious what the "right" answer is, or whether there is any "right" answer. Is it an "ethical gray area"? Yes, but that's precisely what ethics is about: discussing the gray areas.

The state of surveillance

In a chilling article, The Verge reports that police in Baltimore used a face identification application called Geofeedia, together with photographs shared on Instagram, Facebook, and Twitter, to identify and arrest protesters. The Verge's report is based on a more detailed analysis by the ACLU. Instagram and the other companies quickly terminated Geofeedia's account after the news went public, though they willingly provided the data before it was exposed by the press.

Applications of AI to criminal cases quickly get creepy. We should all be nervous about the consequences of building a surveillance state. People post pictures to Instagram without thinking of the consequences, even when they're at demonstrations. And, while it's easy to say "anything you post should be assumed to be public, so don't post anything that you wouldn't anyone to see," it's difficult, if not impossible, to think about all the contexts in which your posts can be put.

The ACLU suggests putting the burden on the social media companies: social media companies should have "clear, public, and transparent policies to prohibit developers from exploiting user data for surveillance." Unfortunately, this misses the point: just as you can't predict how your posts will be used or interpreted, who knows the applications to which software will be put? If we only have to worry about software that's designed for surveillance, our task is easy. It's more likely, though, that applications designed for innocent purposes, like finding friends in crowds, will become parts of surveillance suites.

The problem isn't so much the use or abuse of individual Facebook and Instagram posts, but the scale that's enabled by AI. People have always seen other people in crowds, and identified them. Law enforcement agencies have always done the same. What AI enables is identification at scale: matching thousands of photos from social media against photos from drivers' license databases, passport databases, and other sources, then taking the results and crossing them with other kinds of records. Suddenly, someone who participates in a demonstration can find themselves facing a summons over an old parking ticket. Data is powerful, and becomes much more powerful when you combine multiple data sources.

We don't want people to be afraid of attending public gatherings, or in terror that someone might take a photo of them. (A prize goes to anyone who can find me on the cover of Time. These things happen.) But it's also unreasonable to expect law enforcement to stick to methodologies from the 80s and earlier: crime has certainly moved on. So, we need to ask some hard questions—and "should law enforcement look at Instagram" is not one of them. How does automated face recognition at scale change the way we relate to each other, and are those changes acceptable to us? Where's the point at which AI becomes harassment? How will law enforcement agencies be held accountable for the use, and abuse, of AI technologies? Those are the ethical questions we need to discuss.

Our AIs are ourselves

Whether it's fear of losing jobs or fear of a superintelligence deciding that humans are no longer necessary, it's always been easy to conjure up fears of artificial intelligence.

But marching to the future in fear isn't going to end well. And unless someone makes some fantastic discoveries about the physics of time, we have no choice but to march into the future. For better or for worse, we will get the AI that we deserve. The bottom line of AI is simple: to build better AI, be better people.

That sounds trite, and it is trite. But it's also true. If we are unwilling to examine our prejudices, we will implement AI systems that are "unfair" even if they're statistically unbiased, merely because we won't have the interest to examine the data on which the system is trained. If we are willing to live under an authoritarian government, we will build AI systems that subject us to constant surveillance: not just through Instagrams of demonstrations, but in every interaction we take part in. If we're slaves to a fantasy of wealth, we won't object to entrepreneurs releasing AI systems before they're ready, nor will we object to autonomous vehicles that preferentially protect the lives of those wealthy enough to afford them.

But if we insist on open, reasoned discussion of the tradeoffs implicit in any technology; if we insist that both AI algorithms and models are open and public; and if we don't deploy technology that is grossly immature, but also don't suppress new technology because we fear it, we'll be able to have a healthy and fruitful relationship with the AIs we develop. We may not get what we want, but we'll be able to live with what we get.

Walt Kelly said it best, back in 1971: "we have met the enemy and he is us." In a nutshell, that's the future of AI. It may be the enemy, but only if we make it so. I have no doubt that AI will be abused and that "evil AI" (whatever that may mean) will exist. As Tim O'Reilly has argued, large parts of our economy are already managed by unintelligent systems that aren't under our control in any meaningful way. But evil AI won't be built by people who think seriously about their actions and the consequences of their actions. We don't need to forsee everything that might happen in the future, and we won't have a future if we refuse to take risks. We don't even need complete agreement on issues such as fairness, surveillance, openness, and safety. We do need to talk about these issues, and to listen to each other carefully and respectfully. If we think seriously about ethical issues and build these discussions into the process of developing AI, we'll come out OK.

To create better AI, we must be better people.

February 18, 2017 No comments

After knowing the relationship between two variables we may be interested in estimating (predicting) the value of one variable given the value of another. The variable predicted on the basis of other variables is called the “dependent” or the ‘explained’ variable and the other the ‘independent’ or the ‘predicting’ variable. The prediction is based on average relationship derived statistically by regression analysis. The equation, linear or otherwise, is called the regression equation or the explaining equation.

For example, if we know that advertising and sales are correlated we may find out expected amount of sales for a given advertising expenditure or the required amount of expenditure for attaining a given amount of sales.

The relationship between two variables can be considered between, say, rainfall and agricultural production, price of an input and the overall cost of product, consumer expenditure and disposable income. Thus, regression analysis reveals average relationship between two variables and this makes possible estimation or prediction.

Definition:

Regression is the measure of the average relationship between two or more variables in terms of the original units of the data.

Types Of Regression:

The regression analysis can be classified into:
a) Simple and Multiple
b) Linear and Non –Linear
c) Total and Partial

a) Simple and Multiple:

In case of simple relationship only two variables are considered, for example, the influence of advertising expenditure on sales turnover. In the case of multiple relationship, more than two variables are involved.

On this while one variable is a dependent variable the remaining variables are independent ones. For example, the turnover (y) may depend on advertising expenditure (x) and the income of the people (z). Then the functional relationship can be expressed as y = f (x,z).

b) Linear and Non-linear:

The linear relationships are based on straight-line trend, the equation of which has no-power higher than one. But, remember a linear relationship can be both simple and multiple. Normally a linear relationship is taken into account because besides its simplicity, it has a better predective value, a linear trend can be easily projected into the future. In the case of non-linear relationship curved trend lines are derived. The equations of these are parabolic.

c) Total and Partial:

In the case of total relationships all the important variables are considered. Normally, they take the form of a multiple relationships because most economic and business phenomena are affected by multiplicity of cases. In the case of partial relationship one or more variables are considered, but not all, thus excluding the influence of those not found relevant for a given purpose.

Linear Regression Equation:

If two variables have linear relationship then as the independent variable (X) changes, the dependent variable (Y) also changes. If the different values of X and Y are plotted, then the two straight lines of best fit can be made to pass through the plotted points. These two lines are known as regression lines. Again, these regression lines are based on two equations known as regression equations. These equations show best estimate of one variable for the known value of the other. The equations are linear.
Linear regression equation of Y on X is
Y = a + bX ……. (1)
And X on Y is X = a + bY……. (2) a, b are constants.

From (1) We can estimate Y for known value of X.
(2) We can estimate X for known value of Y.

February 18, 2017 No comments

In the distributed age, news organizations are likely to see their stories shared more widely, potentially reaching thousands of readers in a short amount of time. At the Washington Post, we asked ourselves if it was possible to predict which stories will become popular. For the Post newsroom, this would be an invaluable tool, allowing editors to more efficiently allocate resources to support a better reading experience and richer story package, adding photos, videos, links to related content, and more, in order to more deeply engage the new and occasional readers clicking through to a popular story.

Here’s a behind-the-scenes look at how we approached article popularity prediction.

Data science application: Article popularity prediction

There has not been much formal work in article popularity prediction in the news domain, which made this an open challenge. For our first approach to this task, Washington Post data scientists identified the most-viewed articles on five randomly selected dates, and then monitored the number of clicks they received within 30 minutes after being published. These clicks were used to predict how popular these articles would be in 24 hours.

Using the clicks 30 minutes after publishing yielded poor results. As an example, here are five very popular articles:

popular article 1 — Figure 1. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission.

popular article 2 — Figure 2. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission.

popular article 3 — Figure 3. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission.

popular article 4 — Figure 4. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission.

popular article 5 — Figure 5. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission.

Table 1 lists the actual number of clicks these five articles received 30 minutes and 24 hours after being published. The takeaway: looking at how many clicks a story gets in the first 30 minutes is not an accurate way to measure its potential for popularity:

Table 1. Five popular articles.
Articles	# clicks @ 30mins	# clicks @ 24hours
9/11 Flag	6,245	67,028
Trump Policy	2,015	128,217
North Carolina	1,952	11,406
Hillary & Trump	1,733	310,702
Gary Johnson	1,318	196,798

Prediction features

In this prediction task, Washington Post data scientists have explored four groups of features: metadata, contextual, temporal, and social features. Metadata and contextual features, such as authors and readability, are extracted from the news articles themselves. Temporal features come mainly from an internal site-traffic collection system. Social features are statistics from social media sites, such as Twitter and Facebook.

Figure 6 lists all of the features we used in this prediction task. (More details about these features can be found in the paper, on which we we collaborated with Dr. Naren Ramakrishnan and Yaser Keneshloo from Discovery Analytics Center Virginia Tech, "Predicting the Popularity of News Articles.")

Figure 6. List of features used. Credit: Yaser Keneshloo, Shuguang Wang, Eui-Hong Han, Naren Ramakrishnan, used with permission.

Regression task

Figure 7 illustrates the process that we used to build regression models. In the training phase, we built several regression models using 41,000 news articles published by the Post. To predict the popularity of an article, we first collected all features within 30 minutes after its publication, and then used pre-trained models to predict its popularity in 24 hours.

Figure 7. Statistical Modeling. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission.

Evaluation

To measure the performance of the prediction task, we conducted two evaluations. First, we conducted a 10-fold cross validation experiment on the training articles. Table 2 enumerates the results of this evaluation. On average, the adjusted R2 is 79.4 (out of 100) with all features. At the same time, we realized that metadata information is the most useful feature aside from the temporal clickstream feature.

Table 2. 10-fold cross validation results.
Features	Predicted R2
Baseline	69.4
Baseline + Temporal	70.4
Baseline + Social	72.5
Baseline + Context	71.1
Baseline + Metadata	77.2
All	79.4

The second evaluation was done in production after we deployed the prediction system at the Post. Using all articles published in May 2016, we got an adjusted R2 of 81.3% (out of 100).

Figure 8 shows scatter plots of prediction results for articles published last May. The baseline system on the left uses a single feature: the total number of clicks at 30 minutes. On the right is a more complete system using all features listed in Figure 1. The red lines in each plot are the lower and upper error bounds. Each dot represents an article, and, of course, ideally all dots would fall within the error bounds. As you can see, there are many more errors in the baseline system.

Figure 8. Scatter plot of prediction results (May 2016). Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission.

Production deployment

We built a very effective regression model to predict the popularity of news articles. The next step was to deploy it to production at the Post.

The prediction quality relies on the accuracy of features and speed to obtain them. It is preferred to build this prediction task as a streaming service to collect up-to-date features in real time. However, this comes with a challenge. We have to process tens of millions of points of click data every day to predict the popularity of thousands of Post articles. A streaming infrastructure facilitates the fast prediction task with minimal delays.

Architecture

Figure 9 illustrates the overall architecture of the prediction service in the production environment at the Post. Visitors who read news articles generate page view data, which is stored in a Kafka server and then fed into back-end Spark Streaming services. Other features such as metadata and social features are collected by separate services, and then fed into the same Spark Streaming services. With all these collected features, prediction is done with a pre-trained regression model, and results are stored to an HBase server and also forwarded to the Kafka server. The Post newsroom is also alerted of popular articles via Slack and email.

Spark Streaming in clickstream collection

The Spark Streaming framework is used in several components in our prediction service. Figure 10 illustrates one process we use Spark Streaming for: to collect and transform the clickstream (page view) data into prediction features.

The clickstream stored in Kafka is fed into the Spark Streaming framework in real time. The streaming process converts this real time stream into smaller batches of clickstream data. Each batch is converted into a simplified form, and then another format the pre-trained regression model can easily consume. In the end, page view features are stored in the HBase database.

Figure 10. Clickstream processing. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission.

System in the real world

Washington Post journalists monitor predictions using real-time Slack and email notifications. The predictions can be used to drive promotional decisions on the Post home page and social media channels.

We created a Slack bot to notify the newsroom if, 30 minutes after being published, an article is predicted to be extremely popular. Figure 11 shows Slack notifications with the current number and forecasted number of clicks in 24 hours.

We also automatically generate emails that gather that day’s predictions and summarize articles’ predicted and actual performance at the end of the day. Figure 12 shows an example of these emails. This email contains the publication time, predicted clicks, actual clicks in first 30 minutes, actual clicks in first 24 hours, and actual clicks from social media sites in first 30 minutes.

Figure 12. Popularity summary email. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission.

In addition to this being a tool for our newsroom, we are also integrating it into Washington Post advertising products such as PostPulse. PostPulse packages advertiser content with related editorial contents, and delivers a tailored, personalized advertisement to the target group. Figure 13 shows an example of this product in action, in which an advertiser’s video on 5G wireless technology is paired with editorially produced technology articles. A member of the advertising team puts the package together and receives candidate editorial articles as recommendations to include in the package. These articles are ranked according to relevance and expected popularity.

Figure 13. PostPulse example. Credit: Shuguang Wang and Eui-Hong (Sam) Han, used with permission.

Practical challenges

We faced two main challenges when we deployed this service to production. First is the scale of the data. Each day, we process a huge and increasing amount of data for prediction; the system must scale with it, using limited resources. We profile the service’s performance in terms of execution time, and identifying that persistent storage (HBase) is a significant bottleneck. Whenever we have to store or update HBase, it is expensive. To reduce this, we accumulate multiple updates before we physically update HBase. This runs the risk of some potential data loss and less accurate predictions if the service crashes between two updates to HBase. We’ve tuned the system and found a good balance so updates are not too delayed.

The second challenge is dependencies on external services, which we use to collect various features. However, if these external APIs are not reachable, the prediction service should still be available. Thus, we adopted a decoupled microservice infrastructure, in which each feature collection process is a separate microservice. If one or more microservices are down, the overall prediction service will still be available, just with reduced accuracy until these external services are back online.

Continuous experiments and future work

Moving forward, we will explore a few directions. First, we want to identify the time frame in which an article is expected to reach peak traffic (after running some initial experiments, the results are promising). Second, we want to extend our prediction to articles not published by the Washington Post. Last but not least, we want to address distribution biases in the prediction process. Articles can get much more attention when they are in a prominent position on our home page or spread through large channels on social media.

February 18, 2017 No comments