Algorithms writing news stories? How unambitious

Are algorithms the future of news writing? Wired had an interesting article on that topic last week, focusing on a company called Narrative Science that is already doing it. Here’s the excerpt Wired provides from a Narrative Science story:

Friona fell 10-8 to Boys Ranch in five innings on Monday at Friona despite racking up seven hits and eight runs. Friona was led by a flawless day at the dish by Hunter Sundre, who went 2-2 against Boys Ranch pitching. Sundre singled in the third inning and tripled in the fourth inning … Friona piled up the steals, swiping eight bags in all …

I know as a writer I’m expected to either cower in fear or boast that no algorithm can ever spin prose like mine. But I had a totally different reaction. At the point where algorithms are handling the news, why are we still using news stories?

The news story is, from an informational perspective, pretty unsophisticated. It’s a block of text, a headline, some tags. There’s barely any structure or metadata.

But for an algorithm to be able to report the news it would seem that you pretty much have to impose this kind of structure on the information, and it’s clear from Wired that that’s what Narrative Science does:

Narrative Science’s writing engine requires several steps. First, it must amass high-quality data. That’s why finance and sports are such natural subjects…

…So Narrative Science’s engineers program a set of rules that govern each subject, be it corporate earnings or a sporting event. But how to turn that analysis into prose? The company has hired a team of “meta-writers,” trained journalists who have built a set of templates. They work with the engineers to coach the computers to identify various “angles” from the data…

…Then comes the structure. Most news stories, particularly about subjects like sports or finance, hew to a pretty predictable formula, and so it’s a relatively simple matter for the meta-writers to create a framework for the articles. To construct sentences, the algorithms use vocabulary compiled by the meta-writers.

My question is this: Why, when you’ve imposed all this structure on the information do you package it in such a “dumb” format? Yeah, I get that people are accustomed to reading news articles, and if you experiment with some new information format you risk users not understanding or embracing it.

But is there any reason to think that the news story is the ideal way to take in information? Yes, humans like narrative. But they’ll get that in magazine journalism. The basic news item – who won a game, what happened to a stock – doesn’t need to be digested as a story.

That’s why we have headlines, that’s why we use bullet points and bold stuff.

As Google’s Richard Gingras recently said:

“Do we not deserve to rethink the architecture of what a ‘story’ is, the form of presentation and narrative to meet the needs of people who are consuming, not just by articles?”

If you have an algorithm smart enough to parse events happening in the world and translate it into structured data you ought to be dreaming a little bigger about how to present it to your audience. The upside down pyramid format worked when turning news into data would have been another step. Now that’s already done as a necessity of algorithmic news.


Algorithms and the future of divorce

In Chapter 21 of Thinking, Fast and Slow Dan Kahneman discusses the frequent superiority of algorithms over intuition. He documents a wide range of studies showing that algorithms tend to beat expert intuition in areas such as medicine, business, career satisfaction and more. In general, the value of algorithms tends to be in “low-validity environments” which are characterized by “a significant degree of uncertainty and unpredictability.”*

Further, says Kahneman, the algorithms in question need not be complex:

…it is possible to develop useful algorithms without any prior statistical research. Smple equally weighted formulas based on existing statistics or on common sense are often very good predctors of significant outcomes. In a memorable example, Daws showed that marital stability is well predicted by a formula:

frequency of lovemaking minus frequency of quarrels

You con’t want your result to be a negative number.

Kahneman concludes the chapter with an example of how this might be used practically: hiring someone at work.

A vast amount of research offers a promise: you are much more likely to find the best candidate if you use this procedure than if you do what people normally do in such situations, which is to go into the interview unprepared and to make choices by an overall intuitive judgment such as “I looked into his eyes and liked what I saw.”

All of this makes me think of online dating. This is an area where we are transitioning from almost entirely intuition to a mixture of algorithms and intuition. Though algorithms aren’t making any final decisions, they are increasingly playing a major role in shaping peoples’ dating activity. If Kahneman is right, and if finding a significant other is a “low-validity environment”, will our increased use of algorithms lead to more optimal outcomes? What truly excites me about this is that we should be able to measure it. Of course, doing so will require very careful attention to the various confounding variables, but I can’t help but wonder: will couples that meet online have a lower divorce rate in 20 years than couples that didn’t? Will individuals who spent significant time dating online be less likely to have been divorced than those that never tried it?

*One might reasonably object that this definition stacks the deck against intuition, and I think this aspect of the debate deserved a mention in the chapter. The focus on “low-validity environments” is the focus on areas where intuition is lousy. So how shocking is it that these are cases where other methods do better? And yet, the conclusions here are extremely valuable. Even though we know that these “low-validity” scenarios are tough to predict, we still generally tend to overrate our ability to predict via intuition and underrate the value of simple algorithms. So in the end this caveat – while worth making – doesn’t really take away from Kahneman’s point.


Initial thoughts on Eli Pariser

Eli Pariser, president of the board at, has a new book out called The Filter Bubble, and based on his recent NYT op-ed and some interviews he’s done I’m extremely excited to read it. Pariser hits on one of my pet issues: the danger of Facebook, Google, etc. personalizing our news feeds in a way that limits our exposure to news and analysis that challenges us. (I’ve written about that here, here, and here.) In this interview with Mashable he even uses the same metaphor of feeding users their vegetables!

The Filter Bubble - Eli Pariser

So, thus far my opinion of Pariser’s work is very high. But what kind of blogger would I be if I didn’t quibble? So here goes…

From the Mashable interview (Mashable in bold; Pariser non-bold):

Isn’t seeking out a diversity of information a personal responsibility? And haven’t citizens always lived in bubbles of their own making by watching a single news network or subscribing to a single newspaper?

There are a few important ways that the new filtering regime differs from the old one. First, it’s invisible — most people aren’t aware that their Google search results, Yahoo News links, or Facebook feed is being tailored in this way.

When you turn on Fox News, you know what the editing rule is — what kind of information is likely to get through and what kind is likely to be left out. But you don’t know who Google thinks you are or on what basis it’s editing your results, and therefore you don’t know what you’re missing.

I’m just not sure that this is true. I completely recognize the importance of algorithmic transparency, given the terrific power they have over our lives. But it’s not obvious to me that we’re living in a less transparent world. Do we really know more about how Fox’s process works than we do about how Google’s does? It seems to me that in each case we have a rough sketch of the primary factors that drive decisions, but in neither do we have perfect information.

But to me there is an important difference: Google knows how its process works better than Fox knows how its process works. Such is the nature of algorithmic decision-making. At least to the people who can see the algorithm, it’s quite easy to tell how the filter works. This seems fundamentally different than the Fox newsroom, where even those involved probably have imperfect knowledge about the filtering process.

Life offline might feel transparent, but I’m not sure it is. Back in November I wrote a post responding to The Atlantic’s Alexis Madrigal and a piece he’d written on algorithms and online dating. Here was my argument then:

Madrigal points out that dating algorithms are 1) not transparent and 2) can accelerate disturbing social phenomena, like racial inequity.

True enough, but is this any different from offline dating?  The social phenomena in question are presumably the result of the state of the offline world, so the issue then is primarily transparency.

Does offline dating foster transparency in a way online dating does not?  I’m not sure.  Think about the circumstances by which you might meet someone offline.  Perhaps a friend’s party.  How much information do you really have about the people you’re seeing?  You know a little, certainly.  Presumably they are all connected to the host in some way.  But beyond that, it’s not clear that you know much more than you do when you fire up OkCupid.  On what basis were they invited to the party?  Did the host consciously invite certain groups of friends and not others, based on who he or she thought would get along together?

Is it at least possible that, given the complexity of life, we are no more aware of the real-world “algorithms” that shape our lives?

So to conclude… I’m totally sympathetic to Pariser’s focus and can’t wait to read his book. I completely agree that we need to push for greater transparency with regard to the code and the algorithms that increasingly shape our lives. But I hesitate to call a secret algorithm less transparent than the offline world, simply because I’m not convinced anyone really understood how our offline filters worked either.


Google News: Trust the algorithm

I’ve written about the potential dangers of Google and Facebook using algorithms to recommend news, with the basic fear being that they’ll recommend stories that confirm my biases rather than “feed me my vegetables.” But Nieman Lab has an interview with the founder of Google News who has quite a different take on what he’s doing:

“Over time,” he replied, “I realized that there is value in the fact that individual editors have a point of view. If everybody tried to be super-objective, then you’d get a watered-down, bland discussion,” he notes. But “you actually benefit from the fact that each publication has its own style, it has its own point of view, and it can articulate a point of view very strongly.” Provided that perspective can be balanced with another — one that, basically, speaks for another audience — that kind of algorithmic objectivity allows for a more nuanced take on news stories than you’d get from individual editors trying, individually, to strike a balance. “You really want the most articulate and passionate people arguing both sides of the equation,” Bharat says. Then, technology can step in to smooth out the edges and locate consensus. ”From the synthesis of diametrically opposing points of view,” in other words, “you can get a better experience than requiring each of them to provide a completely balanced viewpoint.”“That is the opportunity that having an objective, algorithmic intermediary provides you,” Bharat says. “If you trust the algorithm to do a fair job and really share these viewpoints, then you can allow these viewpoints to be quite biased if they want to be.”

[emphasis from Nieman Lab]

A few thoughts:

1. It is very encouraging that Krishna Bharat is thinking about this, even if only as a piece of what he’s doing.

2. He’s right that whether or not you can trust the algorithm matters tremendously.

3. I remain skeptical that there’s any incentive for the algorithm to challenge me. Does he believe doing so will provide something I want and am more likely to click such that this vision fits nicely with Google’s bottom line? Or is he suggesting that he and his team are worried about more than the bottom line?

Bottom line: it’s great he’s thinking about this but he needs to explain why we should really believe it’s a priority if he wants us to truly trust the algorithm.


Google won’t feed me my vegetables

I had a post months back called “Who will feed me my vegetables?” about the dangers of social news feeds. Here was the gist:

Consider politics.  Facebook knows I self-designate as “liberal”.  They know I’m a “fan” of Barack Obama and the Times’ Nick Kristof.  They can see I’m more likely to “like” stories from liberal outlets.So what kind of political news stories will they send my way?  If the algorithm’s aim is merely to feed me stories I will like then it’s not hard to imagine the feed becoming an echo chamber.

Imagine if Facebook were designing an algorithm to deliver food instead of news.  It wouldn’t be hard to determine the kind of food I enjoy, but if the goal is just to feed me what I like I’d be in trouble.  I’d eat nothing but pizza, burgers and fries.

This is not just idle speculation. Here’s an entry today at the Google News Blog:

Last summer we redesigned Google News with new personalization features that let you tell us which subjects and sources you’d like to see more or less often. Starting today — if you’re logged in — you may also find stories based on articles you’ve clicked on before.

For signed-in users in the Personalized U.S. Edition, “News for You” will now include stories based on your news-related web history. For example, if you click on a lot of articles about baseball, we’ll make sure that you get a chance to see breaking baseball stories. We found in testing that more users clicked on more stories when we added this automatic personalization, sending more traffic to publishers.

Emphasis mine. In many ways this is obviously useful. But it carries real risks. And I bolded that last line to emphasize the driving force behind these efforts: profit. What you should be reading is nowhere in the equation. Even what you want to read is useful only to the extent that it serves up traffic and ad revenue. Somewhat related…I’m increasingly curious about the possibility for “responsible algorithms” to add a new layer to the web experience for users on an opt-in basis. That’s something I’ll expand on in a future post.


Netflix vs. the market

I wrote a lengthy post last year on how the networked information economy offered potential challenges to the market as the primary form of economic organization. In that post I noted the market’s efficiency at aggregating information through individual preferences and used Netflix of an example of how networks are making information aggregation easier in a way that could potentially challenge markets. I want to expand on that. Here’s what I wrote last year:

Just as the market’s claim to dominance in motivating us is starting to be challenged, some are revisiting its dominance in aggregating information.  Sunstein explores the subject in Infotopia and highlights increasing efforts to aggregate human preferences online, including Amazon and Netflix.  If it’s obvious that we are doing better and better at aggregating information thanks to the Net, it’s less obvious how this might challenge the role of the market.

Imagine that Netflix has a small, set number of a rare movie to rent, and that it’s in high demand.  Who should get it first?  Auction the privilege off to the highest bidder, responds the free market advocate.  And, particularly in a scenario where customers have equal wealth at their disposal, this method has a lot to recommend it.  The market is incredibly efficient at allocating resources under ideal settings.  Tremendous gains in human welfare have been predicated on this fact.  But Netflix is developing sophisticated algorithms to use your preferences for movies you’ve seen to predict what movies you’ll like.  Is it so hard to believe that some day in the future an algorithm could – given the aim of maximizing viewer enjoyment – “beat the market” in determining how to distribute the movie?

I want to articulate this challenge in slightly greater detail. Let’s start small and simple…

There are 100 video customers, each of whom has a token for one free movie. Each customer browses the library of videos and picks out the one they want. They watch it and fill out a survey rating how much they enjoyed it. That’s scenario 1.

Scenario 2: Still 100 customers. This time Netflix’s algorithm, looking at their past ratings, gives them the movie to watch. After watching, they rate their satisfaction.

Which group will be more satisfied? What does it mean if Netflix gets to a point where its algorithm wins out? And how about if we change the example by adding scarcity, as was implied in the quote above? Now there’s only one of each movie and we’re comparing Netflix doling out movies via its algorithm to customers bidding on movies in an auction of some kind.

I’d like to ask for some help in thinking about this.

To my tech friends: obviously a ton of people are writing smart stuff about recommendation algorithms. What should I be reading on this?

To my economist and wonk friends: what do you think of the comparison of these two distribution mechanisms? What am I missing? I’d love help thinking about how you’d actually design the specifics of a challenge. While the comparison makes broad sense to me at the macro level I’d appreciate hearing from someone familiar with the economics of auctions, etc. It’s been a while since I’ve thought too much about basic micro.