How to make better predictions

Over the weekend I argued that people are really quite good at making predictions, when you zoom out and think of all the various ways we do so in science and in everyday life. Talk about how “predictions are hard, especially about the future” tends to concentrate on a narrow band of particularly difficult topics.

But even in those cases there are ways to improve your ability to predict the future. The classic book on the subject is Phil Tetlock’s Expert Political Judgment which I recommend. And if you want the short version, and happen to have a subscription to The Financial Times you’re in luck: Tim Harford’s latest column there gives a useful summary of Tetlock’s research.

His early research basically uncovered the role of personality in forecasting accuracy. More open-minded thinkers — prone to caution and the appreciation of uncertainty, who tended to weigh multiple mental models about how the world work against each other — make more accurate predictions than other people. (They still fail to do better than even simple algorithms.)

I’ll excerpt just the last bit, which focuses on Tetlock’s latest project, an ongoing forecasting tournament (I’m participating in the current round; it’s a lot of fun and quite difficult). Here’s the nickel summary of how to be a better forecaster, beyond cultivating open-mindedness:

How to be a superforecaster

Some participants in the Good Judgment Project were given advice on how to transform their knowledge about the world into a probabilistic forecast – and this training, while brief, led to a sharp improvement in forecasting performance.

The advice, a few pages in total, was summarised with the acronym CHAMP:

● Comparisons are important: use relevant comparisons as a starting point;

● Historical trends can help: look at history unless you have a strong reason to expect change;

● Average opinions: experts disagree, so find out what they think and pick a midpoint;

● Mathematical models: when model-based predictions are available, you should take them into account;

● Predictable biases exist and can be allowed for. Don’t let your hopes influence your forecasts, for example; don’t stubbornly cling to old forecasts in the face of news.

Humans are great at prediction

Humans are terrible at making forecasts, we’re often told. Here’s one recent example at Bloomberg View:

I don’t mean to pick on either of those folks; you can randomly name any 10 strategists, forecasters, pundits and commentators and the vast majority of their predictions will be wrong. Not just a little wrong, but wildly, hilariously off.

The author is talking specifically about the economy, and I mostly agree with what I think he’s trying to say. But I’m tired of this framing:

Every now and again, it is worth reminding ourselves just how terrible humans are at predicting what will happen in markets and/or the economy.

Humans are amazing at predicting the future, and yes that includes what will happen in the economy. It’s just that when we sit down to talk about forecasting, for some reason we decide to throw out all the good predictions, and focus on the stuff that’s just hard enough to be beyond our reach.

There are two main avenues through which this happens. The first is that we idolize precision, and ignore the fact that applying a probability distribution to a range of possibilities is a type of prediction. So the piece above is right that it’s incredibly difficult for an economist to predict exactly the number of jobs that will be added in a given month. But experts can assign probabilities to different outcomes. They can say with a high confidence, for example, that the unemployment rate for August will be somewhere between say 5.5% and 6.5%.

You might think that’s not very impressive. But it’s a prediction, and a useful one. The knowledge that the unemployment rate is unlikely to spike over any given month allows businesses to feel confident in making investments, and workers to feel confident making purchases. I’m not saying we’re perfect at this probabilistic approach — recessions still surprise us. But it’s a legitimate form of prediction at which we do far better than random.

That example leads me to the second way in which we ignore good predictions. Talk of how terrible we are at forecasting ignores the “easy” cases. Will the sun rise tomorrow? Will Google still be profitable in a week? Will the price of milk triple over the next 1o days? We can answer these questions fairly easily, with high confidence. Yes, they seem easy. But they seem easy precisely because human knowledge and the scientific process have been so successfully incorporated into modern life.

And there are plenty of other predictions between these easy cases and the toughest ones that get thrown around. If you invest in the stock market for the long-term, you’re likely to make money. Somewhere around a third of venture-backed startups won’t make it to their 10th birthday. A few years down the line, today’s college graduates will have higher wages on average than their peers without a degree. None of these things are certain. But we can assign probabilities to them that would exceed that of a dart-throwing chimp. Perhaps you’re not impressed, but to me this is the foundation of modern society.

None of this is to say we shouldn’t hold pundits and experts more accountable for their bad predictions, or that we shouldn’t work to improve our predictions where possible. (And research suggests such improvement is often possible.)

But let’s not lose sight of all the ways in which we excel at prediction. Forecasting is a really broad category of thinking that is at the center of modern science. And compared to our ancestors, we’re pretty good at it.



Does social media make people less happy?

I’ve written a bunch over the last few years attacking the idea that social media is isolating, causes loneliness, etc. But Technology Review has a piece up on some new and seemingly credible research on the relationship between social media use and reported happiness:

But there is growing evidence that the impact of online social networks is not all good or even benign. A number of studies have begun found evidence that online networks can have significant detrimental effects. This question is hotly debated, often with conflicting results and usually using limited varieties of subjects, such as undergraduate students…

…They found for example that face-to-face interactions and the trust people place in one another are strongly correlated with well-being in a positive way. In other words, if you tend to trust people and have lots of face-to-face interactions, you will probably assess your well-being more highly.

But of course interactions on online social networks are not face-to-face and this may impact the trust you have in people online. It is this loss of trust that can then affect subjective well-being rather than the online interaction itself.

Perhaps, like so many other things, the truth is that it depends. Use the internet to meet and stay in touch with a wide circle of people with whom you sometimes interact in person, and it probably makes you better off. Use it to monitor a wide circle of weak ties or strangers with whom you seldom have much interaction, maybe it doesn’t.

I’ve been critical of even that line of thinking, because from the studies I’ve seen previously it’s seemed that online networking has been, on net, positive. So it should be noted that this research isn’t a confirmation of what we already knew. It runs contrary to it. Previous research has tended to find that online activity has been highly social, and hasn’t caused social isolation. (I’m not aware of anything specifically looking at the impact of social media on trust before this.)

No doubt we’ll continue to get a better picture of the interaction between social media and happiness over time. But for now, this research offers a notion of caution for the optimists.

A reply to Ethan Zuckerman

There’s a great piece in The Atlantic this month on advertising, the internet’s “original sin”, by MIT’s Ethan Zuckerman. Advertising, he argues, has stripped us of our privacy, to the point of normalizing constant online surveillance. Add to that the fact that it’s not yet clear how well even these sophisticated data-heavy ad products are at actually making money — at least for companies without the scale of a Facebook — and it’s hard to disagree with the sentiment that the internet deserves something better.

But what?

Here I want to push back a bit. The example that Zuckerman cites is Pinboard, an online bookmarking service that charges its users. (Full disclosure: I am pretty sure I’m the last active Delicious user on the planet.) Pinboard has an appealing premise:

Why Pay For Bookmarking?

It boils down to this: running a web service costs money. If you’re not paying for your bookmarking, then someone else is, and their interests may not be aligned with yours.

This is what Zuckerman has in mind when he writes:

One simple way forward is to charge for services and protect users’ privacy, as Cegłowski is doing with Pinboard. What would it cost to subscribe to an ad-free Facebook and receive a verifiable promise that your content and metadata wasn’t being resold, and would be deleted within a fixed window? Google now does this with its enterprise and educational email tools, promising paying users that their email is now exempt from the creepy content-based ad targeting that characterizes its free product. Would Google allow users to may a modest subscription fee and opt out of this obvious, heavy-handed surveillance?

I should say, I love this option, and think it would be an improvement from the status quo. But I’m not sure this is actually the most plausible alternative to ad-supported sites and services.

Let’s take the hypothetical further: what if tomorrow advertising were no longer an option for web services and content providers? We’d see some users paying up for things like Pinboard, certainly. Perhaps even for a Facebook subscription.

But I’d argue for another possibility: that we’d continue to get lots of these services for free, minus a modest hosting fee paid to a separate party.

The site you’re reading right now runs on WordPress, an open-source content management system for which I pay nothing. My only cost is a few dollars a year to a hosting provider to keep the site online. This hosting provider has no connection to the company behind WordPress* or the community of WordPress developers who help maintain it.

That’s subtly different from the Pinboard model (which again, is a great model) in that the software comes free. All I’m paying for is storage, and storage is cheap.

If suddenly advertising weren’t an option, whether because it turned out to be a terrible business model or because we decided we cared about privacy, my hope is that something closer to the WordPress + hosting model might spring up to replace it.

You’d pay a few dollars a month for some storage somewhere, which in turn would offer, at a click of the button, installation of the various open source software packages to support things like social networking. These services would no longer be centralized, and so would work a bit more like email or RSS, but fundamentally most users probably wouldn’t even recognize the difference.

This distinction matters, in part because I suspect it’s bound to disappoint some people. For those interested in privacy, both alternatives are good. For those interested in as many people using the internet for as many things as possible, the cheaper, open alternative is better. But for those whose real interest is in getting paid, the open alternative is a disaster.

That’s not what Zuckerman or Pinboard creator Maciej Cegłowski are about. But my sense is there’s a serious faction of web-ad haters for whom this is the primary gripe. This is particularly true on the content side, where — as opposed to for software — an increasing number of workers find their skills no longer in demand.

To this group, the idea of customers just forking it over already feels righteous. And so it’s worth noting that, if advertising went away, that’s not mostly what would happen.

In fact, the shift to “open” content would work even more seamlessly than for web services. Making consumers pay to access any professional content might increase purchases on the margin, but more than that it would shift attention away from the pros and toward the amateurs. Independent blogging would come back in vogue, and we’d discover that, while the average amateur can’t create as engaging a list of images as the average pro, thousands of amateurs giving it a try will in a lot of cases produce something about as good. Luckily, the internet is great at filtering up that one good one, and bringing it to everyone’s attention. This model doesn’t work for everything, particularly certain expensive and time consuming kinds of journalism. But for, say, political commentary, it works better than lots of paid commentators care to admit.

All of which makes me agree even more wholeheartedly that ad-backed business models aren’t an ideal way of supporting the internet. But if they suddenly went away, don’t expect me and everyone else to start paying up.

*It’s true that WordPress wouldn’t be where it is without Automattic, the company its founder created around it. But that’s not the only model for open source development. It’s certainly possible to offer mature, first-class open source software without a single company at the center, driving the initiative. My hope is that this is what we would see increasingly on an internet that relied less heavily on ads.

The mobile media paradox

Mobile apps are dominating media consumption, but getting users to use your app is really, really hard:

U.S. users are now spending the majority of their time consuming digital media within mobile applications, according to a new study released by comScore this morning. That means mobile apps, including the number 1 most popular app Facebook, eat up more of our time than desktop usage or mobile web surfing, accounting for 52% of the time spent using digital media. Combined with mobile web, mobile usage as a whole accounts for 60% of time spent, while desktop-based digital media consumption makes up the remaining 40%.

Apps today are driving the majority of media consumption activity, the report claims, now accounting for 7 our of every 8 minutes of media consumption on mobile devices. On smartphones, app activity is even higher, at 88% usage versus 82% on tablets.

So apps are where media gets consumed. But that doesn’t mean media companies should throw all their efforts at mobile apps. Because getting users to download and use those apps is incredibly difficult:

Only about one-third of smartphone owners download any apps in an average month, with the bulk of those downloading one to three apps. …a “staggering 42% of all app time spent on smartphones occurs on the individual’s single most used app,” comScore reports.

For most media companies, taking advantage of mobile apps means optimizing content for the mobile apps of Facebook and Twitter, and for email. That might not sound ideal — no one wants to be dependent on a monolithic third party like Facebook to deliver their product — but that’s how it is, for now at least.

How to promote economic growth, in 1 paragraph

To hear pundits talk about it, it’s easy to conclude that we have no idea what policies will help with economic growth. After all, we’re debating whether we’re stuck in stagnation or about to witness a new era of technology-led expansion. But there are a set of policies the majority of economists believe will be growth enhancing, and — spoiler alert — cutting taxes isn’t on the list.

This is from a Foreign Affairs essay on automation and the economy by Erik Brynjolfsson, Andrew McAfee, and Michael Spence:

As for spurring economic growth in general, there is a near consensus among serious economists about many of the policies that are necessary. The basic strategy is intellectually simple, if politically difficult: boost public-sector investment over the short and medium term while making such investment more efficient and putting in place a fiscal consolidation plan over the longer term. Public investments are known to yield high returns in basic research in health, science, and technology; in education; and in infrastructure spending on roads, airports, public water and sanitation systems, and energy and communications grids. Increased government spending in these areas would boost economic growth now even as it created real wealth for subsequent generations later.

That’s not to say growth is primarily a factor of policy. No doubt it’s exogenous to a significant degree. But that sort of agenda would almost certainly help.

The essay overall is a concise summary of the global pressures on labor (and even capital) created by digital technologies, and for the longer version Brynjolfsson and McAfee’s recent book The Second Machine Age is also quite good.

Beating the algorithm, for now

Screen Shot 2014-07-03 at 11.32.33 AMThe New York Times has a fun interactive that lets you compete against an algorithm designed to predict which tweets will get the most retweets. The Times also has a story about the algorithm and its implications. Its takeaway:

That an algorithm can make these kinds of predictions shows the power of “big data.” It also illustrates a fundamental limitation of big data: Specifically, guessing which tweet gets retweeted is significantly easier than creating one that gets retweeted.

Sure. But predicting which of two phrasings will get the most retweets is still plenty difficult, and as part of my job is social media, it’s totally a “skill” I supposedly have.

So I was pleased to beat the algorithm 18 to 13. But even in victory, it’s clear that this is a losing gambit for me, over time. First off, there’s a chance I just got lucky, especially since the story says the algorithm usually guesses right 67% of the time. (I’m not sure how the interactive works, and whether they picked a single set of harder examples, if mine were different from others, or what.)

But any case, even if I wasn’t lucky, there’s a single question that to me is the bottom line in terms of humans’ race with machines: which of us do you think will get better faster?

If you do this same kind of test in two years, how much better will I be? Sure, maybe I’ll have improved a bit (although maybe not). But with more data at its disposal, faster processing power, new statistical techniques, etc. the room for an algorithm to improve is far greater.

The same holds for many other kinds of forecasting. That humans still beat algorithms at predicting, say, geopolitical events (I’m making this up) is interesting. But even as we get better, our progress is incremental, linear. It’s the algorithm that’s poised to win most improved.

Our patent problems go way beyond trolls

UPDATE: More recent data documents the serious uptick in patent troll litigation. Likely still true that the patent problem goes way beyond trolls, but they are a problem nonetheless. Recent research is here.

I did a Google Hangout with two intellectual property experts this week, and wrote an article to go along with it. The jumping off point was Tesla’s patent sharing announcement, but really it ended up being broader than that, covering the problems with our patent system and the possibility for reform.

One thing it was not really about was patent trolls, and it occurs to me based on some of the reaction to the article that I should have made this more explicit.

Here’s the chart from the post showing the explosion of patent litigation in the U.S.:



(If you’re curious about that spike at the end, read the update at the bottom of my post.)

The consequence of this dramatic increase is that patents have the effect of making innovation less profitable, rather than more so, in all industries except pharma and chemicals. In other words, when you count up the benefits to innovators from excluding others from their invention, and then subtract the cost of litigation, you get a negative number.

There are many reasons for this, among them the fact that in industries like software the “boundaries” around patents aren’t clear. So you have a patent and I have a patent and neither of us are quite sure what either of the patents does and doesn’t cover. That leads to a lot of unnecessary litigation, and beyond that just a lot of uncertainty.

But it’s worth spelling out that while patent trolls are a problem — one that needs to be addressed — they are not the primary driver of this explosion in litigation. Much of my post borrows from James Bessen of BU, one of the experts I interviewed, who has done research on this question. Here’s what he says in his book Patent Failure:

We also considered the role of patent “trolls,” which we define narrowly as individual inventors who do not commercialize or manufacture their inventions. One story claims that the increasing availability of patent litigators willing to work on contingency fees has spurred lawsuits by such trolls, who might otherwise be unable to afford litigation. The share of lawsuits initiated by public firms has not declined, however, nor has the share of lawsuits involving patents awarded to independent inventors increased. This suggests that the increase in litigation cannot be mainly attributed to patent “trolls,” at least through 1999. Of course, if we use a broader definition of “troll” that includes all sorts of patentees who opportunistically take advantage of poor patent notice to assert patents against unsuspecting firms, then troll-like behavior might be a more important explanation. Indeed, if patent notice is poor, then all sorts of patent owners might quite reasonably assert patents more broadly than they deserve. But then it is more appropriate to attribute the surge in litigation to poor patent notice, not to trolls per se.

As indefensible as the business model of companies like Intellectual Ventures is, that pure troll model does not itself explain the rise in patent litigation.

I wish I’d made this point even in passing in my HBR piece this week. It’s easy to blame the trolls, as well we should. But our patent problems go well beyond them.

Be Xerox not Apple

Having just finished the Steve Jobs bio, and so freshly reminded of how Apple borrowed the idea of the graphical user interface from Xerox to great effect (only to later turn around and call Microsoft a thief for doing the same), I enjoyed this back and forth on Xerox. It’s a piece of an extremely long Q&A between Felix Salmon and Jonah Peretti about… everything.

JP: The thing is, if you care about having an impact on the world, the too-early mode is the highest leverage point because you can have an idea, build a mock or a prototype of it, and then have those ideas find themselves in products that other people build that then scale up to massive.

FS: You’d rather be Xerox than Apple?

JP: People always talk about Xerox as a sad story.

FS: Maybe.

JP: I mean some people do. If all you value is money, then it’s a sad story. But if you think that the graphical user interface is a cool thing and you worked on the graphical user interface at Xerox, you can feel like, “This has a big impact on the world.”

FS: It’s a way of looking at the world through a lens of capital rather than labor. I’m sure the people who invented the graphical user interface at Xerox are doing very well for themselves right now.

JP: Right. And some of them have a sense of personal satisfaction that they had a big impact, even if they didn’t profit from it at the scale that they could have.

The history of the tech industry is full of case studies where we ask why some once great company couldn’t capitalize on a breakthrough it had created. It’s a fascinating question, but it’s worth remembering that it’s a question for firms more than it is for people. It’s easy to see why a firm would want to capture some of the financial value of a breakthrough, but for many people that’s not necessarily the goal. Sure, some great technologists truly are motivated by getting their innovations into as many hands as possible. But not all of them.

Sometimes, as Peretti says, the true impact gets made by the innovators who are too early, while the riches go to those who follow later. For lots of people, it’s the impact that matters.

Net neutrality is about more than small vs. big

tuxWith the FCC reportedly considering allowing paid “fast lanes” for internet traffic, the principle of net neutrality looks more at risk than ever. One of the big concerns of net neutrality advocates is that its absence might empower incumbent firms over newer, smaller, more innovative ones. That is a very valid and important concern.

But small firms represent only one sort of innovator, and arguably not the one most at risk from pay-to-play operations.

Here are a couple examples of the protect-the-startups meme in recent coverage. From NYT:

Consumer groups immediately attacked the proposal, saying that not only would costs rise, but also that big, rich companies with the money to pay large fees to Internet service providers would be favored over small start-ups with innovative business models — stifling the birth of the next Facebook or Twitter.

And here’s an editorial from The Financial Times, arguing that net neutrality may no longer be the right goal:

The fine detail of the FCC’s decision will matter. The regulator will have to ensure its reforms do not create barriers to entry for small and innovative companies – the internet giants of the future.

At The New Yorker, net neutrality advocate and media scholar Tim Wu goes a bit broader:

We take it for granted that bloggers, start-ups, or nonprofits on an open Internet reach their audiences roughly the same way as everyone else. Now they won’t.

To the extent that we can protect innovative new firms from being crushed by incumbents before they get off the ground, that’s great. But the next Facebook or Twitter, while at risk, also has the ability to raise capital and spend it on faster content delivery. (The details of the FCC regulations aren’t yet clear but it sounds like there will be a requirement that similar pay-to-play offers be available to all comers.)

An even bigger risk then is for non-professional content producers and for peer-to-peer, commons-based production. The bloggers Wu mentions could fall into this category, though if they’re using a proprietary platform like Tumblr or Medium they might not. A peer-to-peer project like Wikipedia has its nonprofit arm, but little ability to raise the capital necessary to ensure delivery. Less organized peer production efforts would be at even greater risk. A distributed network of independent bloggers might produce great content, but that content will be delivered slower than content produced by professionals, or by amateurs who’ve bought into a commercial platform. Suddenly, peer production is at a huge disadvantage relative to commercial production unless it has the weight of a commercial enterprise behind it.

The promise of large-scale production outside of firms or governments, from open source software to Wikipedia to independent blogging, was once one of the greatest promises of the internet. And it is even more at risk from the legalization of pay-to-play than are startups. Sure, incumbents might lean on startups who can’t afford to pay for faster delivery. But just as worrying is the thought that startups might raise venture capital to pay for faster delivery in order to crowd out commons-based peer production.

The net neutrality debate isn’t just about small vs. big. It’s also about commercial vs. the commons.