Collective reasoning

The political scientist Henry Farrell recently resurfaced a post of his about knowledge and the role of criticism. (It’s pegged to a debate over of whether we should or shouldn’t be less negative/critical online, which is to me the less interesting aspect.) It contains some excellent bits that dovetail with some of the posts I’ve done over the years on the social aspect of epistemology:

Reasoning has not evolved in the ways that we think it has – as a process of ratiocination that is intended independently to figure out the world. Instead, it has evolved as a social capacity – as a means to justify ourselves to others. We want something to be so, and we use our reasoning capacity to figure out plausible seeming reasons to convince others that it should be so. However (and this is the main topic of a more recent book by Hugo), together with our capacity to generate plausible sounding rationales, we have a decent capacity to detect when others are bullshitting us. In combination, these mean that we are more likely to be closer to the truth when we are trying to figure out why others may be wrong, than when we are trying to figure out why we ourselves are right…

…Our ability to see the motes in others’ eyes while ignoring the beams in our own can be put to good work, when we criticize others and force them to improve their arguments. There are strong benefits to collective institutions that underpin a cognitive division of labor.

This superficially looks to resemble the ‘overcoming bias’/’not wrong’ approaches to self-improvement that are popular on the Internet. But it ends up going in a very different direction: collective processes of improvement rather than individual efforts to remedy the irremediable. The ideal of the individual seeking to eliminate all sources of bias so that he (it is, usually, a he) can calmly consider everything from a neutral and dispassionate perspective is replaced by a Humean recognition that reason cannot readily be separated from the desires of the reasoner. We need negative criticisms from others, since they lead us to understand weaknesses in our arguments that we are incapable of coming at ourselves, unless they are pointed out to us.

This is why it’s useful to have an integrative mindset, or to triangulate the truth. We learn from others, sometimes by seeing an error in our reasoning and updating it–and other times by just granting the possibility that, though we can’t see any error in our own view, there’s still some chance that our critics are right.

Agents

Henry Farrell has a good post at Crooked Timber about AI, where he argues that we should think of them as being more similar to markets, bureaucracies, or other complex social structures than to a human brain. He ties it to the meme of the “shoggoth”. Here’s the gist:

“LLMs too are collective information systems that condense impossibly vast bodies of human knowledge to make it useful… Supervised fine tuning can make a raw LLM system sound more like a human being. This is the mask depicted in the shoggoth meme. Reinforcement learning – repeated interactions with human or automated trainers, who ‘reward’ the algorithm for making appropriate responses – can make it less likely that the model will spit out inappropriate responses, such as spewing racist epithets, or providing bomb-making instructions. This is the smiley-face.

LLMs can reasonably be depicted as shoggoths, so long as we remember that markets and other such social technologies are shoggoths too. None are actually intelligent, or capable of making choices on their own behalf. All, however, display collective tendencies that cannot easily be reduced to the particular desires of particular human beings. Like the scrawl of a Ouija board’s planchette, a false phantom of independent consciousness may seem to emerge from people’s commingled actions. That is why we have been confused about artificial intelligence for far longer than the current “AI” technologies have existed.”

I like this comparison a lot, and it reminds me of this CSET paper: “Machines, Bureaucracies, and Markets as Artificial Intelligence.”

But if you were really impressed by the power of LLMs I think the response to Henry would be something like this:

Yes, LLMs have things in common with markets and bureaucracies, like their complexity and inscrutability and their usefulness for accomplishing various tasks. But markets and bureaucracies aren’t agents. They aren’t actors pursuing a goal. The market cannot “want” a certain price; it is not trying to sell a certain amount of a good. But is it so far fetched to think that advanced AI systems are — or could soon be — agentic? That they might pursue goals in a way that a bureaucracy doesn’t?

But then consider this interview that Annie Duke did with Daniel Kahneman, on a totally different topic. He’s talking about his use of “System 1 and System 2” in his book Thinking Fast and Slow to describe intuitive and reasoned thinking. For a long time, he says, there was a taboo in psychology against describing things in terms like this — because it suggested there were multiple actors inside your own head.

In Thinking, Fast and Slow, I deliberately broke that taboo and I said, yes, there are agents in your mind. It’s very good to think about in that way. And there is an interesting piece of reasoning behind it, I think. Not that I believe there are really two agents in the mind. I don’t. Even the word system is not applicable, and it’s more of a continuum and, you know, it’s much vaguer than it sounds. But it turns out that people’s minds are specialized for thinking about agents. They are surrounded by agents with intentions, with personalities, and we understand agents. Understanding categories or types—we are much poorer at it.

The point isn’t that there literally are agents in your mind. But it was a compelling way to describe an important phenomenon because our minds are good at thinking about agents. In this case, we see them (in our heads) even when they aren’t there.

That’s the risk when we think about AI: We are always at risk of imagining agents that aren’t really there. We think about the world as if everything around us is acting in pursuit of goals just like we are.

Maybe AI agents will materialize and in that sense they’ll be quite different from markets and bureaucracies. But we might also fall prey to imagining that they are agentic even when the aren’t, just like we’ve long done with other social systems.

Three cultures of governance

From Lessons from the Covid War:

The Covid war is a story of how our wondrous scientific knowledge has run far, far ahead of the organized human ability to apply that knowledge in practice…

…Real strategy is a notion of how someone plans concretely to connect ends with means. It is a theory of the ‘how.’ It is realized in blueprints of design, and by organizing, funding, training, and equipping many people to play their part in this choreography, people who sometimes must do very difficult things.

There are three main cultures in governance. One is a culture of programs and process. People get authority and money for a program. They administer processes. Many programs are controlled more by congressional committees than by their nominal agency heads. The programs are created to dispense money and they do that, following the given process.

Another is a culture of research and investigation, sometimes to offer advice or inform regulation. It is the dominant culture of high science, the realm of basic research and some ethical reflection. Its strengths were apparent in this crisis in, for example, the understanding of molecular biology and the NIH’s support for messenger RNA technology. The CDC nurtures cadres of disease detectives. It is a culture that can become insular, if the researchers mainly just judge each other, and judge only by their own cultural standards for methods, insight, and value.

A third is a culture of operations, to do things, to produce results in the field. It is a culture that can be resilient and adaptable, since the operators have to adjust to the real conditions they encounter in the field. It too can become insular and clannish in other ways. It is the dominant culture in most private firms, especially those that make products or deliver services. It is the dominant culture in a large part of the governance most Americans actually interact with every day, little of which is in the federal government.

The challenge in the Covid war, as in any great emergency, is to meld all these cultures in practice. it is very difficult to do this. What the Covid war exposed, what every recent crisis has exposed — even in Iraq and Afghanistan — is the erosion of operational capabilities in much of American civilian governance.

p. 14-19

On mergers

Over the last six years I have edited and written dozens of articles about industry concentration, declining competition, “superstar firms”, and antitrust policy. For a long while, I was trying to catalog all that I’d been reading on this topic here (I’ve since given up tracking).

The confusing thing about this topic is that over the last few decades, digital technology has transformed competition at the same time that there has been less antitrust enforcement and a more laissez-faire approach to big business. The former has both increased the advantage of being a big company and reallocated market share toward companies that are good at software, making them bigger. The latter has made it easier for big firms to merge, given firms more power over workers, and coincided with a rise in corporate lobbying that allowed firms to rig the rules in their favor.

It’s tricky to tease apart these two trends, and it frankly annoys me that the conversation about them remains somewhat two-tracked. A lot of antitrust people resist the idea that tech may be driving industry concentration for less nefarious reasons; a lot of people studying firm strategy and the role of tech don’t focus so much on rent-seeking.

Even with these two things tangled up, it’s clear at this point that competition is declining in the US economy. And in my view there’s one fairly straightforward policy lever that could go a long way toward fixing that:

It should be much harder for large firms to merge and acquire.

I have written and edited pieces about this before, and today I have a piece out in ProMarket, UChicago’s online publication covering competition, with my colleague and editor Brooke Fox. We recap some data and some theory on why so many mergers are harmful to competition and conclude that there ought to be a stronger presumption that mergers are anticompetitive.

This is not the same thing as being anti-bigness. Big firms have advantages. But we should make it much tougher for them to acquire their way to dominance.

Neoliberalism, again

There’s a new New York Times Opinion essay about the US turn away from neoliberalism. Twitter thread versions of the piece here and here.

I don’t really have any big point here except maybe a frustration about all the arguments that get tangled up in a topic like this. Here’s a sampling of the ideas in the piece: The claim that we’re turning away from the assumption that “What was good for markets was good for America.” Assuming this means financial markets, it’s definitely not true that what’s good for financial markets is always or even usually good for America. So, yay, sounds positive.

Here’s another one: “The ‘new consensus’ has meant enormous state investment, directed toward industrial revival all around the postindustrial world.” I like it!

But then: “Five years ago, sanctimonious neoliberals mocked Donald Trump’s zero-sum view of the world as a kind of Dunning-Kruger geopolitics. But today, you hear few invocations among politicians or diplomats or bureaucrats of any truly universal positive-sum model of free markets or economic growth.” I mean… most trade pretty clearly is positive sum. Is anyone really claiming most international trade is zero sum? Anyone? Bueller?

It’s hard to add all these ideas up and take stock of where people stand. Did we change our views on trade policy? Or did we know all along that trade was usually net positive but had all sorts of possible ill effects that had to be mitigated (and that we weren’t mitigating them)? Do we actually think ‘Made in America’ clauses are a good idea? Or are we holding our nose because they’re popular? When topics like this get rolled up into these big new paradigm pieces, it can be tough to keep track. (I still like big new paradigm pieces.)

Anyway, hence the post clipping together my writing and blogging on a few of the sub-themes that come up here…

Center-left neoliberalism: Kinda good!

First off, neoliberalism can mean a bunch of different things. It can mean ridiculous laissez-faire conservatism which I have zero interest in defending. But to the extent that one wants to take on the Obama administration center-left liberalism version of it… that version I think holds up fairly well? I wrote about this here, tracing the Obama economic policy and grading it on two specific policy areas. (I will also say, though, that to the extent neoliberalism is about fealty to markets, that’s not the right perspective. The whole game here is about reinventing the mixed economy.)

Free trade: Also kinda good?

One big frustration I have here is that it’s hard to tell which parts of the previous economic consensus on trade were actually wrong. Some areas of the US were hurt more by the China shock than policymakers suspected — though not necessarily more than economic models predicted. But as a general rule trade seems fairly positive if the right supporting policies are in place. My mind can definitely be changed, but I wish discussions of this topic were clearer.

Industrial policy: Also also good!

Governments should be more active in their economies, taking careful, evidence-based actions to boost productivity, steer the economy in positive ways, address climate change, etc. etc. I’ve been saying this for a long while. More recently I commissioned a whole series on industrial policy; this piece lays out the case nicely.

Political economy: ¯_(ツ)_/¯

I’ve long thought the best critique of center-left liberalism starts with political economy. Maybe a lot of these policies that seem good, and actually even maybe are good, don’t lend themselves to building and maintaining good political coalitions. Or maybe they seem good or start out good, but then get distorted and captured and so end up worse than the wonks predict. In the last year I’ve written two essays about this.

One was about the need to consider political economy and the idea that real-world policies are always second-best at best. The other was a review of Mancur Olson’s ‘The Rise and Decline of Nations’ which was republished for its 40th anniversary. I did a link roundup on other influences in this vein here.


Alright, that’s it for now. Onward, into the new, good-ish, deeply confusing economic era.

Political economy

I have a new essay on economic policy and political economy that I want to quote in a second. The piece is an attempt to write something in response to several questions that keep coming up for me. And those questions began for me, in part, from a few different sources that didn’t make it into the piece. So let me start with those.

Here’s Joseph Stiglitz in an essay for the volume “New Perspectives on Regulation” published in 2009:

Previously the presumption that markets were efficient was widespread, with the corollary that only under exceptional circumstances (such as monopoly and massive pollution) were there failures that warranted intervention. Now, among mainstream economists, there is no presumption that markets are efficient.

This has certainly been my experience talking to lots of mainstream economists over the years. There is absolutely no presumption that markets are perfectly competitive when left to their own devices.

Here is a 2019 piece from political scientist Henry Farrell on the excellent blog Crooked Timber, writing about left-leaning neoliberalism:

Two questions follow (for me, anyway). One is for the neoliberal leftists, as a part of a broader left coalition. When and to what extent will their preferred approach to delivering policy clash with, or undermine, the necessary conditions for achieving collective action and making the policy sustainable? If they are pushing for market means towards social democratic ends, that is fine and good – markets can indeed sometimes be the best way to deliver those ends, and few of us would want to be completely without them (including Marxists like Sam Gindin. But one key lesson of the last couple of decades is that market provision of benefits makes it harder to build and sustain coalitions – private gain and public solidarity are at best uncomfortable bedfellows. Figuring out the political tradeoffs – when market means are worthwhile even when they make collective action tougher, or where non-market means might be better for sustainability reasons, even when markets are more efficient – is going to be hard, and we need to start building shared language and concepts to make it easier to resolve the inevitable disputes.

It is not a coincidence that Farrell was also the person who sent me back to Mancur Olson.

Then there’s this line from Brad Delong in Concrete Economics: He describes the New Deal, admiringly, as “pragmatic experimentalism.”

Last but not least is the book Democracy for Realists which argues, more or less, that democracy is under-theorized.

Two questions from all the scattered links above:

  1. What does good policymaking look like when you grant that market failures are ubiquitous?
  2. What does good policymaking look like once you take political economy seriously?

That’s what my new essay for the Stigler Center/Promarket is about. The title is “Biden’s Second-Best Economic Agenda.” Here are a couple bits of it:

In the wake of the Trump presidency and the pandemic, the Biden administration is keenly aware that policies that seem optimal when considered on their own might in practice not be optimal at all. Concerns about efficiency have taken a backseat while concerns over political economy have grown…

If the case for some of these policies is not efficiency or innovation but political economy, it’s reasonable to ask for a more-detailed justification on those grounds. If we’re going to rely less on economic theory, let’s at least have more political science in its place…

Without a better theory of political economy concerns, the theory of the second-best can become a shield for defending almost any policy…

Still, the fact remains that the Biden administration was able to accomplish through industrial policy what the Obama administration couldn’t get done with a carbon price… Last year, the US finally passed a climate bill on the scale of the climate challenge. The second-best approach sure looks better than doing nothing.

More on why data helps

I’ve written before about the puzzle of why data helps. Why do even really basic, descriptive analytics seem anecdotally to be so useful?

First, just one more time evidence that data sure seems to help. From a paper in Nature about forecasting social change:

We compared forecasting approaches relying on (1) no data modelling (but possible consideration of theories), (2) pure data modelling (but no consideration of subject matter theories) and (3) hybrid approaches. Roughly half of the teams relied on data-based modelling as a basis for their forecasts, whereas the other half of the teams in each tournament relied only on their intuitions or theoretical considerations… Forecasts that considered historical data as part of the forecast modelling were more accurate than models that did not… There were no domains where data-free models were more accurate than data-inclusive models.

(Amazingly the data-only models did even better than hybrid models.)

By contrast:

The results from two forecasting tournaments conducted during the first year of the COVID-19 pandemic show that for most domains, social scientists’ predictions were no better than those from a sample of the (non-specialist) general public.

The only time the experts did better than the public? When they used data:

Which strategies and team characteristics were associated with more effective forecasts? One defining feature of more effective forecasters was that they relied on prior data rather than theory alone. This observation fits with prior studies on the performance of algorithmic versus intuitive human judgements21. Social scientists who relied on prior data also performed better than lay crowds and were overrepresented among the winning teams

Why does data work? Why does quantifying seem to be so useful?

Here’s a totally separate study at Voxeu that compares stories to data and illustrates a key driver of the systematic biases that drive human judgment awry: memory.

To examine the belief impact of stories versus statistics, we conducted controlled online experiments. The key idea of these experiments is to compare the immediate belief impact of stories and statistics to the belief impact after some delay, to isolate the role of memory. Participants in our experiment were informed that hypothetical products received a number of reviews. The task of participants was to guess whether a randomly selected review is positive. Before stating their guess, participants either received news in the form of a statistic, a story, or no information. We conceptualise statistics as abstract summaries of multiple data points (multiple reviews). Stories, by contrast, contain one datapoint (one review), but in addition provide contextual qualitative information about the review. Each participant saw three different product scenarios across which they were presented with one story, one statistic, and once no information. Crucial to our experimental design was that we elicited beliefs from participants twice, once immediately after they received the information and once following a delay of one day. This allows us to track the belief impact of stories versus statistics over time…

…both stories and statistics have an immediate effect on beliefs. On average, subjects immediately adjust their beliefs by about 20 percentage points for statistics, and by about 18 percentage points for stories. This pattern, however, looks markedly different after a one-day delay. While there remains a substantial belief impact for stories (about 12 percentage points), the belief impact of statistics drops to about five percentage points. In other words, we document a pronounced story-statistic gap in the evolution of beliefs. While the impact of statistics on beliefs decays rapidly, stories have a more persistent effect on beliefs. Using recall data, we confirm that the reason for this dynamic pattern is that stories are more easily retrieved than statistics.

As I wrote in my summary of behavioral economics, “We rely heavily on the information we can easily recall.” Memory gives us a biased view based on the stories we can most easily recall. But what comes easily to mind may have little to do with what’s actually going on: we’re misled by what’s most unusual or extreme or striking. Data works because it focuses us on what usually happens, not what is most memorable, and so has a de-biasing effect.

The amazing thing is that our judgment is so poor that a lot of the time we can’t do any better than just totally deferring to a super basic, content-free extrapolation from that data. Quantification has its own problems, of course. It helps not because it’s so great but because of how limited human reason can be.

Forecasting on INFER

Last year I had the opportunity to be a “Pro” forecaster on INFER, the crowd forecasting tournament run by Cultivate Labs and the University of Maryland (formerly Foretell of Georgetown’s CSET). Basically you get a small stipend to participate each month. It was fun and I recommend it!

Ultimately, I decided not to keep going as a Pro in 2023. I started Nonrival in August, which I’ll blog about one of these days, and since then my forecasting time has been focused on that project.

I still plan to collaborate with the INFER team in some other capacities (more on that soon too perhaps) but I won’t be paid to make forecasts there.

I’ve “exited” the three questions that I’d forecasted on and weren’t yet resolved, but I’ll still be scored eventually for the period of time I was active on them — so this assessment isn’t quite a true scoring of my time there, but it’s close. How’d I do?

  • In the 2022 season, 344 users made at least 5 forecasts (that resolved), and by default that’s the cutoff INFER uses on its leaderboard so I’ll use it here, too.
  • I finished 76th, with 8 questions resolved and scored. That puts me at the 78th percentile.
  • On the “all-time” leaderboard for INFER (which for me counts my two questions forecast in 2021) I’m 71st of 620, which puts me at the 89th percentile.
  • Lifetime on INFER, I’m better-than-median on 9 out of 10 questions (7 out of 8 for 2022 season), with my one blunder being a forecast of Intel’s earnings where I seemingly underrated the chance of an out-of-sample result.

Overall, my MO seems to be consistently being just a tiny bit better than the crowd. Not bad! But that leaves plenty of room for improvement. Some of it is I think I could do better by simply spending more time and updating more regularly on news and shifts by other forecasters.

But there’s also a “tenaciousness” that Tetlock describes when talking about the superforecasters that includes a willingness or even an eagerness to sift through details when necessary until you find what you need. I saw some of that with teammates during my year as a Pro. And that’s something I’ve not had the time or maybe the inclination for. I think I’ve done a pretty consistent job of avoiding the basic mistakes that lead to poor forecasts: I look for quantitative data, seek out multiple perspectives, I often blend my own judgment with others’, etc. But if I want to get to the next level I need to immerse myself more in the details of a topic, at least some of the time.

Past forecasting record posts here and here.

Code is not law

I’m fond of Lessig’s saying that “code is law” and I often mention it on the blog. But there’s a deeply distorted version of this idea cropping up in crypto lately and it’s worth distinguishing it from the original meme.

Lessig’s idea was that human behavior was affected by four types of “governance” including markets, laws, norms, and what he called “architecture.” Architecture (if I’m remembering the book correctly) encompassed stuff we built in the physical world that affected human behavior. If I build a speed bump, you might drive differently; if I build a skyscraper, it might affect your view or change your walk to work or what-have-you. The things we build impose certain constraints on us — they shift how we behave.

Lessig then argued that in the digital world, code was the architecture. You could make some things possible or impossible, easy or hard, through the way the software was built. Code became a form of digital governance, alongside markets, laws, and norms.

Compare that to the crypto-maximalist version of “code is law,” which holds that anything the code allows is fair game. Here, via Matt Levine, is the defense provided by a trader who allegedly rigged a crypto market in a way that clearly would not be allowed in any normal financial market:

I believe all of our actions were legal open market actions, using the protocol as designed, even if the development team did not fully anticipate all the consequences of setting parameters the way they are.

You see the logic here: If you wanted what I did to be illegal, why did you write the code to make it possible? This is code-is-law maximalism.

There’s a less maximalist, dorm-room version of this that you sometimes see in crypto that maybe deserves some consideration. This version doesn’t argue that anything the code allows is OK. But it does say we should rely more on code for our regulation. It wants code to play a bigger role in setting the rules, bringing us closer to a world where anything the code allows is OK — even if we’re not there yet and even if we never get all the way there. I’m OK with a bit of utopianism and so I don’t mind entertaining this as a thought experiment. But so far crypto has mostly served to show why anything the code allows is OK is not OK.

To see just how damaging this maximalism is, compare it to a totally different case:

The U.S. Supreme Court on Monday let Meta Platforms Inc’s (META.O) WhatsApp pursue a lawsuit accusing Israel’s NSO Group of exploiting a bug in the WhatsApp messaging app to install spy software allowing the surveillance of 1,400 people, including journalists, human rights activists and dissidents.

If you exploit a bug to do bad things, you can’t just hide behind anything the code allows is OK. In this case, we’re talking about the murky world of international affairs where law is less effective. No one thinks this is a good thing: the world of international espionage is much closer than other spheres to anything the code allows is OK and no person in their right mind would want to run the rest of an economy that way. Code is law maximalism forfeits three-fourths of the original code-as-law formulation: Governing human behavior is hard, and we need all the tools we can find. As much as code increasingly does govern our behavior, laws, and incentives, and norms are all still essential.

On falsification

From Richard McElreath’s textbook Statistical Rethinking, via Data Elixir:

…The above is a kind of folk Popperism, an informal philosophy of science common among scientists but not among philosophers of science. Science is not describe by the falsification standard, and Popper recognized that. In fact, deductive falsification is impossible in nearly every scientific context. In this section, I review two reasons for this impossibility: 1) Hypotheses are not models… 2) Measurement matters…

…For both of these reasons, deductive falsification never works. The scientific method cannot be reduced to a statistical procedure, and so our statistical methods should not pretend. Statistical evidence is part of the hot mess that is science, with all of its combat and egotism and mutual coercion. If you believe, as I do, that science does often work, then learning that it doesn’t work via falsification shouldn’t change your mind. But it might help you do better science. It might open your eyes to many legitimately useful functions of statistical golems…

…So if attempting to mimic falsification is not a generally useful approach to statistical methods, what are we to do? We are to model. Models can be made into testing procedures–all statistical tests are also models–but they can also be used to design, forecast, and argue…

Related: Strevens’ iron rule of explanation.