Non-rival data and AI safety

In a recent paper, OpenAI discusses the concern that competitive pressures could push AI developers to underinvest in responsible development, triggering a race to the bottom on safety that leaves everyone worse off. To avoid this outcome, it may be necessary to take intentional action designed to cultivate systemic conditions that favor cooperation on safety. OpenAI describes two principles that affect an organization’s likelihood to cooperate: (1) the amount of trust between developers and (2) the incentive landscape for cooperation versus defection. The mechanisms by which the incentive landscape might be changed are regulation, product liability law, and market forces. 

Here I explore an idea for a regulatory action which might have a positive effect on the incentive landscape. 

In a recent Andreessen Horowitz podcast episode, Oleg Rogynskyy, founder of sales automation startup, describes competition around AI in terms that might make the OpenAI authors cringe: “Unlike automation in the industrial revolution from 100 years ago,” he says, “the AI arms race is a zero-sum game.” Oleg’s view is that AI systems involve network effects leading to rich-get-richer, winner-take-all scenarios. Wherever there is a stream of human activity that produces data that can be captured without manual entry and aggregated, machine learning can be applied to understand the micro trends and produce predictions about what actions people should take next. These predictions lead to smarter actions, leading to more profit. This lures more players into the network, which brings more data, leading to ever smarter predictions. Thus the first player to enter the market with an AI tool might benefit from a virtuous cycle that prevents competitors from catching up. 

If this is the case, then entrepreneurs and business leaders would have an extraordinarily strong incentive to underinvest on AI safety in order to get to market faster. This is the race condition that OpenAI hopes to avoid. 

If we want to think about how to modify the system to reduce the risk of this sort of race, the place to focus our attention is the data. Data is the scarce resource that creates the runaway feedback effects. 

Economists say that a good is rival if its use by one party limits its use by another party. For example, rice is a rival good because if Bob eats a cup of rice, then Adam cannot eat it. Goods are non-rival if one party’s use does not limit another’s. Broadcast radio waves are non-rival: Bob and Adam can both tune in to the same radio station at the same time. 

Data is intrinsically non-rival. At the level of technology, it can be used and reused infinitely. But in current practice, data is rival: the firm that gets the user controls the data and doesn’t share it. If data were non-rival in practice, there would be no winner-take-all, runaway rich-get-richer effect. 

With regulatory and legislative action, we could conceivably make data non-rival in practice. This would be difficult, but there’s reason to think it might be the societally optimal move. Economists like Christopher Tonetti at Stanford have proposed models under which data is owned by the consumer that produces the behavior rather than by the company that tracks the behavior. Tonetti argues that consumer-controlled data produces more net good for a society. Data privacy advocates have also argued for such a shift. 

A question for technical AI developers is how useful this shared data would be. ML systems are trained on data collected in idiosyncratic environments. For example, the data from one self-driving car company might not be useful to another company because their systems use different sensors and different sensor placements. 

This change wouldn’t eliminate all of the competitive pressure contributing to AI safety races. There will always be advantages to entering a market earlier — maybe not winner take all, but winner take more. The recent example of Boeing skipping on safety to catch up to Airbus is a good illustration. This regulatory change would simply reduce the stakes. If you’re late to the game you’re not out — you just have to buy the data. 


Leave a Reply