A growing number of machine learning architectures, such as Generative Adversarial Networks, rely on the design of games which implement a desired functionality via a Nash equilibrium. In practice these games have an implicit complexity (e.g. from underlying datasets and the deep networks used) that makes directly computing a Nash equilibrium impractical or impossible. For this reason, numerous learning algorithms have been developed with the goal of iteratively converging to a Nash equilibrium. Unfortunately, the dynamics generated by the learning process can be very intricate and instances of training failure hard to interpret. In this paper we show that, in a strong sense, this dynamic complexity is inherent to games. Specifically, we prove that replicator dynamics, the continuous-time analogue of Multiplicative Weights Update, even when applied in a very restricted class of games -- known as finite matrix games -- is rich enough to be able to approximate arbitrary dynamical systems. Our results are positive in the sense that they show the nearly boundless dynamic modelling capabilities of current machine learning practices, but also negative in implying that these capabilities may come at the cost of interpretability. As a concrete example, we show how replicator dynamics can effectively reproduce the well-known strange attractor of Lonrenz dynamics (the "butterfly effect") while achieving no regret.
Winner-take-all competitions in forecasting and machine-learning suffer from distorted incentives. Witkowskiet al. identified this problem and proposed ELF, a truthful mechanism to select a winner. We show that, from a pool of n forecasters, ELF requires Θ(nlogn) events or test data points to select a near-optimal forecaster with high probability. We then show that standard online learning algorithms select an ϵ-optimal forecaster using only O(log(n)/ϵ2) events, by way of a strong approximate-truthfulness guarantee. This bound matches the best possible even in the nonstrategic setting. We then apply these mechanisms to obtain the first no-regret guarantee for non-myopic strategic experts.
Given a prediction task, understanding when one can and cannot design a consistent convex surrogate loss, particularly a low-dimensional one, is an important and active area of machine learning research. The prediction task may be given as a target loss, as in classification and structured prediction, or simply as a (conditional) statistic of the data, as in risk measure estimation. These two scenarios typically involve different techniques for designing and analyzing surrogate losses. We unify these settings using tools from property elicitation, and give a general lower bound on prediction dimension. Our lower bound tightens existing results in the case of discrete predictions, showing that previous calibration-based bounds can largely be recovered via property elicitation. For continuous estimation, our lower bound resolves on open problem on estimating measures of risk and uncertainty.
We introduce a theoretical framework of elicitability and identifiability of set-valued functionals, such as quantiles, prediction intervals, and systemic risk measures. A functional is elicitable if it is the unique minimiser of an expected scoring function, and identifiable if it is the unique zero of an expected identification function; both notions are essential for forecast ranking and validation, and M- and Z-estimation. Our framework distinguishes between exhaustive forecasts, being set-valued and aiming at correctly specifying the entire functional, and selective forecasts, content with solely specifying a single point in the correct functional. We establish a mutual exclusivity result: A set-valued functional can be either selectively elicitable or exhaustively elicitable or not elicitable at all. Notably, since quantiles are well known to be selectively elicitable, they fail to be exhaustively elicitable. We further show that the class of prediction intervals and Vorob'ev quantiles turn out to be exhaustively elicitable and selectively identifiable. In particular, we provide a mixture representation of elementary exhaustive scores, leading the way to Murphy diagrams. We give possibility and impossibility results for the shortest prediction interval and prediction intervals specified by an endpoint or a midpoint. We end with a comprehensive literature review on common practice in forecast evaluation of set-valued functionals.
We consider several computational problems related to conjugacy between subshifts of finite type, restricted to k-block codes: verifying a proposed k-block conjugacy, deciding if two shifts admit a k-block conjugacy, and reducing the representation size of a shift via a k-block conjugacy. We give a polynomial-time algorithm for verification, and show GI- and NP-hardness for deciding conjugacy and reducing representation size, respectively. Our approach focuses on 1-block conjugacies between vertex shifts, from which we generalize to k-block conjugacies and to edge shifts. We conclude with several open problems.
A property, or statistical functional, is said to be elicitable if it minimizes expected loss for some loss function. The study of which properties are elicitable sheds light on the capabilities and limitations of point estimation and empirical risk minimization. While recent work asks which properties are elicitable, we instead advocate for a more nuanced question: how many dimensions are required to indirectly elicit a given property? This number is called the elicitation complexity of the property. We lay the foundation for a general theory of elicitation complexity, including several basic results about how elicitation complexity behaves, and the complexity of standard properties of interest. Building on this foundation, our main result gives tight complexity bounds for the broad class of Bayes risks. We apply these results to several properties of interest, including variance, entropy, norms, and several classes of financial risk measures. We conclude with discussion and open directions.
One way to define the randomness of a fixed individual sequence is to ask how hard it is to predict relative to a given loss function. A sequence is memoryless if, with respect to average loss, no continuous function can predict the next entry of the sequence from a finite window of previous entries better than a constant prediction. For squared loss, memoryless sequences are known to have stochastic attributes analogous to those of truly random sequences. In this paper, we address the question of how changing the loss function changes the set of memoryless sequences, and in particular, the stochastic attributes they possess. For convex differentiable losses we establish that the statistic or property elicited by the loss determines the identity and stochastic attributes of the corresponding memoryless sequences. We generalize these results to convex non-differentiable losses, under additional assumptions, and to non-convex Bregman divergences. In particular, our results show that any Bregman divergence has the same set of memoryless sequences as squared loss. We apply our results to price calibration in prediction markets.
A common technique in supervised learning with discrete losses, such as 0-1 loss, is to optimize a convex surrogate loss over R d , calibrated with respect to the original loss. In particular, recent work has investigated embedding the original predictions (e.g. labels) as points in R^d , showing an equivalence to using polyhedral surrogates. In this work, we study the notion of the embedding dimension of a given discrete loss: the minimum dimension d such that an embedding exists. We characterize d-embeddability for all d, with a particularly tight characterization for d = 1 (embedding into the real line), and useful necessary conditions for d > 1 in the form of a quadratic feasibility program. We illustrate our results with novel lower bounds for abstain loss.
We formalize and study the natural approach of designing convex surrogate loss functions via embeddings for problems such as classification or ranking. In this approach, one embeds each of the finitely many predictions (e.g. classes) as a point in Rd, assigns the original loss values to these points, and convexifies the loss in between to obtain a surrogate. We prove that this approach is equivalent, in a strong sense, to working with polyhedral (piecewise linear convex) losses. Moreover, given any polyhedral loss L, we give a construction of a link function through which $L$ is a consistent surrogate for the loss it embeds. We go on to illustrate the power of this embedding framework with succinct proofs of consistency or inconsistency of various polyhedral surrogates in the literature.
We extend and demonstrate the applicability of computational Conley index techniques for computing symbolic dynamics and corresponding lower bounds on topological entropy for discrete-time systems governed by maps. In particular, we describe an algorithm that uses Conley index information to construct sofic shifts that are topologically semi-conjugate to the system under study. As illustration, we present results for the two-dimensional Hénon map, the three-dimensional LPA map, and the infinite-dimensional Kot-Schaffer map. This approach significantly builds on methods first presented in [DFT08] and is related to work in [Kwa00, Kwa04].
A state amalgamation of a directed graph is a node contraction which is only permitted under certain configurations of incident edges. In symbolic dynamics, state amalgamation and its inverse operation, state splitting, play a fundamental role in the the theory of subshifts of finite type (SFT): any conjugacy between SFTs, given as vertex shifts, can be expressed as a sequence of symbol splittings followed by a sequence of symbol amalgamations. It is not known whether determining conjugacy between SFTs is decidable.
We focus on conjugacy via amalgamations alone, and consider the simpler problem of deciding whether one can perform k consecutive amalgamations from a given graph. This problem also arises when using symbolic dynamics to study continuous maps, where one seeks to coarsen a Markov partition in order to simplify it. We show that this state amalgamation problem is NP-complete by reduction from the hitting set problem, thus giving further evidence that classifying SFTs up to conjugacy may be undecidable.
Scoring functions are commonly used to evaluate a point forecast of a particular statistical functional. This scoring function should be calibrated, meaning the correct value of the functional is the Bayes optimal forecast, in which case we say the scoring function elicits the functional. Heinrich (2014) shows that the mode functional is not elicitable. In this work, we ask whether it is at least possible to indirectly elicit the mode, wherein one elicits a low-dimensional functional from which the mode can be computed. We show that this cannot be done: neither the mode nor modal interval are indirectly elicitable with respect to the class of identifiable functionals.
Given a data set of (x,y) pairs, a common learning task is to fit a model predicting y (a label or dependent variable) conditioned on x. This paper considers the similar but much less-understood problem of modeling "higher-order" statistics of y's distribution conditioned on x. Such statistics are often challenging to estimate using traditional empirical risk minimization (ERM) approaches. We develop and theoretically analyze an ERM-like approach with multi-observation loss functions. We propose four algorithms formalizing the concept of empirical risk minimization for this problem, two of which have statistical guarantees in settings allowing both slow and fast convergence rates, but which are out-performed empirically by the other two. Empirical results illustrate potential practicality of these algorithms in low dimensions and significant improvement over standard approaches in some settings.
Power diagrams, a type of weighted Voronoi diagrams, have many applications throughout operations research. We study the problem of power diagram detection: determining whether a given finite partition of Rd takes the form of a power diagram. This detection problem is particularly prevalent in the field of information elicitation, where one wishes to design contracts to incentivize self-minded agents to provide honest information. We devise a simple linear program to decide whether a polyhedral cell decomposition can be described as a power diagram. Further, we discuss applications to property elicitation, peer prediction, and mechanism design, where this question arises. Our model is able to efficiently decide the question for decompositions of Rd or of a restricted domain in Rd. The approach is based on the use of an alternative representation of power diagrams, and invariance of a power diagram under uniform scaling of the parameters in this representation.
Recent work has shown that we can use partial verification instead of money to implement truthful mechanisms. In this paper we develop tools to answer the following question. Given an allocation rule that can be made truthful with payments, what is the minimal verification needed to make it truthful without them? Our techniques leverage the geometric relationship between the type space and the set of possible allocations.
A property or statistic of a distribution is said to be elicitable if it can be expressed as the minimizer of some loss function in expectation. Recent work shows that continuous real-valued properties are elicitable if and only if they are identifiable, meaning the set of distributions with the same property value can be described by linear constraints. From a practical standpoint, one may ask for which such properties do there exist convex loss functions. In this paper, in a finite-outcome setting, we show that in fact every elicitable real-valued property can be elicited by a convex loss function. Our proof is constructive, and leads to convex loss functions for new properties.
Prior work has investigated variations of prediction markets that preserve participants' (differential) privacy, which formed the basis of useful mechanisms for purchasing data for machine learning objectives. Such markets required potentially unlimited financial subsidy, however, making them impractical. In this work, we design an adaptively-growing prediction market with a bounded financial subsidy, while achieving privacy, incentives to produce accurate predictions, and precision in the sense that market prices are not heavily impacted by the added privacy-preserving noise. We briefly discuss how our mechanism can extend to the data-purchasing setting, and its relationship to traditional learning algorithms.
Prediction markets are well-studied in the case where predictions are probabilities or expectations of future random variables. In 2008, Lambert, et al. proposed a generalization, which we call "scoring rule markets" (SRMs), in which traders predict the value of arbitrary statistics of the random variables, provided these statistics can be elicited by a scoring rule. Surprisingly, despite active recent work on prediction markets, there has not yet been any investigation into more general SRMs. To initiate such a study, we ask the following question: in what sense are SRMs "markets"? We classify SRMs according to several axioms that capture potentially desirable qualities of a market, such as the ability to freely exchange goods (contracts) for money. Not all SRMs satisfy our axioms: once a contract is purchased in any market for prediction the median of some variable, there will not necessarily be any way to sell that contract back, even in a very weak sense. Our main result is a characterization showing that slight generalizations of cost-function-based markets are the only markets to satisfy all of our axioms for finite-outcome random variables. Nonetheless, we find that several SRMs satisfy weaker versions of our axioms, including a novel share-based market mechanism for ratios of expected values.
Minimal peer prediction mechanisms truthfully elicit private information (e.g., opinions or experiences) from rational agents without the requirement that ground truth is eventually revealed. In this paper, we use a geometric perspective to prove that minimal peer prediction mechanisms are equivalent to power diagrams, a type of weighted Voronoi diagram. Using this characterization and results from computational geometry, we show that many of the mechanisms in the literature are unique up to affine transformations. We also show that classical peer prediction is "complete" in that every minimal mechanism can be written as a classical peer prediction mechanism for some scoring rule. Finally, we use our geometric characterization to develop a general method for constructing new truthful mechanisms, and we show how to optimize for the mechanisms’ effort incentives and robustness.
One way to define the "randomness" of a fixed individual sequence is to ask how hard it is to predict. When prediction error is measured via squared loss, it has been established that memoryless sequences (which are, in a precise sense, hard to predict) have some of the stochastic attributes of truly random sequences. In this paper, we ask how changing the loss function used changes the set of memoryless sequences, and in particular, the stochastic attributes they possess. We answer this question using tools from property elicitation, showing that the property elicited by the loss function determines the stochastic attributes of the corresponding memoryless sequences. We apply our results to the question of price calibration in prediction markets.
We study loss functions that measure the accuracy of a prediction based on multiple data points simultaneously. To our knowledge, such loss functions have not been studied before in the area of property elicitation or in machine learning more broadly. As compared to traditional loss functions that take only a single data point, these multi-observation loss functions can in some cases drastically reduce the dimensionality of the hypothesis required. In elicitation, this corresponds to requiring many fewer reports; in empirical risk minimization, it corresponds to algorithms on a hypothesis space of much smaller dimension. We explore some examples of the tradeoff between dimensionality and number of observations, give some geometric characterizations and intuition for relating loss functions and the properties that they elicit, and discuss some implications for both elicitation and machine-learning contexts.
Models for collecting and aggregating categorical data on crowdsourcing platforms typically fall into two broad categories: those assuming agents honest and consistent but with heterogeneous error rates, and those assuming agents strategic and seek to maximize their expected reward. The former often leads to tractable aggregation of elicited data, while the latter usually focuses on optimal elicitation and does not consider aggregation. In this paper, we develop a Bayesian model, wherein agents have differing quality of information, but also respond to incentives. Our model generalizes both categories and enables the joint exploration of optimal elicitation and aggregation. This model enables our exploration, both analytically and experimentally, of optimal aggregation of categorical data and optimal multiple-choice interface design.
We study the problem of designing optimal auctions under restrictions on the set of permissible allocations. In addition to allowing us to restrict to deterministic mechanisms, we can also indirectly model non-additive valuations. We prove a strong duality result, extending a result due to Daskalakis, Deckelbaum, and Tzamos (2015), that guarantees the existence of a certificate of optimality for optimal restricted mechanisms. As a corollary of our result, we provide a new characterization of the set of allocations that the optimal mechanism may actually use. To illustrate our result we find and certify optimal mechanisms for four settings where previous frameworks do not apply, and provide new economic intuition about some of the tools that have previously been used to find optimal mechanisms.
The problem of peer prediction is to elicit information from agents in settings without any objective ground truth against which to score reports. Peer prediction mechanisms seek to exploit correlations between signals to align incentives with truthful reports. A long-standing concern has been the possibility of uninformative equilibria. For binary signals, a multi-task output agreement (OA) mechanism due to Dasgupta and Ghosh achieves strong truthfulness, so that the truthful equilibrium strictly maximizes payoff. In this paper, we first characterize conditions on the signal distribution for which the OA mechanism remains strongly-truthful with non-binary signals. Our analysis also yields a greatly simplified proof of the earlier, binary-signal strong truthfulness result. We then introduce the multi-task CA mechanism, which extends the OA mechanism to multiple signals and provides informed truthfulness: no strategy provides more payoff in equilibrium than truthful reporting, and truthful reporting is strictly better than any uninformed strategy (where an agent avoids the effort of obtaining a signal). The CA mechanism is also maximally strongly truthful, in that no mechanism in a broader class of mechanisms is strongly truthful on more signal distributions. We then present a detail-free version of the CA mechanism that works without any knowledge requirements on the designer while retaining epsilon-informed truthfulness.
The study of property elicitation is gaining ground in statistics and machine learning as a way to view and reason about the expressive power of emiprical risk minimization (ERM). Yet beyond a widening frontier of special cases, the two most fundamental questions in this area remain open: which statistics are elicitable (computable via ERM), and which loss functions elicit them? Moreover, recent work suggests a complementary line of questioning: given a statistic, how many ERM parameters are needed to compute it? We give concrete instantiations of these important questions, which have numerous applications to machine learning and related fields.
Peer prediction is the problem of eliciting private, but correlated, information from agents. By rewarding an agent for the amount that their report "predicts" that of another agent, mechanisms can promote effort and truthful reports. A common concern in peer prediction is the multiplicity of equilibria, perhaps including high-payoff equilibria that reveal no information. Rather than assume agents counterspeculate and compute an equilibrium, we adopt replicator dynamics as a model for population learning. We take the size of the basin of attraction of the truthful equilibrium as a proxy for the robustness of truthful play. We study different mechanism designs, using models estimated from real peer evaluations in several massive online courses. Among other observations, we confirm that recent mechanisms present a significant improvement in robustness over earlier approaches.
Minimal peer prediction mechanisms truthfully elicit private information (e.g., opinions or experiences) from rational agents without the requirement that ground truth is eventually revealed. In this paper, we use a geometric perspective to prove that minimal peer prediction mechanisms are equivalent to power diagrams, a type of weighted Voronoi diagram. Using this characterization and results from computational geometry, we show that many of the mechanisms in the literature are unique up to affine transformations, and introduce a general method to construct new truthful mechanisms.
We propose a mechanism for purchasing information from a sequence of participants. The participants may simply hold data points they wish to sell, or may have more sophisticated information; either way, they are incentivized to participate as long as they believe their data points are representative or their information will improve the mechanism's future prediction on a test set. The mechanism, which draws on the principles of prediction markets, has a bounded budget and minimizes generalization error for Bregman divergence loss functions. We then show how to modify this mechanism to preserve the privacy of participants information: At any given time, the current prices and predictions of the mechanism reveal almost no information about any one participant, yet in total over all participants, information is accurately aggregated.
Elicitation is the study of statistics or properties which are computable via empirical risk minimization. While several recent papers have approached the general question of which properties are elicitable, we suggest that this is the wrong question---all properties are elicitable by first eliciting the entire distribution or data set, and thus the important question is how elicitable. Specifically, what is the minimum number of regression parameters needed to compute the property?
Building on previous work, we introduce a new notion of elicitation complexity and lay the foundations for a calculus of elicitation. We establish several general results and techniques for proving upper and lower bounds on elicitation complexity. These results provide tight bounds for eliciting the Bayes risk of any loss, a large class of properties which includes spectral risk measures and several new properties of interest.
Prediction markets are economic mechanisms for aggregating information about future events through sequential interactions with traders. The pricing mechanisms in these markets are known to be related to optimization algorithms in machine learning and through these connections we have some understanding of how equilibrium market prices relate to the beliefs of the traders in a market. However, little is known about rates and guarantees for the convergence of these sequential mechanisms, and two recent papers cite this as an important open question.
In this paper we show how some previously studied prediction market trading models can be understood as a natural generalization of randomized coordinate descent which we call randomized subspace descent (RSD). We establish convergence rates for RSD and leverage them to prove rates for the two prediction market models above, answering the open questions. Our results extend beyond standard centralized markets to arbitrary trade networks.
The elicitation of a statistic, or property of a distribution, is the task of devising proper scoring rules, equivalently proper losses, which incentivize an agent or algorithm to truthfully estimate the desired property of the underlying probability distribution.
Exploiting the connection of such elicitation to convex analysis, we address the vector-valued property case, which has received relatively little attention in the literature but is relevant for machine learning applications. We first provide a very general characterization of linear and ratio-of-linear properties, which resolves an open problem by unifying several previous characterizations in machine learning and statistics. We then ask which vectors of properties admit nonseparable scores, which cannot be expressed as a sum of scores for each coordinate separately, a natural desideratum for machine learning. We show that linear and ratio-of-linear do admit nonseparable scores, and provide evidence for the conjecture that these are the only such properties (up to link functions). Finally, we provide a general method for producing identification functions and address an open problem by showing that convex maximal level sets are insufficient for elicitability in general.
Mixability is a property of a loss which characterizes when fast convergence is possible in the game of prediction with expert advice. We show that a key property of mixability generalizes, and the exp and log operations present in the usual theory are not as special as one might have thought. In doing this we introduce a more general notion of Φ-mixability where Φ is a general entropy (i.e., any convex function on probabilities). We show how a property shared by the convex dual of any such entropy yields a natural algorithm (the minimizer of a regret bound) which, analogous to the classical aggregating algorithm, is guaranteed a constant regret when used with Φ-mixable losses. We characterize precisely which Φ have Φ-mixable losses and put forward a number of conjectures about the optimality and relationships between different choices of entropy.
We study the problem of eliciting and aggregating probabilistic information from multiple agents. In order to successfully aggregate the predictions of agents, the principal needs to elicit some notion of confidence from agents, capturing how much experience or knowledge led to their predictions. To formalize this, we consider a principal who wishes to elicit predictions about a random variable from a group of Bayesian agents, each of whom have privately observed some independent samples of the random variable, and hopes to aggregate the predictions as if she had directly observed the samples of all agents. Leveraging techniques from Bayesian statistics, we represent confidence as the number of samples an agent has observed, which is quantified by a hyperparameter from a conjugate family of prior distributions. This then allows us to show that if the principal has access to a few samples, she can achieve her aggregation goal by eliciting predictions from agents using proper scoring rules. In particular, if she has access to one sample, she can successfully aggregate the agents' predictions if and only if every posterior predictive distribution corresponds to a unique value of the hyperparameter. Furthermore, this uniqueness holds for many common distributions of interest. When this uniqueness property does not hold, we construct a novel and intuitive mechanism where a principal with two samples can elicit and optimally aggregate the agents' predictions.
In this note we elaborate on an emerging connection between three areas of research: (a) the concept of a risk measure developed within financial mathematics for reasoning about risk attitudes of agents under uncertainty, (b) the design of automated market makers for prediction markets, and (c) the family of probability distributions known as exponential families.
We present a model of truthful elicitation which generalizes and extends mechanisms, scoring rules, and a number of related settings that do not quite qualify as one or the other. Our main result is a characterization theorem, yielding characterizations for all of these settings, including a new characterization of scoring rules for non-convex sets of distributions. We generalize this model to eliciting some property of the agent's private information, and provide the first general characterization for this setting. We also show how this yields a new proof of a result in mechanism design due to Saks and Yu.
We develop a generalization of randomized coordinate descent for smooth convex problems, where the coordinates specify arbitrary subspaces, and derive standard O(1/epsilon) and O(1/log epsilon) rates. For the special case of overlapping 2-block subspaces (i.e. graphs), which has received attention in the literature recently, we derive a convergence rate on a given graph in terms of its algebraic connectivity. Using this connection, we introduce bounds for graph topologies not previously considered. We conclude with preliminary progress toward the interesting open question: what is the best network structure for a given optimization problem?
We study information elicitation in cost-function-based combinatorial prediction markets when the market maker’s utility for information decreases over time. In the sudden revelation setting, it is known that some piece of information will be revealed to traders, and the market maker wishes to prevent guaranteed profits for trading on the sure information. In the gradual decrease setting, the market maker’s utility for (partial) information decreases continuously over time. We design adaptive cost functions for both settings which: (1) preserve the information previously gathered in the market; (2) eliminate (or diminish) rewards to traders for the publicly revealed information; (3) leave the reward structure unaffected for other information; and (4) maintain the market maker’s worst-case loss. Our constructions utilize mixed Bregman divergence, which matches our notion of utility for information.
We introduce a framework for automated market making for prediction markets, the volume parameterized market (VPM), in which securities are priced based on the market maker's current liabilities as well as the total volume of trade in the market. We provide a set of mathematical tools that can be used to analyze markets in this framework, and show that many existing market makers (including cost-function based markets, profit-charging markets, and buy-only markets) all fall into this framework as special cases. Using the framework, we design a new market maker, the perspective market, that satisfies four desirable properties (worst-case loss, no arbitrage, increasing liquidity, and shrinking spread) in the complex market setting, but fails to satisfy information incorporation. However, we show that the sacrifice of information incorporation is unavoidable: we prove an impossibility result showing that any market maker that prices securities based only on the trade history cannot satisfy all five properties simultaneously. Instead, we show that perspective markets may satisfy a weaker notion that we call center-price information incorporation.
Combining two existing rigorous computational methods, for verifying hyperbolicity [Arai, 2007] and for computing topological entropy bounds [Day et al., 2008], we prove lower bounds on topological entropy for 43 hyperbolic plateaus of the Hénon map. We also examine the 16 area-preserving plateaus studied by Arai and compare our results with related work. Along the way, we augment the entropy algorithms of Day et al. with routines to optimize the algorithmic parameters and simplify the resulting semi-conjugate subshift.
We present an approach to maximum entropy models that highlights the convex geometry and duality of Generalized Exponential Families (GEFs) and their connection to Bregman divergences. Using our framework, we are able to resolve a puzzling aspect of the bijection of Banerjee et al. (2005) between classical exponential families and what they call regular Bregman divergences. Their regularity condition rules out all but Bregman divergences generated from log-convex generators. We recover their bijection and show that a much broader class of divergences correspond to GEFs via two key observations: 1) Like classical exponential families, GEFs have a ``cumulant’’ C whose subdifferential contains the mean; 2) Generalized relative entropy is a C-Bregman divergence between parameters, which becomes the KL divergence for Shannon entropy. We also show that every incomplete market with cost function C (see Abernethy et al. (2011)) can be expressed as a complete market, where the prices are constrained to be a GEF with cumulant C. This provides an entirely new interpretation of prediction markets, relating their design back to the principle of maximum entropy.
We consider a popular problem in finance, option pricing, through the lens of an online learning game between Nature and an Investor. In the Black-Scholes option pricing model from 1973, the Investor can continuously hedge the risk of an option by trading the underlying asset, assuming that the asset's price fluctuates according to Geometric Brownian Motion (GBM). We consider a worst-case model, in which Nature chooses a sequence of price fluctuations under a cumulative quadratic volatility constraint, and the Investor can make a sequence of hedging decisions. Our main result is to show that the value of our proposed game, which is the regret of the hedging strategy, converges to the Black-Scholes option price. We use significantly weaker assumptions than previous work---for instance, we allow large jumps in the asset price---and show that the Black-Scholes hedging strategy is near-optimal for the Investor even in this non-stochastic framework.
Ever since the Internet opened the floodgates to millions of users, each looking after their own interests, modern algorithm design has needed to be increasingly robust to strategic manipulation. Often, it is the inputs to these algorithms which are provided by strategic agents, who may lie for their own benefit, necessitating the design of algorithms which incentivize the truthful revelation of private information -- but how can this be done? This is a fundamental question with answers from many disciplines, from mechanism design to scoring rules and prediction markets. Each domain has its own model with its own assumptions, yet all seek schemes to gather private information from selfish agents, by leveraging incentives. Together, we refer to such models as elicitation.
This dissertation unifies and advances the theory of incentivized information elicitation. Using tools from convex analysis, we introduce a new model of elicitation with a matching characterization theorem which together encompass mechanism design, scoring rules, prediction markets, and other models. This lays a firm foundation on which the rest of the dissertation is built.
It is natural to consider a setting where agents report some alternate representation of their private information, called a property, rather than reporting it directly. We extend our model and characterization to this setting, revealing even deeper ties to convex analysis and convex duality, and we use these connections to prove new results for linear, smooth nonlinear, and finite-valued properties. Exploring the linear case further, we show a new four-fold equivalence between scoring rules, prediction markets, Bregman divergences, and generalized exponential families.
Applied to mechanism design, our framework offers a new perspective. By focusing on the (convex) consumer surplus function, we simplify a number of existing results, from the classic revenue equivalence theorem, to more recent characterizations of mechanism implementability.
Finally, we follow a line of research on the interpretation of prediction markets, relating a new stochastic framework to the classic Walrasian equilibrium and to stochastic mirror descent, thereby strengthening ties between prediction markets and machine learning.
We describe a new, simplifed, and general analysis of a fusion of Nesterov's accelerated gradient with parallel coordinate descent. The resulting algorithm, which we call BOOM, for boosting with momentum, enjoys the merits of both techniques. Namely, BOOM retains the momentum and convergence properties of the accelerated gradient method while taking into account the curvature of the objective function. We describe an distributed implementation of BOOM which is suitable for massive high dimensional datasets. We show experimentally that BOOM is especially effective in large scale learning problems with rare yet informative features.
We strengthen recent connections between prediction markets and learning by showing that a natural class of market makers can be understood as performing stochastic mirror descent when trader demands are sequentially drawn from a fixed distribution. This provides new insights into how market prices (and price paths) may be interpreted as a summary of the market's belief distribution by relating them to the optimization problem being solved. In particular, we show that under certain conditions the stationary point of the stochastic process of prices generated by the market is equal to the market's Walrasian equilibrium of classic market analysis. Together, these results suggest how traditional market making mechanisms might be replaced with general purpose learning algorithms while still retaining guarantees about their behaviour.
We consider the design of proper scoring rules, equivalently proper losses, when the goal is to elicit some function, known as a property, of the underlying distribution. We provide a full characterization of the class of proper scoring rules when the property is linear as a function of the input distribution. A key conclusion is that any such scoring rule can be written in the form of a Bregman divergence for some convex function. We also apply our results to the design of prediction market mechanisms, showing a strong equivalence between scoring rules for linear properties and automated prediction market makers.
Option contracts are a type of financial derivative that allow investors to hedge risk and speculate on the variation of an asset's future market price. In short, an option has a particular payout that is based on the market price for an asset on a given date in the future. In 1973, Black and Scholes proposed a valuation model for options that essentially estimates the tail risk of the asset price under the assumption that the price will fluctuate according to geometric Brownian motion. More recently, DeMarzo et al., among others, have proposed more robust valuation schemes, where we can even assume an adversary chooses the price fluctuations. This framework can be considered as a sequential two-player zero-sum game between the investor and Nature. We analyze the value of this game in the limit, where the investor can trade at smaller and smaller time intervals. Under weak assumptions on the actions of Nature (an adversary), we show that the minimax option price asymptotically approaches exactly the Black-Scholes valuation. The key piece of our analysis is showing that Nature's minimax optimal dual strategy converges to geometric Brownian motion in the limit.
We present new methods of automating the construction of index pairs, essential ingredients of discrete Conley index theory. These new algorithms are further steps in the direction of automating computer-assisted proofs of semi-conjugacies from a map on a manifold to a subshift of finite type. We apply these new algorithms to the standard map at different values of the perturbative parameter ε and obtain rigorous lower bounds for its topological entropy for ε in [.7, 2].
We study a model of learning on social networks in dynamic environments, describing a group of agents who are each trying to estimate an underlying state that varies over time, given access to weak signals and the estimates of their social network neighbors.
We study three models of agent behavior. In the "fixed response" model agents use a fixed linear combination to incorporate information from their peers into their own estimate. This can be thought of as an extension of the DeGroot model to a dynamic setting. In the "best response" model players calculate minimum variance linear estimators of the underlying state.
We show that regardless of the initial configuration, fixed response dynamics converge to a steady state, and that the same holds for best response on the complete graph. We show that best response dynamics can, in the long term, lead to estimators with higher variance than is achievable using well chosen fixed responses.
The "penultimate prediction" model is an elaboration of the best response model. While this model only slightly complicates the computations required of the agents, we show that in some cases it greatly increases the efficiency of learning, and on the complete graphs is in fact optimal, in a strong sense.
Machine Learning competitions such as the Netflix Prize have proven reasonably successful as a method of "crowdsourcing" prediction tasks. But these competitions have a number of weaknesses, particularly in the incentive structure they create for the participants. We propose a new approach, called a Crowdsourced Learning Mechanism, in which participants collaboratively "learn" a hypothesis for a given prediction task. The approach draws heavily from the concept of a prediction market, where traders bet on the likelihood of a future event. In our framework, the mechanism continues to publish the current hypothesis, and participants can modify this hypothesis by wagering on an update. The critical incentive property is that a participant profits exactly how much her update improves the performance of the hypothesis on a released test set.
Can learning algorithms find a Nash equilibrium? This is a natural question for several reasons. Learning algorithms resemble the behavior of players in many naturally arising games, and thus results on the convergence or non-convergence properties of such dynamics may inform our understanding of the applicability of Nash equilibria as a plausible solution concept in some settings. A second reason for asking this question is in the hope of being able to prove an impossibility result, not dependent on complexity assumptions, for computing Nash equilibria via a restricted class of reasonable algorithms. In this work, we begin to answer this question by considering the dynamics of the standard multiplicative weights update learning algorithms (which are known to converge to a Nash equilibrium for zero-sum games). We revisit a 3x3 game defined by Shapley in the 1950s in order to establish that fictitious play does not converge in general games. For this simple game, we show via a potential function argument that in a variety of settings the multiplicative updates algorithm impressively fails to find the unique Nash equilibrium, in that the cumulative distributions of players produced by learning dynamics actually drift away from the equilibrium.
The aim of this paper is to introduce a method for computing rigorous lower bounds for topological entropy. The topological entropy of a dynamical system measures the number of trajectories that separate in finite time and quantifies the complexity of the system. Our method relies on extending existing computational Conley index techniques for constructing semiconjugate symbolic dynamical systems. Besides offering a description of the dynamics, the constructed symbol system allows for the computation of a lower bound for the topological entropy of the original system. Our overall goal is to construct symbolic dynamics that yield a high lower bound for entropy. The method described in this paper is algorithmic and, although it is computational, yields mathematically rigorous results. For illustration, we apply the method to the Hénon map, wherne we compute a rigorous lower bound of 0.4320 for topological entropy.
The goal of this paper is to find a competitive algorithm for the following problem. We are given a hypergraph G = (X, E), |E| = n, with weighted edges and maximum edge size k. We wish to maximize the weight of a matching M ⊆ E of G, given that edges are revealed in a random order, and that when an edge e is revealed the algorithm must decide irrevocably whether or not e ∈ M. Note that this is a generalization of the classic secretary problem.
In this paper, we classify and describe a method for constructing fractal trees in three dimensions. We explore certain aspects of these trees, such as space-filling and self-contact.