ABSTRACT

Contents 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

11.1.1 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 11.2 Rate Adaptation Problem Formulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

11.2.1 System Description and TDMA Access Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 11.2.2 Action and Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

11.2.2.1 Transmission Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 11.2.2.2 Holding Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

11.2.3 Switching Control Game and Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . . . 254 11.2.4 Switching Control Markovian Game Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

11.3 Randomized Threshold Nash Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 11.3.1 Value Iteration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 11.3.2 Structural Result on Randomized Threshold Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 11.3.3 Proof of Theorem 11.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 11.3.4 Proof of Lemma 11.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 11.3.5 Justification of Assumptions in Theorem 11.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 11.3.6 Stochastic Approximation Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

11.4 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 11.4.1 General-Sum Constrained Game: Structure of the Nash . . . . . . . . . . . . . . . . . . . . . . 266 11.4.2 Zero-Sum Constrained Game: Structure of the Nash . . . . . . . . . . . . . . . . . . . . . . . . . . 266 11.4.3 Stochastic ApproximationAlgorithm for LearningNashEquilibriumPolicy . . . 266

11.5 Conclusions and Open Issues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

This chapter introduces a special type of game, namely, switching control Markovian dynamic game, and explores the structural result on its Nash equilibrium policy. The system it considers is an uplink time division multiple access (TDMA) cognitive radio network where multiple cognitive radios (secondary users) attempt to access a spectrum hole. It is assumed that each secondary user can access the channel according to a decentralized predefined access rule based on the channel qualities and the transmission delay of each secondary user. By modeling secondary user block-fading channel qualities as a finite state Markov chain, the transmission rate adaptation problem of each secondary user is formulated as a general-sum Markovian dynamic game with a delay constraint. It is shown that the Nash equilibrium transmission policy of each secondary user is a randomized mixture of pure threshold policies under certain conditions. A stochastic approximation algorithm can be applied based on the structural result on the Nash equilibrium policy. Such algorithm can adaptively estimate the Nash equilibrium policies and track such policies for nonstationary problems where the statistics of the channel and of the user parameters evolve with time.