Consider a dynamic system based on a discrete set A {\displaystyle {\mathcal {A}}} of actions and a discrete set O {\displaystyle {\mathcal {O}}} of observations.3 A history h {\displaystyle h} is a sequence a 1 o 1 … a ℓ o ℓ {\displaystyle a_{1}o_{1}\dots a_{\ell }o_{\ell }} where a 1 , … , a ℓ {\displaystyle a_{1},\dots ,a_{\ell }} are the actions taken by the agent in that order and o 1 , … , o ℓ {\displaystyle o_{1},\dots ,o_{\ell }} are the observations made up by the agent. Let us write P ( a 1 o 1 … a ℓ o ℓ ) {\displaystyle P(a_{1}o_{1}\dots a_{\ell }o_{\ell })} to be the conditional probability of observing o 1 , … , o ℓ {\displaystyle o_{1},\dots ,o_{\ell }} given that the actions taken are a 1 , … , a ℓ {\displaystyle a_{1},\dots ,a_{\ell }} .
We now want to characterize a given hidden state reached after some history h {\displaystyle h} . To do that, we introduce the notion of test. A test t {\displaystyle t} is of the same type that a history: it is a sequence of action-observation pairs. The idea is now to consider a set of tests { t 1 , … , t n } {\displaystyle \{t_{1},\dots ,t_{n}\}} to full characterize a hidden state. To do that, we first define the probability of a test t {\displaystyle t} conditional on a history h {\displaystyle h} : P ( t ∣ h ) := P ( h t ) P ( h ) {\displaystyle P(t\mid h):={\frac {P(ht)}{P(h)}}} .
We now define the prediction vector p ( h ) = [ P ( t 1 ∣ h ) , … , P ( t n ∣ h ) ] {\displaystyle p(h)=[P(t_{1}\mid h),\dots ,P(t_{n}\mid h)]} . We say that p ( h ) {\displaystyle p(h)} is a predictive state representation (PSR) if and only if it forms a sufficient statistic for the system. In other words, p ( h ) {\displaystyle p(h)} is a predictive state representation (PSR) if and only if for all possible tests t {\displaystyle t} , there exists a function f t {\displaystyle f_{t}} such that for all histories h {\displaystyle h} , P ( t ∣ h ) = f t ( p ( h ) ) {\displaystyle P(t\mid h)=f_{t}(p(h))} .
The functions f t {\displaystyle f_{t}} are called projection functions. We say that the PSR is linear when the function f t {\displaystyle f_{t}} is linear for all possible tests t {\displaystyle t} . The main theorem proved in 4 is stated as follows.
Theorem. Consider a finite POMDP with k {\displaystyle k} states. Then there exists a linear PSR with a number of tests n {\displaystyle n} smaller that k {\displaystyle k} .
James, Michael R.; Singh, Satinder (2004). "Learning and discovery of predictive state representations in dynamical systems with reset". Twenty-first international conference on Machine learning - ICML '04. p. 53. CiteSeerX 10.1.1.67.5179. doi:10.1145/1015330.1015359. ISBN 978-1-58113-838-2. S2CID 9111832. 978-1-58113-838-2 ↩
Izadi, Masoumeh T.; Precup, Doina (9 August 2003). "A planning algorithm for predictive state representations". Proceedings of the 18th International Joint Conference on Artificial Intelligence. Ijcai'03: 1520–1521. https://dl.acm.org/doi/abs/10.5555/1630659.1630919 ↩
Littman, Michael; Sutton, Richard S (2001). "Predictive Representations of State". Advances in Neural Information Processing Systems. 14. MIT Press. https://papers.nips.cc/paper_files/paper/2001/hash/1e4d36177d71bbb3558e43af9577d70e-Abstract.html ↩