How Early Rewards Influence Choice: Targeting model-free processing through reward timing
How Early Rewards Influence Choice: Targeting model-free processing through reward timing
By Diego Garaialde
Abstract
“While many people claim to have the intention to perform certain behaviours, it is commonly the case these intentions do not come to fruition. This issue is particularly pronounced in cases where there is a long delay between intention and the behaviour, or cases where there is a strong automatic impulse that acts against the intention. According to dual-process theories, this intention-behaviour gap is a result of a conflict between two types of systems: a habitual model-free system and a deliberate model-based system. Usually, interventions target the model-based system, providing important information necessary to convince individuals that the behaviour is desirable or beneficial. However, this approach mostly ignores the model-free system, leaving a large part of the decision-making process outside of the intervention. The early reward strategy is a method to target the model-free system directly and considers the known mechanisms behind how reward information is processed. In particular, it focuses on how reward timing affects decision making within a sequence of actions. Due to how temporal discounting and temporal difference learning lead to reductions in the value of the reward based on how far it is placed from the first action in the sequence, placing the reward as close to the start of the sequence as possible is likely to prevent this reduction from occurring as much as possible. This early reward strategy was tested across four experiments and was found to successfully alter behaviour in a way predicted by the theory. Two of the experiments focused on a computational approach, using reinforcement learning algorithms to predict behaviour and compare it against the participant responses. The other two experiments were conducted with a more applied approach that used tasks more representative of real-world action sequences to test the extent to which behaviour was affected by early rewards. Whether the reward was monetary or gamified, placing a reward earlier in a sequence improved the frequency of selection for that sequence significantly when compared to other reward placements. The results have important implications for anyone attempting to incentivise new behaviours by providing a theory-driven approach towards maximising the effectiveness of the reward, particularly to the model-free system. As a result, consideration for reward timing should be integral to any incentive system that involves sequences of actions, with a strong emphasis on providing rewards as early in the interaction as possible.”
Reference
Garaialde, D. (2021). How early rewards influence choice: Targeting model-free processing through reward timing. Retrieved May 12, 2022, from https://researchrepository.ucd.ie/handle/10197/12806
Keywords
Dual process theory, rewards, temporal discounting, temporal difference learning, research