Basic Models of Learning

How do organizations offer appropriate rewards in a timely fashion?
Learning may be defined, for our purposes, as a relatively permanent change in behavior that occurs as a result of experience. That is, a person is said to have learned something when she consistently exhibits a new behavior over time. Several aspects of this definition are noteworthy.

We can best understand the learning process by looking at four stages in the development of research on learning (see Exhibit 4.2). Scientific interest in learning dates from the early experiments of Pavlov and others around the turn of the century. The focus of this research was on stimulus-response relationships and the environmental determinants of observable behaviors. This was followed by the discovery of the law of effect, experiments in operant conditioning, and, finally, the formulation of social learning theory.

Next, Pavlov paired the unconditioned stimulus (meat) with a neutral one (the ringing of a bell). Normally, the ringing of the bell by itself would not be expected to elicit salivation. However, over time, a learned linkage developed for the dog between the bell and meat, ultimately resulting in an S-R bond between the conditioned stimulus (the bell) and the response (salivation) without the presence of the unconditioned stimulus (the meat). Evidence emerged that learning had occurred and that this learning resulted from conditioning the dogs to associate two normally unrelated objects, the bell and the meat.

Although Pavlov’s experiments are widely cited as evidence of the existence of classical conditioning, it is necessary from the perspective of organizational behavior to ask how this process relates to people at work. Ivancevich, Szilagyi, and Wallace provide one such work-related example of classical conditioning:

An illustration of classical conditioning in a work setting would be an airplane pilot learning how to use a newly installed warning system. In this case the behavior to be learned is to respond to a warning light that indicates that the plane has dropped below a critical altitude on an assigned glide path. The proper response is to increase the plane’s altitude. The pilot already knows how to appropriately respond to the trainer’s warning to increase altitude (in this case we would say the trainer’s warning is an unconditioned stimulus and the corrective action of increasing altitude is an unconditioned response). The training session consists of the trainer warning the pilot to increase altitude every time the warning light goes on. Through repeated pairings of the warning light with the trainer’s warning, the pilot eventually learns to adjust the plane’s altitude in response to the warning light even though the trainer is not present. Again, the unit of learning is a new S-R connection, or habit.2

Although classical conditioning clearly has applications to work situations, particularly in the area of training and development, it has been criticized as explaining only a limited part of total human learning. Psychologist B. F. Skinner argues that classical conditioning focuses on respondent, or reflexive, behaviors; that is, it concentrates on explaining largely involuntary responses that result from stimuli.3 More complex learning cannot be explained solely by classical conditioning. As an alternative explanation, Skinner and others have proposed the operant conditioning model of learning.

Operant Conditioning
The major focus of operant conditioning is on the effects of reinforcements, or rewards, on desired behaviors. One of the first psychologists to examine such processes was J. B. Watson, a contemporary of Pavlov, who argued that behavior is largely influenced by the rewards one receives as a result of actions.4 This notion is best summarized in Thorndike’s law of effect. This law states that of several responses made to the same situation, those that are accompanied or closely followed by satisfaction (reinforcement) will be more likely to occur; those that are accompanied or closely followed by discomfort (punishment) will be less likely to occur.5

In other words, it posits that behavior that leads to positive or pleasurable outcomes tends to be repeated, whereas behavior that leads to negative outcomes or punishment tends to be avoided. In this manner, individuals learn appropriate, acceptable responses to their environment. If we repeatedly dock the pay of an employee who is habitually tardy, we would expect that employee to learn to arrive early enough to receive a full day’s pay.

A basic operant model of learning is presented in Exhibit 4.2. There are three important concepts of this model:

Drive. A drive is an internal state of disequilibrium; it is a felt need. It is generally believed that drive increases with the strength of deprivation. A drive, or desire, to learn must be present for learning to take place. For example, not currently being able to afford the house you want is likely to lead to a drive for more money to buy your desired house. Living in a run-down shack is likely to increase this drive compared to living in a nice apartment.

Habit. A habit is the experienced bond or connection between stimulus and response. For example, if a person learns over time that eating satisfies hunger, a strong stimulus-response (hunger-eating) bond will develop. Habits thus determine the behaviors, or courses of action, we choose.

Reinforcement or reward. This represents the feedback individuals receive as a result of action. For example, if as a salesperson you are given a bonus for greater sales and plan to use the money to buy the house you have always wanted, this will reinforce the behaviors that you believed led to greater sales, such as smiling at customers, repeating their name during the presentation, and so on.

A stimulus activates an individual’s motivation through its impact on drive and habit. The stronger the drive and habit (S-R bond), the stronger the motivation to behave in a certain way. As a result of this behavior, two things happen. First, the individual receives feedback that reduces the original drive. Second, the individual strengthens his or her belief in the veracity of the S-R bond to the extent that it proved successful. That is, if one’s response to the stimulus satisfied one’s drive or need, the individual would come to believe more strongly in the appropriateness of the particular S-R connection and would respond in the same way under similar circumstances.

An example will clarify this point. Several recent attempts to train chronically unemployed workers have used a daily pay system instead of weekly or monthly systems. The primary reason for this is that the workers, who do not have a history of working, can more quickly see the relationship between coming to work and receiving pay. An S-R bond develops more quickly because of the frequency of the reinforcement, or reward.

Operant versus Classical Conditioning
Operant conditioning can be distinguished from classical conditioning in at least two ways.6 First, the two approaches differ in what is believed to cause changes in behavior. In classical conditioning, changes in behavior are thought to arise through changes in stimuli—that is, a transfer from an unconditioned stimulus to a conditioned stimulus. In operant conditioning, on the other hand, changes in behavior are thought to result from the consequences of previous behavior. When behavior has not been rewarded or has been punished, we would not expect it to be repeated.

Second, the two approaches differ in the role and frequency of rewards. In classical conditioning, the unconditioned stimulus, acting as a sort of reward, is administered during every trial. In contrast, in operant conditioning the reward results only when individuals choose the correct response. That is, in operant conditioning, individuals must correctly operate on their environment before a reward is received. The response is instrumental in obtaining the desired reward.