The “Treatless Click”

I was at the post office the other day, when a man walked up and laid a package on the counter. He touched the artificial voice box at his throat and asked the clerk how much it would cost to mail the package. The clerk weighed the package and then asked “Would you prefer Express or Priority?” The man with the artificial voice box was slow in reaching up to his neck, and did not answer instantly. The clerk again said “Do you want Express or Priority.” The man touched his communicator and croaked out “Priority.” Over the next few minutes, the routine was repeated several times — the man taking extra time to speak because of the apparatus — the clerk repeated himself, automatically, even though he could see why the man hadn’t answered promptly. The clerk was offering an operant behavior, on a “Fixed Interval” schedule of positive reinforcement.

A simple definition of an operant behavior is, “a behavior determined by its consequences.” At a basic level, this requires that the animal must be able to change its behavior based on a single reinforcing event. Each reinforced response leads to a better version of the behavior, until the behavior becomes firmly established. This sensitivity to a single reinforcement is the foundation of successive approximation.

While paying attention to the consequences of a single action is important to survival, it is merely one component of the larger context of survival. The next level of responding to changing consequences is the ability to perceive trends, or “schedules” of reinforcement. If a particular prey species becomes scarce, a wolf must be able to sense that the aggregate reinforcement for hunting that species has changed. Entirely new behavior patterns, like hunting in a different territory, or choosing a different prey species are evoked by the change in the schedule of reinforcement.

In order to understand the various levels of adaptation to consequences, scientists record the rate of response of a single operant, such as lever pressing or key pecks. i.e. the behavior is the constant and the schedule of reinforcement is the variable. Once a single operant is established, the more complex issues of reinforcement schedules can be studied. While clicker trainers most often use a couple of simple schedules for the construction and maintenance of individual behaviors, a better understanding of the topic can yield significant rewards.

Types of Schedules:
The most simple way to deliver reinforcement is with a “Fixed” schedule. Fixed schedules are those that deliver a specific quantity of a reinforcer as a consequence of responding. The two types of fixed schedules are fixed rate and fixed interval. These are usually abbreviated as FR(x) or FI(x), where FR stands for “fixed rate”, FI stands for “fixed interval” and the (x) represents the number of responses necessary to “cause” a reinforcement or the amount of time that will elapse between reinforcements. For instance, FR(3) means that every three responses brings a reinforcement. FI(10-sec) means the reinforcer is applied every 10 seconds, regardless of the animal’s responding.

Another type of reinforcement schedule is labeled “variable”. As with fixed rates and times, variable rates and times are also abbreviated — with a slight difference. A VR(3) means that “on average” the animal is reinforced every third time it offers the response. This could mean two reinforcements in a row, or one reinforcement after 6 unreinforced behaviors.

The other type of variable schedule is a Variable Interval. This is also abbreviated, using an average of the amount of time that elapses in between reinforcements, regardless of the animal’s behavior. VI(10) means that on average, 10 seconds elapses in between reinforcements. Variable interval schedules are tricky to use unless the behavior is firmly established in the animal’s repertoire.

From Schedules To Cycles:
Without getting bogged down by all these abbreviations, here is the most important part of this information — in the early stages of shaping, rate of response is generally more important than topography. In other words, getting lots of responses is more important than which responses you get. A good way to think of this, is to try to set the pace of the shaping so that the animal is offering some type of response every few seconds. This insures that the rate of reinforcement will be high enough to sustain the minor frustrations that are a natural part of learning. If you can, try to get this cycle down to five seconds or less. Once the rate of responding is cyclic and quick, you can go to the next stage of the process.

When the animal is performing a behavior at a high rate, it’s time to get a little more complicated. If you suddenly change the rate or interval of reinforcement, the animal is going to know it. Ideally, this sudden change in consequences results in a corresponding change in behavior. For some animals, however, this shift in reinforcement causes a startle response. This is where the rapidly repeating cycle comes in handy. If you have created enough momentum with your cycle of responding, this startle will not have a chance to interrupt the action. Bingo! Before the dog can stop short, it has offered the next behavior — Click, then treat.

Schedule Change as a Cue:
It is obvious that animals are capable of changing their behavior in response to changes in rates and intervals of reinforcement. What may not be obvious is that as soon as the animal learns to sense schedule changes, the change from one schedule to another can be used as a discriminative stimulus (S^D). In essence, you can use the predictable change in schedule to tell the animal “What’s coming, next” and define it to your specifications. Over a series of training sessions, this use of the change in schedules as an S^D can take the animal to the next level of learning sophistication.

Example: Teach a puppy to bump an object with his nose. Reinforce the behavior on an FR(1). After about 30 repetitions, click the clicker without giving a treat. What’s likely to happen? The puppy will wait a moment and then toss an extra behavior at you. Click and treat for the second behavior and you have made the simple shift from FR(1) to FR(2), OR you can assume that it is the first repetition of a new VR(2) — or maybe something else entirely.

On the first instance of this pattern, any of these assumptions could be correct, but what happens when you make this a predictable pattern for the dog? After 20-30 training sessions, the click without a treat can become a discriminative stimulus for any of several things. You could use it to mean “you are close, but you need to refine the behavior a little.” Or, you could use the treatless click to mean “give me something different” — as Ogden Lindsley calls it, an S^D for variability. As with all operant behavior, what you reinforce is what you get — only the animal’s behavior can tell you which type of discrimination you have achieved.

In most general shaping, the goal is to build the rate of response high enough so that the behavior is occurring on a fairly short cycle. Once the animal is offering the behavior every few seconds, you can begin to apply other schedules to create different effects. Teaching your dog to recognize different schedules of reinforcement can lead to a new level of sophistication for both of you.

Notes:

Gary Wilkes' Real Clicker Training

Adherence to a flawed ideology resembles nothing so much as abject stupidity…GW

Leave a Reply Cancel reply