DRO – Differential Reinforcement of “Other” Behavior:

DRO stands for “differential reinforcement of ‘other’ behavior”. You can shorten it to “teach something different.” It is a widely recommended means of fixing behavior problems. For example, if a dog jumps up on people, teach it to lie down, instead. There are variations on DRO, such as DRI (teaching an incompatible behavior) and DRA (teaching an alternate behavior) In essence they all mean the same thing – teach the dog something new to replace an existing behavior. Despite the popularity of these tricks the reality is that they are not suggested because they are the most effective way to stop unacceptable behavior. They are suggested because they avoid considering punishment. This is often attributed to being the “scientific” way to solve behavior problems. It’s not.

In the history of behavior analysis there is a name that stands out for courage behind enemy lines. Nate Azrin, PhD, was one of the first scientists to study behavior without a bias toward reinforcement or punishment. He started his career trying to correct the preference for positive reinforcement already present in B.F. Skinner’s ideology as early as the 1950’s.

“B. F. Skinner’s published views on punishment were well known at the time of my arrival in September 1953 at Harvard University, which I attended for the sole purpose of studying under Skinner. He had coauthored some studies of punishment earlier with W.K. Estes, but had devoted virtually all of his other animal research to the study of positive reinforcement. He was opposed to the use of punishment to influence human behavior, a view strongly expressed in his books Science and Human Behavior (1953) and Walden Two (1948), and indeed shared generally by psychology at large.

My own view at the time was that the strong opinions and ethical views regarding punishment had prevented the serious study of that process to the same extent that was true of positive reinforcement. I believed that punishment deserved more study; more specifically I believed that such study should address some of the same factors, such as the schedule of presentations, as had been found by Skinner to be so important with positive reinforcement.” (Reflections and Comments: JEAB, 2002,77, 373–392 NUMBER 3)

It is hard to disagree with someone who wishes to balance the investigation of behavior into both reinforcement and punishment to create an objective study. Logically, real science doesn’t prefer one natural phenomenon over another. Physicists do not prefer momentum over inertia. By contrast, the bias described by Azrin is stronger now than in the 1950’s. Behavior analysts, the children of Skinner, hold an overwhelming bias for positive reinforcement over aversive control. You can make your own conclusions about that and the validity of “behavioral science”. This preference is easy to spot and DRO is one of the best examples that confirm the bias. Consider this comment about changing behavior. It’s from that same Nate Azrin. Remember, his credentials are impeccable. He is just as smart, just as educated and just as much an authority as any other PhD behavior analyst. He worked in B.F. Skinner’s rat lab at Harvard. He had a long and productive career as a behavior analyst. He was the real deal.

“Providing negative consequences is the fastest, most effective means of eliminating unwanted behavior – far faster than developing stimulus control or teaching an alternate behavior.” (Personal communication with the author.)

As you hold that thought consider that the vast majority of behaviorists and modern trainers recommend “teaching an alternate behavior.” That is a direct contradiction of Azrin’s simple solution. Their bias leads them to suggest a solution that isn’t a solution. The fundamental problem is that DRO is based on an illogical assumption. Teaching new behaviors does not remove old behaviors. If that was true, teaching you French would make you forget or be unable to speak English.

To show that this isn’t a unique perspective, here is a quote from Dr. Ron Van Houten – a respected behavior analyst. This is from The Effects of Punishment on Human Behavior, Axelrod and Apsche, Academic Press, 1983.

“Another way of suppressing unwanted behavior is to reinforce incompatible behavior. However, just as it can be difficult to teach a new behavior entirely through the use of punishment, it can be very difficult to suppress an old behavior entirely through the reinforcement of incompatible behavior. If reinforcement for the unwanted behavior cannot be completely eliminated, it will likely continue even if several new behaviors are established. Hence, the best formula for suppressing behavior involves reinforcing desirable behavior at the same time that one punishes undesirable behavior. Indeed, as has been pointed out earlier, punishment is most effective when an alternative reinforced behavior that is not punished is available. If, on the other hand, one provides an alternative behavior but does not punish the unwanted behavior, a concurrent schedule of reinforcement would prevail that would be expected to maintain both behaviors at strengths proportional to the amount of reinforcement associated with each behavior.”

 To examine this from the ground up, let’s go to a dog example and study how learning takes place.

The Nature of New Behaviors:
When you create a behavior to replace a behavior, the dog doesn’t know your purpose or that one behavior is better than another. It simply goes happily along learning something new. This is like teaching an English speaker the Spanish phrase, “que paso?” The new phrase is integrated into the verbal repertoire of the speaker and used, at will, to elicit a specific response. It does not affect the speaker’s ability to use English phrases such as “what’s happening”, “what’s up?”, how’s it going, “hey, dude”, or dozens of other casual greetings. Here’s an example of how this works when teaching behaviors.

EG: Teach a dog to turn in a circle. Then teach him to truncate the behavior. (If you are using a clicker, simply start clicking when the dog is at the half-way point. Alternate with a word like “wrong”, said in a normal tone of voice for anything that goes beyond the half-way point. ) Meaning now you are only going to require a half-turn. Teaching the truncated ½ turn doesn’t do anything to the full circle. That a human would call it a mistake if the dog sometimes does a full circle is irrelevant. The organism thinks the new behavior is additive, because it is. However, the addition of a behavior creates no prejudice against the old behavior so it continues to exist and be an option for the dog. Consider buying a new pair of sandals that sits in your closet right next to your insulated snow boots. You have no inhibition against wearing the boots. To the dog, we simply now have Behavior A (full circle) and Behavior B (half circle). That the two appear similar is exactly that – an appearance in the mind of the trainer. That the dog thinks of them as related is an assumption that may or may not be true. We are capable of disassociating things at the drop of a hat. The summer shoes are different from the winter shoes but neither are in the underwear drawer. Selecting one pair over the other is based on knowledge of its particular benefit in a specific context. If you leave the house wearing sandals and drive to a snow resort you need to switch gears and put on your warmer shoes.

Having shoes that appear similar but are treated as different is carried over to many aspects of our behavior. Things may appear completely similar and yet be treated differently based on context. Think of homophones – words that sound the same but can only be defined as they are used in a sentence.

“The soldier wound his watch carefully to avoid irritating the wound on his wrist.”

Did you understand the sentence? One word is a verb and the other is a noun, but it looks like the same word. The point to take away from this is that animals make associations as they make associations, not based on a human assumption about what they know or don’t know. Unless you test them logically, making assumptions that a dog “knows” something leads to sloppy control. In the case of this homophone the same word triggers two completely separate meanings. The meaning of each is based on context created by syntax – the order in which the word appears in the sentence and other cues – subject, object and predicate. Now let’s take a look at how assumptions about additive behaviors influence performance.

A Practical Example: Drop on Recall
Consider a classic obedience behavior called “Drop on Recall.” The dog is asked to come and then half-way to the handler is told to drop to the ground. Then the dog is asked to finish the ‘come‘. The dog knows both cues as separate things. Come means go to the handlers position. Drop means lie down. Each has its own power to control behavior in real time. However, when placed in a sequence something odd happens.

Within a few repetitions of the new pattern, ‘Come‘ followed by ‘drop‘ will become a new behavior that needs only the word ‘come’ to trigger. Now the dog comes toward the handler and drops automatically without being asked. This is a huge point. The dog’s natural ability to anticipate has trumped its ability to listen to commands in real time. Come suddenly doesn’t mean “come” anymore but drop still means drop. The embedded behavior is unchanged. The triggering command has morphed. This is no different than someone bringing sandals and snow shoes from their closet when you ask for “shoes.” The brain must be able to hold or cancel associations at any given moment to satisfy the demands of a specific context.

This leads to a problem for the obedience trainer. Come will turn into drop on recall gone wrong as the dog stops paying attention to the commands in real time and anticipates what is going to happen – thereby leading to a combination of the two or more behaviors. This is how a chain is formed. That it forms when we don’t want it to or when we want something else to happen is irrelevant to the reality. The dog encapsulates the behaviors that appear to be connected by tangible reinforcement. It stops listening and performs a knee-jerk behavior. Now let’s put it into the context of DRO.

DRO and Front Door Arousal: An everyday occurrence
Take a dog that rushes the front door. Teach it to go to a different location. Reinforce that heavily. Nothing has changed except adding a behavior to the dog’s repertoire. We just bought a pair of sandals during the summer. Though we have added a new behavior, the old skill still exists. Heavy, continual reinforcement is needed to prevent the old behavior from returning on its own. Slack off on the reinforcement and the dog senses a variation in consequences. That triggers variability and makes any other behavior more likely to occur. Just like the English speaker saying “what’s happening?” to someone who doesn’t speak English. If the English speaker also knows a Spanish greeting it will be triggered instantly. Both behaviors exist in the speaker’s repertoire and can be used at any time that communication is needed. As an aside, this is what the term fluency means.

The net cost for the trainer is to invest far more effort into teaching and maintaining the alternate behavior. In this case, you lost your time fussing with a process that sucks up training time and doesn’t really solve the problem. Few owners will remain diligent enough to make this process work because it requires perpetual diligence, an always hungry dog, treats at the ready and nothing coming through the door that is more important to the dog than the treats that have been traditionally connected with the event. As a recommendation to rapidly solve an unacceptable behavior, DRO fails, every time. Considering that the primary cause of owners taking dogs to shelters is problem behavior this is a serious problem. Tax the owner’s patience too far and they get rid of the dog. Shelters kill about 80% of what they get. Anything too complicated or too time-consuming puts the dog’s life at risk – and that is not an exaggeration.

EG: Sophia Yin, DVM, did a study of her remote feeder machine. It took her four months to teach 20 dogs to lie down rather than rush the front door. It took hours of devotion by the owners to practice the “solution” and in the end, also required the assistance of live trainers. (This begs the question of why the remote feeder was important. Why not just have live trainers do that work?) In all real-world problems there are time and cost constraints that must be integrated into the solution. Now you know why Nate Azrin’s quote describes reality and DRO does not. You can’t block existing unacceptable behavior by playing “bait and switch”. Unless you confront the original behavior you will never remove “que paso” or objectionable terms like “The N Word” from someone’s vocabulary.

Nailing a Solution:
Rather than leave you with just a critique of DRO I am going to give you a simple way to prove whether the criticism is justified. Find a dog that rushes the front door. They are all around you. Then find one that doesn’t like having its nails trimmed. They, also, are all around you. Now ring the doorbell. When the dog comes flying forward barking hysterically, restrain the dog and nip the tiniest bit off one nail. Then let the dog go. Ring the bell again. If the dog comes to the door, restrain the dog and nip the tiniest bit from another nail. Repeat. As soon as the dog stops rushing the door, start giving treats for “not rushing.” Done. This process can be accomplished in a couple of five minute sessions and periodic updates if the behavior starts to return. No harm came to the dog. No horrible side-effects of punishment occurred. Compare that to weeks of trying to teach the dog an alternate behavior and perpetual maintenance. One additional thought. I am not limited to using only punishment or positive reinforcement. In the example of the nail-clippers I would start using positive reinforcement for DRO after the punishment procedure has inhibited the dog from rushing the door. In essence, DRO is a procedure that is logically only part of a solution. Without the punishment component (wearing sandals on cold, rainy day will punish sandal selection on cold, rainy days) teaching an alternate behavior is unworkable.

Nate Azrin was a brilliant scientist because his research did what good science always does – it reveals nature. People who allow ideological biases to govern their methods invariably create complex, ineffective methods that serve themselves rather than those who need help. The next time someone suggests teaching an alternate behavior to get rid of unacceptable behavior teach them to be silent, instead. Good luck.

3 thoughts on “DRO – Differential Reinforcement of “Other” Behavior:

  1. My question is, since punishment is a much more viable and quick option–why do people choose not to apply it? Are they scared?

  2. Brandon, they are indeed scared. The underbelly of “positive” methods is immediate punishment for anyone who would suggest using punishment or actually applying it. As I also mentioned elsewhere, many people make good money and have elevated status because they preach “positive” methods. When you poke a hole in their fraud they suddenly respond instinctively. In dogs it’s called “resource guarding.” Threaten an ideologues cash or status and they get nasty, very quickly.

  3. Very interesting article! I’m curious – if a dog is fearful or lacks confidence, and that is the reason for rushing the door or barking, would you still recommend the same tact? i.e., clip a nail or provide a ‘punishment’? I also read an example in one of your other articles about a dog who barks during mealtimes – using a click and treat method to reduce the barking. Is that better to help a fearful barker?

    Thanks for the fascinating writing and making my brain burn – in a good way.

Leave a Reply

Your email address will not be published. Required fields are marked *