Reinforce Every Behavior?

By KPCT on 12/12/2006

Filed in

Trouble reading this email? Read it online :: Recent Letters :: Clickertraining.com :: Clicker Gear Store

Dear Clicker Friends,

In September I gave a workshop at the annual meeting of the Association for Pet Dog Trainers, always both an honor and a pleasure. In the workshop I demonstrated an exercise I'd learned, at an earlier APDT meeting from Massachusetts trainer Tibby Chase, for teaching inattentive dogs to walk politely at a person's side. The exercise involves targeting and shaping, and works even if neither the handler nor the dog know anything about clicker training.

APDT had arranged for a pet owner to bring three friendly but largely untrained dogs. None of the dogs were accustomed to being in public, and while they were fairly quiet, they were, of course, trying to smell everything and greet everyone, pulling on their leashes and paying very little attention to the person holding them. The owner found a volunteer handler for each dog so I could put them through the exercise, one at a time.

I set out about ten circular colored floor markers (the kind that are used in kids' soccer practice) in a straight line about four feet (or two paces) apart across the front of the room. I had the handler and dog start at one end of the line and walk to the other; the only instruction was that every time I clicked, the handler must stop and give the dog a treat.

If the dog was walking at the owner's left side, I clicked just before they came to the next dot. By the fifth dot I didn't have to worry about where the dog was. I had deliberately set the dots so close to each other that the dog hardly had time to be distracted or move away between dots.

At the end of the row I asked the handler to turn around and bring the dog back along the line again. At first the dog's attention wandered during the turn around, and either the dog or handler might be pulling on the leash, but as soon as they started down the line again the dog fell into position. Click-stop/treat, click-stop/treat, click-stop/treat all the way past ten dots. By now the dog was staying next to the handler on purpose, and the leash was slack between them.

So far, I was using a continuous schedule of reinforcement. The dog was doing what I had in mind, the click was marking it over and over, and the click was always followed by food.

Before the next pass I stepped in and remove the third, fifth, and seventh dots. I had thus raised one criterion: the distance. Now there were three gaps in the line that were longer than before. "Sometimes, dog, you may have to walk a little further to get to a clickable moment." Just as I expected, the first dog strayed a bit in the new gap. Then as the dog and handler approached the next dot, if the dog was nearer the handler again, I clicked. I was shaping the behavior of "walk next to the handler for longer and longer distances." Usually by the time the handler and the dog hit the gap between the sixth and eighth dot, the dog was once again glued to the handler's side and remained so from then on. One dog, however, a large hound mix that had been the most inattentive and pull-happy of the three at the beginning, did need three passes down the line to stay at heel during all the longer gaps."From the dog's standpoint, it was getting reinforced on a predictable basis, and now, suddenly, it was not so predictable. The dog must try a little harder, maintain the behavior a little longer, to find out how to get the click to happen again for sure."

You might say I was still reinforcing the behavior continuously, because I definitely clicked every time the correct behavior occurred: every time dog, person, and dot were in close proximity. But from the dog's standpoint, it was getting reinforced on a predictable basis, and now, suddenly, it was not so predictable. The dog must try a little harder, maintain the behavior a little longer, to find out how to get the click to happen again for sure.

During shaping of a new behavior, each time you establish the behavior, the dog is being reinforced on a continuous schedule: that is, it does the behavior and it gets the click/treat. As soon as you want to improve the behavior, however, and you raise a criterion, the dog is on a less predictable schedule. The requirements are a little different and it will probably not get reinforced every time. From the dog's standpoint, the schedule has become variable. When the dog is meeting the new criterion every time, the reinforcement becomes continuous again.

Marian Breland Bailey told me she called this a "shaping schedule." It's a natural part of the shaping process. Reinforcement may go from predictable to a little unpredictable back to predictable, as you climb, step by step, toward your ultimate goal.

Sometimes a novice animal may find this very disconcerting. If two or three expected reinforcers fail to materialize, the animal may simply give up and quit on you. You can see this clearly on the video of my fish learning to swim through a hoop. When three tries "didn't work" the fish not only quit trying, he had an emotional collapse, lying on the bottom of the tank in visible distress. He offered no more hoop-swimming; scientists would say the behavior had extinguished.

Extinction does not erase a behavior; once learned, it still exists in the animal's nervous system. There are a number of ways to recover a behavior that has gone into extinction, such as reducing your criterion (going back to a dot every two feet) or simply asking for some other well-learned behavior, or waiting an hour or a day and trying again. But perhaps the most graceful way is to build a little confidence, a little resilience in the animal, by introducing a little variability in the reinforcement schedule on purpose, but very tactfully. The animal mostly gets the reinforcement it expected for the behavior it is just learning, but sometimes it has to do the behavior two times, or go twice as far, or twice as long, for a single click. This is what I was causing, in these naÃ¯ve dogs, by removing an occasional dot: sometimes the dogs had to go the usual distance, and sometimes twice the usual distance.

At first each dog had thought the game was over, but then they discovered it was still working. Both their confidence and the strength of the behavior increased. By the fifth pass down the line, each of the three dogs looked like a polished obedience class graduate: locked into position next to the handler on a nicely loose leash, tail up and waving, head cranked around to look eagerly at the person's face, watching for the next magic moment when a click-stop/treat might occur.

Just for fun, when the last dog, that big hound mix, came down the line perfectly, obedience prance, turned head, and all, I grabbed up a few more dots and laid them out several yards apart across an empty section of the ballroom toward the distant entrance. From the end of the line I sent the handler out across the empty spaces, with the dots as targets to guide her. With just two or three clicks and treats each way, the hound walked nicely at heel, gazing up at her eagerly, clear across the ballroom and back. This easygoing dog could now accept quite large increases in criterion, and still give the behavior so well that he was on a continuous schedule again. Good boy! "Once a simple behavior has been learned, a long and unpredictable schedule can in fact maintain behavior that you DON'T want, with incredible power."

So that's a place where a variable ratio of click/treat to offered behavior occurs: when you are selectively reinforcing better or stronger or different behavior. It may happen again when you are adding the cue. Some behaviors may be reinforced and some not; from the animal's standpoint, it is not sure why, and it must be a little resilient about those missed clicks to figure out how to meet the new criterion. And again, when the behavior becomes part of a longer repertoire or rolled into daily life, and natural reinforcers take over, reinforcement may be erratic, and consequently (in my view) on a variable ratio schedule. Yet the behavior is maintained.

Once a simple behavior has been learned, a long and unpredictable schedule can in fact maintain behavior that you DON'T want, with incredible power. People inadvertently train cats to get them up in the night, dogs to pull like freight trains, and children to have tantrums, by holding out for some of the time and then giving in, feeding the cat, going along where the dog wants to go, or buying the candy in the supermarket, on an irregular basis. Casinos, believe me, use the power of the variable ratio schedule to develop behaviors, such as playing slot machines, that are very resistant to extinction, despite highly variable and unpredictable reinforcement.

So: where do you deliberately use a variable ratio schedule of reinforcement? In raising criteria. For building resistance to extinction during shaping. For extending duration and distance of a behavior (ping-ponging, as Morgan Spector and Corally Burmaster say).

Where do you NOT use it?

NEVER purely for a maintenance tool. Behaviors that occur in just the same way with the same level of difficulty each time are better maintained by continuous reinforcement, or by reinforcing in various combinations with other behaviors, than by deliberately letting satisfactory behavior go unreinforced.

NEVER for maintaining chains. I once had the privilege of co-presenting a workshop with Debi Davis, and saw her service dog, a papillon, jump down from her lap to pick up and bring back to her a dollar bill she had dropped. Debi promptly clicked and treated, and then told me people routinely remonstrated with her for doing that, saying that the behavior should NOT be reinforced every time. But this was a chained behavior involving multiple steps. The environment provided the cue for each step of the chain. (See money fall, jump down. Reach money, pick it up. Got the money? Take it back to Debi, etc.) Each cue reinforced the behavior that preceded it. But failing to reinforce the whole chain at the end of it would inevitably lead to pieces of the chain beginning to extinguish down the road. Debi was right. Pay the pup for that great job!

NEVER for discrimination problems such as scent articles. If you are asking the dog to make a choice between two objects or stimuli, you have to tell him when he's right; putting him on "twofers" just punishes correct answers.

I was very happy with the super performance of my three pullers at APDT, and the demonstration of how oscillating between continuous and intermittent reinforcement allows you to raise criteria extremely fast—until a trainer complained afterwards that I had used the wrong dogs. "It would have been a better demonstration," she said, "if they hadn't already been so well trained."

Happy Clicking!

Sunshine Books, Inc.
49 River St., Suite 3
Waltham, MA 02453
1-800-47-CLICK(2-5425)

Ken Ramirez Letter

Email this Newsletter issue

Reinforce Every Behavior?

Dear Clicker Friends,

Newsletter

Post new comment

Build a great relationship

Sign Up for our Newsletter