This first multiprocessing attempt was difficult for Fidelity.
Looking at the Printed Circuit Boards (PCB), it is not easy to understand what was the strategy used by the engineers and the programmers.
The letter of the programmers signed by Dan and Kathe Spracklen is useful and allows to follow their reasoning. Technically, it makes sense and justifies the choices for the hardware design for a master and slave microprocessor.
The result is a performance increase: the speed is boost up by 1.5 to 1.6, but that doesn't mean that the program was performing better. In fact, it is difficult to have an idea of the improvement by the multiprocessing. Simply put, the question is why the EAG-V5 didn't win more competitions.
The following document represents the way
that the Spracklen were conceptualizing their approach of the
multiprocessing in the EAG-V5 and later in the EAG-V8.
A letter from Dan and Kathe Spracklen to the Elite Avant-Garde Multi-Processor Owner |
Congratulations on the purchase of your new Elite Avant-Garde Multi-Processor unit. It is our hope that you will thoroughly enjoy your new chess computer, which is one of the finest and strongest available. Because the multi-processor
concept is new, we have included the following technical
information regarding your unit.
NOTES ON MULTI-PROCESSING If you play with Option E3 (described on the Instruction Manual Addendum Sheet), giving the same positions to the machine in both Single Processor Mode and Multi-Processor Mode, you may notice that the multi-processor is about 1.7 times as fast as the single processor. Perhaps you were wondering why it isn't twice as fast, since two processors are dividing up the work. Here are a few notes on how the
multi-processing in your new Elite works.
TERMINOLOGY Let's start with a little terminology. The processor that reads the keys and lights the LEDs on your Multi is called the master. The processor that helps out with the work of deciding on a move is called the slave. The process of deciding on a move is called a search. The job of the master is to divide up the search, sharing the work load with the slave and deciding on the best move. In most positions, there are many possible moves and the search must decide which is best.
DIVIDING UP THE MOVE LIST A simple and obvious way of dividing up the search would be to give the master and the slave each a move to work on. As soon as one is finished, it could be given the next move on the list and so on, until all the moves have been examined. This method is feasible and gives typical speed-ups of about 1.4 times as fast as one processor working alone. Why so slow? The reason lies in the nature of the Alpha-Beta Search itself. Without going into a lengthy explanation of this process, suffice it to say that the first move examined normally lakes far longer than any of the other moves. The reason for this is that the first move establishes a score which can be used in the examination of the other moves. When the move list is divided up, with both the master and the slave processing a different move, neither processor has an established score to work with and both are working on a long, difficult job. This means a lot of wasted effort and an overall result that is less than impressive. Nor would it help for the slave processor to sit idly by and wait for the master to finish the first move, which gives about the same results
PVS TREE SPLITTING A far superior method of dividing up the search was presented by Professor Tony Marsland of the University of Alberta. His system, the one which we use, involves sharing the work of the long, difficult first move and then dividing up the rest of the moves. In this system, the master gives the slave assignments to search parts of the tree needed to complete the examination of the first move. The results are the impressive 1.7 speed-up that you see in your Multi.
WHERE THE REMAINING TIME
IS LOST There are four types of overhead in a multi-processor system. Each contributes to frustrate the
attempt to reach the ideal speed-up of 2.0 times faster. This is the time lost when the two processors must talk to each other and send messages back and forth to one another. One processor must compose messages telling the other about the direction or results of a search. The other processor must read and act on the messages. This is Communication Overhead. Synchronization Overhead To understand synchronization overhead, we must think about what happens when the search is nearly finished. Let's say the master is thinking about the last move, and the slave is thinking about the next-to-the-last move. One will finish before the other and will have nothing to do. The time one processor must sit idly by and wait for the other to finish a task is called Synchronization Overhead. Divided Search Overhead Think back to our discussion of dividing up the search using the you-take-one, I’ll-take-one approach. We said that the reason this method didn't work out so well was that the score established by examining the first move helps the rest of the moves go by faster. In fact, the better the score,
the faster the rest of the moves go by. If, for example, our
first move wins the enemy's Queen, we won't pay much
attention to the rest of the moves that win less material or
win nothing at all. If, on the other hand, our first move
can only get us a little better position for some piece, we
will have to look longer at the other possible moves. Usually, thanks to sorting, the
first move on the list will be the best move. Sometimes it
happens that a move further down on the list will turn out
to be better. When that happens, the score established by
the first move is not as helpful to the other moves as the
score established by the new move. The processor that found
the better move will begin at once to use the new score. The
other processor is still stuck working with the old score
until it finishes the move it is currently examining. The
loss of efficiency due to the inability to promptly share
score information is called Divided Search Overhead. An interesting side effect of Divided Search Overhead shows up in test results using composed chess problems. Usually, the idea behind using chess problems to compare the performance of different chess computers is that there is one clear-cut winning move, which is difficult to find. Testers can measure how long it takes a particular computer to spot the winning move, and that will often be a fair measure of how fast and powerful the chess computer is. By their very nature, however, chess problems involve positions where the best move is almost never the first move examined. Thus, they are positions which are artificially high in Divided Search Overhead. For this reason, when researchers want to test the performance of their tree-splitting problems, they are better advised to use positions taken at random from master games than to use these chess problems, since the chess problems will cause the multi-processor to appear worse than its true performance. Split Hash Overhead A hash table is a large area of memory into which a processor can store information it has gained during the search. If the same position turns up again, the processor can look up the result in the hash table instead of computing it all over again. Hash tables are particularly effective in the endgame, where they can make a search run many times as fast as a program without them. This is because repeated positions show up so often in the endgame. Split Hash Overhead comes about because each processor has its own private hash table. The master can't see into the slave's hash table, and the slave can't see into the master's hash table. If one gets a position that the
other has already examined, it is stuck computing the result
all over again. In the opening, the middlegame, and even the
early endgame, this is not so serious. But when the number
of pieces on the board drops way down, Split Hash Overhead
can cripple the effectiveness of the multi-processor system.
Occasionally, endgame positions having only Kings and Pawns
may even take longer td run than they do on a single
processor. It was the debilitating effect of Split Hash Overhead that put us into a scramble for a way around it. We held off the Sales Department with one arm, while we searched for and found a way around the problem: If splitting the hash tables was the problem, we reasoned, why not look into the idea of Shared Hash Tables. In this implementation, both processors would share the same hash table (the master's would be used, since it is larger). The drawback here is that the Communication Overhead would increase due to the necessity of passing hash table information back and forth. In the hardware configuration originally designed for the Multi, this approach would not be feasible, because the processors had no way to "tap one another on the shoulder" to say, "Give me some hash information" or "Here is some hash information for you". Each processor would waste so much time waiting for the other to get around to reading its request, that the resulting Communication Overhead would be worse than the Split Hash Overhead. This approach would be feasible
if the hardware design were changed to allow the slave
processor to interrupt the master to retrieve or store hash
table information.
We fought for more time with the Marketing Department, and teamed up with the Engineering Department. Yes, with an ingeniously simple modification to the design, we could have interrupts! The rest, as they say, is history.
Instead of slowing down in the endgame; your Multi now
boosts speed-ups of 1.5 to 1.6!
Now you have an idea of what the software and hardware engineers at Fidelity Electronics have been doing to increase your enjoyment of computer chess. We hope you will enjoy the
exciting advances in multi-processing as much as we have
enjoyed bringing them to you. Sincerely,
S
Dan and Kathe Spracklen Fidelity Electronics
International, Inc. A Letter from the Programmers |
Elite Avant-Garde Multi-Processor Version 5 |
- 2265
Series, model 6114-5 - two 68000 Motorola microprocessors: MC68HC60P12F, 16 bits - each microprocessor mounted on a separate PCB - 16 MHz - RAM (Master) 128KB, extensible to 256KB on four banks of DRAM memory - ROM (Master): 2x64KB (128KB) + 4x8KB (32KB) - ROM (Slave): 2x32KB (64KB) - RAM (Slave): 2x8KB (16KB); 2 empty RAM sockets for expansion until 32KB - 48,3 x 45,7 x 6,2 cm - adapter 9V, 1100 mA |
Global view of the Hardware
Two Printed Circuit Board and 2 Microprocessors
The 2 PCB are distinct.
They communicate through connectors.
Master Printed Circuit Board (PCB)
1) Master Microprocessor Motorola 68000 (MC68HC60P12F, 16 bits, 16MHz)
2) Master DRAM 64KB x 2 (128KB)
only 2 banks are occupied. Two are free.
3) Master ROM
2 ROM with white labels 2 x 64KB (128KB)
Under: 4 RAM of 8KB (32KB)
Slave Printed Circuit Board (PCB)
1) Slave Microprocessor Motorola 68000 (MC68HC60P12F, 16 bits, 16MHz)
2) Slave ROM: 2x32KB (64KB)
3) Slave RAM: 2 x 8KB (16KB)
and 2 empty sockets for expansion until 32KB
Many thanks to Jose Guttierez (elpeon.com) for his help with his own EAG-V5.
Also, my gratitude to Steve Braid for his technical help
reviewing this article.