How to Build a Speech Recognition Circuit

来源:百度文库 编辑:神马文学网 时间:2024/03/29 03:15:32
Build a Speech Recognition Circuit
In the near future, speech recognition will become the methodof choice for controlling appliances, toys, tools, computersand robotics. There is a huge commercial market waiting forthis technology to mature.
This article details the construction and building of a standalone trainable speech recognition circuit that may be interfacedto control just about anything electrical, such as; appliances,robots, test instruments, VCR‘s TV‘s, etc. The circuit istrained (programmed) to recognized words you want it to recognize.
To control and command an appliance (computer, VCR, TV securitysystem, etc.) by speaking to it, will make it easier, whileincreasing the efficiency and effectiveness of working withthat device.
At its most basic level speech recognition allows the userto perform parallel tasks, (i.e. hands and eyes are busy elsewhere)while continuing to work with the computer or appliance.
This circuit allows one to experiment with many facets ofspeech recognition technology.
The heart of the circuit is the HM2007 speech recognitionintegrated circuit. The chip provides the options of recognizingeither forty .96 second words or twenty 1.92 second words.This circuit allows the user to choose either the .96 secondword length (40 word vocabulary) or the 1.92 second word length(20 word vocabulary). For memory the circuit uses an 8K X8 static RAM.
The chip has two operational modes; manual mode and CPU mode.The CPU mode is designed to allow the chip to work under ahost computer. This is an attractive approach to speech recognitionfor computers because the speech recognition chip operatesas a co-processor to the main CPU. The jobs of listening andrecognition doesn‘t occupying any of the computer‘s CPU time.When the HM2007 recognizes a command it can signal an interruptto the host CPU and then relay the command code. The HM2007chip can be cascaded to provide a larger word recognitionlibrary.
The SR-07 circuit we are building operates in the manualmode. The manual mode allows one to build a stand alone speechrecognition board that doesn‘t require a host computer andmay be integrated into other devices to utilize speech control.
Applications
Command and control of appliances and equipment
Telephone assistance systems
Data entry
Speech controlled toys
Speech and voice recognition security systems
Software Approach
Currently most speech recognition systems available todayare programs that use personal computers. The add on programsoperate continuously in the background of the computers operatingsystem (windows, OS/2, etc.). These programs require the computerto be equipped with a compatible sound card. The disadvantagein this approach is the necessity of a computer. While thesespeech programs are impressive, it is not economically viablefor manufacturers to add full blown computer systems to controla washing machine or VCR. At best the programs add to theprocessing required of the computer‘s CPU. There is a noticeableslow down in the operation and function of the computer whenvoice recognition is enabled.
Learning to Listen
We take our ability to listen for granted. For instance weare capable of listening to one person speak among severalat a party. We sub-consciously filter out the extemporaneousconversations and sound. This filtering ability is beyondthe capabilities of today‘s speech recognition systems.
Speech recognition is not speech understanding. Understandingthe meaning of words is a higher intellectual function. Becausea computer can respond to a vocal command does not mean itunderstands the command spoken. Voice recognition system willone day have the ability to distinguish linguistic nuancesand meaning of words, to "Do what I mean, not what I say!"
Speaker Dependent / SpeakerIndependent
Speech recognition is classified into two categories, speakerdependent and speaker independent.
Speaker dependent systems are trained by the individualwho will be using the system. These systems are capable ofachieving a high command count and better than 95% accuracyfor word recognition. The drawback to this approach is thatthe system only responds accurately only to the individualwho trained the system. This is the most common approach employedin software for personal computers.
Speaker independent is a system trained to respondto a word regardless of who speaks. Therefore the system mustrespond to a large variety of speech patterns, inflectionsand enunciation‘s of the target word. The command word countis usually lower than the speaker dependent however high accuracycan still be maintain within processing limits. Industrialrequirements more often need speaker independent voice systems,such as the AT&T system used in the telephone systems.
Recognition Style
Speech recognition systems have another constraint concerningthe style of speech they can recognize. They are three stylesof speech: isolated, connected and continuous.
Isolated speech recognition systems can just handlewords that are spoken separately. This is the most commonspeech recognition systems available today. The user mustpause between each word or command spoken. The speech recognitioncircuit is set up to identify isolated words of .96 secondlengths.
Connected is a half way point between isolated wordand continuous speech recognition. Allows users to speak multiplewords. The HM2007 can be set up to identify words or phrases1.92 seconds in length. This reduces the word recognitionvocabulary number to 20.
Continuous is the natural conversational speech weare use to in everyday life. It is extremely difficult fora recognizer to shift through the text as the word tend tomerge together. For instance, "Hi, how are you doing?" soundslike "Hi,.howyadoin" Continuous speech recognition systemsare on the market and are under continual development.
Speech Recognition Circuit
The demonstration circuit operates in the HM2007‘s manualmode. This mode uses a simple keypad and digital display tocommunicate with and program the HM2007 chip.

Figure 1
Keypad:The keypad is made up of 12 switches.

   Figure 2
When the circuit is turned on, the HM2007 checks the staticRAM. If everything checks out the board displays "00" on thedigital display and lights the red LED (READY). It is in the"Ready" waiting for a command.
To Train:
To train the circuit begin by pressing the word number youwant to train on the keypad. The circuit can be trained torecognize up to 40 words. Use any numbers between 1 and 40.For example press the number "1" to train word number 1. Whenyou press the number(s) on the keypad the red led will turnoff. The number is displayed on the digital display. Nextpress the "#" key for train. When the "#" key is pressed itsignals the chip to listen for a training word and the redled turns back on. Now speak the word you want the circuitto recognize into the microphone clearly. The LED should blinkoff momentarily, this is a signal that the word has been accepted.
Continue training new words in the circuit using the procedureoutlined above. Press the "2" key then "#" key to train thesecond word and so on. The circuit will accept up to fortywords. You do not have to enter 40 words into memory to usethe circuit. If you want you can use as many word spaces asyou want.
Testing Recognition:
The circuit is continually listening. Repeat a trained wordinto the microphone. The number of the word should be displayedon the digital display. For instance if the word "directory"was trained as word number 25. Saying the word "directory"into the microphone will cause the number 25 to be displayed.
Error Codes:
The chip provides the following error codes:
55 = word too long
66 = word too short
77 = word no match
Build a Speech Recognition Page 3
Training the HM2007 IC
Clearing the memory:
To erase all the words in the RAM memory (Training) press"99" on the keypad then press the "*" key. The display willscroll through the numbers 1-40 quickly, clearing out thememory.
To erase a single word space press the number of the wordyou want to clear, then press the "*" key.
Circuit Construction:
The schematic is shown in figure 1. Three PCB boards areavailable for this project, see parts list. The componentsare mounted on the top side of the board, see Figure 3.Begin construction by soldering the IC sockets on to the PCboards. Next mount and solder all the resistors. Now mountand solder the 3.57 MHz crystal and red LED. The long leadof the LED is positive. Next solder capacitors and 7805 voltageregulator. Solder seven the seven position headers on thekeypad to main circuit board as shown in figure 2 and 3. Nextsolder the 10 position headers on the display board and maincircuit board.

Figure 3
Independent Recognition System
This demo circuit allows you to experiment with dependentas well as independent systems. The system is typically trainedas speaker dependent. Meaning the voice that trained the circuitalso uses it.
To train the system for speaker independent recognition (Multi-user)use the following technique. We will use four word spacesfor each target word. Let‘s arrange the words so that thewords can be recognized by just decoding the lest significantdigit (number) on the digital display.
To accomplish this word spaces 01, 11, 21 and 31 are allocatedto the first target word. By only decoding the least significantdigit number, in this case 1 of "X" "1" (where X is any number0 - 3) we can recognize the target word.
We do this for the remaining word spaces. For instances,the second target word will use word spaces 02, 12, 22 and32. We continue in this manner until all the words are programmed.
If possible use a different person speaking the word. Thiswill enable the system to recognize different voices, inflectionsand enunciations of the target word. The more system resourcesthat are allocated for independent recognition the more robustthe circuit will become.
There are certain caveats to be aware of. First you are tradingoff word vocabulary number for speaker independence. The effectivevocabulary drops from forty words to ten words.
The decoding circuit that recognizes the word number andperforms a function must be designed to recognize error codes55, 66 and 77 and not confuse them with word spaces 5, 6 and7. Our interface circuit does this.
Voice Security System
This HM2007 wasn‘t designed for use in a voice security system.But this doesn‘t prevent you from experimenting with it forthat purpose. You may want to use three or four keywords thatmust be spoken and recognized in sequence in order to activatea circuit that opens a lock or allows entry.
CPU Mode
The HM2007 speech recognition chip is made to be connectedto a host computer system. Actually connecting the chip tothe IBM PC bus, parallel port or serial bus isn‘t a problem.However the circuit will require driver software needed forcontrol training, storing and recognition. The programmingwill present more of a challenge than the circuit.
For anyone wanting to interface the HM2007 to a PC bus Irecommend purchasing the additional HM2007 data sheet.
Kit available from us.  See thekitin the online catalog.