Hardware Diagnostics and Power On Self Tests

来源:百度文库 编辑:神马文学网 时间:2024/05/05 05:54:12

Hardware Diagnostics

Most Embedded Systems run hardware diagnostics to check the health of thehardware. Diagnostics are also used to confirm a fault that might have beendetected during normal operations. In this article we will be coveringdifferent type of diagnostic tests that are run in an operational EmbeddedSystem. These tests are summarized below:

  • Power On Self Tests (POST)
    • CPU and Register Test
    • Interrupt and Exception Test
    • EPROM Checksum Test
    • RAM March Test
    • DMA Controller Test
    • Device Tests
    • Loop Back Test
  • Out of Service Tests
    • Interface Tests
    • Echo Back Test
  • In- Service Monitoring
    • Transient Error Monitoring
    • Link Monitoring

Power On Self Tests (POST)

As the name suggests, Power On Self Tests (POST) are run just after a boardpowers up. These tests run diagnostics on the hardware components on the board.Typically code for these tests resides in the EPROM that boots the card. Whenthe EPROM boots, these tests are triggered automatically.

The main limitation of these tests is that they can only test internalfunctioning of the card. External interface logic of the card will not be testedby the Power On Self Tests.

CPU and Register Test

CPU test is one of the first tests in POST. This test checks the internalworking of the CPU. This test is run by executing processor instructions andthen verifying the output of the instruction. All the processor registers arealso exercised in this test. For example, as a part of this test, data containedin a register might be shifted by one bit and the result of the shift operationwill be compared with a pre-computed value. 

Interrupt and Exception Test

This test checks the interrupt and exception processing of the processor. Thetest is run by creating interrupt and exception conditions and thenlooping until the expected interrupt is recognized. For example, a timerinterrupt might be enabled and the test checks a flag that would be set by theinterrupt handler. Exception tests are carried out by deliberately creatingexception conditions like  "divide by zero" and then verifyingthat control has been transferred to the appropriate handler.

EPROM Checksum Test

When an EPROM is programmed, the last two bytes in the EPROM are deliberatelyinitialized to zero. When the EPROM programmer computes the checksum, thecomputed checksum is fused into the last two bytes.

This test calculates  the checksum for the EPROM by computing a 16-bitExclusive OR (XOR) of the EPROM contents, excluding the last two bytes. Thecalculated checksum is then compared with the checksum that was fused inthe last two bytes. Test passes if the computed and the fused checksum match.

RAM March Test

RAM March Test is run to test the integrity of the read-write memory on theboard. The test focuses on catching three types of problems with memory:

  • Address Line Faults: The address lines on the board or inside the memory chip might be shorting with each other or they might be stuck to 0 or 1. In either case, when memory is written, multiple locations or a wrong location might get written. A read might result in data corruption when two different locations in the memory output data on the data bus.
  • Data Line Faults: The data lines on the board or inside the memory chip might be shorting with each other or they might be stuck to 0 and 1. This condition will result in wrong data being written or read from the memory.
  • Data Loss: Data written to a particular location might be fine when read just after writing, but it is lost a little while later. Here the address and data lines are fine but the memory cells get corrupted over time.

Memory testing techniques can get fairly complicated and the actualalgorithm used also depends on the layout of the memory banks. We will becovering a simple test that does a pretty good job of testing the faultscenarios mentioned above. The RAM March test is carried out in by executing thefollowing steps:

  1. Initializing: Write a 0 in all memory locations on the board.
  2. Marching Ones: Repeat the following steps starting from the lowest address until the highest address is reached:
    • Check if the content of the memory the zero
    • Write a 1 in the bit 0 position
    • Read the memory location to confirm that the bit has been written successfully.
    • Repeat the above steps until a 1 has been written in all bits of that location
  3. Marching Zeros: Repeat the following steps starting from the highest address until the lowest address is reached:
    • Check if the content of the memory the 0xFF (i.e. all bits are still set as one after the one march)
    • Write a 0 in the bit 0 position
    • Read the memory location to confirm that the bit has been written successfully.
    • Repeat the above steps until a 0 has been written in all bits of that location

DMA Controller Test

Direct Memory Access Controllers (DMA controllers) are present on almost allboards. The DMA operations are required to transfer data to and from peripheraldevices without involving the processor. The DMA operations on the board canbe simply checked by initiating a DMA transfer and then verifying that thesource and destination memory areas match after DMA has been completed.

Device Tests

Peripheral devices used on a board need to be tested during the self tests.These tests are very specific to the device being tested. Many vendors implementspecial support for device tests by providing a test mode of operation. Thedevice is programmed into the test mode to perform these tests. When a devicedoes not support a test mode operations, board designers provide extrafunctionality on the board to test the peripheral devices.

Loop Back Test

Loop Back tests can be performed by connecting the transmitter on the deviceto the receiver on the same device. This is achieved by programming the deviceinto loop back mode. Once the device has been programmed, the test transmits thedata and waits until the receiver receives the data after loop back. The mainadvantage of this test is that it can be carried out independently on the boardunder test. But many times the loop back test does not test out the transmit andinterface data paths, as the loop back has been performed within the chip. Wewill be covering Echo Back Tests which addressthis problem.

Out of Service Tests

We have covered Power On Self Tests in the previous section. POST tests cantest out the internal working of the board quite well. But these tests fallshort when it comes to testing the interfaces with other boards in the system.In this section we will cover tests that are run in an active system by bringingthe board to be tested out of service and then verifying its interfaces withneighboring boards. 

Interface Tests

Interface tests are a broad category of tests that are performed to test outinterfaces with other cards. These tests generally involve participation fromthe neighboring cards. Basic steps in interface tests are listed below:

  1. Bring the card to be tested out of service.
  2. Configure the neighboring cards to work in an interface test mode. (In some cases this might require bringing the neighboring cards out of service).
  3. Instruct the card under test to perform the test.
  4. Restore the configuration on the neighboring cards by bringing them out of interface test mode.

Echo Back Test

The main disadvantage of the Loop Back Test wasthat it does not test out the hardware logic at the transmitter and receiverinterfaces. This problem can be solved by performing the Echo Back Test. Herethe interfacing card is configured in echo back mode, i.e. the interfacingreceives the data and echoes it back by transmitting it to the card under test. Thus the card undertest receives back the data that it had transmitted. The important differencefrom the loop back mode is that this is now testing the transmit and receivedriver logic. The picture shown below points out this difference between loopback and echo back.

Note that echo back is a special case of Interface Tests. Thus it follows thesame sequence of operations:

  1. Bring the card to be tested out of service.
  2. Configure the interfacing card to echo back all the data it receives. 
  3. Instruct the card under test to perform a loop back test.
  4. Restore the configuration on the interfacing card by bringing it out of echo back.

In- Service Monitoring

We have considered running diagnostic tests at power on and in out of servicemode. Here we will be discussing techniques to check the health of the card whenthe card is in service.

Transient Error Monitoring

When a card is in service, it should keep track of transient errors thatmight be detected by the software. Transient errors are errors that occuroccasionally even when the hardware is functioning normally. These errors aretransient, so if the failed operation is attempted again, the operation wouldsucceed. In a healthy system such problems are caused by power glitches, spikesand interference from other cards.

A good example of transient errors is spurious interrupts. Spurious interruptcondition is detected when processor detects an interrupt but the interrupthandler does not find a device that had initiated the interrupt. In such cases aleaky bucket error counter is incremented. If spurious interrupts become toofrequent, the leaky bucket counter will overflow. When the counter overflows,the system should trigger complete hardware diagnostics to isolate the problem.

Link Monitoring

List monitoring is also a very important tool for in- service monitoring of acard. Monitoring the bit error rate on the links can give advance warning aboutthe health of the system. When the bit error rate exceeds a certain threshold,diagnostics might have to be triggered.