{"title": "Applications of Neural Networks in Video Signal Processing", "book": "Advances in Neural Information Processing Systems", "page_first": 289, "page_last": 295, "abstract": null, "full_text": "Applications of Neural Networks in \n\nVideo Signal Processing \n\nJohn C. Pearson, Clay D. Spence and Ronald Sverdlove \n\nDavid Sarnoff Research Center \n\nCN5300 \n\nPrinceton, NJ 08543-5300 \n\nAbstract \n\nAlthough color TV is an established technology, there are a number of \nlongstanding problems for which neural networks may be suited. Impulse \nnoise is such a problem, and a modular neural network approach is pre(cid:173)\nsented in this paper. The training and analysis was done on conventional \ncomputers, while real-time simulations were performed on a massively par(cid:173)\nallel computer called the Princeton Engine. The network approach was \ncompared to a conventional alternative, a median filter. Real-time simula(cid:173)\ntions and quantitative analysis demonstrated the technical superiority of \nthe neural system. Ongoing work is investigating the complexity and cost \nof implementing this system in hardware. \n\n1 THE POTENTIAL FOR NEURAL NETWORKS IN \n\nCONSUMER ELECTRONICS \n\nNeural networks are most often considered for application in emerging new tech(cid:173)\nnologies, such as speech recognition, machine vision, and robotics. The fundamental \nideas behind these technologies are still being developed, and it will be some time \nbefore products containing neural networks are manufactured. As a result, research \nin these areas will not drive the development of inexpensive neural network hard(cid:173)\nware which could serve as a catalyst for the field of neural networks in general. \n\nIn contrast, neural networks are rarely considered for application in mature tech(cid:173)\nnologies, such as consumer electronics. These technologies are based on established \nprinciples of information processing and communication, and they are used in mil(cid:173)\nlions of products per year. The embedding of neural networks within such mass-\n\n289 \n\n\f290 \n\nPearson, Spence, and Sverdlove \n\nmarket products would certainly fuel the development oflow-cost network hardware, \nas economics dictates rigorous cost-reduction in every component. \n\n2 \n\nIMPULSE NOISE IN TV \n\nThe color television signaling standard used in the U.S. was adopted in 1953 (McIl(cid:173)\nwain and Dean, 1956; Pearson, 1975). The video information is first broadcast as an \namplitude modulated (AM) radio-frequency (RF) signal, and is then demodulated \nin the receiver into what is called the composite video signal. The composite signal \nis comprised of the high-bandwidth (4.2 MHz) luminance (black and white) signal \nand two low-bandwidth color signals whose amplitudes are modulated in quadrature \non a 3.58 MHz subcarrier. This signal is then further decoded into the red, green \nand blue signals that drive the display. One image \"frame\" is formed by interlacing \ntwo successive \"fields\" of 262.5 horizontal lines. \n\nElectric sparks create broad-band RF emissions which are transformed into oscilla(cid:173)\ntory waveforms in the composite video signal, called AM impulses. See Figure 1. \nThese impulses appear on a television screen as short, horizontal, multi-colored \nstreaks which clearly stand out from the picture. Such sparks are commonly cre(cid:173)\nated by electric motors. There is little spatial (within a frame) or temporal (between \nframes) correlation between impulses. \n\nGeneral considerations suggest a two step approach for the removal of impulses from \nthe video signal - detect which samples have been corrupted, and replace them with \nvalues derived from their spatio-temporal neighbors. Although impulses are quite \nvisible, they form a small fraction of the data, so only those samples detected as \ncorrupted should be altered. An interpolated average of some sort will generally be a \ngood estimate of impulse-corrupted samples because images are generally smoothly \nvarying in space and time. \n\nThere are a number of difficulties associated with this detection/replacement ap(cid:173)\nproach to the problem. There are many impulse-like waveforms present in normal \nvideo, which can cause \"false positives\" or \"false alarms\". See Figure 2. The algo(cid:173)\nrithms that decode the composite signal into RGB spread impulses onto neighboring \nlines, so it is desirable to remove the impulses in the composite signal. However, \nthe color encoding within the composite signal complicates matters. The sub carrier \nfrequency is near the ringing frequency of the impulses and tends to hide the im(cid:173)\npulses. Furthermore, the replacement function cannot simply average the nearest \n\nFigure 1: Seven Representative AM Impulse Waveforms. They have been digitized \nand displayed at the intervals used in digital receivers (8 bits, .07 usec). The largest \namplitude impulses are 20-30 samples wide, approximately 3% of the width of one \nline of active video (752 samples). \n\n\fApplications of Neural Networks in Video Signal Processing \n\n291 \n\n255~~io----_ \nO~~ ~ \n\n+l2:~I11-____ H,,--_____ J,..J\\II--! --1-'\" -\n-128~L ____ \n\n.. ________________________________________________ ~~ \no \n752 \n\nFigure 2: Corrupted Video Scan Line. (Top) Scan line of a composite video signal \ncontaining six impulse waveforms. (Bottom) The impulse waveforms, derived by \nsubtracting the uncorrupted signal from the corrupted signal. Note the presence of \nmany impulse-like features in the video signal. \n\nsamples, because they represent different color components. The impulses also have \na wide variety of waveforms (Figure I), including some variation caused by clipping \nin the receiver. \n\n3 MODULAR NEURAL NETWORK SYSTEM \n\nThe impulse removal system incorporates three small multi-layer perceptron net(cid:173)\nworks (Rumelhart and McClelland, 1986), and all of the processing is confined to \none field of data. See Figure 3. The replacement function is performed by one \nnetwork, termed the i-net (\"i\" denotes interpolation). Its input is 5 consecutive \nsamples each from the two lines above and the two lines below the current line. \nThe network consists of 10 units in the first hidden layer, 5 in the second, and-one \noutput node trained to estimate the center sample of the current line. \n\nThe detection function employs 2 networks in series. (A single network detector \nhas been tried, but it has never performed as well as this two-stage detector.) The \ninputs to the first network are 9 consecutive samples from the current line centered \non the sample of interest. It has 3 nodes in the first layer, and one output node \ntrained to compute a moving average of the absolute difference between the clean \nand noisy signals of the current inputs. It is thus trained to function as a filter for \nimpulse energy, and is termed the e-net. The output of the e-net is then low-pass \nfiltered and sub-sampled to remove redundant information. \nThe inputs to the second network are 3 lines of 5 consecutive samples each, drawn \nfrom the post-processed output of the e-net, centered on the sample of interest. \nThis network, like the e-net, has 3 nodes in the first layer and one output node. It \nis trained to output 1 if the sample of interest is contaminated with impulse noise, \nand 0 otherwise. It is thus an impulse detector, and is called the d-net. \n\nThe output of the d-net is then fed to a binary switch, which passes through to the \nfinal system output either the output of the i-net or the original signal, depending \non whether the input exceeds an adjustable threshold. \n\n\f292 \n\nPearson, Spence, and Sverdlove \n\nOriginal Dirty Picture \n\nsmall \nImpu_lse_+-~ \n\n'pseudo(cid:173)\nImpulse\" \n\"edge\"--f-I---\n\n-\nI \n\nbig Impulse -+--l..,,+ \n\npotenllaltrue -+-- -\n\npositive \n\npotential false negalive \n\nfalse negative \n\n-- false positives \n\ntrue positive -+-_ -\n\nInterpolated Original \n\n\"Restored\" Picture \n\n\"\"\"\"'r+smallimpulse let through \n\nblurred eyes \n\n~-+-big Impulse removed \n\nFigure 3: The Neural Network AM Impulse Removal System. The cartoon face is \nused to illustrate salient image processing characteristics of the system. The e-net \ncorrectly signals the presence of the large impulse (chin), misses the small impulse \n(forehead), and incorrectly identifies edges (nose) and points (eyes) as impulses. \nThe d-net correctly disregards the vertically correlated impulse features (nose) and \ndetects the large impulse (chin), but incorrectly misses the small impulse (forehead) \nand the non-correlated impulse-like features (eyes). The i-net produces a fuzzy \n(doubled) version of the original, which is used to replace segments identified as \ncorrupted by the d-net. \n\nExperience showed that the d-net tended to produce narrow spikes in response to \nimpulse-like features of the image. To remove this source of false positives, the \noutput of the d-net is averaged over a 19 sample region centered on the sample of \ninterest. This reduces the peak amplitude of signals due to impulse-like features \nmuch more than the broad signals produced by true impulses. An impulse is con(cid:173)\nsidered to be present if this smoothed signal exceeds a threshold, the level of which \nis chosen so as to strike a balance between low false positive rates (high threshold), \nand high true positive rates (low threshold). \n\nExperience also showed that the fringes of the impulses were not being detected. \nTo compensate for this, sub-threshold d-net output samples are set high if they are \nwithin 9 samples of a super-threshold d-net sample. Figure 4 shows the output of \nthe resulting trained system for one scan line. \n\nThe detection networks were trained on one frame of video containing impulses of \n5 different amplitudes with the largest twenty times the smallest. Visually, these \nranged from non-objectionable to brightly colored. Standard incremental back(cid:173)\npropagation and conjugate gradient (NAG, 1990) were the training proceedures \nused. The complexity of the e-net and d-net were reduced in phases. These nets \n\n\fApplications of Neural Networks in Video Signal Processing \n\n293 \n\n255~,,~ \n\n(J'b \n\no \n\nINPUT \n\nNOISE \n\n+2~~~ ___ ~~~.____ \n-25~ \n\nr--r \n\n~--------~.----------~v~----~~ \n\n~~ _ __ ~-~N_E_T ______ ~ \n\n255 \no \n\no \n\nSMOOTHED D-NET \n\nTHRESHOL~ 1\\ \n\n/\\ \n\n/\\ \n/\\ \n\n752 \n\nFigure 4: Input and Network Signals. \n\nbegan as 3 layer nets. After a phase of training, redundant nodes were identified \nand removed, and training re-started. This process was repeated until there were \nno redundant nodes. \n\n4 REAL-TIME SIMULATION ON THE PRINCETON \n\nENGINE \n\nThe trained system was simulated in real-time on the Princeton Engine (Chin et. \nal., 1988), and a video demonstration was presented at the conference. The Prince(cid:173)\nton Engine (PE) is a 29.3 GIPS image processing system consisting of up to 2048 \nprocessing elements in a SIMD configuration. Each processor is responsible for the \noutput of one column of pixels, and contains a 16-bit arithmetic unit, multiplier, a \n64-word triple-port register stack, and 16,000 words of local processor memory. In \naddition, an interprocessor communication bus permits exchanges of data between \nneighboring processors during one instruction cycle. \n\nWhile the i-net performs better than conventional interpolation methods, the dif(cid:173)\nference is not significant for this problem because of the small amount of signal \nwhich is replaced. (If the whole image is replaced, the neural net interpolator gave \nabout 1.5 dB better performance than a conventional method.) Thus it has not \nbeen implemented on the PE. The i-net may be of value in other video tasks, such \nas converting from an interlaced to a non-interlaced display. \n\n16-bit fixed point arithmetic was used in these simulations, with 8 bits of fraction, \nand 10 bit sigmoid function look-up tables. Comparison with the double-precision \narithmetic used on the conventional computers showed no significant reduction in \n\n\f294 \n\nPearson, Spence, and Sverdlove \n\n10 \n\n:zo - - - -\n~ ---\n\n100 - - -\n\n200 \n\n.2 \n\nO~~====~ O~~====~ \n\n.02 \n\n.04 \n\n.08 \n\n.08 \n\n.1 \n\n.1 \n\n% FALSE DETtcnON \n\no \n\n.02 \n\n.04 \n\n.08 \n\n.08 \n\n% FALSE DETtcnoN \n\no \n\nFigure 5: ROC Analysis of Neural Network and Median Detectors. \n\nperformance. Current work is exploring the feasibility of implementing training on \nthe PE. \n\n5 PERFORMANCE ANALYSIS \n\nThe mean squared error (MSE) is well known to be a poor measure of subjective \nimage quality (Roufs and Bouma, 1980). A better measure of detection performance \nis given by the receiver operating characteristic, or ROC (Green and Swets, 1966, \n1974). The ROC is a parametric plot of the fraction of corrupted samples correctly \ndetected versus the fraction of clean samples that were falsely detected. In this case, \nthe decision threshold for the smoothed output of the d-net was the parameter \nvaried. Figure 5 (left) shows the neural network detector ROC for five different \nimpulse amplitudes (tested on a video frame that it was not trained on). This \nquantifies the sharp breakdown in performance observed in real-time simulations at \nlow impulse amplitude. This breakdown is not observed in analysis of the MSE. \nMedian filters are often suggested for impulse removal tasks, and have been applied \nto the removal of impulses from FM TV transmission systems (Perlman, et aI, \n1987). In order to assess the relative merits of the neural network detector, a \nmedian detector was designed and analyzed. This detector computes the m~dian of \nthe current sample and its 4 nearest neighbors with the same color sub-carrier phase. \nA detection is registered if the difference between the median and the current sample \nis above threshold (the same additional measures were taken to insure that impulse \nfringes were detected as were described above for the neural network detector). \nFigure 5 (right) shows both the neural network and median detector ROC's for \ntwo different video frames, each of which contained a mixture of all 5 impulse \namplitudes. One frame was used in training the network (TRAIN), and the other \nwas not (TEST). This verifies that the network was not overtrained, and quantifies \nthe superior performance of the network detector observed in real-time simulations. \n\n\fApplications of Neural Networks in Video Signal P1'ocessing \n\n295 \n\n6 CONCLUSIONS \n\nWe have presented a system using neural network algorithms that outperforms a \nconventional method, median filtering, in removing AM impulses from television \nsignals. Of course an additional essential criterion is the cost and complexity of \nhardware implementations. Median filter chips have been successfully fabricated \n(Christopher et al., 1988). We are currently investigating the feasibility of casting \nsmall neural networks into special purpose chips. We are also applying neural nets \nto other television signal processing problems. \n\nAcknowledgements \n\nThis work was supported by Thomson Consumer Electronics, under Erich Geiger \nand Dietrich Westerkamp. This work was part of a larger team effort, and we \nacknowledge their help, in particular: Nurit Binenbaum, Jim Gibson, Patrick Hsieh, \nand John Ju. \n\nReferences \n\nChin, D., J. Passe, F. Bernard, H. Taylor and S. Knight, (1988). The Princeton \nEngine: A Real-Time Video System Simulator. IEEE Transactions on Consumer \nElectronics 34:2 pp. 285-297. \nChristopher, L.A., W.T. Mayweather III, and S. Perlman, (1988). A VLSI Median \nFilter for Impulse Noise Elimination in Composite or Component TV Signals. IEEE \nTransactions on Consumer Electronics 34:1 p. 262. \nGreen, D.M., and J .A. Swets, (1966 and 1974). Signal Detection Theory and Psy(cid:173)\nchophysics. New York, Wiley (1966). Reprinted with corrections, Huntington, \nN.Y., Krieger (1974). \nMcIlwain, K. and C.E. Dean (eds.); Hazeltine Corporation Staff, (1956). Principles \nof Color Television. New York. John Wiley and Sons. \n\nNAG, (1990). The NAG Fortran Library Manual, Mark 14. Downers Grove, IL \n(The Numerical Algorithms Group Inc.). \nPearson, D.E., (1975). Transmission and Display of Pictorial Information. New \nYork. John Wiley and Sons. \nPerlman, S.S, S. Eisenhandler, P.W. Lyons, and M.J. Shumila, (1987). Adaptive \nMedian Filtering for Impulse Noise Elimination in Real-Time TV Signals. IEEE \nTransactions on Communications COM-35:6 p. 646. \nRoufs, J .A. and H. Bouma, (1980). Towards Linking Perception Research and \nImage Quality. Proceedings of the SID 21:3, pp. 247-270. \nRumelhart, D.E. and J.L. McClelland (eds.), (1986). Parallel Distributed Pro(cid:173)\ncessing: Explorations in the Microstructure of Cognition. Cambridge, Mass., MIT \nPress. \n\n\f\fPart VII \n\nVisual Processing \n\n\f\f", "award": [], "sourceid": 313, "authors": [{"given_name": "John", "family_name": "Pearson", "institution": null}, {"given_name": "Clay", "family_name": "Spence", "institution": null}, {"given_name": "Ronald", "family_name": "Sverdlove", "institution": null}]}