{"title": "Remote Sensing Image Analysis via a Texture Classification Neural Network", "book": "Advances in Neural Information Processing Systems", "page_first": 425, "page_last": 432, "abstract": null, "full_text": "Remote Sensing Image Analysis via a Texture \n\nClassification Neural Network \n\nHayit K. Greenspan and Rodney Goodman \n\nDepartment of Electrical Engineering \n\nCalifornia Institute of Technology, 116-81 \n\nPasadena, CA 91125 \n\nhayit@electra.micro.caltech.edu \n\nAbstract \n\nIn this work we apply a texture classification network to remote sensing im(cid:173)\nage analysis. The goal is to extract the characteristics of the area depicted \nin the input image, thus achieving a segmented map of the region. We have \nrecently proposed a combined neural network and rule-based framework \nfor texture recognition. The framework uses unsupervised and supervised \nlearning, and provides probability estimates for the output classes. We \ndescribe the texture classification network and extend it to demonstrate \nits application to the Landsat and Aerial image analysis domain . \n\n1 \n\nINTRODUCTION \n\nIn this work we apply a texture classification network to remote sensing image \nanalysis. The goal is to segment the input image into homogeneous textured regions \nand identify each region as one of a prelearned library of textures, e.g. tree area and \nurban area distinction. Classification 0 f remote sensing imagery is of importance in \nmany applications, such as navigation, surveillance and exploration. It has become \na very complex task spanning a growing number of sensors and application domains. \nThe applications include: landcover identification (with systems such as the AVIRIS \nand SPOT), atmospheric analysis via cloud-coverage mapping (using the AVHRR \nsensor), oceanographic exploration for sea/ice type classification (SAR input) and \nmore. \n\nMuch attention has been given to the use of the spectral signature for the identifica-\n\n425 \n\n\f426 \n\nGreenspan and Goodman \n\ntion of region types (Wharton, 1987; Lee and Philpot, 1991). Only recently has the \nidea of adding on spatial information been presented (Ton et aI, 1991). In this work \nwe investigate the possibility of gaining information from textural analysis. We \nhave recently developed a texture recognition system (Greenspan et aI, 1992) which \nachieves state-of-the-art results on natural textures. In this paper we apply the \nsystem to remote sensing imagery and check the system's robustness in this noisy \nenvironment. Texture can playa major role in segmenting the images into homoge(cid:173)\nneous areas and enhancing other sensors capabilities, such as multispectra analysis, \nby indicating areas of interest in which further analysis can be pursued. Fusion of \nthe spatial information with the spectral signature will enhance the classification \nand the overall automated analysis capabilities. \n\nMost of the work in the literature focuses on human expert-based rules with specific \nsensor data calibration. Some of the existing problems with this classic approach \nare the following (Ton et aI, 1991): \n- Experienced photointerpreters are required to spend a considerable amount of \ntime generating rules. \n- The rules need to be updated for different geographical regions. \n- No spatial rules exist for the complex Landsat imagery. \nAn interesting question is if one can automate the rule generation. In this paper we \npresent a learning framework in which spatial rules are learned by the system from \na given database of examples. \n\nThe learning framework and its contribution in a texture-recognition system is the \ntopic of section 2. Experimental results of the system's application to remote sensing \nimagery are presented in section 3. \n\n2 The texture-classification network \n\nWe have previously presented a texture classification network which combines a \nneural network and rule-based framework (Greenspan et aI, 1992) and enables both \nunsupervised and supervised learning. The system consists of three major stages, \nas shown in Fig. 1. The first stage performs feature extraction and transforms the \nimage space into an array of 15-dimensional feature vectors, each vector correspond(cid:173)\ning to a local window in the original image. There is much evidence in animal visual \nsystems supporting the use of multi-channel orientation selective band-pass filters \nin the feature-extraction phase. An open issue is the decision regarding the appro(cid:173)\npriate number of frequencies and orientations required for the representation of the \ninput domain. We define an initial set of 15 filters and achieve a computationally \nefficient filtering scheme via the multi-resolution pyramidal approach. \n\nThe learning mechanism shown next derives a minimal subset of the above filters \nwhich conveys sufficient information about the visual input for its differentiation \nand labeling. In an unsupervised stage a machine-learning clustering algorithm is \nused to quantize the continuous input features. A supervised learning stage follows \nin which labeling of the input domain is achieved using a rule-based network. Here \nan information theoretic measure is utilized to find the most informative correlations \nbetween the attributes and the pattern class specification, while providing proba(cid:173)\nbility estimates for the output classes. Ultimately, a minimal representation for a \nlibrary of patterns is learned in a training mode, following which the classification \n\n\fRemote Sensing Image Analysis via a Texture Classification Neural Network \n\n427 \n\nORENI' AT10N \nSELEcrlVE \n\n8PF \n\nUNSUPERVISED \n\na.USTEANi \n\nSUPERVISED \n\nLEARNING \n\nvIII \n\nTEXTURE \nCLASSES \n\nWindow \n\nof Input Image \n\nN-Dimensional \n\nContinuous \n\nFeature- Vector \n\nN-Dimensional \n\nQuantized \n\nFeature-Vector \n\n\u2022 \n\nFEATURE-EXTRACTION \n\n\u2022 \n\n\u2022 \n\nLEARNING \n\n\u2022 \n\nPHASE \nFigure 1: System block diagram \n\nPHASE \n\nof new patterns is achieved. \n2.1 The system in more detail \n\nThe initial stage for a classification system is the feature extraction phase. In the \ntexture-analysis task there is both biological a.nd computational evidence support(cid:173)\ning the use of Gabor-like filters for the feature-extraction. In this work, we use \nthe Log Gabor pyramid, or the Gabor wavelet decomposition to define an initial \nfinite set of filters. A computational efficient. scheme involves using a pyramidal \nrepresentation of the image which is convolved with fixed spatial support oriented \nGabor filters (Greenspan at aI, 1993). Three scales are used with 4 orientations per \nscale (0,90,45,135 degrees), together with a non-oriented component, to produce a \n15-dimensional feature vector as the output of the feature extraction stage. Using \nthe pyramid representation is computationally efficient as the image is subsampled \nin the filtering process. Two such size reduction stages take place in the three scale \npyramid. The feature values thus generated correspond to the average power of the \nresponse, to specific orientation and frequency ranges, in an 8 * 8 window of the \ninput image. Each such window gets mapped to a 15-dimensional attribute vector \nas the output of the feature extraction stage. \n\nThe goal of the learning system is to use the feature representation described above \nto discriminate between the input patterns, or textures. Both unsupervised and \nsupervised learning stages are utilized. A minimal set of features are extracted from \nthe 15-dimensional attribute vector, which convey sufficient information about the \nvisual input for its differentiation and labeling. \n\nThe unsupervised learning stage can be viewed as a preprocessing stage for achiev(cid:173)\ning a more compact representation of the filtered input. The goal is to quantize the \ncontinuous valued features which are the result of the initial filtering, thus shifting \nto a more symbolic representation of the input domain . This clustering stage was \nfound experimentally to be of importance as an initial learning phase in a classi(cid:173)\nfication system. The need for discretization becomes evident when trying to learn \nassociations between attributes in a symbolic representation, such as rules. \n\n\f428 \n\nGreenspan and Goodman \n\nThe output of the filtering stage consists of N (=15), continuous valued feature \nmaps; each representing a filtered version of the original input. Thus, each local \narea of the input image is represented via an N-dimensional feature vector. An \narray of such N-dimensional vectors, viewed across the input image, is the input \nto the learning stage. We wish to detect characteristic behavior across the N(cid:173)\ndimensional feature space, for the family of textures to be learned. In this work, each \ndimension, out of the 15-dimensional attribute vector, is individually clustered. All \ntraining samples are thus projected onto each axis of the space and one-dimensional \nclusters are found using the K-means clustering algorithm (Duda and Hart, 1973). \nThis statistical clustering technique consists of an iterative procedure of finding \nK means in the training sample space, following which each new input sample is \nassociated with the closest mean in Euclidean distance. The means, labeled 0 thru K \nminus 1 arbitrarily, correspond to discrete codewords. Each continuous-valued input \nsample gets mapped to the discrete codeword representing its associated mean. The \noutput of this preprocessing stage is a 15-dimensional quantized vector of attributes \nwhich is the result of concatenating the discrete-valued codewords of the individual \ndimensions. \n\nIn the final, supervised stage, we utilize the existing information in the feature \nmaps for higher level analysis, such as input labeling and classification. A rule -\nbased information theoretic approach is used which is an extension of a first order \nBayesian classifier, because of its ability to output probability estimates for the out(cid:173)\nput classes (Goodman et aI, 1992). The classifier defines correlations between input \nfeatures and output classes as probabilistic rules. A data driven supervised learning \napproach utilizes an information theoretic measure to learn the most informative \nlinks or rules between features and class labels. The classifier then uses these links \nto provide an estimate of the probability of a given output class being true. When \npresented with a new input evidence vector, a set of rules R can be considered to \n\"fire\". The classifier estimates the posterior probability of each class given the rules \nthat fire in the form log(p( x )IR), and the largest estimate is chosen as the initial \nclass label decision. The probability estimates for the output classes can now be \nused for feedback purposes and further higher level processing. \n\nThe rule-based classification system can be mapped into a 3 layer feed forward \narchitecture as shown in Fig. 2 (Greenspan et aI, 1993). The input layer contains \na node for each attribute. The hidden layer contains a node for each rule and the \noutput layer contains a node for each class. Each rule (second layer node j) is \nconnected to a class via a multiplicative weight of evidence Wj. \n\nInputs \n\nRules \n\nClass \nProbability \nEstimates \n\nFigure 2: Rule-Based Network \n\n\fRemote Sensing Image Analysis via a Texture Classification Neural Network \n\n429 \n\n3 Results \n\nThe above-described system has achieved state-of-the-art results on both structured \nand unstructured natural texture classification [5]. In this work we present initial \nresults of applying the network to the noisy environment of satellite and air-borne \nImagery. \n\nFig. 3 presents two such examples. The first example (top) is an image of Pasadena, \nCalifornia, taken via the AVIRIS system (Airborne Visible/Infrared Imaging Spec(cid:173)\ntrometer). rhe AVIRIS system covers 224 contiguous spectral bands simultane(cid:173)\nously, at 20 meters per pixel resolution. The presented example is taken as an \naverage of several bands in the visual range. In this input image we can see that \na major distinguishing characteristic is urban area vs. hilly surround. These are \nthe two categories we set forth to learn. The training consists of a 128*128 image \nsample for each category. The test input is a 512*512 image which is very noisy \nand because of its low resolution, very difficult to segment into the two categories, \neven to our own visual perception. In the presented output (top right), the ur(cid:173)\nban area is labeled in white, the hillside in gray and unknown, undetermined areas \nare in darker gray. We see that a rough segmentation into the desired regions has \nbeen achieved. The probabilistic network's output allows for the identification of \nunknown or unspecified regions, in which more elaborate analysis can be pursued \n(Greenspan et aI, 1992). The dark gray areas correspond to such regions; one ex(cid:173)\nample is the hill and urban contact (bottom right) in which some urban suburbs \non the hill slopes form a mixture of the classes. Note that in the initial results \npresented the blockiness perceived is the result of the analysis resolution chosen. \nFusing into the system additional spectral bands as our input, would enable pixel \nresolution as well as enable detecting additional classes (not visually detectable), \nsuch as concrete material, a variety of vegetation etc. \n\nA higher resolution Airborne image is presented at the bottom of Fig. 3. The \nclasses learned are bush (output label dark gray), ground (output label gray) and a \nstructured area, such as a field present or the man-made structures (white). Here, \nthe training was done on 128*128 image examples (1 example per class). The input \nimage is 800*800. In the result presented (right) we see that the three classes have \nbeen found and a rough segmentation into the three regions is achieved. Note in \nparticular the detection of the bush areas and the three main structured areas in \nthe image, including the man-made field, indicated in white. \n\nOur final example relates to an autonomous navigation scenario. Autonomous ve(cid:173)\nhicles require an automated scene analysis system to avoid obstacles and navigate \nthrough rough terrain. Fusion of several visual modalities, such as intensity-based \nsegmentation, texture, stereo, and color, together with other domain inputs, such \nas soil spectral decomposition analysis, will be required for this challenging task. In \nFig. 4. we present preliminary results on outdoor photographed scenes taken by an \nautonomous vehicle at JPL (Jet Propulsion Laboratory, Pasadena). The presented \nscenes (left) are segmented into bush and gravel regions (right). The training set \nconsists of 4 64 * 64 image samples from each category. In the top example (a \n256*256 pixel image), light gray indicates gravel while black represents bushy re(cid:173)\ngions. We can see that intensity alone can not suffice in this task (for example, top \nright corner). The system has learned some textural characteristics which guided \n\n\f430 \n\nGreenspan and Goodman \n\nFigure 3: Remote sensing image analysis results. The input test image is shown \n(left) followed by the system output classification map (right). In the AVIRIS (top) \ninput, white indicates urban regions, gray is a hilly area and dark gray reflects \nundetermined or different region types. In the Airborne output (bottom), dark \ngray indicates a bush area, light gray is a ground cover region and white indicates \nman-made structures. Both robustness to noise and generalization are demonstrated \nin these two challenging real-world problems. \n\n\fRemote Sensing Image Analysis via a Texture Classification Neural Network \n\n431 \n\nthe segmentation in otherwise similar-intensity regions. Note that this is also prob(cid:173)\nably the cause for identifying the track-like region (e.g., center bottom) as bush \nregions. We could learn track-like regions as a third category, or specifically include \nsuch examples as gravel in our training set. \nIn the second example (a 400*400 input image, bottom) light gray indicates gravel, \ndark gray represents a bush-like region, and black represents the unknown category. \nHere, the top right region of the sky, is labeled correctly as an unknown, or new \ncategory. Kote that intensity alone would have confused that region as being gravel. \nOverall, the texture classification neural-network succeeds in achieving a correct, \nyet rough, segmentation of the scene based on textural characteristics alone. These \nare encouraging results indicating that the learning system has learned informative \ncharacteristics of the domain. \n\n\u2022 \n\nFig 4: Image Analysis for Autonomous Navigation \n\n\f432 \n\nGreenspan and Goodman \n\n4 Summary and Discussion \n\nThe presented results demonstrate the network's capability for generalization and \nrobustness to noise in very challenging real-world problems. In the presented frame(cid:173)\nwork a learning mechanism automates the rule generation. This framework can an(cid:173)\nswer some of the current difficulties in using the human expert's knowledge. Further \nmore, the automation of the rule generation can enhance the expert's knowledge \nregarding the task at hand. We have demonstrated that the use of textural spa(cid:173)\ntial information can segment complex scenery into homogeneous regions. Some of \nthe system's strengths include generalization to new scenes, invariance to intensity, \nand the ability to enlarge the feature vector representation to include additional \ninputs (such as additional spectral bands) and learn rules characterizing the inte(cid:173)\ngrated modalities. Future work includes fusing several modalities within the learn(cid:173)\ning framework for enhanced performance and testing the performance on a large \ndatabase. \n\nAcknowledgements \n\nThis work is supported in part by Pacific Bell, and in part by DARPA and ONR \nunder grant no. N00014-92-J-1860. H. Greenspan is supported in part by an Intel \nfellowship. The research described in this paper was carried out in part by the \nJet Propulsion Laboratories, California Institute of Technology. We would like to \nthank Dr. C. Anderson for his pyramid software support and Dr. 1. Matthies for \nthe autonomous vehicle images. \n\nReferences \n\nS. Wharton. (1987) A Spectral-Knowledge-Based Approach for Urban Land-Cover \nDiscrimination. IEEE Transactions on Geoscience and Remote Sensing, Vol. GE-\n25[3] :272-282. \n\nJ. Lee and W. Philpot. (1991) Spectral Texture Pattern Matching: A Classifier \nFor Digital Imagery. IEEE Transactions on Geoscience and Remote Sensing, Vol. \n29[4] :545-554. \n\nJ. Ton, J. Sticklen and A. Jain. (1991) Knowledge-Based Segmentation of Landsat \nImages. IEEE Transactions on Geoscience and Remote Sensing, Vol. 29[2]:222-232. \n\nH. Greenspan, R. Goodman and R. Chellappa. (1992) Combined Neural Network \nand Rule-Based Framework for Probabilistic Pattern Recognition and Discovery. In \nJ. E. Moody, S. J. Hanson, and R. P. Lippman (eds.), Advances in Neural Informa(cid:173)\ntion Processing Systems 4.,444-452, San Mateo, CA: Morgan Kaufmann Publishers. \n\nH. Greenspan, R. Goodman, R. Chellappa and C. Anderson. (1993) Learning \nTexture Discrimination Rules in a Multiresolution System. Submitted to IEEE \nTransactions on Pattern Analysis and Machine Intelligence. \n\nR. O. Duda and P. E. Hart. (1973) Pattern Classification and Scene Analysis. John \nWiley and Sons, Inc. \n\nR. Goodman, C. Higgins, J. Miller and P. Smyth. (1992) Rule-Based Networks for \nClassification and Probability Estimation. Neural Computation, [4]:781-804. \n\n\f", "award": [], "sourceid": 636, "authors": [{"given_name": "Hayit", "family_name": "Greenspan", "institution": null}, {"given_name": "Rodney", "family_name": "Goodman", "institution": null}]}