{"title": "The RA Scanner: Prediction of Rheumatoid Joint Inflammation Based on Laser Imaging", "book": "Advances in Neural Information Processing Systems", "page_first": 1433, "page_last": 1440, "abstract": "", "full_text": "The RA Scanner: Prediction of Rheumatoid\nJoint In\ufb02ammation Based on Laser Imaging\n\nAnton Schwaighofer1 2\n\n1 TU Graz, Institute for Theoretical Computer Science\n\nInffeldgasse 16b, 8010 Graz, Austria\n\nhttp://www.igi.tugraz.at/aschwaig\n\nVolker Tresp, Peter Mayer\n\n2 Siemens Corporate Technology, Department of Neural Computation\n\nOtto-Hahn-Ring 6, 81739 Munich, Germany\n\nhttp://www.tresp.org,peter.mayer@mchp.siemens.de\n\nAlexander K. Scheel, Gerhard M\u00a8uller\n\nUniversity G\u00a8ottingen, Department of Medicine, Nephrology and Rheumatology\n\nRobert-Koch-Stra\u00dfe 40, 37075 G\u00a8ottingen, Germany\n\nascheel@gwdg.de,gmueller@med.uni-goettingen.de\n\nAbstract\n\nWe describe the RA scanner, a novel system for the examination of pa-\ntients suffering from rheumatoid arthritis. The RA scanner is based on\na novel laser-based imaging technique which is sensitive to the optical\ncharacteristics of \ufb01nger joint tissue. Based on the laser images, \ufb01nger\njoints are classi\ufb01ed according to whether the in\ufb02ammatory status has\nimproved or worsened. To perform the classi\ufb01cation task, various lin-\near and kernel-based systems were implemented and their performances\nwere compared. Special emphasis was put on measures to reliably per-\nform parameter tuning and evaluation, since only a very small data set\nwas available. Based on the results presented in this paper, it was con-\ncluded that the RA scanner permits a reliable classi\ufb01cation of patholog-\nical \ufb01nger joints, thus paving the way for a further development from\nprototype to product stage.\n\n1 Introduction\n\nRheumatoid arthritis (RA) is the most common in\ufb02ammatory arthropathy with 1\u20132% of the\npopulation being affected. This chronic, mostly progressive disease often leads to early dis-\nability and joint deformities. Recent studies have convincingly shown that early treatment\nand therefore an early diagnosis is mandatory to prevent or at least delay joint destruc-\ntion [2]. Unfortunately, long-term medication with disease modifying anti-rheumatic drugs\n(DMARDs) often acts very slowly on clinical parameters of in\ufb02ammation, making it dif-\n\ufb01cult to \ufb01nd the right drug for a patient within adequate time. Conventional radiology,\n\n\fsuch as magnetic resonance imaging (MRI) and ultrasound, may provide information on\nsoft tissue changes, yet these techniques are time-consuming and\u2014in the case of MRI\u2014\ncostly. New imaging techniques for RA diagnosis should thus be non-invasive, of low cost,\nexaminer independent and easy to use.\n\nFollowing recent experiments on absorption and scattering coef\ufb01cients of laser light in\njoint tissue [6], a prototype laser imaging technique was developed [7]. As part of the pro-\ntotype development, it became necessary to analyze if the rheumatic status of a \ufb01nger joint\ncan be reliably classi\ufb01ed on the basis of the laser images. Aim of this article is to pro-\nvide an overview of this analysis. Employing different linear and kernel-based classi\ufb01ers,\nwe will investigate the performance of the laser imaging technique to predict the status\nof the rheumatic joint in\ufb02ammation. Provided that the accuracy of the overall system is\nsuf\ufb01ciently high, the imaging technique and the automatic in\ufb02ammation classi\ufb01cation can\nbe combined into a novel device that allows an inexpensive and objective assessment of\nin\ufb02ammatory joint changes.\n\nThe paper is organized as follows. In Sec. 2 we describe the RA scanner in more detail, as\nwell as the process of data acquisition. In Sec. 3 we describe the linear and kernel-based\nclassi\ufb01ers used in the experiments. In Sec. 4 we describe how the methods were evaluated\nand compared. We present experimental results in Sec. 5. Conclusions and an outlook are\ngiven in Sec. 6.\n\n2 The RA Scanner\n\nThe rheumatoid arthritis (RA) scanner provides a new medical imaging technique, devel-\noped speci\ufb01cally for the diagnosis of RA in \ufb01nger joints. The RA scanner [7] allows the\nin vivo trans-illumination of \ufb01nger joints with laser light in the near infrared wavelength\nrange. The scattered light distribution is detected by a camera and is used to assess the\nin\ufb02ammatory status of the \ufb01nger joint. Example images, taken from an in\ufb02amed joint and\nfrom a healthy control, are shown in Fig. 1.\n\nStarting out from the laser images, image pre-processing is used to obtain a description of\neach laser image by nine numerical features. A brief description of the features is given in\nFig. 1. Furthermore for each \ufb01nger joint examined, the circumference is measured using a\nconventional measuring tape. The nine image features plus the joint circumference make up\nthe data that is used in the classi\ufb01cation step of the RA scanner to predict the in\ufb02ammatory\nstatus of the joint.\n\n2.1 Data Acquisition\n\nOne of the clinically important questions is to know as early as possible if a prescribed\nmedication improves the state of rheumatoid arthritis. Therefore the goal of the classi-\n\ufb01cation step in the RA scanner is to decide\u2014based on features extracted from the laser\nimages\u2014if there was an improvement of arthritis activity or if the joint in\ufb02ammation re-\nmained unchanged or worsened.\n\nThe data for the development of the RA scanner stems from a study on 22 patients with\nrheumatoid arthritis. Data from 72 \ufb01nger joints were used for the study. All of these 72\n\ufb01nger joints were examined at baseline and during a follow-up visit after a mean duration of\n42 days. Earlier data from an additional 20 patients had to be discarded since experimental\nconditions were not controlled properly.\n\nEach joint was examined and the clinical arthritis activity was classi\ufb01ed from 0 (inactive,\nnot swollen, tender or warm) to 3 (very active) by a rheumatologist. The characteristics of\njoint tissue was recorded by the above described laser imaging technique. In a preprocess-\n\n\f(a) Laser image of a healthy \ufb01nger joint\n\n(b) Laser image of an in\ufb02amed \ufb01nger\njoint. The in\ufb02ammation changes the joint\ntissue\u2019s absorption coef\ufb01cient, giving a\ndarker image.\n\nFigure 1: Two examples of the light distribution captured by the RA scanner. A laser beam\nis sent through the \ufb01nger joint (the \ufb01nger tip is to the right, the palm is on the left), the light\ndistribution below the joint is captured by a CCD element. To calculate the features, \ufb01rst\na horizontal line near the vertical center of the \ufb01nger joint is selected. The distribution of\nlight intensity along that line is bell-shaped. The features used in the classi\ufb01cation task are\nthe maximum light intensity, the curvature of the light intensity at the maximum and seven\nadditional features based on higher moments of the intensity curve.\n\ning step nine features were derived from the distribution of the scattered laser light (see\nFig. 1). The tenth feature is the circumference of the \ufb01nger joint.\n\nSince there are high inter-individual variations in optical joint characteristics, it is not pos-\nsible to tell the in\ufb02ammatory status of a joint from one single image.\nInstead, special\nemphasis was put on the intra-individual comparison of baseline and follow-up data. For\nevery joint examined, data from baseline and follow-up visit were compared and changes\nin arthritis activity were rated as improvement, unchanged or worsening.\nThis rating divided the data into two classes: Class 1 contains the joints where an im-\nprovement of arthritis activity was observed (a total of 46 joints), and class\n1 are the\njoints that remained unchanged or worsened (a total of 26 joints). For all joints, the differ-\nences in feature values between baseline and follow-up visit were computed.\n\n3 Classi\ufb01cation Methods\n\nIn this section, we describe the employed linear and kernel-based classi\ufb01cation methods,\nwhere we focus on design issues.\n\n3.1 Gaussian Process Classi\ufb01cation (GPC)\n\nIn Gaussian processes, a function\n\nM(cid:229)\n\nj\n\n1\n\nf\n\nx\n\n\u0003\u0005\u0004\n\nw jK\n\nx\n\nx j\n\n(1)\n\nis described as a superposition of M kernel functions K\n, de\ufb01ned for each of the\nM training data points x j, with weight w j. The kernel functions are parameterized by the\nvector Q\n. In two-class Gaussian process classi\ufb01cation, the logistic transfer\nfunction s\n1 is applied to the prediction of a Gaussian process to\n1 \nproduce an output which can be interpreted as p\n, the probability of the input x belonging\nto class 1 [10].\n\n\t\n\t\u000b\t\n\u0003\u000b\u0003\f\u0004\r\u0002\n\nq 0\nx\n\n\u0004\b\u0002\nf\n\nq d\n\nx\n\nx j\n\nf\n\nx\n\ne\n\n\u0003\u0011\u000e\n\nx\n\n\u0001\n\u0002\n\u0006\n\u0002\n\u0007\n\u0007\nQ\n\u0003\n\u0002\n\u0007\n\u0007\nQ\n\u0003\n\u0003\n\u0002\n\u0002\n\u000e\n\u000f\n\u0010\n\u0002\n\u0003\n\fIn the experiment we chose the Gaussian kernel function\n\nx\n\n1\n\nx\n\nx\n\nK\n\nx j\n\nx j\n\nT diag\n\nq 0 exp\u0001\n\n1\n2 \u0002\nwith input length scales q 1\nq 2\nq 2\n1\nd\u0003\ndenotes a diagonal matrix with entries q 2\nq 2\n\t\u000b\t\n\t\nd. For training the Gaussian process classi\ufb01er\n1\nq d) we\n(that is, determining the posterior probabilities of the parameters w 1\nused a full Bayesian approach, implemented with Readford Neal\u2019s freely available FBM\nsoftware.1\n\nq d where d is the dimension of the input space. diag\nq 0\n\nq 2\nd\u0003\n\nq 2\n1\n\nwM\n\n\t\n\t\u000b\t\n\n\t\n\t\u000b\t\n\n\t\u000b\t\n\t\n\n\u0003\u0003\u0002\n\n\t\u000b\t\u000b\t\n\n\t\u000b\t\u000b\t\n\n(2)\n\nx j\n\n3.2 Gaussian Process Regression (GPR)\n\n1\n\n 1\n\n0 is treated as indicating an example from class 0, any output\n\nIn GPR we treat the classi\ufb01cation problem as a regression problem with target values\n, i.e. we do not apply the logistic transfer function as in the last subsection.\nAny GP output\n0 as\nan indicator for class 1.The disadvantage is that the GPR prediction cannot be treated as\na posterior class probability; the advantage is that the fast and non-iterative training al-\ngorithms for GPR can be applied. GPR for classi\ufb01cation problems can be considered as\nspecial cases of Fisher discriminant analysis with kernels [4] and of least squares support\nvector machines [9].\nThe parameters Q\nof the covariance function Eq. (2) were chosen by maxi-\n(cid:181) P\nmizing the posterior probability of Q, P\nt\n, via a scaled conjugate\ngradient method. Later on, this method will be referred to as \u201cGPR Bayesian\u201d. Results are\nq 2\nalso given for a simpli\ufb01ed covariance function with q 0\nr, where\nthe common length scale r was chosen by cross-validation (later on referred to as \u201cGPR\ncrossval\u201d).\n\nP\n1, q 1\n\nq 0\n\nq d\n\nq d\n\n\t\n\t\u000b\t\n\n\t\u000b\t\n\t\n\nX\n\nX\n\nt\n\n3.3 Support Vector Machine (SVM)\n\nThe SVM is a maximum margin linear classi\ufb01er. As in Sec. 3.2, the SVM classi\ufb01es a\npattern according to the sign of f\nw1\n\nT in the SVM minimize the particular cost function [8]\n\nin Eq. (1). The difference is that the weights w\n\nwM\n\nx\n\n\t\u000b\t\n\t\n\n(3)\n\nwT Kw \n\nM(cid:229)\n\ni\n\n1\n\nCi\n\n1\n\nyi\n\nf\n\nxi\n\n\u0003\u000b\u0003\n\u0003\n\t\n\n1\n\n\u0007\u000b\u0001\n\nxi\n\nx j\n\n 1\n\nsets all negative arguments to zero. Here, yi\n\nis the class label for\nwhere\ntraining point xi. Ci\n0 is a constant that determines the weight of errors on the training\ndata, and K is an M\nM matrix containing the amplitudes of the kernel functions at the\ntraining data, i.e. Ki j\n. The motivation for this cost function stems from sta-\nK\ntistical learning theory [8]. Many authors have previously obtained excellent classi\ufb01cation\nresults by using the SVM. One particular feature of the SVM is the sparsity of the solution\nvector w, that is, many elements wi are zero.\nIn the experiments, we used both an SVM with linear kernel (\u201cSVM linear\u201d) and an SVM\nwith a Gaussian kernel (\u201cSVM Gaussian\u201d), equivalent to the Gaussian process kernel\nEq. (2), with q 0\nr. The kernel parameter r was chosen by\ncross-validation.\n\n1, q 1\n\nq d\n\nq 2\n\n\t\n\t\u000b\t\n\n1As a prior distribution for kernel parameter q 0 we chose a Gamma distribution. q 1\n\nq d are sam-\nples of a hierarchical Gamma distribution. In FBM syntax, the prior is 0.05:0.5 x0.2:0.5:1.\nSampling from the posterior distribution was done by persistent hybrid Monte Carlo, following the\nexample of a 3-class problem in Neal [5].\n\n\u000e\u0003\u000e\u000f\u000e\n\n\u0002\n\u0007\n\u0007\nQ\n\u0003\n\u0004\n\u0001\n\u0001\n\u0003\n\u0002\n\u000e\n\u0002\n\u0001\n\u0002\n\u0007\n\u0007\n\u0007\n\u0004\n\u0001\n\u0007\n\u0005\n\u0006\n\u0007\n\u0004\n\u0004\n\u0004\n\u0005\n\u0002\nQ\n\b\n\u0007\n\u0003\n\u0002\n\b\n\u0007\nQ\n\u0003\n\u0002\nQ\n\u0003\n\u0004\n\u0004\n\u0004\n\u0004\n\u0004\n\u0002\n\u0003\n\u0004\n\u0002\n\u0007\n\u0007\n\u0003\n\u0006\n\u0002\n\u0001\n\u0002\n\u0002\n\u0002\n\u0003\n\t\n\u000b\n\u0004\n\u0005\n\f\n\n\u0004\n\u0002\n\u0007\n\u0007\nQ\n\u0003\n\u0004\n\u0004\n\u0004\n\u0004\n\u0004\n\fTo compensate for the unbalanced distribution of classes, the penalty term C i was chosen\nto be 0\n8 for the examples from the larger class and 1 for the smaller class. This was found\nempirically to give the best balance of sensitivity and speci\ufb01city (cf. Sec. 4). A formal\ntreatment of this issue can be found in Lin et al. [3].\n\n3.4 Generalized Linear Model (GLM)\n\nx\n\nA GLM for binary responses is built up from a linear model for the input data, and the\nwT x is in turn input to the link function. For Bernoulli distributions,\nmodel output f\nthe natural link function [1] is the logistic transfer function s\n1. The\noverall output of the GLM s\n, the probability of the input x belonging\nto class 1. Training of the linear model was done by iteratively re-weighted least squares\n(IRLS).\n\ncomputes p\n\n1 e\n\n\u0003\u0005\u0004\n\nf\n\nx\n\nf\n\nx\n\nf\n\nx\n\n\u0003\u000b\u0003\n\nx\n\n\u0003\u000b\u0003\n\n\u0003\n\u000e\n\n4 Training and Evaluation\n\nOne of the challenges in developing the classi\ufb01cation system for the RA scanner is the low\nnumber of training examples available. Data was collected through an extensive medical\nstudy, but only data from 72 \ufb01ngers were found to be suitable for further use. Further\ndata can only be acquired in carefully controlled future studies, once the initial prototype\nmethod has proven suf\ufb01ciently successful.\n\nTraining From the currently available 72 training examples, classi\ufb01ers need to be trained\nand evaluated reliably. Part of the standard methodology for small data sets is N-fold cross-\nvalidation, where the data are partitioned into N equally sized sets and the system is trained\non N\n1 of those sets and tested on the Nth data set left out. Since we wish to make use of\n36 seemed the appropriate choice 2, giving test sets\nas much training data as possible, N\nwith two examples in each iteration. For some of the methods model parameter needed\nto be tuned (for example, choosing SVM kernel width), where again cross-validation is\nemployed. The nested cross-validation ensures that in no case any of the test examples is\nused for training or to tune parameters, leading to the following procedure:\n\nRun 36 fold CV\n\nFor Bayesian methods or methods without tunable parameters\n\n(SVM linear, GPC, GPR Bayesian, GLM):\nUse full training set to tune and train classifier\nFor Non-Bayesian methods (SVM Gaussian, GPR crossval):\n\nRun 35 fold CV on the training set\n\nchoose parameters to minimise CV error\n\ntrain classifier with chosen parameters\n\nevaluate the classifier on the 2 example test set\n\nSigni\ufb01cance Tests\nIn order to compare the performance of two given classi\ufb01cation meth-\nods, one usually employs statistical hypothesis testing. We use here a test that is best suited\nfor small test sets, since it takes into account the outcome on the test examples one by one,\nthus matching our above described 36-fold cross validation scheme perfectly. A similar\ntest has been used by Yang and Liu [11] to compare text categorization methods.\n\nBasis of the test are two counts b (how many examples in the test set were correctly classi-\n\ufb01ed by method B, but misclassi\ufb01ed by method A) and c (number of examples misclassi\ufb01ed\nby B, correctly classi\ufb01ed by A). We assume that examples misclassi\ufb01ed (resp. correctly\nclassi\ufb01ed) by both A and B do not contribute to the performance difference. We take the\n\n2Thus, it is equivalent to a leave-one-out scheme, yet with only half the time consumption.\n\n\t\n\u0002\n\u0002\n\u0002\n\u0004\n\u0002\n\u000e\n\u000f\n\u0010\n\u0002\n\u0002\n\u0002\n\u0003\n\u0001\n\u0004\n\fMethod\nGLM\nGLM, reduced feature set\nGPR Bayesian\nGPR crossval\nGPC\nSVM linear\nSVM linear, reduced feature set\nSVM Gaussian\n\nError rate\n83%\n67%\n89%\n22%\n61%\n22%\n67%\n83%\n\n20\n16\n13\n22\n23\n22\n16\n20\n\nTable 1: Error rates of different classi\ufb01cation methods on the rheumatoid arthritis predic-\ntion problem. All error rates have been computed by 36-fold cross-validation. \u201cReduced\nfeature set\u201d indicates experiments where a priori feature selection has been done\n\nis the proportion of cases where method A performs better than method B.\n\ncounts b and c as the suf\ufb01cient statistics of a binomial random variable with parameter q,\nwhere q\nThe null hypothesis H0 is that the parameter q\nthe same performance. Hypothesis H1 is that q\nhypothesis is the Binomial distribution Bi\nb \nthe null hypothesis if the probability of observing a count k\nP\n\n5, that is, both methods A and B have\n0\n5. The test statistics under the null\nq) with parameter\n5. We reject\nc under the null hypothesis\n\nis suf\ufb01ciently small.\n\n0\n\n5\n\n0\n\n0\n\nk\n\nc\n\ni\n\nc\n\n(cid:229) b\ni\n\nc\nc Bi\n\ni\n\nb c\n\nROC Curves\nIn medical diagnosis, biometrics and other areas, the common means of\nassessing a classi\ufb01cation method is the receiver operating characteristics (ROC) curve. An\nROC curve plots sensitivity versus 1-speci\ufb01city3 for different thresholds of the classi\ufb01er\noutput. Based on the ROC curve it can be decided how many false positives resp. false\nnegatives one is willing to tolerate, thus helping to tune the classi\ufb01er threshold to best suit\na certain application.\n\nAcquiring the ROC curve typically requires the classi\ufb01er output on an independent test set.\nWe instead use the union of all test set outputs in the cross-validation routine. This means\nthat the ROC curve is based on outputs of slightly different models, yet this still seems to\nbe the most suitable solution for such few data. For all classi\ufb01ers we assess the area of the\nROC curve and the cross-validation error rate. Here the above mentioned threshold on the\nclassi\ufb01er output is chosen such that sensitivity equals speci\ufb01city.\n\n5 Results\n\nTab. 1 lists error rates for all methods listed in Sec. 3. Gaussian process regression (GPR\n14% clearly outperforms all other methods, which all\nBayesian) with an error rate of\n24%. We attribute the good perfor-\nachieve comparable error rates in the range of 20\nmance of GPR to its inherent feature relevance detection, which is done by adapting the\nlength scales q\ni means that the i-th feature\nis essentially ignored.\n\ni in the covariance function Eq. (2), i.e. a large q\n\n\t\u000b\t\n\t\n\nSurprisingly, Gaussian process classi\ufb01cation implemented with Markov chain Monte Carlo\nsampling (GPC) showed rather poor performance. We currently have no clear explanation\nfor this fact. We found no indications of convergence problems, furthermore we achieved\nsimilar results with different sampling schemes.\n\nIn an additional experiment we wanted to \ufb01nd out if classi\ufb01cation results could be improved\n\n3sensitivity\n\ntrue positives\n\ntrue positives\n\nfalse negatives\n\nspeci\ufb01city\n\ntrue negatives\n\ntrue negatives\n\nfalse positives\n\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\t\n\u0004\n\t\n\u0007\n\t\n\u0002\n\b\n\u0007\nq\n\u0004\n\t\n\f\n\u0002\n\f\n\u0003\n\u0004\n\t\n\u0006\n\u0002\n\b\n\u0007\nq\n\u0004\n\t\n\u0003\n\n\u0001\n\u0002\n\u0001\n\u0002\n\f1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\ny\nt\ni\nv\ni\nt\ni\ns\nn\ne\nS\n\n0\n0\n\n0.2\n\nGPR Bayesian\nGLM, reduced feature set\nSVM linear, reduced feature set\n\n0.4\n0.6\n1\u2212Specificity\n\n0.8\n\n1\n\nFigure 2: ROC curves of the best classi\ufb01cation methods, both on the full data set and on\na reduced data set where a priori feature selection was used to retain only the three most\nrelevant features. Integrating the area under the ROC curves gives similar results for all\nthree methods, with an area of 0\n84 for GPR Bayesian\n\n86 for SVM linear and GLM, and 0\n\nby using only a subset of input features 4. We found that only the performance of the two\nlinear classi\ufb01ers (GLM and SVM linear) could be improved by the input feature selection.\nBoth now achieve an error rate of 16\n67%, which is slightly worse than GPR on the full\nfeature set (see Tab. 1).\n\nSigni\ufb01cance Tests Using the statistical hypothesis test described in the previous section,\nwe compared all classi\ufb01cation methods pairwise.\nIt turned out the three best methods\n(GPR Bayesian, and GLM and SVM linear with reduced feature set) perform better than\nall other methods at a con\ufb01dence level of 90% or more. Amongst the three best methods,\nno signi\ufb01cant difference could be observed.\n\nROC Curves For the three best classi\ufb01cation methods (GPR Bayesian, and GLM and\nSVM linear with reduced feature set), we have plotted the receiver operating characteristics\n80% can be achieved\n(ROC) curve in Fig. 2. According to the ROC curve a sensitivity of\nwith a speci\ufb01city at around 90%. GPR Bayesian seems to give best results, both in terms\nof error rate and shape of the ROC curve.\n\nSummary To summarize, when the full set of features was used, best performance was\nobtained with GPR Bayesian. We attribute this to the inherent input relevance detection\nmechanisms of this approach. Comparable yet slightly worse results could be achieved\nby performing feature selection a priori and reducing the number of input features to the\nthree most signi\ufb01cant ones. In particular, the error rates of linear classi\ufb01ers (GLM and\nlinear SVM) improved by this feature selection, whereas more complex classi\ufb01ers did not\nbene\ufb01t. We can draw the important conclusion that, using the best classi\ufb01ers, a sensitivity\nof 80% can be reached at a speci\ufb01city of approximately 90%.\n\n6 Conclusions\n\nIn this paper we have reported results of the analysis of a prototype medical imaging sys-\ntem, the RA scanner. Aim of the RA scanner is to detect soft tissue changes in \ufb01nger joints,\n\n4This was done with the input relevance detection algorithm of the neural network tool SENN,\na variant of sequential backward elimination where the feature that least affects the neural network\noutput is removed. The feature set was reduced to the three most relevant ones.\n\n\t\n\t\n\t\n\n\fwhich occur in early stages of rheumatoid arthritis (RA). Basis of the RA scanner is a novel\nlaser imaging technique that is sensitive to in\ufb02ammatory soft tissue changes.\n\nWe have analyzed whether the laser images are suitable for an accurate prediction of the\nin\ufb02ammatory status of a \ufb01nger joint, and which classi\ufb01cation methods are best suited for\nthis task. Out of a set of linear and kernel-based classi\ufb01cation methods, Gaussian processes\nregression performed best, followed closely by generalized linear models and the linear\nsupport vector machine, the latter two operating on a reduced feature set. In particular, we\nhave shown how parameter tuning and classi\ufb01er training can be done on basis of the scarce\navailable data. For the RA prediction task, we achieved a sensitivity of 80% at a speci\ufb01city\nof approximately 90%. These results show that a further development of the RA scanner is\ndesirable.\n\nIn the present study the in\ufb02ammatory status is assessed by a rheumatologist, taking into\naccount the patients subjective degree of pain. Thus we may expect a certain degree of label\nnoise in the data we have trained the classi\ufb01cation system on. Further developments of the\nclassi\ufb01cation system in the RA scanner will thus incorporate information from established\nmedical imaging systems such as magnetic resonance imaging (MRI). MRI is known to\nprovide accurate information about soft tissue changes in \ufb01nger joints, yet is too costly to\nbe routinely used for RA diagnosis. By incorporating MRI results into the RA scanner\u2019s\nclassi\ufb01cation system, we expect to signi\ufb01cantly improve the overall accuracy.\n\nAcknowledgments AS gratefully acknowledges support through an Ernst-von-Siemens\nscholarship. Thanks go to Radford Neal for making his FBM software available to the\npublic, and to Ian Nabney and Chris Bishop for the Netlab toolbox.\n\nReferences\n[1] Fahrmeir, L. and Tutz, G. Multivariate Statistical Modelling Based on Generalized Linear\n\nModels. Springer Verlag, 2nd edn., 2001.\n\n[2] Kim, J. and Weisman, M. When does rheumatoid arthritis begin and why do we need to know?\n\nArthritis and Rheumatism, 43:473\u2013482, 2000.\n\n[3] Lin, Y., Lee, Y., and Wahba, G. Support vector machines for classi\ufb01cation in nonstandard\nsituations. Tech. Rep. 1016, Department of Statistics, University of Wisconsin, Madison, WI,\nUSA, 2000.\n\n[4] Mika, S., R\u00a8atsch, G., Weston, J., Sch\u00a8olkopf, B., Smola, A. J., and M\u00a8uller, K.-R.\n\nfeature extraction and classi\ufb01cation in kernel spaces.\nM\u00a8uller, eds., Advances in Neural Information Processing Systems 12. MIT Press, 2000.\n\nInvariant\nIn S. A. Solla, T. K. Leen, and K.-R.\n\n[5] Neal, R. M. Monte carlo implementation of gaussian process models for bayesian regression\n\nand classi\ufb01cation. Tech. Rep. 9702, Department of Statistics, University of Toronto, 1997.\n\n[6] Prapavat, V., Runge, W., Krause, A., Beuthan, J., and M\u00a8uller, G. A. Bestimmung von gewe-\nbeoptischen Eigenschaften eines Gelenksystems im Fr\u00a8uhstadium der rheumatoiden Arthritis (in\nvitro). Minimal Invasive Medizin, 8:7\u201316, 1997.\n\n[7] Scheel, A. K., Krause, A., Mesecke-von Rheinbaben, I., Metzger, G., Rost, H., Tresp, V., Mayer,\nP., Reuss-Borst, M., and M\u00a8uller, G. A. Assessment of proximal \ufb01nger joint in\ufb02ammation in\npatients with rheumatoid arthritis, using a novel laser-based imaging technique. Arthritis and\nRheumatism, 46(5):1177\u20131184, 2002.\n\n[8] Sch\u00a8olkopf, B. and Smola, A. J. Learning with Kernels. MIT Press, 2002.\n[9] Van Gestel, T., Suykens, J. A., Lanckriet, G., Lambrechts, A., De Moor, B., and Vandewalle,\nJ. Bayesian framework for least-squares support vector machine classi\ufb01ers, gaussian processes\nand kernel \ufb01sher discriminant analysis. Neural Computation, 14(5):1115\u20131147, 2002.\n\n[10] Williams, C. K. and Barber, D. Bayesian classi\ufb01cation with gaussian processes.\nactions on Pattern Analysis and Machine Intelligence, 20(12):1342\u20131351, 1998.\n\nIEEE Trans-\n\n[11] Yang, Y. and Liu, X. A re-examination of text categorization methods. In Proceedings of ACM\n\nSIGIR 1999. ACM Press, 1999.\n\n\f", "award": [], "sourceid": 2175, "authors": [{"given_name": "Anton", "family_name": "Schwaighofer", "institution": null}, {"given_name": "Volker", "family_name": "Tresp", "institution": null}, {"given_name": "Peter", "family_name": "Mayer", "institution": null}, {"given_name": "Alexander", "family_name": "Scheel", "institution": null}, {"given_name": "Gerhard", "family_name": "M\u00fcller", "institution": null}]}