{"title": "Learning shape correspondence with anisotropic convolutional neural networks", "book": "Advances in Neural Information Processing Systems", "page_first": 3189, "page_last": 3197, "abstract": "Convolutional neural networks have achieved extraordinary results in many computer vision and pattern recognition applications; however, their adoption in the computer graphics and geometry processing communities is limited due to the non-Euclidean structure of their data. In this paper, we propose Anisotropic Convolutional Neural Network (ACNN), a generalization of classical CNNs to non-Euclidean domains, where classical convolutions are replaced by projections over a set of oriented anisotropic diffusion kernels. We use ACNNs to effectively learn intrinsic dense correspondences between deformable shapes, a fundamental problem in geometry processing, arising in a wide variety of applications. We tested ACNNs performance in very challenging settings, achieving state-of-the-art results on some of the most difficult recent correspondence benchmarks.", "full_text": "Learning shape correspondence with\n\nanisotropic convolutional neural networks\n\nDavide Boscaini1, Jonathan Masci1, Emanuele Rodol`a1, Michael Bronstein1,2,3\n\n1USI Lugano, Switzerland\n\n2Tel Aviv University, Israel\n\n3Intel, Israel\n\nname.surname@usi.ch\n\nAbstract\n\nConvolutional neural networks have achieved extraordinary results in many com-\nputer vision and pattern recognition applications; however, their adoption in the\ncomputer graphics and geometry processing communities is limited due to the\nnon-Euclidean structure of their data. In this paper, we propose Anisotropic Con-\nvolutional Neural Network (ACNN), a generalization of classical CNNs to non-\nEuclidean domains, where classical convolutions are replaced by projections over\na set of oriented anisotropic diffusion kernels. We use ACNNs to effectively learn\nintrinsic dense correspondences between deformable shapes, a fundamental prob-\nlem in geometry processing, arising in a wide variety of applications. We tested\nACNNs performance in challenging settings, achieving state-of-the-art results on\nrecent correspondence benchmarks.\n\n1\n\nIntroduction\n\nIn geometry processing, computer graphics, and vision, \ufb01nding intrinsic correspondence between\n3D shapes affected by different transformations is one of the fundamental problems with a wide\nspectrum of applications ranging from texture mapping to animation [25]. Of particular interest is\nthe setting in which the shapes are allowed to deform non-rigidly. Traditional hand-crafted corre-\nspondence approaches are divided into two main categories: point-wise correspondence methods\n[17], which establish the matching between (a subset of) the points on two or more shapes by min-\nimizing metric distortion, and soft correspondence methods [23], which establish a correspondence\namong functions de\ufb01ned over the shapes, rather than the vertices themselves. Recently, the emer-\ngence of 3D sensing technology has brought the need to deal with acquisition artifacts, such as\nmissing parts, geometric, and topological noise, as well as matching 3D shapes in different repre-\nsentations, such as meshes and point clouds. With new and broader classes of artifacts, comes the\nneed of learning from data invariance that is otherwise impossible to model axiomatically.\nIn the past years, we have witnessed the emergence of learning-based approaches for 3D shape\nanalysis. The \ufb01rst attempts were aimed at learning local shape descriptors [15, 5, 27], and shape\ncorrespondence [20]. The dramatic success of deep learning (in particular, convolutional neural\nnetworks [8, 14]) in computer vision [13] has led to a recent keen interest in the geometry processing\nand graphics communities to apply such methodologies to geometric problems [16, 24, 28, 4, 26].\n\nExtrinsic deep learning. Many machine learning techniques successfully working on images were\ntried \u201cas is\u201d on 3D geometric data, represented for this purpose in some way \u201cdigestible\u201d by stan-\ndard frameworks. Su et al. [24] used CNNs applied to range images obtained from multiple views\nof 3D objects for retrieval and classi\ufb01cation tasks. Wei et al. [26] used view-based representation\nto \ufb01nd correspondence between non-rigid shapes. Wu et al. [28] used volumetric CNNs applied to\nrasterized volumetric representation of 3D shapes. The main drawback of such approaches is their\ntreatment of geometric data as Euclidean structures. Such representations are not intrinsic, and vary\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fFigure1:Illustrationofthedif-ferencebetweenextrinsic(left)andintrinsic(right)deeplearningmethodsongeometricdata.In-trinsicmethodsworkonthemani-foldratherthanitsEuclideanreal-izationandareisometry-invariantbyconstruction.astheresultofposeordeformationoftheobject.Forinstance,inFigure1,the\ufb01lterthatrespondstofeaturesonastraightcylinderwouldnotrespondtoabentone.Achievinginvariancetoshapede-formations,acommonrequirementinmanyapplications,isextremelyhardwiththeaforementionedmethodsandrequirescomplexmodelsandhugetrainingsetsduetothelargenumberofdegreesoffreedominvolvedindescribingnon-rigiddeformations.Intrinsicdeeplearningapproachestrytoapplylearningtechniquestogeometricdatabygeneral-izingthemainingredientssuchasconvolutionstonon-Euclideandomains.Inanintrinsicrepresen-tation,the\ufb01lterisappliedtosomedataonthesurfaceitself,thusbeinginvarianttodeformationsbyconstruction(seeFigure1).The\ufb01rstintrinsicconvolutionalneuralnetworkarchitecture(GeodesicCNN)waspresentedin[16].Whileproducingimpressiveresultsonseveralshapecorrespondenceandretrievalbenchmarks,GCNNhasanumberofsigni\ufb01cantdrawbacks.First,thechartingproce-dureislimitedtomeshes,andsecond,thereisnoguaranteethatthechartisalwaystopologicallymeaningful.AnotherintrinsicCNNconstruction(LocalizedSpectralCNN)usinganalternativechartingtechniquebasedonthewindowedFouriertransform[22]wasproposedin[4].Thismethodisageneralizationofapreviouswork[6]onspectraldeeplearningongraphs.Oneofthekeyad-vantagesofLSCNNisthatthesameframeworkcanbeappliedtodifferentshaperepresentations,inparticular,meshesandpointclouds.Adrawbackofthisapproachisitsmemoryandcomputationrequirements,aseachwindowneedstobeexplicitlyproduced.Contributions.WepresentAnisotropicConvolutionalNeuralNetworks(ACNN),amethodforintrinsicdeeplearningonnon-Euclideandomains.Thoughitisagenericframeworkthatcanbeusedtohandledifferenttasks,wefocushereonlearningcorrespondencebetweenshapes.Ourap-proachisrelatedtotwopreviousmethodsfordeeplearningonmanifolds,GCNN[16]andADD[5].Comparedto[5],wherealearnedspectral\ufb01lterappliedtotheeigenvaluesofanisotropicLaplace-Beltramioperator,weuseanisotropicheatkernelsasspatialweightingfunctionsallowingtoextractalocalintrinsicrepresentationofafunctionde\ufb01nedonthemanifold.UnlikeADD,ourACNNisaconvolutionalneuralnetworkarchitecture.ComparedtoGCNN,ourconstructionofthe\u201cpatchop-erator\u201dismuchsimpler,doesnotdependontheinjectivityradiusofthemanifold,andisnotlimitedtotriangularmeshes.Overall,ACNNcombinesallthebestpropertiesofthepreviousapproacheswithoutinheritingtheirdrawbacks.WeshowthattheproposedframeworkoutperformsGCNN,ADD,andotherstate-of-the-artapproachesonchallengingcorrespondencebenchmarks.2BackgroundWemodela3Dshapeasatwo-dimensionalcompactRiemannianmanifold(surface)X.LetTxXdenotethetangentplaneatx,modelingthesurfacelocallyasaEuclideanspace.ARiemannianmetricisaninnerproducth\u00b7,\u00b7iTxX:TxX\u21e5TxX!Ronthetangentplane,dependingsmoothlyonx.QuantitieswhichareexpressibleentirelyintermsofRiemannianmetric,andthereforeinde-pendentonthewaythesurfaceisembedded,arecalledintrinsic.Suchquantitiesareinvarianttoisometric(metric-preserving)deformations.Heatdiffusiononmanifoldsisgovernedbytheheatequation,whichhasthemostgeneralformft(x,t)=divX(D(x)rXf(x,t)),(1)withappropriateboundaryconditionsifnecessary.HererXanddivXdenotetheintrinsicgradientanddivergenceoperators,andf(x,t)isthetemperatureatpointxattimet.D(x)isthethermalconductivitytensor(2\u21e52matrix)appliedtotheintrinsicgradientinthetangentplane.Thisformu-lationallowsmodelingheat\ufb02owthatisposition-anddirection-dependent(anisotropic).Andreuxet2\fal.[1]consideredanisotropicdiffusiondrivenbythesurfacecurvature.Boscainietal.[5],assumingthatateachpointxthetangentvectorsareexpressedw.r.t.theorthogonalbasisvm,vMofprincipalcurvaturedirections,usedathermalconductivitytensoroftheformD\u21b5\u2713(x)=R\u2713(x)\uf8ff\u21b51R>\u2713(x),(2)wherethe2\u21e52matrixR\u2713(x)performsrotationof\u2713w.r.t.tothemaximumcurvaturedirectionvM(x),and\u21b5>0isaparametercontrollingthedegreeofanisotropy(\u21b5=1correspondstotheclassicalisotropiccase).Werefertotheoperator\u21b5\u2713f(x)=divX(D\u21b5\u2713(x)rXf(x))astheanisotropicLaplacian,anddenoteby{\u21b5\u2713i,\u21b5\u2713i}i0itseigenfunctionsandeigenvalues(computed,ifapplicable,withtheappropriateboundaryconditions)satisfying\u21b5\u2713\u21b5\u2713i(x)=\u21b5\u2713i\u21b5\u2713i(x).Givensomeinitialheatdistributionf0(x)=f(x,0),thesolutionofheatequation(1)attimetisobtainedbyapplyingtheanisotropicheatoperatorHt\u21b5\u2713=et\u21b5\u2713tof0,f(x,t)=Ht\u21b5\u2713f0(x)=ZXf0(\u21e0)h\u21b5\u2713t(x,\u21e0)d\u21e0,(3)whereh\u21b5\u2713t(x,\u21e0)istheanisotropicheatkernel,andtheaboveequationcanbeinterpretedasanon-shift-invariantversionofconvolution.Inthespectraldomain,theheatkernelisexpressedash\u21b5\u2713t(x,\u21e0)=Xk0et\u21b5\u2713k\u21b5\u2713k(x)\u21b5\u2713k(\u21e0).(4)Appealingtothesignalprocessingintuition,theeigenvaluesplaytheroleof\u2018frequencies\u2019,etactsasalow-pass\ufb01lter(largertcorrespondingtolongerdiffusionresultsina\ufb01lterwithanarrowerpassband).ThisconstructionwasusedinADD[5]togeneralizetheOSDapproach[15]usinganisotropicheatkernels(consideringthediagonalh\u21b5\u2713t(x,x)andlearningasetofoptimaltask-speci\ufb01cspectral\ufb01ltersreplacingthelow-pass\ufb01lterset\u21b5\u2713k).\u21b5ijij\u2713ijkhR\u2713\u02c6umR\u2713\u02c6uM\u02c6um\u02c6uM\u02c6n\u02c6ekj\u02c6eki\u02c6ehi\u02c6ehjDiscretization.Inthediscretesetting,thesurfaceXissampledatnpointsV={x1,...,xn}.ThepointsareconnectedbyedgesEandfacesF,formingamanifoldtriangularmesh(V,E,F).Toeachtriangleijk2F,weattachanorthonormalreferenceframeUijk=(\u02c6uM,\u02c6um,\u02c6n),where\u02c6nistheunitnormalvec-tortothetriangleand\u02c6uM,\u02c6um2R3arethedirectionsofprincipalcurvature.Thethermalconductivityten-sorforthetriangleijkoperatingontangentvectorsisexpressedw.r.t.Uijkasa3\u21e53matrix\u21e3\u21b510\u2318.ThediscretizationoftheanisotropicLaplaciantakestheformofann\u21e5nsparsematrixL=S1W.ThemassmatrixSisadiagonalmatrixofareaelementssi=13Pjk:ijk2FAijk,whereAijkdenotestheareaoftriangleijk.ThestiffnessmatrixWiscomposedofweightswij=8><>:12\u21e3h\u02c6ekj,\u02c6ekiiH\u2713sin\u21b5ij+h\u02c6ehj,\u02c6ehiiH\u2713sinij\u2318(i,j)2E;Pk6=iwiki=j;0else,(5)wherethenotationisaccordingtotheinset\ufb01gure,andtheshearmatrixH\u2713=R\u2713Uijk\u21e3\u21b510\u2318U>ijkR>\u2713encodestheanisotropicscalinguptoanorthogonalbasischange.HereR\u2713denotesthe3\u21e53rotationmatrix,rotatingthebasisvectorsUijkoneachtrianglearoundthenormal\u02c6nbyangle\u2713.3\f3\n\nIntrinsic deep learning\n\nThis paper deals with the extension of the popular convolutional neural networks (CNN) [14] to\nnon-Euclidean domains. The key feature of CNNs is the convolutional layer, implementing the idea\nof \u201cweight sharing\u201d, wherein a small set of templates (\ufb01lters) is applied to different parts of the data.\nIn image analysis applications, the input into the CNN is a function representing pixel values given\non a Euclidean domain (plane); due to shift-invariance the convolution can be thought of as passing\na template across the plane and recording the correlation of the template with the function at that\nlocation. One of the major problems in applying the same paradigm to non-Euclidean domains is\nthe lack of shift-invariance, the template now has to be location-dependent.\nAmong the recent attempts to develop intrinsic CNNs on non-Euclidean domain [6, 4, 16], the most\nrelated to our work is GCNN [16]. The latter approach was introduced as a generalization of CNN\nto triangular meshes based on geodesic local patches. The core of this method is the construction of\nlocal geodesic polar coordinates using a procedure previously employed for intrinsic shape context\ndescriptors [12]. The patch operator (D(x)f )(\u2713, \u21e2) in GCNN maps the values of the function\nf around vertex x into the local polar coordinates \u2713, \u21e2, leading to the de\ufb01nition of the geodesic\nconvolution\n\n(f \u21e4 a)(x) =\n\n\u27132[0,2\u21e1)Z a(\u2713 + \u2713, \u21e2)(D(x)f )(\u2713, \u21e2)d\u21e2d\u2713,\n\nmax\n\nwhich follows the idea of multiplication by template, but is de\ufb01ned up to arbitrary rotation \u2713 2\n[0, 2\u21e1) due to the ambiguity in the selection of the origin of the angular coordinate. The authors\npropose to take the maximum over all possible rotations of the template a(\u21e2, \u2713) to remove this\nambiguity. Here, and in the following, f is some feature vector that is de\ufb01ned on the surface (e.g.\ntexture, geometric descriptors, etc.)\nThere are several drawbacks to this construction. First, the charting method relies on a fast marching-\nlike procedure requiring a triangular mesh. While relatively insensitive to triangulation [12], it may\nfail if the mesh is very irregular. Second, the radius of the geodesic patches must be suf\ufb01ciently\nsmall compared to the injectivity radius of the shape, otherwise the resulting patch is not guaranteed\nto be a topological disk. In practice, this limits the size of the patches one can safely use, or requires\nan adaptive radius selection mechanism.\n\n4 Anisotropic convolutional neural networks\nThe key idea of the Anisotropic CNN presented in this paper is the construction of a patch operator\nusing anisotropic heat kernels. We interpret heat kernels as local weighting functions and construct\n\n(D\u21b5(x)f )(\u2713, t) = RX h\u21b5\u2713t(x, \u21e0)f (\u21e0)d\u21e0\nRX h\u21b5\u2713t(x, \u21e0)d\u21e0\n\n,\n\n(7)\n\nfor some anisotropy level \u21b5> 1. This way, the values of f around point x are mapped to a local\nsystem of coordinates (\u2713, t) that behaves like a polar system (here t denotes the scale of the heat\nkernel and \u2713 is its orientation). We de\ufb01ne intrinsic convolution as\n\n(6)\n\n(8)\n\n(f \u21e4 a)(x) = Z a(\u2713, t)(D\u21b5(x)f )(\u2713, t)dtd\u2713,\n\nNote that unlike the arbitrarily oriented geodesic patches in GCNN, necessitating to take a maximum\nover all the template rotations (6), in our construction it is natural to use the principal curvature\ndirection as the reference \u2713 = 0.\nSuch an approach has a few major advantages compared to previous intrinsic CNN models. First,\nbeing a spectral construction, our patch operator can be applied to any shape representation (like\nLSCNN and unlike GCNN). Second, being de\ufb01ned in the spatial domain, the patches and the result-\ning \ufb01lters have a clear geometric interpretation (unlike LSCNN). Third, our construction accounts\nfor local directional patterns (like GCNN and unlike LSCNN). Fourth, the heat kernels are always\nwell de\ufb01ned independently of the injectivity radius of the manifold (unlike GCNN). We summarize\nthe comparative advantages in Table 1.\nACNN architecture. Similarly to Euclidean CNNs, our ACNN consists of several layers that are\napplied subsequently, i.e. the output of the previous layer is used as the input into the subsequent one.\n\n4\n\n\fGeneralizable\n\nContext Directional\n\nInput\n\nGeometry\nGeometry\n\nRepr.\nMethod\nAny\nOSD [15]\nAny\nADD [5]\nRF [20]\nAny\nGCNN [16] Mesh\nAny\nSCNN [6]\nLSCNN [4]\nAny\nACNN\nAny\nTable 1: Comparison of different intrinsic learning models. Our ACNN model combines all the best\nproperties of the other models. Note that OSD and ADD are local spectral descriptors operating\nwith intrinsic geometric information of the shape and cannot be applied to arbitrary input, unlike the\nRandom Forest (RF) and convolutional models.\n\nFilters\nSpectral\nSpectral\nSpectral\nSpatial\nSpectral\nSpectral\nSpatial\n\nNo\nNo\nNo\nYes\nYes\nYes\nYes\n\nAny\n\nAny\nAny\nAny\nAny\n\nNo\nYes\nNo\nYes\nNo\nNo\nYes\n\nYes\nYes\nYes\nYes\nNo\nYes\nYes\n\nTask\n\nDescriptors\n\nCorrespondence\n\nAny\nAny\nAny\nAny\nAny\n\nACNN, as any convolutional network, is applied in a point-wise manner on a function de\ufb01ned on\nthe manifolds, producing a point-wise output that is interpreted as soft correspondence, as described\nbelow. Our intrinsic convolutional layer ICQ, with Q output maps, is de\ufb01ned as follows and replaces\nthe convolutional layer used in classical Euclidean CNNs with the construction (8). The ICQ layer\ncontains P Q \ufb01lters arranged in banks (P \ufb01lters in Q banks); each bank corresponds to an output\ndimension. The \ufb01lters are applied to the input as follows,\n\nf out\nq (x) =\n\n(f in\np \u21e4 aqp)(x),\n\nq = 1, . . . , Q,\n\n(9)\n\nPXp=1\n\nwhere aqp(\u2713, t) are the learnable coef\ufb01cients of the pth \ufb01lter in the qth \ufb01lter bank. A visualization\nof such \ufb01lters is available in the supplementary material.\nOverall, the ACNN architecture combining several layers of different type, acts as a non-linear\nparametric mapping of the form f\u21e5(x) at each point x of the shape, where \u21e5 denotes the set of\nall learnable parameters of the network. The choice of the parameters is done by an optimization\nprocess, minimizing a task-speci\ufb01c cost, and can thus be rather general. Here, we focus on learning\nshape correspondence.\n\nLearning correspondence Finding correspondence in a collection of shapes can be cast as a la-\nbelling problem, where one tries to label each vertex of a given query shape X with the index of a\ncorresponding point on some reference shape Y [20]. Let n and m denote the number of vertices in\nX and Y , respectively. For a point x on a query shape, the output of ACNN f\u21e5(x) is m-dimensional\nand is interpreted as a probability distribution (\u2018soft correspondence\u2019) on Y . The output of the\nnetwork at all the points of the query shape represents the probability of x mapped to y.\nLet us denote by y\u21e4(x) the ground-truth correspondence of x on the reference shape. We assume\nto be provided with examples of points from shapes across the collection and their ground-truth\ncorrespondence, T = {(x, y\u21e4(x))}. The optimal parameters of the network are found by minimizing\nthe multinomial regression loss\n\n`reg(\u21e5) = X(x,y\u21e4(x))2T\n\nlog f\u21e5(x, y\u21e4(x)).\n\n(10)\n\n5 Results\nIn this section, we evaluate the proposed ACNN method and compare it to state-of-the-art ap-\nproaches. Anisotropic Laplacians were computed according to (5). Heat kernels were computed\nin the frequency domain using all the eigenpairs. In all experiments, we used L = 16 orientations\nand the anisotropy parameter \u21b5 = 100. Neural networks were implemented in Theano [2]. The\nADAM [11] stochastic optimization algorithm was used with initial learning rate of 103, 1 = 0.9,\nand 2 = 0.999. As the input to the networks, we used the local SHOT descriptor [21] with 544\ndimensions and using default parameters. For all experiments, training was done by minimizing the\nloss (10). For shapes with 6.9K vertices, Laplacian computation and eigendecomposition took 1 sec\nand 4 seconds per angle, respectively on a desktop workstation with 64Gb of RAM and i7-4820K\nCPU. Forward propagation of the trained model takes approximately 0.5 sec to produce the dense\nsoft correspondence for all the vertices.\n\n5\n\n\fcm\n20\n\n30\n\n40\n\n0\n\n1\n\n10\n\ns\ne\nc\nn\ne\nd\nn\no\np\ns\ne\nr\nr\no\nc\n%\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0.2\n\nRF\nPFM\nACNN\n0.15\n0.05\n% geodesic diameter\n\n0.1\n\nGeodesic error\n\nRF\nPFM\nACNN\n0.15\n0.05\n% geodesic diameter\n\n0.1\n\nGeodesic error\n\n0.2\n\n0.2\n\nBIM\nLSCNN\nRF\nADD\nGCNN\nACNN\n0.15\n% geodesic diameter\n\n0.05\n\n0.1\n\nGeodesic error\n\nFigure 2: Performance of different correspondence methods,\nleft to right: FAUST meshes,\nSHREC\u201916 Partial cuts and holes. Evaluation of the correspondence was done using the Prince-\nton protocol.\nFull mesh correspondence We used the FAUST humans dataset [3], containing 100 meshes of\n10 scanned subjects, each in 10 different poses. The shapes in the collection manifest strong\nnon-isometric deformations. Vertex-wise groundtruth correspondence is known between all the\nshapes. The zeroth FAUST shape containing 6890 vertices was used as reference; for each point\non the query shape, the output of the network represents the soft correspondence as a 6890-\ndimensional vector which was then converted to point correspondence with the technique explained\nin Section 4. First 80 shapes for training and the remaining 20 for testing, following verba-\ntim the settings of [16]. Batch normalization [9] allowed to effectively train larger and deeper\nnetworks. For this experiment, we adopted the following architecture inspired by GCNN [16]:\nFC64+IC64+IC128+IC256+FC1024+FC512+Softmax. The soft correspondences produced by the\nnet were re\ufb01ned using functional map [18]. We refer to the supplementary material for the details.\nWe compare to Random Forests (RF) [20], Blended Intrinsic Maps (BIM) [10], Localized Spectral\nCNN (LSCNN) [4], and Anisotropic Diffusion Descriptors (ADD) [5].\nFigure 2 (left) shows the performance of different methods. The performance was evaluated us-\ning the Princeton protocol [10], plotting the percentage of matches that are at most r-geodesically\ndistant from the groundtruth correspondence on the reference shape. Two versions of the proto-\ncol consider intrinsically symmetric matches as correct (symmetric setting, solid curves) or wrong\n(asymmetric, more challenging setting, dashed curves). Some methods based on intrinsic structures\n(e.g. LSCNN or RF applied on WKS descriptors) are invariant under intrinsic symmetries and thus\ncannot distinguish between symmetric points. The proposed ACNN method clearly outperforms\nall the compared approaches and also perfectly distinguishes symmetric points. Figure 3 shows the\npointwise geodesic error of different correspondence methods (distance of the correspondence at a\npoint from the groundtruth). ACNN shows dramatically smaller distortions compared to other meth-\nods. Over 60% of matches are exact (zero geodesic error), while only a few points have geodesic\nerror larger than 10% of the geodesic diameter of the shape 1. Please refer to the supplementary\nmaterial for an additional visualization of the quality of the correspondences obtained with ACNN\nin terms of texture transfer.\n\nPartial correspondence We used the recent very challenging SHREC\u201916 Partial Correspon-\ndence benchmark [7], consisting of nearly-isometrically deformed shapes from eight classes, with\ndifferent parts removed. Two types of partiality in the benchmark are cuts (removal of a few\nlarge parts) and holes (removal of many small parts). In each class, the vertex-wise groundtruth\ncorrespondence between the full shape and its partial versions is given. The dataset was split\ninto training and testing disjoint sets. For cuts, training was done on 15 shapes per class; for\nholes, training was done on 10 shapes per class. We used the following ACNN architecture:\nIC32+FC1024+DO(0.5)+FC2048+DO(0.5)+Softmax. The soft correspondences produced by the\nnet were re\ufb01ned using partial functional correspondence [19]. We refer to the supplementary mate-\n\n1Per subject leave-one-out produces comparable results with mean accuracy of 59.6 \u00b1 3.7%.\n\n6\n\n\f0.1\n\n0\n\nBlended Intrinsic Maps\n\nGeodesic CNN\n\nAnisotropic CNN\n\nFigure 3: Pointwise geodesic error (in % of geodesic diameter) of different correspondence methods\n(top to bottom: Blended Intrinsic Maps, GCNN, ACNN) on the FAUST dataset. Error values are\nsaturated at 10% of the geodesic diameter. Hot colors correspond to large errors.\n\nrial for the details. The dropout regularization, with \u21e1drop = 0.5, was crucial to avoid over\ufb01tting on\nsuch a small training set. We compared ACNN to RF [20] and Partial Functional Maps (PFM) [19].\nFor the evaluation, we used the protocol of [7], which closely follows the Princeton benchmark.\nFigure 2 (middle) compares the performance of different partial matching methods on the\nSHREC\u201916 Partial (cuts) dataset. ACNN outperforms other approaches with a signi\ufb01cant margin.\nFigure 4 (top) shows examples of partial correspondence on the horse shape as well as the point-\nwise geodesic error. We observe that the proposed approach produces high-quality correspondences\neven in such a challenging setting. Figure 2 (right) compares the performance of different partial\nmatching methods on the SHREC\u201916 Partial (holes) dataset. In this setting as well, ACNN out-\nperforms other approaches with a signi\ufb01cant margin. Figure 4 (bottom) shows examples of partial\ncorrespondence on the dog shape as well as the pointwise geodesic error.\n\n6 Conclusions\nWe presented Anisotropic CNN, a new framework generalizing convolutional neural networks to\nnon-Euclidean domains, allowing to perform deep learning on geometric data. Our work follows\nthe very recent trend in bringing machine learning methods to computer graphics and geometry\nprocessing applications, and is currently the most generic intrinsic CNN model. Our experiments\nshow that ACNN outperforms previously proposed intrinsic CNN models, as well as additional\nstate-of-the-art methods in the shape correspondence application in challenging settings. Being a\ngeneric model, ACNN can be used for many other applications. The most promising future work\ndirection is applying ACNN to learning on graphs.\n\n7\n\n\f0.1\n\n0\n\n0.1\n\n0\n\nAnisotropic CNN\n\nRandom Forest\n\nAnisotropic CNN\n\nRandom Forest\n\nFigure 4: Examples of partial correspondence on the SHREC\u201916 Partial cuts (top) and holes (bottom)\ndatasets. Rows 1 and 4: correspondence produced by ACNN. Corresponding points are shown in\nsimilar color. Reference shape is shown on the left. Rows 2, 5 and 3, 6: pointwise geodesic error\n(in % of geodesic diameter) of the ACNN and RF correspondence, respectively. Error values are\nsaturated at 10% of the geodesic diameter. Hot colors correspond to large errors.\n\n8\n\n\fAcknowledgments\n\nThe authors wish to thank Matteo Sala for the textured models. This research was supported by\nthe ERC Starting Grant No. 307047 (COMET), a Google Faculty Research Award, and Nvidia\nequipment grant.\n\nReferences\n[1] M. Andreux, E. Rodol`a, M. Aubry, and D. Cremers. Anisotropic Laplace-Beltrami operators for shape\n\nanalysis. In Proc. NORDIA, 2014.\n\n[2] J. Bergstra et al. Theano: a CPU and GPU math expression compiler. In Proc. SciPy, June 2010.\n[3] F. Bogo, J. Romero, M. Loper, and M. J. Black. FAUST: Dataset and evaluation for 3D mesh registration.\n\nIn Proc. CVPR, 2014.\n\n[4] D. Boscaini, J. Masci, S. Melzi, M. M. Bronstein, U. Castellani, and P. Vandergheynst. Learning class-\nspeci\ufb01c descriptors for deformable shapes using localized spectral convolutional networks. Computer\nGraphics Forum, 34(5):13\u201323, 2015.\n\n[5] D. Boscaini, J. Masci, E. Rodol`a, M. M. Bronstein, and D. Cremers. Anisotropic diffusion descriptors.\n\n[6] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on\n\nComputer Graphics Forum, 35(2), 2016.\n\ngraphs. In Proc. ICLR, 2014.\n\n[7] L. Cosmo, E. Rodol`a, M. M. Bronstein, A. Torsello, D. Cremers, and Y. Sahillio\u02d8glu. Shrec\u201916: Partial\n\nmatching of deformable shapes. In Proc. 3DOR, 2016.\n\n[8] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recog-\n\nnition unaffected by shift in position. Biological Cybernetics, 36(4):193\u2013202, 1980.\n\n[9] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal\n\ncovariate shift. In Proc. ICML, pages 448\u2013456, 2015.\n\n[10] V. G. Kim, Y. Lipman, and T. Funkhouser. Blended intrinsic maps. TOG, 30(4):79, 2011.\n[11] D. P. Kingma and J. Ba. ADAM: A method for stochastic optimization. In ICLR, 2015.\n[12] I. Kokkinos, M. M. Bronstein, R. Litman, and A. M. Bronstein. Intrinsic shape context descriptors for\n\n[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classi\ufb01cation with deep convolutional neural\n\ndeformable shapes. In Proc. CVPR, 2012.\n\nnetworks. In Proc. NIPS, 2012.\n\n[14] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backprop-\n\nagation applied to handwritten zip code recognition. Neural computation, 1(4):541\u2013551, 1989.\n\n[15] R. Litman and A. M. Bronstein. Learning spectral descriptors for deformable shape correspondence.\n\n[16] J. Masci, D. Boscaini, M. M. Bronstein, and P. Vandergheynst. Geodesic convolutional neural networks\n\n[17] F. M\u00b4emoli. Gromov-Wasserstein Distances and the Metric Approach to Object Matching. Foundations of\n\nPAMI, 36(1):170\u2013180, 2014.\n\non riemannian manifolds. In Proc. 3dRR, 2015.\n\nComputational Mathematics, pages 1\u201371, 2011.\n\n[18] M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher, and L. Guibas. Functional maps: a \ufb02exible\n\nrepresentation of maps between shapes. TOG, 31(4):1\u201311, 2012.\n\n[19] E. Rodol`a, L. Cosmo, M. M. Bronstein, A. Torsello, and D. Cremers. Partial functional correspondence.\n\nComputer Graphics Forum, 2016.\n\n[20] E. Rodol`a, S. Rota Bul`o, T. Windheuser, M. Vestner, and D. Cremers. Dense non-rigid shape correspon-\n\ndence using random forests. In Proc. CVPR, 2014.\n\n[21] S. Salti, F. Tombari, and L. Di Stefano. SHOT: unique signatures of histograms for surface and texture\n\ndescription. CVIU, 125:251\u2013264, 2014.\n\n[22] D. I Shuman, B. Ricaud, and P. Vandergheynst. Vertex-frequency analysis on graphs. arXiv:1307.5708,\n\n[23] J. Solomon, A. Nguyen, A. Butscher, M. Ben-Chen, and L. Guibas. Soft maps between surfaces. Com-\n\nputer Graphics Forum, 31(5):1617\u20131626, 2012.\n\n[24] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. Multi-view convolutional neural networks for 3D\n\n2013.\n\nshape recognition. In Proc. ICCV, 2015.\n\nGraphics Forum, 20:1\u201323, 2010.\n\ntional networks. In Proc. CVPR, 2016.\n\n[25] O. van Kaick, H. Zhang, G. Hamarneh, and D. Cohen-Or. A survey on shape correspondence. Computer\n\n[26] L. Wei, Q. Huang, D. Ceylan, E. Vouga, and H. Li. Dense human body correspondences using convolu-\n\n[27] T. Windheuser, M. Vestner, E. Rodol`a, R. Triebel, and D. Cremers. Optimal intrinsic descriptors for\n\nnon-rigid shape analysis. In Proc. BMVC, 2014.\n\n[28] Z. Wu, S. Song, A. Khosla, et al. 3D ShapeNets: A deep representation for volumetric shapes. In Proc.\n\nCVPR, 2015.\n\n9\n\n\f", "award": [], "sourceid": 1585, "authors": [{"given_name": "Davide", "family_name": "Boscaini", "institution": "University of Lugano"}, {"given_name": "Jonathan", "family_name": "Masci", "institution": "Universit\u00e0 della Svizzera italiana"}, {"given_name": "Emanuele", "family_name": "Rodol\u00e0", "institution": "University of Lugano"}, {"given_name": "Michael", "family_name": "Bronstein", "institution": "University of Lugano"}]}