{"title": "Unsupervised Learning of Artistic Styles with Archetypal Style Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 6584, "page_last": 6593, "abstract": "In this paper, we introduce an unsupervised learning approach to automatically dis-\ncover, summarize, and manipulate artistic styles from large collections of paintings.\nOur method is based on archetypal analysis, which is an unsupervised learning\ntechnique akin to sparse coding with a geometric interpretation. When applied\nto deep image representations from a data collection, it learns a dictionary of\narchetypal styles, which can be easily visualized. After training the model, the style\nof a new image, which is characterized by local statistics of deep visual features,\nis approximated by a sparse convex combination of archetypes. This allows us\nto interpret which archetypal styles are present in the input image, and in which\nproportion. Finally, our approach allows us to manipulate the coefficients of the\nlatent archetypal decomposition, and achieve various special effects such as style\nenhancement, transfer, and interpolation between multiple archetypes.", "full_text": "Unsupervised Learning of Artistic Styles with\n\nArchetypal Style Analysis\n\nUniv. Grenoble Alpes, Inria, CNRS, Grenoble INP\u2217, LJK, 38000 Grenoble, France\n\nDaan Wynen, Cordelia Schmid, Julien Mairal\n\nfirstname.lastname@inria.fr\n\nAbstract\n\nIn this paper, we introduce an unsupervised learning approach to automatically dis-\ncover, summarize, and manipulate artistic styles from large collections of paintings.\nOur method is based on archetypal analysis, which is an unsupervised learning\ntechnique akin to sparse coding with a geometric interpretation. When applied to\nneural style representations from a collection of artworks, it learns a dictionary of\narchetypal styles, which can be easily visualized. After training the model, the style\nof a new image, which is characterized by local statistics of deep visual features,\nis approximated by a sparse convex combination of archetypes. This enables us\nto interpret which archetypal styles are present in the input image, and in which\nproportion. Finally, our approach allows us to manipulate the coef\ufb01cients of the\nlatent archetypal decomposition, and achieve various special effects such as style\nenhancement, transfer, and interpolation between multiple archetypes.\n\n1\n\nIntroduction\n\nArtistic style transfer consists in manipulating the appearance of an input image such that its semantic\ncontent and its scene organization are preserved, but a human may perceive the modi\ufb01ed image\nas having been painted in a similar fashion as a given target painting. Closely related to previous\napproaches to texture synthesis based on modeling statistics of wavelet coef\ufb01cients [8, 19], the\nseminal work of Gatys et al. [5, 6] has recently shown that a deep convolutional neural network\noriginally trained for classi\ufb01cation tasks yields a powerful representation for style and texture\nmodeling. Speci\ufb01cally, the description of \u201cstyle\u201d in [5] consists of local statistics obtained from deep\nvisual features, represented by the covariance matrices of feature responses computed at each network\nlayer. Then, by using an iterative optimization procedure, the method of Gatys et al. [5] outputs an\nimage whose deep representation should be as close as possible to that of the input content image,\nwhile matching the statistics of the target painting. This approach, even though relatively simple,\nleads to impressive stylization effects that are now widely deployed in consumer applications.\nSubsequently, style transfer was improved in many aspects. First, removing the relatively slow\noptimization procedure of [5] was shown to be possible by instead training a convolutional neural\nnetwork to perform style transfer [10, 22]. Once the model has been learned, stylization of a new\nimage requires a single forward pass of the network, allowing real-time applications. Whereas these\nnetworks were originally trained to transfer a single style (e.g., a network trained for producing a\n\u201cVan Gogh effect\u201d was unable to produce an image resembling Monet\u2019s paintings), recent approaches\nhave been able to train a convolutional neural network to transfer multiple styles from a collection of\npaintings and to interpolate between styles [1, 4, 9].\nThen, key to our work, Li et al. [11] recently proposed a simple learning-free and optimization-free\nprocedure to modify deep features of an input image such that their local statistics approximately\n\n\u2217Institute of Engineering Univ. Grenoble Alpes\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fmatch those of a target style image. Their approach is based on decoders that have been trained to\ninvert a VGG network [21], allowing them to iteratively whiten and recolor feature maps of every\nlayer, before eventually reconstructing a stylized image for any arbitrary style. Even though the\napproach may not perserve content details as accurately as other learning-based techniques [24], it\nnevertheless produces oustanding results given its simplicity and universality. Finally, another trend\nconsists of extending style transfer to other modalities such as videos [20] or natural photographs [12]\n(to transfer style from photo to photo instead of painting to photograph).\nWhereas the goal of these previous approaches was to improve style transfer, we address a different\nobjective and propose to use an unsupervised method to learn style representations from a potentially\nlarge collection of paintings. Our objective is to automatically discover, summarize, and manipulate\nartistic styles present in the collection. To achieve this goal, we rely on a classical unsupervised learn-\ning technique called archetypal analysis [3]; despite its moderate popularity, this approach is related to\nwidely-used paradigms such as sparse coding [13, 15] or non-negative matrix factorization [16]. The\nmain advantage of archetypal analysis over these other methods is mostly its better interpretability,\nwhich is crucial to conduct applied machine learning work that requires model interpretation.\nIn this paper, we learn archetypal representations of style from image collections. Archetypes are\nsimple to interpret since they are related to convex combinations of a few image style representa-\ntions from the original dataset, which can thus be visualized (see, e.g., [2] for an application of\narchetypal analysis to image collections). When applied to painter-speci\ufb01c datasets, they may for\ninstance capture the variety and evolution of styles adopted by a painter during his career. Moreover,\narchetypal analysis offers a dual interpretation view: if on the one hand, archetypes can be seen as\nconvex combinations of image style representations from the dataset, each image\u2019s style can also be\ndecomposed into a convex combination of archetypes on the other hand. Then, given an image, we\nmay automatically interpret which archetypal style is present in the image and in which proportion,\nwhich is a much richer information than what a simple clustering approach would produce. When\napplied to rich data collections, we sometimes observe trivial associations (e.g., the image\u2019s style\nis very close to one archetype), but we also discover meaningful interesting ones (when an image\u2019s\nstyle may be interpreted as an interpolation between several archetypes).\nAfter establishing archetypal analysis as a natural tool for unsupervised learning of artistic style, we\nalso show that it provides a latent parametrization allowing to manipulate style by extending the uni-\nversal style transfer technique of [11]. By changing the coef\ufb01cients of the archetypal decomposition\n(typically of small dimension, such as 256) and applying stylization, various effects on the input\nimage may be obtained in a \ufb02exible manner. Transfer to an archetypal style is achieved by selecting a\nsingle archetype in the decomposition; style enhancement consist of increasing the contribution of an\nexisting archetype, making the input image more \u201carchetypal\u201d. More generally, exploring the latent\nspace allows to create and use styles that were not necessarily seen in the dataset, see Figure 1.\nTo the best of our knowledge, [7] is the closest work to ours in terms of latent space description of\nstyle; our approach is however based on signi\ufb01cantly different tools and our objective is different.\nWhereas a latent space is learned in [7] for style description in order to improve the generalization of\na style transfer network to new unseen paintings, our goal is to build a latent space that is directly\ninterpretable, with one dimension associated to one archetypal style.\nThe paper is organized as follows: Section 2 presents the archetypal style analysis model, and its\napplication to a large collection of paintings. Section 3 shows how we use them for various style\nmanipulations. Finally, Section 4 is devoted to additional experiments and implementation details.\n\n2 Archetypal Style Analysis\n\nIn this section, we show how to use archetypal analysis to learn style from large collections of\npaintings, and subsequently perform style decomposition on arbitrary images.\n\nLearning a latent low-dimensional representation of style. Given an input image, denoted by I,\nwe consider a set of feature maps F1, F2, . . . , FL produced by a deep network. Following [11], we\nconsider \ufb01ve layers of the VGG-19 network [21], which has been pre-trained for classi\ufb01cation. Each\nfeature map Fl may be seen as a matrix in Rpl\u00d7ml where pl is the number of channels and ml is the\nnumber of pixel positions in the feature map at layer l. Then, we de\ufb01ne the style of I as the collection\n\n2\n\n\fFigure 1: Using deep archetypal style analysis, we can represent an artistic image (a) as a convex\ncombination of archetypes. The archetypes can be visualized as synthesized textures (b), as a convex\ncombination of artworks (c) or, when analyzing a speci\ufb01c image, as stylized versions of that image\nitself (d). Free recombination of the archetypal styles then allows for novel stylizations of the input.\n\n(cid:80)ml\nof \ufb01rst-order and second order statistics {\u00b51, \u03a31, . . . , \u00b5L, \u03a3L} of the feature maps, de\ufb01ned as\nj=1(Fl[j] \u2212 \u00b5l)(Fl[j] \u2212 \u00b5l)(cid:62) \u2208 Rpl\u00d7pl ,\n\n(cid:80)ml\nj=1 Fl[j] \u2208 Rpl\n\nand \u03a3l = 1\nml\n\n\u00b5l = 1\nml\n\nwhere Fl[j] represents the column in Rpl that carries the activations at position j in the feature\nmap Fl. A style descriptor is then de\ufb01ned as the concatenation of all parameters from the collection\n{\u00b51, \u03a31, . . . , \u00b5L, \u03a3L}, normalized by the number of parameters at each layer\u2014that is, \u00b5l and \u03a3l\nare divided by pl(pl + 1), which was found to be empirically useful for preventing layers with more\nparameters to be over-represented. The resulting vector is very high-dimensional, but it contains\nkey information for artistic style [5]. Then, we apply a singular value decomposition on the style\nrepresentations from the paintings collection to reduce the dimension to 4 096 while keeping more\nthan 99% of the variance. Next, we show how to obtain a lower-dimensional latent representation.\nArchetypal style representation. Given a set of vectors X = [x1, . . . , xn] in Rp\u00d7n, archetypal\nanalysis [3] learns a set of archetypes Z = [z1, . . . , zk] in Rp\u00d7k such that each sample xi can be well\napproximated by a convex combination of archetypes\u2014that is, there exists a code \u03b1i in Rk such that\nxi \u2248 Z\u03b1i, where \u03b1i lies in the simplex\n\n(cid:110)\n\u03b1 \u2208 Rk s.t. \u03b1 \u2265 0 and (cid:80)k\n\n\u2206k =\n\n(cid:111)\n\nj=1 \u03b1[j] = 1\n\n.\n\nConversely, each archetype zj is constrained to be in the convex hull of the data and there exists\na code \u03b2j in \u2206n such that zj = X\u03b2j. The natural formulation resulting from these geometric\nconstraints is then the following optimization problem\n\n(cid:107)xi \u2212 Z\u03b1i(cid:107)2\n\ns.t. zj = X\u03b2j\n\nfor all j = 1, . . . , k,\n\n(1)\n\nn(cid:88)\n\ni=1\n\nmin\n\n\u03b11,...,\u03b1n\u2208\u2206k\n\u03b21,...,\u03b2k\u2208\u2206n\n\n1\nn\n\nwhich can be addressed ef\ufb01ciently with dedicated solvers [2]. Note that the simplex constraints lead to\nnon-negative sparse codes \u03b1i for every sample xi since the simplex constraint enforces the vector \u03b1i\n\n3\n\nDecompositionManipulationa)b)c)d). . .46%32%6%0%e)Archetype Visualization\fto have unit (cid:96)1-norm, which has a sparsity-inducing effect [13]. As a result, a sample xi will be\nassociated in practice to a few archetypes, as observed in our experimental section. Conversely, an\narchetype zj = X\u03b2j can be represented by a non-negative sparse code \u03b2j and thus be associated to\na few samples corresponding to non-zero entries in \u03b2j.\nIn this paper, we use archetypal analysis on the 4 096-dimensional style vectors previously described,\nand typically learn between k = 32 to k = 256 archetypes. Each painting\u2019s style can then be\nrepresented by a sparse low-dimensional code \u03b1 in \u2206k, and each archetype is itself associated to a\nfew input paintings, which is crucial for their interpretation (see the experimental section). Given a\n\ufb01xed set of archetypes Z, we may also quantify the presence of archetypal styles in a new image I by\nsolving the convex optimization problem\n\n\u03b1(cid:63) \u2208 arg min\n\u03b1\u2208\u2206k\n\n(cid:107)x \u2212 Z\u03b1(cid:107)2,\n\n(2)\n\nwhere x is the high-dimensional input style representation described at the beginning of this section.\nEncoding an image style into a sparse vector \u03b1 allows us to obtain interesting interpretations in terms\nof the presence and quanti\ufb01cation of archetypal styles in the input image. Next, we show how to\nmanipulate the archetypal decomposition by modifying the universal feature transform of [11].\n\n3 Archetypal Style Manipulation\n\nIn the following, we brie\ufb02y present the universal style transfer approach of [11] and introduce a\nmodi\ufb01cation that allows us to better preserve the content details of the original images, before\npresenting how to use the framework for archetypal style manipulation.\n\nA new variant of universal style transfer. We assume, in this section only, that we are given a\ncontent image I c and a style image I s. We also assume that we are given pairs of encoders/decoders\n(dl, el) such that el(I) produces the l-th feature map previously selected from the VGG network and\ndl is a decoder that has been trained to approximately \u201cinvert\u201d el\u2014that is, dl(el(I)) \u2248 I.\nUniversal style transfer builds upon a simple idea. Given a \u201ccontent\u201d feature map Fc in Rp\u00d7m, making\nlocal features match the mean and covariance structure of another \u201cstyle\u201d feature map Fs can be\nachieved with simple whitening and coloring operations, leading overall to an af\ufb01ne transformation:\n\nC s(Fc) := CsWc(Fc \u2212 \u00b5c) + \u00b5s,\n\nwhere \u00b5c, \u00b5s are the mean of the content and style feature maps, respectively, Cs is the coloring\nmatrix and Wc is a whitening matrix that decorrelates the features. We simply summarize this\noperation as a single function C s : Rp\u00d7m \u2192 Rp\u00d7m.\nOf course, feature maps between network layers are interconnected and such coloring and whitening\noperations cannot be applied simultaneously at every layer. For this reason, the method produces\na sequence of stylized images \u02c6Il, one per layer, starting from the last one l = L in a coarse-to-\ufb01ne\nmanner, and the \ufb01nal output is \u02c6I1. Given a stylized image \u02c6Il+1 (with \u02c6IL+1 = I c), we propose the\nfollowing update, which differs slightly from [11], for a reason we will detail below:\n\n\u02c6Il = dl\n\n\u03b3\n\n\u03b4C s\n\nl (el( \u02c6Il+1)) + (1 \u2212 \u03b4)C s\n\nl (el(I c))\n\n+ (1 \u2212 \u03b3)el(I c)\n\n(cid:16)\n\n(cid:16)\n\n(cid:17)\n\n(cid:17)\n\n,\n\n(3)\n\nwhere \u03b3 in (0, 1) controls the amount of stylization since el(I c) corresponds to the l-th feature map\nof the original content image. The parameter \u03b4 in (0, 1) controls how much one should trust the\ncurrent stylized image \u02c6Il+1 in terms of content information before stylization at layer l. Intuitively,\nl (el( \u02c6Il+1))) can be interpreted as a re\ufb01nement of the stylized image at layer l + 1 in order to\n(a) dl(C s\ntake into account the mean and covariance structure of the image style at layer l.\n(b) dl(C s\nstructure of the style at layer l regardless of the structure at the preceding stylization steps.\nWhereas \u02c6Il+1 takes into account the style structure of the top layers, it may also have lost a signi\ufb01cant\namount of content information, in part due to the fact that the decoders dl do not invert perfectly the\nencoders and do not correctly recover \ufb01ne details. For this reason, being able to make a trade-off\nbetween (a) and (b) to explicitly use the original content image I c at each layer is important.\n\nl (el(I c))) can be seen as a stylization of the content image by looking at the correlation/mean\n\n4\n\n\fIn contrast, the update of [11] involves a single parameter \u03b3 and is of the form\n\n(cid:16)\n\n(cid:16)\n\n(cid:17)\n\n(cid:17)\n\n\u02c6Il = dl\n\n\u03b3\n\nl (el( \u02c6Il+1))\nC s\n\n+ (1 \u2212 \u03b3)el( \u02c6Il+1)\n\n.\n\n(4)\n\nNotice that here the original image I c is used only once at the beginning of the process, and details\nthat have been lost at layer l + 1 have no chance to be recovered at layer l. We present in the\nexperimental section the effect of our variant. Whenever one is not looking for a fully stylized\nimage\u2014that is, \u03b3 < 1 in (3) and (4), content details can be much better preserved with our approach.\n\nArchetypal style manipulation. We now aim to analyze styles and change them in a controllable\nmanner based on styles present in a large collection of images rather than on a single image. To this\nend, we use the archetypal style analysis procedure described in Section 2. Given now an image I,\nits style, originally represented by a collection of statistics {\u00b51, \u03a31, . . . , \u00b5L, \u03a3L}, is approximated\nby a convex combination of archetypes [z1, . . . , zk], where archetype zj can also be seen as the\nconcatenation of statistics {\u00b5j\nL}. Indeed, zj is associated to a sparse code \u03b2j in \u2206n,\nwhere n is the number of training images\u2014allowing us to de\ufb01ne for archetype j and layer l\n\n1, . . . , \u00b5j\n\nL, \u03a3j\n\n1, \u03a3j\n\nl =(cid:80)n\n\n\u00b5j\n\ni=1 \u03b2j[i]\u00b5(i)\n\nl\n\nand \u03a3j\n\ni=1 \u03b2j[i]\u03a3(i)\n\nl\n\n,\n\nl =(cid:80)n\n\nl\n\nl\n\nand \u03a3(i)\n\nwhere \u00b5(i)\nare the mean and covariance matrices of training image i at layer l. As a convex\ncombination of covariance matrices, \u03a3j\nl is positive semi-de\ufb01nite and can be also interpreted as a valid\ncovariance matrix, which may then be used for a coloring operation producing an \u201carchetypal\u201d style.\nGiven now a sparse code \u03b1 in \u2206k, a new \u201cstyle\u201d { \u02c6\u00b51, \u02c6\u03a31, . . . , \u02c6\u00b5L, \u02c6\u03a3L} can be obtained by consid-\nering the convex combination of archetypes:\n\n\u02c6\u00b5l =(cid:80)k\n\nand \u02c6\u03a3l =(cid:80)k\n\nj=1 \u03b1[j]\u00b5j\n\nl\n\nj=1 \u03b1[j]\u03a3j\nl .\n\nThen, the collection of means and covariances { \u02c6\u00b51, \u02c6\u03a31, . . . , \u02c6\u00b5L, \u02c6\u03a3L} may be used to de\ufb01ne a\ncoloring operation. Three practical cases come to mind: (i) \u03b1 may be a canonical vector that\nselects a single archetype; (ii) \u03b1 may be any convex combination of archetypes for archetypal style\ninterpolation; (iii) \u03b1 may be a modi\ufb01cation of an existing archetypal decomposition to enhance a\nstyle already present in an input image I\u2014that is, \u03b1 is a variation of \u03b1(cid:63) de\ufb01ned in (2).\n\n4 Experiments\n\nIn this section, we present our experimental results on two datasets described below. Our implementa-\ntion is in PyTorch [17] and relies in part on the universal style transfer implementation2. Archetypal\nanalysis is performed using the SPAMS software package [2, 14], and the singular value decomposi-\ntion is performed by scikit-learn [18]. Our implementation will be made publicly available. Further\nexamples can be found at http://pascal.inrialpes.fr/data2/archetypal_style.\n\nGanGogh is a collection of 95997 artworks3 downloaded from WikiArt.4 The images cover most\nof the freely available WikiArt catalog, with the exception of artworks that are not paintings. Due to\nthe collaborative nature of WikiArt, there is no guarantee for an unbiased selection of artworks, and\nthe presence of various styles varies signi\ufb01cantly. We compute 256 archetypes on this collection.\n\nVincent van Gogh As a counter point to the GanGogh collection, which spans many styles over a\nlong period of time and has a signi\ufb01cant bias towards certain art styles, we analyze the collection\nof Vincent van Gogh\u2019s artwork, also from the WikiArt catalog. Based on the WikiArt metadata, we\nexclude a number of works not amenable to artistic style transfer such as sketches and studies. The\ncollection counts 1154 paintings and drawings in total, with the dates of their creation ranging from\n1858 to 1926. Given the limited size of the collection, we only compute 32 archetypes.\n\n2https://github.com/sunshineatnoon/PytorchWCT\n3https://github.com/rkjones4/GANGogh\n4https://www.wikiart.org\n\n5\n\n\f4.1 Archetypal Visualization\n\nTo visualize the archetypes, we \ufb01rst synthesize one texture per archetype by using its style representa-\ntion to repeatedly stylize an image \ufb01lled with random noise, as described in [11]. We then display\npaintings with signi\ufb01cant contributions. In Figure 2, we present visualizations for a few archetypes.\nThe strongest contributions usually exhibit a common characteristic like stroke style or choice of\ncolors. Smaller contributions are often more dif\ufb01cult to interpret (see supplementary material for the\nfull set of archetypes). Figure 2a also highlights correlation between content and style: the archetype\non the third row is only composed of portraits.\nTo see how the archetypes relate to each other, we also compute t-SNE embeddings [23] and display\nthem with two spatial dimensions. In Figure 3, we show the embeddings for the GanGogh collection,\nby using the texture representation for each archetype. The middle of the \ufb01gure is populated by\nBaroque and Renaissance styles, whereas the right side exhibits abstract and cubist styles.\n\n(a) Archetypes from GanGogh collection.\n\n(b) Archetypes from van Gogh\u2019s paintings.\n\nFigure 2: Archetypes learned from the GanGogh collection and van Gogh\u2019s paintings. Each row\nrepresents one archetype. The leftmost column shows the texture representations, the following\ncolumns the strongest contributions from individual images in order of descending contribution. Each\nimage is labelled with its contribution to the archetype. For layout considerations, only the center\ncrop of each image is shown. Best seen by zooming on a computer screen.\n\nSimilar to showing the decomposition of an archetype into its contributing images, we display in\nFigure 4 examples of decompositions of image styles into their contributing archetypes. Typically,\nonly a few archetypes contribute strongly to the decomposition. Even though often interpretable, the\ndecomposition is sometimes trivial, whenever the image\u2019s style is well described by a single archetype.\nSome paintings\u2019 styles also turn out to be hard to interpret, leading to non-sparse decompositions.\nExamples of such trivial and \u201cfailure\u201d cases are provided in the supplementary material.\n\n4.2 Archetypal Style Manipulation\n\nFirst, we study the in\ufb02uence of the parameters \u03b3, \u03b4 and make a comparison with the baseline method\nof [11]. Even though this is not the main contribution of our paper, this apparently minor modi\ufb01cation\nyields signi\ufb01cant improvements in terms of preservation of content details in stylized images. Besides,\nthe heuristic \u03b3 = \u03b4 appears to be visually reasonable in most cases, reducing the number of effective\nparameters to a single one that controls the amount of stylization. The comparison between our\nupdate (3) and (4) from [11] is illustrated in Figure 5, where the goal is to transfer an archetypal\nstyle to a Renaissance painting. More comparisons on other images and illustrations with pairs\nof parameters \u03b3 (cid:54)= \u03b4, as well as a comparison of the processing work\ufb02ows, are provided in the\nsupplementary material, con\ufb01rming our conclusions.\n\n6\n\n\fFigure 3: t-SNE embeddings of 256 archetypes computed on the GanGogh collection. Each archetype\nis represented by a synthesized texture. Best seen by zooming on a computer screen.\n\n(a) Picasso\u2019s \u201cPitcher and Fruit Bowl\u201d.\n\n(b) \u201cPortrait of Patience Escalier\u201d by van Gogh.\n\nFigure 4: Image decompositions from the GanGogh collection and van Gogh\u2019s work. Each archetype\nis represented as a stylized image (top), as a texture (side) and as a decomposition into paintings.\n\nThen, we conduct style enhancement experiments. To obtain variations of an input image, the\ndecomposition \u03b1(cid:63) of its style can serve as a starting point for stylization. Figure 6 shows the results\nof enhancing archetypes an image already exhibits. Intuitively, this can be seen as taking one aspect\nof the image, and making it stronger with respect to the other ones. In Figure 6, while increasing\nthe contributions of the individual archetypes, we also vary \u03b3 = \u03b4, so that the middle image is very\nclose visually to the original image (\u03b3 = \u03b4 = 0.5), while the outer panels put a strong emphasis on\nthe modi\ufb01ed styles. As can be seen especially in the panels surrounding the middle, modifying the\ndecomposition coef\ufb01cients allows very gentle movements through the styles.\nAs can be seen in the leftmost and rightmost panels of Figure 6, enhancing the contribution of an\narchetype can produce signi\ufb01cant changes. As a matter of fact, it is also possible, and sometimes\ndesirable, depending on the user\u2019s objective, to manually choose a set of archetypes that are originally\nunrelated to the input image, and then interpolate with convex combinations of these archetypes.\nThe results are images akin to those found in classical artistic style transfer papers. In Figure 7, we\napply for instance combinations of freely chosen archetypes to \u201cThe Bitter Drunk\u201d. Other examples\ninvolving stylizing natural photographs are also provided in the supplementary material.\n\n7\n\n\fFigure 5: Top: stylization with our approach for \u03b3 = \u03b4, varying the product \u03b3\u03b4 from 0 to 1 on an\nequally-spaced grid. Bottom: results using [11], varying \u03b3. At \u03b3 = \u03b4 = 1, the approaches are\nequivalent, resulting in equal outputs. Otherwise however, especially for \u03b3 = \u03b4 = 0, [11] produces\nstrong artifacts. These are not artifacts of stylization, since in this case, no actual stylization occurs.\nRather, they are the effect of repeated, lossy encoding and decoding, since no decoder can recover\ninformation lost in a previous one. Best seen on a computer screen.\n\n(a) \u201cLes Alpilles, Mountain Landscape near South-Reme\u201d by van Gogh, from the van Gogh collection.\n\n(b) Self-Portrait by van Gogh, from the van Gogh collection.\n\n(c) \u201cSchneeschmelze\u201d by Max Pechstein, from the GanGogh collection.\n\n(d) \u201cVenice\u201d by Maurice Prendergast, from the GanGogh collection.\n\nFigure 6: We demonstrate the enhancement of the two most prominent archetypal styles for different\nartworks. The middle panel shows a near-perfect reconstruction of the original content image in\nevery case and uses parameters \u03b3, \u03b4 = 0.5. Then, we increase the relative weight of the strongest\ncomponent towards the left, and of the second component towards the right. Simultaneously, we\nincrease \u03b3 and \u03b4 from 0.5 in the middle panel to 0.95 on the outside.\n\n8\n\n\f(a) Content image\n\n(b) Pairwise interpolations between four freely chosen archetypal styles.\nFigure 7: Free archetypal style manipulation of \u201cThe Bitter Drunk\u201d by Adriaen Brouwer.\n\n5 Discussion\n\nIn this work, we introduced archetypal style analysis as a means to identify styles in a collection of\nartworks without supervision, and to use them for the manipulation of artworks and photos. Whereas\nother techniques may be used for that purpose, archetypal analysis admits a dual interpretation which\nmakes it particularly appropriate for the task: On the one hand, archetypes are represented as convex\ncombinations of input image styles and are thus directly interpretable; on the other hand, an image\nstyle is approximated by a convex combination of archetypes allowing various kinds of visualizations.\nBesides, archetypal coef\ufb01cients may be used to perform style manipulations.\nOne of the major challenge we want to address in future work is the exploitation of metadata available\non the WikiArt repository (period, schools, art movement. . . ) to link the learned styles to the\ndescriptions employed outside the context of computer vision and graphics, which we believe will\nmake them more useful beyond style manipulation.\n\nAcknowledgements\n\nThis work was supported by a grant from ANR (MACARON project ANR-14-CE23-0003-01), by\nthe ERC grant number 714381 (SOLARIS project) and the ERC advanced grant Allegro.\n\n9\n\n\fReferences\n[1] D. Chen, L. Yuan, J. Liao, N. Yu, and G. Hua. StyleBank: an explicit representation for neural image style\n\ntransfer. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR), 2017.\n\n[2] Y. Chen, J. Mairal, and Z. Harchaoui. Fast and robust archetypal analysis for representation learning. In\n\nProc. Conference on Computer Vision and Pattern Recognition (CVPR), 2014.\n\n[3] A. Cutler and L. Breiman. Archetypal analysis. Technometrics, 36(4):338\u2013347, 1994.\n\n[4] V. Dumoulin, J. Shlens, and M. Kudlur. A learned representation for artistic style. In Proc. International\n\nConference on Learning Representations (ICLR), 2017.\n\n[5] L. A. Gatys, A. S. Ecker, and M. Bethge. A Neural Algorithm of Artistic Style. preprint arXiv:1508.06576,\n\n2015.\n\n[6] L. A. Gatys, A. S. Ecker, and M. Bethge. Texture synthesis using convolutional neural networks. In Adv.\n\nin Neural Information Processing Systems (NIPS), 2015.\n\n[7] G. Ghiasi, H. Leeu, M. Kudlur, V. Dumoulin, and J. Shlens. Exploring the structure of a real-time, arbitrary\n\nneural artistic stylization network. In Proc. British Machine Vision Conference (BMVC), 2017.\n\n[8] D. J. Heeger and J. R. Bergen. Pyramid-based texture analysis/synthesis. In Proc. 22nd annual conference\n\non Computer graphics and interactive techniques (SIGGRAPH), 1995.\n\n[9] X. Huang and S. Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In\n\nProc. International Conference on Computer Vision (ICCV), 2017.\n\n[10] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In\n\nEuropean Conference on Computer Vision (ECCV), 2016.\n\n[11] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang. Universal style transfer via feature transforms.\n\nIn Adv. Neural Information Processing Systems (NIPS), 2017.\n\n[12] F. Luan, S. Paris, E. Shechtman, and K. Bala. Deep photo style transfer. In Proc. Conference on Computer\n\nVision and Pattern Recognition (CVPR), 2017.\n\n[13] J. Mairal, F. Bach, J. Ponce, et al. Sparse modeling for image and vision processing. Foundations and\n\nTrends in Computer Graphics and Vision, 8(2-3):85\u2013283, 2014.\n\n[14] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding.\n\nJournal of Machine Learning Research (JMLR), 11(Jan):19\u201360, 2010.\n\n[15] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive \ufb01eld properties by learning a sparse\n\ncode for natural images. Nature, 381:607\u2013609, 1996.\n\n[16] P. Paatero and U. Tapper. Positive matrix factorization: a non-negative factor model with optimal utilization\n\nof error estimates of data values. Environmetrics, 5(2):111\u2013126, 1994.\n\n[17] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and\n\nA. Lerer. Automatic differentiation in pytorch. 2017.\n\n[18] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,\nR. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning\nresearch (JMLR), 12(Oct):2825\u20132830, 2011.\n\n[19] J. Portilla and E. P. Simoncelli. A parametric texture model based on joint statistics of complex wavelet\n\ncoef\ufb01cients. International journal of computer vision, 40(1):49\u201370, 2000.\n\n[20] M. Ruder, A. Dosovitskiy, and T. Brox. Artistic style transfer for videos and spherical images. International\n\nJournal on Computer Vision (IJCV), 2018.\n\n[21] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition.\n\npreprint arXiv:1409.1556, 2015.\n\n[22] D. Ulyanov, V. Lebedev, A. Vedaldi, and V. Lempitsky. Texture networks: feed-forward synthesis of\n\ntextures and stylized images. In Proc. International Conference on Machine Learning (ICML), 2016.\n\n[23] L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research\n\n(JMLR), 9(Nov):2579\u20132605, 2008.\n\n[24] M.-C. Yeh, S. Tang, A. Bhattad, and D. A. Forsyth. Quantitative evaluation of style transfer. preprint\n\narXiv:1804.00118, 2018.\n\n10\n\n\f", "award": [], "sourceid": 3313, "authors": [{"given_name": "Daan", "family_name": "Wynen", "institution": "INRIA"}, {"given_name": "Cordelia", "family_name": "Schmid", "institution": "Inria / Google"}, {"given_name": "Julien", "family_name": "Mairal", "institution": "Inria"}]}