{"title": "Structured Learning for Cell Tracking", "book": "Advances in Neural Information Processing Systems", "page_first": 1296, "page_last": 1304, "abstract": "We study the problem of learning to track a large quantity of homogeneous objects such as cell tracking in cell culture study and developmental biology. Reliable cell tracking in time-lapse microscopic image sequences is important for modern biomedical research. Existing cell tracking methods are usually kept simple and use only a small number of features to allow for manual parameter tweaking or grid search. We propose a structured learning approach that allows to learn optimum parameters automatically from a training set. This allows for the use of a richer set of features which in turn affords improved tracking compared to recently reported methods on two public benchmark sequences.", "full_text": "Structured Learning for Cell Tracking\n\nXinghua Lou, Fred A. Hamprecht\n\nHeidelberg Collaboratory for Image Processing (HCI)\nInterdisciplinary Center for Scienti\ufb01c Computing (IWR)\nUniversity of Heidelberg, Heidelberg 69115, Germany\n\n{xinghua.lou,fred.hamprecht}@iwr.uni-heidelberg.de\n\nAbstract\n\nWe study the problem of learning to track a large quantity of homogeneous objects\nsuch as cell tracking in cell culture study and developmental biology. Reliable\ncell tracking in time-lapse microscopic image sequences is important for modern\nbiomedical research. Existing cell tracking methods are usually kept simple and\nuse only a small number of features to allow for manual parameter tweaking or\ngrid search. We propose a structured learning approach that allows to learn op-\ntimum parameters automatically from a training set. This allows for the use of a\nricher set of features which in turn affords improved tracking compared to recently\nreported methods on two public benchmark sequences.\n\n1\n\nIntroduction\n\nOne distinguishing property of life is its temporal dynamics, and it is hence only natural that time\nlapse experiments play a crucial role in current research on signaling pathways, drug discovery and\ndevelopmental biology [17]. Such experiments yield a very large number of images, and reliable\nautomated cell tracking emerges naturally as a prerequisite for further quantitative analysis.\nEven today, cell tracking remains a challenging problem in dense populations, in the presence of\ncomplex behavior or when image quality is poor. Existing cell tracking methods can broadly be\ncategorized as deformable models, stochastic \ufb01ltering and object association. Deformable models\ncombine detection, segmentation and tracking by initializing a set of models (e.g. active contours) in\nthe \ufb01rst frame and updating them in subsequent frames (e.g. [17, 8]). Large displacements are dif\ufb01-\ncult to capture with this class of techniques and are better handled by state space models, e.g. in the\nguise of stochastic \ufb01ltering. The latter also allows for sophisticated observation models (e.g. [20]).\nStochastic \ufb01ltering builds on a solid statistical foundation, but is often limited in practice due to its\nhigh computational demands. Object association methods approximate and simplify the problem by\nseparating the detection and association steps: once object candidates have been detected and char-\nacterized, a second step suggests associations between object candidates at different frames. This\nclass of methods scales well [21, 16, 13] and allows the tracking of thousands of cells in 3D [19].\nAll of the above approaches contain energy terms whose parameters may be tedious or dif\ufb01cult\nto adjust. Recently, great efforts have been made to produce better energy terms with helps of\nmachine learning techniques. This was \ufb01rst accomplished by casting tracking as a local af\ufb01nity\nprediction problem such as binary classi\ufb01cation with either of\ufb02ine [1] or online learning [11, 5, 15],\nweakly supervised learning with imperfect oracles [27], manifold appearance model learning [25],\nor ranking [10, 18]. However, these local methods fail to capture the very important dependency\namong associations, hence the resulting local af\ufb01nities do not necessarily guarantee a better global\nassociation [26]. To address this limitation, [26] extended the RankBoost method from [18] to rank\nglobal associations represented as a Conditional Random Field (CRF). Regardless of this, it has\ntwo major drawbacks. Firstly, it depends on a set of arti\ufb01cially generated false association samples\nthat can make the training data particularly imbalanced and the training procedure too expensive\n\n1\n\n\ffor large-scale tracking problems. Secondly, RankBoost desires the ranking feature to be positively\ncorrelated with the \ufb01nal ranking (i.e. the association score) [10]. This in turn requires careful pre-\nadjustment of the sign of each feature based on some prior knowledge [18]. Actually, this prior\nknowledge may not always be available or reliable in practice.\nThe contribution of this paper is two-fold. We \ufb01rst present an extended formulation of the object\nassociation models proposed in the literature. This generalization improves the expressiveness of the\nmodel, but also increases the number of parameters. We hence, secondly, propose to use structured\nlearning to automatically learn optimum parameters from a training set, and hence pro\ufb01t fully from\nthis richer description. Our method addresses the limitations of aforementioned learning approaches\nin a principled way.\nThe rest of the paper is organized as follows. In section 2, we present the extended object association\nmodels and a structured learning approach for global af\ufb01nity learning. In section 3, an evaluation\nshows that our framework inherits the runtime advantage of object association while addressing\nmany of its limitations. Finally, section 4 states our conclusions and discusses future work.\n\n2 Structured Learning for Cell Tracking\n\n2.1 Association Hypotheses and Scoring\n\nWe assume that a previous detection and segmentation step has identi\ufb01ed object candidates in all\nframes, see Fig. 1. We set out to \ufb01nd that set of object associations that best explains these obser-\nvations. To this end, we admit the following set E of standard events [21, 13]: a cell can move\nor divide and it can appear or disappear. In addition, we allow two cells to (seemingly) merge, to\naccount for occlusion or undersegmentation; and a cell can (seemingly) split, to allow for the lifting\nof occlusion or oversegmentation. These additional hypotheses are useful to account for the errors\nthat typically occur in the detection and segmentation step in crowded or noisy data. The distinction\nbetween division and split is reasonable given that typical \ufb02uorescence stains endow the anaphase\nwith a distinctive appearance.\n\nFigure 1: Toy example: two sets of object candidates, and a small subset of the possible associa-\ntion hypotheses. One particular interpretation of the scene is indicated by colored arrows (left) or\nequivalently by a con\ufb01guration of binary indicator variables z (rightmost column in table).\n\nGiven a pair of object candidate lists x = {C, C(cid:48)} in two neighboring frames, there is a multitude\nof possible association hypotheses, see Fig. 1. We have two tasks: \ufb01rstly, to allow only consistent\nassociations (e.g. making sure that each cell in the second frame is accounted for only once); and\nsecondly to identify, among the multitude of consistent hypotheses, the one that is most compatible\nwith the observations, and with what we have learned from the training data.\nWe express this compatibility of the association between c \u2208 P(C) and c(cid:48) \u2208 P(C(cid:48)) by event e \u2208 E\nc,c(cid:48) is a feature vector that characterizes the discrepancy (if\nany) between object candidates c and c(cid:48); and we is a parameter vector that encodes everything we\n\nas an inner product(cid:10)f e\n\nc,c(cid:48)we(cid:11). Here, f e\n\n2\n\nHypothesesFrame t1cFrame t+1Input Frame Pair2c3c1c\uf0a22c\uf0a23c\uf0a24c\uf0a25c\uf0a24c\uf0a25c\uf0a21c1c2c3c2c\uf0a23c\uf0a21c\uf0a22c\uf0a21c\uf0a22c\uf0a21c2c1c\uf0a2moves tomoves todivides tomoves todivides tosplits toFeaturesmove,11ccf\uf0a2move,21ccf\uf0a2divide},{,211cccf\uf0a2\uf0a2move,12ccf\uf0a2divide},{,322cccf\uf0a2\uf0a2split},{,543cccf\uf0a2\uf0a2},,{32ccc1C\uf03d},,,,{5432ccccc\uf0a2\uf0a2\uf0a2\uf0a2\uf0a2\uf03d\uf0a21Ccc\uf0a25c\uf0a23cmoves to\u2026       \u2026                       \u2026                \u2026             \u2026            \u2026move,53ccf\uf0a24c\uf0a23cmoves tomove,43ccf\uf0a2zmove,11ccz\uf0a2move,21ccz\uf0a2divide},{,211cccz\uf0a2\uf0a2move,12ccz\uf0a2divide},{,322cccz\uf0a2\uf0a2split},{,543cccz\uf0a2\uf0a2move,53ccz\uf0a2move,43ccz\uf0a2Value10001100e\fhave learned from the training data. Summing over all object candidates in either of the frames and\nover all types of events gives the following compatibility function:\n\n(1)\n\n(2)\n\nL(x, z; w) =\n\n(cid:88)\n(cid:88)\nc,c(cid:48) = 1 and (cid:88)\n\ne\u2208E\n\nze\n\n(cid:88)\n(cid:88)\n\nc\u2208P(C)\n\nc(cid:48)\u2208P(C(cid:48))\n\ns. t. (cid:88)\n\ne\u2208E\n\n(cid:88)\n\nc\u2208P(C)\n\nc,c(cid:48), we(cid:105)ze\n(cid:104)f e\nc,c(cid:48)\n\nc,c(cid:48) = 1 with ze\nze\n\nc,c(cid:48) \u2208 {0, 1}\n\ne\u2208E\n\nc(cid:48)\u2208P(C(cid:48))\n\nThe constraints in the last line involve binary indicator variables z that re\ufb02ect the consistency re-\nquirements: each candidate in the \ufb01rst frame must have a single fate, and each candidate from the\nsecond frame a unique history. As an important technical detail, note that P(C) := C \u222a (C \u2297 C)\nis a set comprising each object candidate, as well as all ordered pairs of object candidates from\na frame1. This allows us to conveniently subsume cell divisions, splits and mergers in the above\nequation. Overall, the compatibility function L(x, z; w), i.e. the global af\ufb01nity measure, states how\nwell a set of associations z matches the observations f (x) computed from the raw data x, given the\nknowledge w from the training set.\nThe remaining tasks, discussed next, are how to learn the parameters w from the training data\n(section 2.2); given these, how to \ufb01nd the best possible associations z (section 2.3); and \ufb01nding\nuseful features (section 2.4).\n\n2.2 Structured Max-Margin Parameter Learning\n\nIn learning the parameters automatically from a training set, we pursue two goals: \ufb01rst, to go beyond\nmanual parameter tweaking in obtaining the best possible performance; and second, to make the\nprocess as facile as possible for the user. This is under the assumption that most experimentalists\n\ufb01nd it easier to specify what a correct tracking should look like, rather than what value a more-or-less\nobscure parameter should have.\nGiven N training frame pairs X = {xn} and their correct associations Z\u2217 = {z\u2217\nthe best set of parameters is the optimizer of\n\nn}, n = 1, . . . , N,\n\nR(w; X, Z\u2217) + \u03bb\u2126(w)\n\nw\n\narg min\n\n(3)\nHere, R(w; X, Z\u2217) measures the empirical loss of the current parametrization w given the train-\ning data X, Z\u2217. To prevent over\ufb01tting to the training data, this is complemented by the reg-\nularizer \u2126(w) that favors parsimonious models. We use L1 or L2 regularization (\u2126(w) =\np/p, p = {1, 2}), i.e. a measure of the length of the parameter vector w. The latter is of-\n||w||p\nten used for its numerical ef\ufb01ciency, while the former is popular thanks to its potential to in-\nduce sparse solutions (i.e., some parameters can become zero). The empirical loss is given by\nR(w; X, Z\u2217) = 1\nn, \u02c6zn(w; xn)). Here \u2206(z\u2217, \u02c6z) is a loss function that measures the\ndiscrepancy between a true association z\u2217 and a prediction by specifying the fraction of missed\nevents w.r.t. the ground truth:\n\n(cid:80)N\ni=1 \u2206(z\u2217\n\nN\n\n\u2206(z\u2217, \u02c6z) =\n\n1\n|z\u2217|\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\ne\u2208E\n\nc\u2208P(C)\n\nc(cid:48)\u2208P(C(cid:48))\n\nc,c(cid:48)(1 \u2212 \u02c6ze\nz\u2217e\n\nc,c(cid:48)).\n\n(4)\n\nThis decomposable function allows for exact inference when solving Eq. 5 [6].\nImportantly, both the input (objects from a frame pair) and output (associations between objects)\nin this learning problem are structured. We hence resort to max-margin structured learning [2] to\nexploit the structure and dependency within the association hypotheses.\nIn comparison to other\naforementioned learning methods, structured learning allows us to directly learn the global af\ufb01nity\nmeasure, avoid generating many arti\ufb01cial false association samples, and drop any assumptions on\nthe signs of the features. Structured learning has been successfully applied to many complex real\nworld problems such as word/sequence alignment [22, 24], graph matching [6], static analysis of\nbinary executables [14] and segmentation [3].\nIn particular, we attempt to \ufb01nd the decision boundary that maximizes the margin between the\ncorrect association z\u2217\nn and the closest runner-up solution. An equivalent formulation is the condition\n\n1For the example in Fig. 1, P(C) = {c1, c2, c3,{c1, c2},{c1, c3},{c2, c3}}.\n\n3\n\n\fthat the score of z\u2217\nn be greater than that of any other solution. To allow for regularization, one can\nrelax this constraint by introducing slack variables \u03ben, which \ufb01nally yields the following objective\nfunction for the max-margin structured learning problem from Eq. 3:\n\n(cid:80)N\n\n1\nN\n\narg min\nw,\u03be\u22650\ns. t.\n\nn=1 \u03ben + \u03bb\u2126(w)\n\n\u2200n,\u2200 \u02c6zn \u2208 Zn : L(xn, z\u2217\n\nn; w) \u2212 L(xn, \u02c6zn; w) \u2265 \u2206(z\u2217\n\nn, \u02c6zn) \u2212 \u03ben,\n\n(5)\n\nn, \u02c6zn) \u2212 \u03ben is known as \u201cmargin-\nwhere Zn is the set of possible consistent associations and \u2206(z\u2217\nrescaling\u201d [24]. Intuitively, it pushes the decision boundary further away from the \u201cbad\u201d solutions\nwith high losses.\n\n2.3\n\nInference and Implementation\n\nSince Eq. 5 involves an exponential number of constraints, the learning problem cannot be rep-\nresented explicitly, let alone solved directly. We thus resort to the bundle method [23] which, in\nturn, is based on the cutting-planes approach [24]. The basic idea is as follows: Start with some\nparametrization w and no constraints. Iteratively \ufb01nd, \ufb01rst, the optimum associations for the current\nw by solving, for all n, \u02c6zn = arg maxz{L(xn, z; w) + \u2206(z\u2217\nn, z)}. Use all these \u02c6zn to identify the\nmost violated constraint, and add it to Eq. 5. Update w by solving Eq. 5 (with added constraints),\nthen \ufb01nd new best associations, etc. pp. For a given parametrization, the optimum associations can\nbe found by integer linear programming (ILP) [16, 21, 13].\nOur framework has been implemented in Matlab and C++, including a labeling GUI for the gen-\neration of training set associations, feature extraction, model inference and the bundle method. To\nreduce the search space and eliminate hypotheses with no prospect of being realized, we constrain\nthe hypotheses to a k-nearest neighborhood with distance thresholding. We use IBM CPLEX2 as\nthe underlying optimization platform for the ILP, quadratic programming and linear programming\nas needed for solving Eq. 5 [23].\n\n2.4 Features\n\nTo differentiate similar events (e.g. division and split) and resolve ambiguity in model inference, we\nneed rich features to characterize different events. In additional to basic features such as size/position\n[21] and intensity histogram [16], we also designed new features such as \u201cshape compactness\u201d for\noversegmentation and \u201cangle pattern\u201d for division. Shape compactness relates the summed areas\nof two object candidates to the area of their union\u2019s convex hull. Angle pattern describes the con-\nstellation of two daughter cells relative to their mother. Features can be de\ufb01ned on a pair of object\ncandidates or on an individual object candidate only. Our features are categorized in Table 1. Note\nthat the same feature can be used for different events.\n\nTable 1: Categorization of features.\n\nFeature Description\ndifference in position, distance to border, overlap with border;\ndifference in intensity histogram/sum/mean/deviation, intensity of father cell;\ndifference in shape, difference in size, shape compactness, shape evenness;\ndivision angle pattern, mass evenness, eccentricity of father cell.\n\nPosition\nIntensity\nShape\nOthers\n\n3 Results\n\nWe evaluated the proposed method on two publicly available image sequences provided in conjunc-\ntion with the DCellIQ project3 [16] and the Mitocheck project4 [12]. The two datasets show a certain\ndegree of variations such as illumination, cell density and image compression artifacts (Fig. 2). The\n\n2http://www-01.ibm.com/software/integration/optimization/cplex-optimizer/\n3http://www.cbi-tmhs.org/Dcelliq/\ufb01les/051606 HeLaMCF10A DMSO 1.rar\n4http://www.mitocheck.org/cgi-bin/mtc?action=show movie;query=243867\n\n4\n\n\fGFP stained cell nuclei were segmented using the method in [19], yielding an F-measure over 99.3%\nby counting. Full ground truth associations for training and evaluation were generated with a Mat-\nlab GUI tool at a rate of approximately 20 frames/hour. Some statistics about these two datasets are\nshown in Table 2.\n\nName\nDCellIQ\nMitocheck\n\nImage Size\n512 \u00d7 672\n1024 \u00d7 1344\n\n100\n94\n\n10664\n24096\n\nTable 2: Some statistics about the datasets in our evaluation.\n\nNo. of Frames No. of Cells\n\nSegm. F-Measure Compressed\n\n99.5%\n99.3%\n\nNo\nYes\n\nFigure 2: Selected raw images from the DCellIQ sequence (top) and the Mitocheck sequence (bot-\ntom). The Mitocheck sequence exhibits higher cell density, larger intensity variability and \u201cblock-\nness\u201d artifacts due to image compression.\n\nTask 1: Ef\ufb01cient Tracking for a Given Sequence\nWe \ufb01rst evaluate our method on a task that is frequently encountered in practice: the user simply\nwishes to obtain a good tracking for a given sequence with the smallest possible effort. For a fair\ncomparison, we extended Pad\ufb01eld\u2019s method [21] to account for the six events described in section\n2.1 and used the same features (viz., size and position) and weights as in [21]. Hand-tuning of the\nparameters results in a high accuracy of 98.4% (i.e. 1 - total loss) as shown in Table 3 (2nd row).\nA detailed analysis of the error counts for speci\ufb01c events shows that the method accounts well for\nmoves, but has dif\ufb01culty with disappearance and split events. This is mainly due to the limited\ndescriptive power of the simple features used. To study the difference between manual tweaking\nand learning of the parameters, we used the learning framework presented here to optimize the\nmodel and obtained a reduction of the total loss from 1.64% to 0.65% (3rd row). This can be\nconsidered as the limit of this model. Note that the learned parametrization actually deteriorates the\ndetection of divisions because the learning aims at minimizing the overall loss across all events. In\nobtaining these results, one third of the entire sequence was used for training, just as in all subsequent\ncomparisons.\nWith 37 features included and their weights optimized using structured learning, our model fully\npro\ufb01ts from this richer description and achieves a total loss of only 0.30% (4th row) which is a\nsigni\ufb01cant improvement over [21, 16] (2nd/7th row) and manual tweaking (6th row). Though a\ncertain amount of efforts is needed for creating the training set, our method allows experimentalists\nto contribute their expertise in an intuitive fashion. Some example associations are shown in Fig. 3.\nThe learned parameters are summarized in Fig. 4 (top). They afford the following observations:\nFirstly, features on cell size and shape are generally of high importance, which is in line with the\nassumption in [21]. Secondly, the correlations of the features with the \ufb01nal association score are\n\n5\n\nT=25T=50T=75T=25T=50T=75\fTable 3: Performance comparison on the DCellIQ dataset. The header row shows the number of\nevents occurring for moves, divisions, appearance, disappearance, splits and mergers. The remaining\nentries give the error counts for each event, summed over the entire sequence.\n\nmov\n10156\n\n71\n21\n15\n22\n56\n-\n18\n\ndiv\n104\n18\n25\n6\n6\n24\n-\n14\n\napp\n78\n16\n5\n4\n9\n16\n-\n2\n\ndis\n76\n26\n5\n1\n3\n19\n-\n0\n\nPad\ufb01eld et al. [21]\n\nPad\ufb01eld et al. w/ learning\n\nOurs w/ learning (L2 regula.)\nOurs w/ learning (L1 regula.)\n\nOurs w/ manual tweaking\n\nLi et al. [16]\n\nLocal learning by Random Forest\n\nspl\n54\n30\n6\n2\n4\n2\n-\n12\n\nmer\n55\n12\n10\n6\n9\n5\n-\n13\n\ntotal loss\n\n-\n\n1.64%\n0.65%\n0.30%\n0.45%\n1.12%\n6.18%a\n0.55%\n\naHere we use the best reported error matching rate in [16] (similar to our loss).\n\nFigure 3: Some diverging associations by [21] (top) and our method (bottom). Color code: yellow\n\u2013 move; red \u2013 division; green \u2013 split; cyan \u2013 merger.\n\nautomatically learned. For example, shape compactness is positively correlated with split but neg-\natively with division. This is in line with the intuition that an oversegmentation conserves compact\nshape, while a true division seemingly pushes the daughters far away from each other (in the present\nkind of data, where only DNA is labeled). Finally, in spite of the regularization, many features are\nassociated with large parameter values, which is key to the improved expressive power.\nTask 2: Tracking for High-Throughput Experiments\nThe experiment described in the foregoing draws both training and test samples from the same time\nlapse experiment. However, in high-throughput experiments such as in the Mitocheck project [12],\nit is more desirable to train on one or a few sequences, and make predictions on many others. To\nemulate this situation, we have used the parameters w trained in the foregoing on the DCellIQ\nsequence [16] and used these to estimate the tracking of the Mitocheck dataset. The main focus of\nthe Mitocheck project is on accurate detection of mitosis (cell division). Despite the difference in\nillumination and cell density from the training data, and despite the segmentation artifacts caused\nby the compression of the image sequence, our method shows a high generalization capability and\nobtains a total loss of 0.78%.\nIn particular, we extract 93.2% of 384 mitosis events which is a\nsigni\ufb01cant improvement over the mitosis detection rate reported in [12] (81.5%, 294 events).\nComparison to Local Af\ufb01nity Learning\nWe also developed a local af\ufb01nity learning approach that is in spirit of [1, 15]. Rather than using\nAdaBoost [9], we chose Random Forest (RF) [4] which provides fairly comparable classi\ufb01cation\npower [7]. We sample positive associations from the ground truth and randomly generate false\nassociations. RF classi\ufb01ers are built for each event independently. The predicted probabilities by\nthe RF classi\ufb01ers are used to compute the overall association score as in Eq. 6 (with the same\nconstraints in Eq. 2). Since we have multiple competing events (one cell can only have a single\n\n6\n\n\fFigure 4: Parameters w learned from the training data with L2 (top) or L1 (bottom) regularization.\nParameters weighing the features for different events are colored differently. Both parameter vectors\nare normalized to unit 1-norm, i.e. (cid:107)w(cid:107)1 = 1.\n\nTable 4: Performance comparison on the Mitocheck dataset. The method was trained on the DCellIQ\ndataset. The header row shows the number of events occurring for moves, divisions, appearance, dis-\nappearance, splits and mergers. The remaining entries give the error counts for each event, summed\nover the entire sequence.\n\nPad\ufb01eld et al. w/ learning\n\nOurs w/ learning (L2 regula.)\nOurs w/ learning (L1 regula.)\n\nLocal learning by Random Forest\n\nmov\n22520\n171\n98\n93\n214\n\ndiv\n384\n85\n26\n35\n281\n\napp\n310\n58\n31\n54\n162\n\ndis\n304\n47\n25\n25\n10\n\nspl\n127\n53\n43\n26\n82\n\nmer\n132\n13\n9\n48\n68\n\ntotal loss\n\n-\n\n1.39%\n0.78%\n0.98%\n2.33%\n\nfate), we also introduce weights {\u03b1e} to capture the dependencies between events. These weights\nare optimized via a grid search on the training data.\n\nL(x, z; w) =\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\ne\u2208E\n\nc\u2208P(C)\n\nc(cid:48)\u2208P(C(cid:48))\n\n\u03b1eProb(f e\n\nc,c(cid:48))ze\n\nc,c(cid:48)\n\n(6)\n\nThe results are shown in Table 3 (8th row) and Table 4 (5th row), which afford the following ob-\nservations. Firstly, a locally strong af\ufb01nity prediction does not necessarily guarantee a better global\nassociation. Secondly, local learning shows particularly weak generalization capability.\nSensitivity to Training Set\nThe success of supervised learning depends on the representativeness (and hence also size) of the\ntraining set. To test the sensitivity of the results to the training data used, we drew different numbers\nof training image pairs randomly from the entire sequence and used the remaining pairs for testing.\nFor each training set size, this experiment is repeated 10 times. The mean and deviation of the losses\non the respective test sets is shown in Fig. 5. According to the one-standard-error-rule, associations\nbetween at least 15 or 20 image pairs are desirable, which can be accomplished in well below an\nhour of annotation work.\n\n7\n\n\u22120.06\u22120.04\u22120.0200.020.04ImportanceFeature Importance (L2)  movdivappdissplmer\u22120.100.1diff. positiondiff. sizediff. shapediff. inten. hist.diff. inten. sumdiff. inten. meandiff. inten. devia.diff. inten. sumangle patternfather intensityfather eccentricitysize evennessshape compactnessmass evennessoverlap with borderdistance to borderdiff. sizediff. inten. sumdiff. inten. meandiff. inten. devia.overlap with borderdistance to borderdiff. sizediff. inten. sumdiff. inten. meandiff. inten. devia.diff. positiondiff. sizediff. shapediff. inten. meanshape compactnessmass evennessdiff. positiondiff. sizediff. shapediff. inten. meanshape compactnessmass evennessImportanceFeature Importance (L1)  \fL1 vs. L2 Regularization\nThe results of L1 vs. L2 regularization are comparable (see Table 3 and Table 4). While L1 regular-\nization yields sparser feature selection 4 (bottom), it has a much slower convergence rate (Fig. 6).\nThe staircase structure shows that, due to sparse feature selection, the bundle method has to \ufb01nd\nmore constraints to escape from a local minimum.\n\nFigure 5: Learning curve of structured learning\n(with L2 regularization).\n\nFigure 6: Convergence rates of structured learn-\ning (L1 vs. L2 regularization).\n\n4 Conclusion & Future Work\n\nWe present a new cell tracking scheme that uses more expressive features and comes with a struc-\ntured learning framework to train the larger number of parameters involved. Comparison to related\nmethods shows that this learning scheme brings signi\ufb01cant improvements in performance and, in\nour opinion, usability.\nWe currently work on further improvement of the tracking by considering more than two frames at\na time, and on an active learning scheme that should reduce the amount of required training inputs.\n\nAcknowledgement\n\nWe are very grateful for partial \ufb01nancial support by CellNetworks Cluster (EXC81), FORSYS-\nViroQuant (0313923), SBCancer, DFG (GRK 1653) and \u201cEnable fund\u201d of University of Heidel-\nberg. We also thank Bjoern Andres, Jing Yuan and Christoph Straehle for their comments on the\nmanuscript.\n\nReferences\n[1] S. Avidan. Ensemble Tracking. In CVPR, 2005.\n[2] G. Bakir, T. Hofmann, B. Schoelkopf, A. J. Smola, B. Taskar, and S. Vishwanathan. Predicting\n\nStructured Data. MIT Press, Cambridge, MA, 2006.\n\n[3] L. Bertelli, T. Yu, D. Vu, and B. Gokturk. Kernelized Structural SVM Learning for Supervised\n\nObject Segmentation. In CVPR, 2011.\n\n[4] L. Breiman. Random Forests. Mach Learn, 45(1):5\u201332, 2001.\n[5] M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. V. Gool. Robust Tracking-\n\nby-Detection using a Detector Con\ufb01dence Particle Filter. In ICCV, 2009.\n\n[6] T. S. Caetano, J. J. McAuley, L. Cheng, Q. V. Le, and A. J. Smola. Learning Graph Matching.\n\nIEEE T Pattern Anal, 31(6):1048\u20131058, 2009.\n\n[7] R. Caruana and A. Niculescu-Mizil. An Empirical Comparison of Supervised Learning Algo-\n\nrithms. In ICML, pages 161\u2013168, 2006.\n\n8\n\n0102030400.20.40.60.811.21.41.6Number of frame pairs for trainingAverage test loss (\u00d7 10\u22122)Sensitivity to Training Data Size10203040506070051015202530Number of constraintsApproximation gap \u03b5 (\u00d7 10\u22123)Convergence Rate (L1 vs. L2)  L1 RegularizationL2 Regularization\f[8] O. Dzyubachyk, W. A. van Cappellen, J. Essers, et al. Advanced Level-Set-Based Cell Track-\n\ning in Time-Lapse Fluorescence Microscopy. IEEE T Med Imag, 29(3):852, 2010.\n\n[9] Y. Freund. An adaptive version of the boost by majority algorithm. Mach Learn, 43(3):293\u2013\n\n318, 2001.\n\n[10] Y. Freund, R. Iyer, R. E. Schapire, , and Y. Singer. An Ef\ufb01cient Boosting Algorithm for\n\nCombining Preferences. J Mach Learn Res, 4:933\u2013969, 2003.\n\n[11] H. Grabner and H. Bischof. On-line Boosting and Vision. In CVPR, 2006.\n[12] M. Held, M. H. A. Schmitz, et al. CellCognition: time-resolved phenotype annotation in high-\n\nthroughput live cell imaging. Nature Methods, 7(9):747\u2013754, 2010.\n\n[13] T. Kanade, Z. Yin, R. Bise, S. Huh, S. E. Eom, M. Sandbothe, and M. Chen. Cell Image\n\nAnalysis: Algorithms, System and Applications. In WACV, 2011.\n\n[14] N. Karampatziakis. Static Analysis of Binary Executables Using Structural SVMs. In NIPS,\n\n2010.\n\n[15] C.-H. Kuo, C. Huang, , and R. Nevatia. Multi-Target Tracking by On-Line Learned Discrimi-\n\nnative Appearance Models. In CVPR, 2010.\n\n[16] F. Li, X. Zhou, J. Ma, and S. Wong. Multiple Nuclei Tracking Using Integer Programming for\n\nQuantitative Cancer Cell Cycle Analysis. IEEE T Med Imag, 29(1):96, 2010.\n\n[17] K. Li, E. D. Miller, M. Chen, et al. Cell population tracking and lineage construction with\n\nspatiotemporal context. Med Image Anal, 12(5):546\u2013566, 2008.\n\n[18] Y. Li, C. Huang, and R. Nevatia. Learning to Associate: HybridBoosted Multi-Target Tracker\n\nfor Crowded Scene. CVPR, 2009.\n\n[19] X. Lou, F. O. Kaster, M. S. Lindner, et al. DELTR: Digital Embryo Lineage Tree Reconstructor.\n\nIn ISBI, 2011.\n\n[20] E. Meijering, O. Dzyubachyk, I. Smal, and W. A. van Cappellen. Tracking in cell and devel-\n\nopmental biology. Semin Cell Dev Biol, 20(8):894 \u2013 902, 2009.\n\n[21] D. Pad\ufb01eld, J. Rittscher, and B. Roysam. Coupled Minimum-Cost Flow Cell Tracking for\n\nHigh-Throughput Quantitative Analysis. Med Image Anal, 2010.\n\n[22] B. Taskar, S. Lacoste-Julien, and M. I. Jordan. Structured Prediction, Dual Extragradient and\n\nBregman Projections. J Mach Learn Res, 7:1627\u20131653, 2006.\n\n[23] C. H. Teo, S. V. N. Vishwanthan, A. J. Smola, and Q. V. Le. Bundle methods for regularized\n\nrisk minimization. J Mach Learn Res, 11:311\u2013365, 2010.\n\n[24] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large Margin Methods for Struc-\n\ntured and Interdependent Output Variables. J Mach Learn Res, 6(2):1453, 2006.\n\n[25] X. Wang, G. Hua, and T. X. Han. Discriminative Tracking by Metric Learning. In ECCV,\n\n2010.\n\n[26] B. Yang, C. Huang, and R. Nevatia. Learning Af\ufb01nities and Dependencies for Multi-Target\n\nTracking using a CRF Model. In CVPR, 2011.\n\n[27] B. Zhong, H. Yao, S. Chen, et al. Visual Tracking via Weakly Supervised Learning from\n\nMultiple Imperfect Oracles. In CVPR, 2010.\n\n9\n\n\f", "award": [], "sourceid": 766, "authors": [{"given_name": "Xinghua", "family_name": "Lou", "institution": null}, {"given_name": "Fred", "family_name": "Hamprecht", "institution": null}]}