__ Summary and Contributions__: This paper considers learning from mixtures of private and public populations -- where some individuals require privacy of their data and some do not. The data content may also be correlated with the privacy requirement, so it is not sufficient to learn only on one population, or to treat these as iid samples. For example, in a medical study, healthy individuals may not require privacy of their data since its "not embarrassing", and sick individuals may require strong differential privacy guarantees. The authors provide a formal model and definition for this new setting, and show how to learn halfspaces in this model.

__ Strengths__: I support accepting this paper. To the best of my knowledge, it is the first work to formalize this problem. It addresses a novel technical problem of practical importance, introduces a new model and accompanying definitions, provides solid technical results, and is well written.

__ Weaknesses__: Minor comments:
The privacy model introduced on page 5 was confusing because of its name. It sounds like a new privacy definition or an attack model, when it was actually describing the conditional distribution of privacy bits.

__ Correctness__: Yes

__ Clarity__: Yes

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: EDIT: I have read the authors' response, and I remain enthusiastic about accepting the paper. I think the authors did a great job of responding to comments from all three reviewers.

__ Summary and Contributions__: This paper studies the problem of supervised learning from mixtures of private the public populations when the private and public data are drawn from different subpopulations. This paper focuses on learning halfspaces, and provides an differentially private algorithm. They also give sample complexity analysis for the algorithm under the label-determined privacy model, and show that the sample complexity is comparable to that of the non-private PAC learning.

__ Strengths__: The paper consider an interesting and practical angle that the private and public data are from different populations. I can see lots of extensions along this direction.
The algorithm is clean and the sample complexity analysis is tight.

__ Weaknesses__: I think the label-determined privacy model is too restricted. Intuitively, the theoretical sample complexity should be comparable to that of the non-private one, since the additional noise only added to the private data that behaves very differently. I'd expect some simple but a bit more advance model, like a bernoulli model characterizes private status and the target label, and the sample complexity accounts for the parameter p.

__ Correctness__: I didn't fully examine the sample complexity but the sketch seems correct.

__ Clarity__: This is a well-written paper.

__ Relation to Prior Work__: Mostly yes. I'd like to see a short discussion on the line of work on private knowledge transfer, and private learning halfspaces.

__ Reproducibility__: Yes

__ Additional Feedback__: EDIT: The authors made good response. I am convinced that the sample complexity under the label-determined model is non-trivial. I'm happy to increase my score. One more additional comment (just for future work) is that I'm curious whether reweighting in knowledge transfer would be useful in this setting when the two distributions are different.

__ Summary and Contributions__: The paper studies a new model of privacy for supervised learning. The authors propose a model where the dataset is made up of samples from a mixture of a private data distribution and a public data distribution and the goal of the learning paradigm is to minimize excess error over the marginal distribution (marginalizing over the privacy status) while satisfying the differential privacy constraints only with respect to the private data and they call it Private-Public Mixture (PPM) learning.
The authors also give an improper learning algorithm for learning halfspaces under the PPM learning model under the assumption that the fact that the data is private or not is perfectly correlated with the just label of the data and they call this assumption label-determined privacy model, for example the data with label 1 is private and with label 0 is not private. The authors give sample complexity results for excess risk and the hypothesis returned is a combination of up to $d$ half-spaces (because it’s an improper learner).
The authors do mention that similar guarantees also hold under small relaxations to this label-determined privacy model.

__ Strengths__: The proposed model of preserving privacy with respect to just the private data but having access to public data where no privacy concerns are required is not exactly new as prior works [eg Bassily et al’ 18] have also considered the same. The novelty in this work is that the model assumes different distributions for private and public data whereas all prior works assumed the public data was from the same distribution as the private data and I do believe this model has novelty. The authors motivate why this model of privacy requirement with different distributions for private and public data is realistic using a couple of examples about health data, and credit score data. The authors give an intuitive framework for learning under this setting which they formalize as PPM learning.
The sample complexity results for half-spaces under the label-determined privacy model also has some merit as it shows the existence of a learner (albeit improper) under the PPM learning framework but I do feel the label-determined privacy model is very restrictive.
-------------------------------------------------------------------------------
Edit after author feedback: I have read the rebuttal. I do agree that there is definitely novelty in the setting proposed by the authors and since it's the first result, I can let the restricted label-determined privacy model slide. I do believe this work can inspire a lot of future work.

__ Weaknesses__: The extra novelty in the new privacy framework introduced in this work is only that the distribution of the private and public data is different as using public data to aid private learning has already been studied before.
The results for the halfspaces are under the very restrictive label-determined privacy settings and I wonder what kind of results are even possible without making such assumptions about the privacy model. The learning algorithm given for halfspaces is an improper learning algorithm (i.e. the hypothesis returned may not be in the hypothesis class being compared to) and is also not polytime in the dimension as the exponential mechanism requires sampling from $O(2^d . n^{d^2})$ hypothesis and I wonder if there is an efficient algorithm for the same. The sample complexity given is also O(d^2) and I wonder if it’s possible to get something close to the non-private case of only O(d). A discussion about it would have been appreciated. The tools used for proving the half-space case are also specialized and these ideas may not generalize to general classes of bounded VC dimension.

__ Correctness__: The claims are correct as per my understanding. There are no experiments in the paper.

__ Clarity__: The paper is well written for a draft and was easy to follow. The setting was well motivated and the appropriate background information has been provided to understand the context and follow the paper.

__ Relation to Prior Work__: Yes, authors do discuss prior work about learning while preserving privacy and about prior work on learning with public and private data and how those settings differ from the one studied there. Maybe comparison/discussion for the sample complexity of halfspaces with the non-private case and the private/public but with the same distribution setting should have been included.

__ Reproducibility__: Yes

__ Additional Feedback__: