__ Summary and Contributions__: The paper proposes and analyzes primal-dual algorithms for a number of online optimization problems, which are augmented by an arbitrary solution prediction.
Given a parameter lambda, which controls the trust in the prediction, the authors provide performance bounds on the obtained solutions based on both the optimal offline solution and the predicted solution.
The theoretical results are complemented by experiments for one of the problems, which illustrate the behavior of the method.

__ Strengths__: The theoretical results appear to be novel for all presented problems and are enough material for a journal publication on the topic.
From what I understand (not being an expert), this work significantly extends the results on online optimization, by incorporating predictions in the primal-dual method.
The results seem very valuable to online optimization community.

__ Weaknesses__: I do not see major weaknesses in this work.
--- Update ---
After the rebuttal, my initial assessment remains.

__ Correctness__: I could not check all technical details (e.g. Lemma 20) due to limited time. I checked the proofs of the main claims, however, and I could not find major flaws. Minor comments are included below.

__ Clarity__: The paper is clearly structured and generally well written. The accessibility of the proofs could be improved a bit at times by defining important quantities outside of the text.

__ Relation to Prior Work__: The related work is discussed along with each presented problem. The differences are laid out clearly.
Another section discussing the commonalities of the presented problems together with the related work may be desirable. I am not an expert for this type of problems, however.

__ Reproducibility__: Yes

__ Additional Feedback__: The proof sketches might be valuable for readers very familiar when the presented techniques. For others it may be better to just discuss the main ideas underlying all presented results and then make sure that the main proofs are easy to read.
Please properly define the value d in line 111. The informal definition used in the text is unclear and misleading.
Questions about proofs:
Please clarify where you assume that w_S >= 1 (line 449).
Why does Alg 4 stop after at most N updates if N^{pred} <= B (proof of Thm 2) ?
In the proof of Lem 15 between line 651 and 652, the second equality is actually an inequality. Also clarify why the prior inequality holds.
Typos:
"banhcard" line 201
summation index lines 647-648

__ Summary and Contributions__: The authors study an adaptation of the primal-dual method that incorporates predictions. The show a (somewhat) black box way of converting algorithms that were designed using the primal dual method using a covering LP to include predictions, which are in the form of the actual prediction A and a 'confidence' \lambda in [0,1]. Using this, they augment previously known state-of-the-art online algorithms for Set Cover, Ski Rental, Bahncard Problem and Dynamic TCP acknowledgement to use advice and are able to upper bound their competitive ratio by two terms: one which includes only \lambda and the ex-post error if the advice is followed blindly and one which includes \lambda and the optimal algorithm, therefore achieving a trade-off that between using the advice, but never straying too far from the worst case, even if the advice is useless.
To briefly summarize the primal-dual method for covering problems: a covering problem is one whose offline version can be solved using a covering LP, i.e. one where all constraints are of the form A*X >= B where A, B contains positive coefficients. To design the online algorithm, the LP is initially almost empty but at every turn, a new set of constraints is added while the variables stay the same. Then, some variables are continuously increased until the new constraint is met, while also keeping the increase of the dual variables as small as possible to obtain a guarantee by weak duality. The type of constraints is very important: having only non-negative coefficients can capture problems where different sets of resources can be spent at different rates (set cover, ski rental etc) but usually not ones where the solution can move 'back-and-forth', such as the k-server problem. In their approach the actual LP is not modified, however when updating the variables each time a constraint is added the change can be biased to move slightly faster towards the advice. This change usually doesn't disrupt the primal-dual proof, so a slightly weaker bound against the optimal algorithm is easy, while at the same time a guarantee against following the advice blindly can be obtained using a slightly ad-hoc, charging argument.
Finally, the authors have included experiments that showcase that the performance is indeed better for small inputs, and is not drowned by hidden factors or only holds asymptotically.

__ Strengths__: 1) The paper provides a generalized way of adapting a specific class of online algorithms to incorporate advice.
2) The actual settings studied in the paper are interesting in their own right and are not just toy problems.
3) The paper managed to fit enough interesting content in 8 pages, even though it's theoretical. Someone could decide if this approach works for them without having to go through the full version.

__ Weaknesses__: 1) I am not sure I would call the approach black box in the strict sense, as there is still a fair amount of tailor made analysis for each setting (and the algorithm modification, although intuitive, still requires a deep understanding of the problem).
2) Covering problems are generally very well understood in the online literature. Are there any interesting covering problems left for others to apply this technique?

__ Correctness__: I did not read everything in the supplementary material, but the main lemmata and theorems statements follow what would be expected for an online paper using the primal dual method.

__ Clarity__: Yes, it was fun to read.

__ Relation to Prior Work__: Yes.

__ Reproducibility__: Yes

__ Additional Feedback__: After reading the rebuttal and discussing with the other reviewers, I am confident that this paper is a very nice result indeed.

__ Summary and Contributions__: This paper presents a new version of the Primal-Dual algorithm for solving online set-covering problems, which is able to take into account a solution predicted by any other algorithm.
The Primal-Dual algorithm for online problems builds fractional solutions by augmenting the primal and the dual solutions each time a new item to be covered becomes available (the problem is considered online). The proposed modification for handling predicted solution consists basically of augmenting the variable of the primal solutions using a convex combination of the increment given by the original Primal-Dual algorithm (see Algorithm 2), and a quantity which consider the predicted integral solution given in input to the new algorithm. The constant (parameter) \lambda of the convex combination, which is indeed in the interval [0,1], should reflect how much we trust the predicted solution: for \lambda=0 the online algorithm will exactly build the predicted solution; for \lambda=1, the algorithm will produce the same solution of the original Primal-Dual algorithm.
The proposed algorithm is detailed for three different problems, which are all a variant of the online set covering problem: the ski rental problem, the Banchard problem, and a Dynamic TCP acknowledgment problem. For each variant of the set covering problem, the authors provide a formal bound obtained with their algorithm, which is a specialization of their Theorem 1.

__ Strengths__: The main advantage of this paper is that it presents a simple and intuitive idea for building fractional solutions for online problems which can benefit from a predicted solution. In addition, the authors prove a formal bound for each variant of the set covering problem considered.
The results given in the theorems are intuitive and the proofs look correct.

__ Weaknesses__: The main weakness of this paper is that the three problems considered are very similar, and the computational results are limited and somehow artificial.
Moreover, it inherits the same issues of the Primal-Dual algorithm: it only build fractional solutions, hence, we are still left with the problem of building online an integral solution. Little comments are provided with respect with this issue.
The computational results are limited and they are restricted only to the last problem (Dynamic TCP Acknowledgement).

__ Correctness__: The results (the five theorems in the paper) looks correct, and they generalize previous results on standard online algorithms.

__ Clarity__: All the algorithms could be better presented, for instance, by reporting only Algorithms 2, 4, and 6 with the line numbers, and then by specifying which lines are present only in the Augmented version (and by omitting the description for Algorithms 1, 3, and 5). This way the authors could save space to better explain how to build integral solutions.
Otherwise, the paper is well written, but a general conclusion is missing.

__ Relation to Prior Work__: The paper clearly presents related works and it discusses how it generalize previous results, appeared recently in the ML literature, as for instance [25].

__ Reproducibility__: Yes

__ Additional Feedback__: UPDATE AFTER REBUTTAL:
We are thankful to the authors for their precise answer. After reading the rebuttal and the comments of the other reviewers, we have increased the overall score.
---------------------------------------------------------------------------------------------------
The main questions for the authors are:
- how do you recover optimal integral solutions, from the fractional solutions you get with your algorithms?
- do standard dataset exist for benchmarking online algorithms, likely with data coming from real industrial applications?
- could the authors benefit from distribution errors coming from the predictors?