Log file for this run: /data/moran/distiller_KURE_2/Robust_Quantization_With_KURE/distiller_with_kure/distiller/distiller/quantization/logs/2020.06.08-172603/2020.06.08-172603.log
Random seed: 51433
=> created a resnet18 model with the imagenet dataset
The "--resume" flag is deprecated. Please use "--resume-from=YOUR_PATH" instead.
If you wish to also reset the optimizer, call with: --reset-optimizer
=> loading checkpoint ../../examples/classifier_compression/logs/2020.06.07-115638/best.pth.tar
=> Checkpoint contents:
+----------------------+-------------+----------+
| Key                  | Type        | Value    |
|----------------------+-------------+----------|
| arch                 | str         | resnet18 |
| compression_sched    | dict        |          |
| dataset              | str         | imagenet |
| epoch                | int         | 37       |
| extras               | dict        |          |
| is_parallel          | bool        | True     |
| optimizer_state_dict | dict        |          |
| optimizer_type       | type        | SGD      |
| quantizer_metadata   | dict        |          |
| state_dict           | OrderedDict |          |
+----------------------+-------------+----------+

=> Checkpoint['extras'] contents:
+--------------+--------+---------+
| Key          | Type   |   Value |
|--------------+--------+---------|
| best_epoch   | int    |  37     |
| best_top1    | float  |  70.348 |
| current_top1 | float  |  70.348 |
+--------------+--------+---------+

Loaded compression schedule from checkpoint (epoch 37)
Loaded quantizer metadata from the checkpoint
Optimizer of type <class 'torch.optim.sgd.SGD'> was loaded from checkpoint
Optimizer Args: {'initial_lr': 0.001, 'nesterov': False, 'momentum': 0.9, 'lr': 0.0001, 'weight_decay': 0.0001, 'dampening': 0}
Preparing model for quantization using DorefaQuantizer
Parameter 'module.layer1.0.conv1.weight' will be quantized to 6 bits
Parameter 'module.layer1.0.conv2.weight' will be quantized to 6 bits
Parameter 'module.layer1.1.conv1.weight' will be quantized to 6 bits
Parameter 'module.layer1.1.conv2.weight' will be quantized to 6 bits
Parameter 'module.layer2.0.conv1.weight' will be quantized to 6 bits
Parameter 'module.layer2.0.conv2.weight' will be quantized to 6 bits
Parameter 'module.layer2.0.downsample.0.weight' will be quantized to 6 bits
Parameter 'module.layer2.1.conv1.weight' will be quantized to 6 bits
Parameter 'module.layer2.1.conv2.weight' will be quantized to 6 bits
Parameter 'module.layer3.0.conv1.weight' will be quantized to 6 bits
Parameter 'module.layer3.0.conv2.weight' will be quantized to 6 bits
Parameter 'module.layer3.0.downsample.0.weight' will be quantized to 6 bits
Parameter 'module.layer3.1.conv1.weight' will be quantized to 6 bits
Parameter 'module.layer3.1.conv2.weight' will be quantized to 6 bits
Parameter 'module.layer4.0.conv1.weight' will be quantized to 6 bits
Parameter 'module.layer4.0.conv2.weight' will be quantized to 6 bits
Parameter 'module.layer4.0.downsample.0.weight' will be quantized to 6 bits
Parameter 'module.layer4.1.conv1.weight' will be quantized to 6 bits
Parameter 'module.layer4.1.conv2.weight' will be quantized to 6 bits
Quantized model:

DataParallel(
  (module): ResNet(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): DistillerBasicBlock(
        (conv1): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (add): EltwiseAdd()
        (relu2): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
      )
      (1): DistillerBasicBlock(
        (conv1): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (add): EltwiseAdd()
        (relu2): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
      )
    )
    (layer2): Sequential(
      (0): DistillerBasicBlock(
        (conv1): Conv2d(
          64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(
            64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False, 
            Distiller_QuantAwareTrain: weight --> 6 bits
          )
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (add): EltwiseAdd()
        (relu2): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
      )
      (1): DistillerBasicBlock(
        (conv1): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (add): EltwiseAdd()
        (relu2): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
      )
    )
    (layer3): Sequential(
      (0): DistillerBasicBlock(
        (conv1): Conv2d(
          128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(
            128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False, 
            Distiller_QuantAwareTrain: weight --> 6 bits
          )
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (add): EltwiseAdd()
        (relu2): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
      )
      (1): DistillerBasicBlock(
        (conv1): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (add): EltwiseAdd()
        (relu2): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
      )
    )
    (layer4): Sequential(
      (0): DistillerBasicBlock(
        (conv1): Conv2d(
          256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(
            256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False, 
            Distiller_QuantAwareTrain: weight --> 6 bits
          )
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (add): EltwiseAdd()
        (relu2): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
      )
      (1): DistillerBasicBlock(
        (conv1): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ClippedLinearQuantization(num_bits=6, clip_val=1, inplace)
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False, 
          Distiller_QuantAwareTrain: weight --> 6 bits
        )
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (add): EltwiseAdd()
        (relu2): ReLU(inplace=True)
      )
    )
    (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
    (fc): Linear(in_features=512, out_features=1000, bias=True)
  )
)

Optimizer of type <class 'torch.optim.sgd.SGD'> was loaded from checkpoint
Optimizer Args: {'initial_lr': 0.001, 'nesterov': False, 'momentum': 0.9, 'lr': 0.0001, 'weight_decay': 0.0001, 'dampening': 0}
=> loaded checkpoint '../../examples/classifier_compression/logs/2020.06.07-115638/best.pth.tar' (epoch 37)

reset_optimizer flag set: Overriding resumed optimizer and resetting epoch count to 0
Dataset sizes:
	test=50000
Loading checkpoint from ../../examples/classifier_compression/logs/2020.06.07-115638/best.pth.tar
START
Could not find the logger configuration file (logging.conf) - using default logger configuration

--------------------------------------------------------
Logging to TensorBoard - remember to execute the server:
> tensorboard --logdir='./logs'

best point: [6, 1.2024588335533528, 70.314]
[x, loss, top1] = [2, 20.456014282849363, 0.08200000000000429]
[x, loss, top1] = [3, 10.082381384713308, 0.30799999999999716]
[x, loss, top1] = [4, 2.4197814428076474, 47.333999999999996]
[x, loss, top1] = [5, 1.2751472242632689, 68.71199999999999]
[x, loss, top1] = [6, 1.2024793168719943, 70.312]
[x, loss, top1] = [8, 1.2275072950489665, 69.704]
[x, loss, top1] = [fp32, 1.226094699331692, 69.704]
Data saved to resnet18_WALLANone_BCWF.pkl
