This project provides implementation of (a) the code used to generate Figures 1 and 2 in the paper (on synthetic data), (b) clustering images from CIFAR dataset using (WMC) model in the paper.

Dependencies to generate Figures 1 and 2:
numpy, spams, scipy, sklearn, itertools, matplotlib, cvxopt 

Dependencies to cluster images from CIFAR dataset
torch, clip, numpy, spams, scipy, sklearn, torchvision, itertools, matplotlib, cvxopt 


1. To recreate Figure 1 in the paper, execute the following 2 lines:
python WMC_synthetic/240519_compute_geometric_Lambdavlambda.py
python WMC_synthetic/240519_plot_geometric_Lambdavlambda.py

2. To recreate Figure 2 in the paper, execute the following 2 lines:
python WMC_synthetic/240519_compute_geometric_LambdavN.py
python WMC_synthetic/240519_plot_geometric_LambdavN.py

3. To clustering images using (LWMC) and (EWMC) model: 

(a) When using for the first time, execute the following: 

python exps/240206-export-clip-features.py --dataset {dataset name}

where {dataset name} = cifar10 or cifar100 or cifar100coarse

This will export clip features and save them.

(b) Next, execute the following:

To cluster using (EWMC):
python exps/240301-benchmark-mc.py --dataset {dataset name} --save_path {output directory path} --expo_weights

To cluster using (LWMC):
python exps/240301-benchmark-mc.py --dataset {dataset name} --save_path {output directory path}

where 
{dataset name} = cifar_10_clip or cifar_100_clip or cifar_20_clip
{output directory path} = path of directory where output will be saved

This will solve the (WMC) model using the algorithm using an adaptation of the method here 'https://github.com/ChongYou/subspace-clustering/tree/master'

Optionally, you can include the following arguments
--N {number of data samples}: This takes an integer value <60000 as input and clusters first N images in the dataset
--seed {integer seed value}

(c) To construct affinity using a combination of three different strategies, and cluster the images using k-means, execute the following:

python exps/240329_MC_generate_affinity.py --dataset {dataset name} --save_path {output directory path}

{dataset name} = cifar_10_clip or cifar_100_clip or cifar_20_clip

Optionally, you can include the following arguments
--N {number of data samples}: This takes an integer value <60000 as input and clusters first N images in the dataset

Note: {output directory path} and N given as input argument in this step should the same as defined in Step (b)


