# FKEA Clustering Demo

This repository contains a demo script for performing image and text clustering using the `FKEA_Evaluator` class from the FKEA package. The script demonstrates how to set up and run the FKEA evaluator for clustering tasks and compute diversity scores for both image and text datasets.

## Requirements

- Python 3.x
- FKEA package
- pickle
- torch
- torchvision (for image processing)
- transformers
- datasets

## Image Clustering
The script performs image clustering using a specified feature extractor and evaluates the diversity of the clustering. Set the parameters for the FKEA evaluator:

num_samples: Number of samples to use for clustering.
sigma: Gaussian Kernel Bandwidth Parameter.
rff_dim: Dimensionality for random Fourier features (note that final dimension of the matrix is two times this number).
Initialize the FKEA_Evaluator with the desired settings and set the feature extractor.

Create an ImageFilesDataset with the path to your image dataset. Then compute clusters and scores

## Text Clustering
The script also performs text clustering and evaluates the diversity of the clustering using a specified feature extractor for text embeddings. Set the parameters for the FKEA evaluator as in the image clustering section.

Initialize the FKEA_Evaluator and set the feature extractor. Load the text data and corresponding embeddings. Currently, only image data provides data embedding functionality, text module requires external embedding generation which could be loaded directly into this script.

## Notes
- Ensure you update the paths and filenames to match your local environment.
- Adjust the result_name, and other parameters as needed for your specific use case.
- The script assumes that the embeddings for text data are stored in a pickle file (or equivalent) and can be converted to a torch tensor.


This `README.md` file provides a clear and concise guide for users to understand and run the demo script, including installation instructions, usage examples, and parameter explanations. Adjust the paths and filenames in the script to match your specific environment and dataset.

