STLM: Steganography in Text using Language Models

Implementation of Text Steganography using two differently conditioned language models.

Our protocol supports different kind of models such as :

gpt2: small, medium, large, xl
BERT
RoBERTa

Usage

You can play with our protocol through the notebook DemoStego.ipynb. You can either run it on your local machine if you have enough computational power, or use the colabs version for a quick demo :

Experiments in the local machine:

We advice you to have at least 8GB of RAM and ideally a properly working GPU. First, run :

pip install -r requirements.txt

After that : jupyter-notebook and select the DemoStego.ipynb notebook. After this step, you can just follow the tutorial explained in the notebook

If you have a virtual environement please add you virtual env to the notebook with the following command :

ipython kernel install --name "your-venv" --user

Run on Google Colabs

If you do not have enough computational power and want to have a quick try, please refer to our (user-friendly) Google Colab shared notebook

Experiments on Adversarial attacks

We explain all the experiments that we did for adversarial attacks in this notebook

Requirements

Fist of all, you need to get the testing data from this public folder. After that :

Log into Google Drive.
In Google Drive, make a folder named data

Inside the notebook :

Mount the notebook to the Drive
cd to /content/drive/MyDrive/data
Unzip the folder on your Drive (using !unzip <folder_name> inside the colab notebook).
Replace the path by /content/drive/MyDrive/data/
cd again to /

Run the notebook

Just follow the steps inside the notebook

Core structure

Protocol

All the protocol is implemented in the files stored inside the folder wrappedCode.

Building the models :

The important functions for building the corresponding model are defined in the file createModel.py. There is a specific function for each type of model and the main function calls the correct function for every different type of model.

Encryption :

The function encryptMessage wrapps all the functions defined in the file encryptionWrapped.py. Depending on the model you choose, there is a specific rank generating function that we call, and its associated cover text generation function.

Decryption :

The function decryptMessage wrapps all the functions defined in the file decryptionWrapped.py. Depending on the model you choose, there is a specific function for rank retrieval and secret generation.

Testing set

If you want to have a look at the testing set where we have evaluated our models and generated our cover texts, you can have freely access on them at here

Folder structure

Raw articles new: Contains all the raw articles considered as the secret that we want to share. Those articles have been selected from the DailyMail corpus
Preconditionings new: Contains the preconditionings associated for every article. Each article that we want to cover has its own specific preconditioning
Generated texts <model_name>: Contains the generated texts using the model <model_name> where we did our evaluations

stego_ml

Commits

Update encryptionWrapped.py

Change in Precond

Indentation Problem

Merge branch 'main' of github.com:CS-433/cs-433-project-2-stego_ml into main

Changes in Docstrings

Update requirements.txt

README