Implementation of Text Steganography using two differently conditioned language models.
Our protocol supports different kind of models such as :
- gpt2: small, medium, large, xl
- BERT
- RoBERTa
You can play with our protocol through the notebook DemoStego.ipynb
. You can either run it on your local machine if you have enough computational power, or use the colabs version for a quick demo :
We advice you to have at least 8GB of RAM and ideally a properly working GPU. First, run :
pip install -r requirements.txt
After that :
jupyter-notebook
and select the DemoStego.ipynb
notebook.
After this step, you can just follow the tutorial explained in the notebook
If you have a virtual environement please add you virtual env to the notebook with the following command :
ipython kernel install --name "your-venv" --user
If you do not have enough computational power and want to have a quick try, please refer to our (user-friendly) Google Colab shared notebook
We explain all the experiments that we did for adversarial attacks in this notebook
Fist of all, you need to get the testing data from this public folder. After that :
- Log into Google Drive.
- In Google Drive, make a folder named
data
Inside the notebook :
- Mount the notebook to the Drive
- cd to
/content/drive/MyDrive/data
- Unzip the folder on your Drive (using !unzip <folder_name> inside the colab notebook).
- Replace the path by
/content/drive/MyDrive/data/
- cd again to
/
Just follow the steps inside the notebook
All the protocol is implemented in the files stored inside the folder wrappedCode
.
The important functions for building the corresponding model are defined in the file createModel.py
. There is a specific function for each type of model and the main function calls the correct function for every different type of model.
The function encryptMessage
wrapps all the functions defined in the file encryptionWrapped.py
. Depending on the model you choose, there is a specific rank generating function that we call, and its associated cover text generation function.
The function decryptMessage
wrapps all the functions defined in the file decryptionWrapped.py
. Depending on the model you choose, there is a specific function for rank retrieval and secret generation.
If you want to have a look at the testing set where we have evaluated our models and generated our cover texts, you can have freely access on them at here
- Raw articles new: Contains all the raw articles considered as the secret that we want to share. Those articles have been selected from the DailyMail corpus
- Preconditionings new: Contains the preconditionings associated for every article. Each article that we want to cover has its own specific preconditioning
- Generated texts <model_name>: Contains the generated texts using the model <model_name> where we did our evaluations