GitXplorerGitXplorer
d

guess-master

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
bb83f51c4e647654a2c84bca248a240f1415c7ff

chore: more objects and small fixes

nnotentered committed a year ago
Unverified
f2432920cb9b388b2049f44847ca57971ab1cc21

chore: stage some files

nnotentered committed a year ago
Unverified
c87d3e504a669403984cce42c8bb439c10569e0c

feat: various small changes

nnotentered committed a year ago
Unverified
a0a2a967d63bfdfb02b7f94a97f027bb84f7f315

feat: more UI improvements

nnotentered committed a year ago
Unverified
8ece725a99aab5133ddcb1f373bb05a7dd3ea0fb

feat: speech recognition

nnotentered committed a year ago
Unverified
02011e797f659e606ade4411c334d7e51277bbd4

feat: new visuals

nnotentered committed a year ago

README

The README file for this repository.

guess-master

guess-master is an attempt at using AI for creating a voice and visual conversational tool for children with certain disabilities.

The main idea is that the guess-master shows DALL-E-generated images to the user and asks them to guess them. There is a single object in every image. Conversation and choice of object is is handled by a GPT-4 LLM.

Users interact with guess-master via voice and guess-master responds with voice too. Text to speech is handled by an ElevenLabs model.

An interesting aspect is that we ask GPT-4 to generate JSON documents such that the object for the image is returned in a separate field and is not mentioned in the response to the user.

Running

  1. Put your OpenAI API key in the conf/openai_api_key.txt file.
  2. Put your ElevenLabs API key in the conf/elevenlabs_api_key.txt file.
  3. Create a voice in the ElevenLabs Website and get its ID.
  4. Put your ElevenLabs voice ID in the conf/voice_id.txt file.
  5. Create a virtual environment, e.g. python3 -m venv path_to_venv.
  6. Activate it, e.g. source path_to_venv/bin/activate.
  7. Install requirements by pip install -r requirements.txt.
  8. Optionally, change assistant instructions in conf/assistant_instructions.txt.
  9. Create an OpenAI assistant by python ./create_assistant.py.
  10. Put a list of object (new line separated) in conf/objets.txt.
  11. Create the folder db/images.
  12. Generate images by python ./generate_object_images.py.
  13. Generate sad face/happy face images by python ./generate_feedback_images.py.
  14. Generate (e.g. in ElevenLabs) an initial voice message and put it as db/start.mp3
  15. Run the server by flask --app server run -p 5002.
  16. Optionally, run the client simulator by python ./client_simulator.py or interact with the server in a different way (e.g. Web App).
  17. Optinally, open localhost:5002/index.html for a web interface.

Server API

The server exposes two endpoints:

  • startThread - starts a fresh new thread and doesn't require any parameters
  • sendPrompt - sends a user prompt in an existing thread and requires the following parameters:
    • thread_id - the thread ID in which to send the prompt
    • prompt - the prompt text

Both endpoints return a JSON with the following fields:

  • thread_id - the thread ID the response from guess-master is for
  • text - response from guess-master
  • audio - a base64-encoded MP3 file of the text from guess-master
  • new_object - a decription of a new object if this is the first message in a thread or if the user guessed the previous object and a new one is generated (new_object is not present in other cases)
  • new_image - an URL to an image of the new_object (new_image is only present when new_object is present)

A thing to note is that the server is stateless itself. State is kept in the OpenAI's thread.

Example request/responses look like:

// startThread request
{}

// startThread response
{
    "thread_id": "abcddd",
    "text": "Yes, this is a cat! What do you think the next object is?",
    "audio": "aabbbbb....",
    "new_object": "big orange baloon",
    "new_image": "http://...."
}
// sendPrompt request with a correct guess 
{
    "thread_id": "abcddd",
    "prompt": "I think this is a cat!"
}

// sendPrompt response for a correct guess
{
    "thread_id": "abcddd",
    "text": "Yes, this is a cat! What do you think the next object is?",
    "audio": "aabbbbb....",
    "new_object": "big orange baloon",
    "new_image": "http://...."
}
// sendPrompt request with an incorrect guess 
{
    "thread_id": "abcddd",
    "prompt": "I think this is a cat!"
}

// sendPrompt response for an incorrect guess and a hint
{
    "thread_id": "abcddd",
    "text": "Nope, this is not a cat. It is a much bigger animal. What do you think it is?",
    "audio": "aabbbbb....",
}

Frontend

The frontend runs on voice recognition by default. This works only in Chrome/Edge/Safari. However the chat history as well as a text field to write and a submit button can be used by clicking of the "Слушам те!" label in the main screen of the game.