ComfyUI Middleware

Introduction

This project serves as middleware for ComfyUI, designed to wrap various workflows into accessible APIs. It primarily supports my undergraduate thesis project, which aims to empower a narrative card game with multimodal AI capabilities.

Motivation

With the evolution of the gaming industry, especially narrative games, the demand for immersive experiences has significantly increased. Game developers face substantial content creation requirements, which traditionally involve considerable human and time resources. Multimodal AI technology can significantly optimize this process, providing more efficient and flexible development tools.

This research aims to build a comprehensive model workflow using Python and TypeScript, integrating ms-swift and ComfyUI. By leveraging large language models like Qwen2.5Coder-7B and Qwen2.5-7B-Instruct, as well as text-to-image models like SDXL or Flux, we provide customized multimodal API tools for narrative games. The goal is to enhance the speed and quality of game content generation, thereby reducing development costs and improving efficiency.

Objectives

Build and Train Custom Models: Train and optimize large language models and text-to-image models using ms-swift and sd-scripts based on specific game requirements.
Optimize Multimodal Workflows: Use ComfyUI to build efficient image generation workflows, and integrate them with NodeJS (Express, Nodemon, Axios) for API encapsulation, enabling efficient reuse and rapid generation.
API Integration and Performance Optimization: Optimize API call efficiency through POST requests and Promise concurrency control, and integrate with mainstream game engines like Unity and Unreal Engine (UE).
Evaluation and Validation: Assess model generation effects using quality evaluation metrics (SSIM, FID, CLIPScore) and LLM Bench, and validate the project's practical impact.

Installation

Requirements

Steps

Clone the repository:

git clone https://github.com/neverbiasu/comfyui-midware.git
cd comfyui-midware

Install dependencies:
```
npm install
```
Set up environment variables: Create a .env file in the root directory and add the following:
```
COMFYUI_DIR=/path/to/comfyui
```
Run the application:
```
npm run dev
```

Exposing Local Server

Cloudflare Tunnel allows you to expose your local server to the public.

Installation

Download and Install: Get Cloudflare Tunnel from the Cloudflare Tunnel website.

Configuration

Login to Cloudflare:

cloudflared login

Authenticate with your Cloudflare account in PowerShell.

Start Tunnel:

cloudflared tunnel --url http://localhost:3000

Run the Cloudflare Tunnel command to generate a public URL for your local server.

API Usage

Available Endpoints

Style Transfer: /api/style_transfer
Text to Portrait: /api/text_to_portrait

Example Request

Style Transfer

curl -X POST http://localhost:3000/api/style_transfer \
  -F "content=@path/to/content.jpg" \
  -F "style=@path/to/style.jpg" \
  -F "positivePrompt=positive prompt text" \
  -F "negativePrompt=negative prompt text"

Text to Portrait

curl -X POST http://localhost:3000/api/text_to_portrait \
  -F "text=a girl"

Conclusion

This project demonstrates the application of multimodal AI in narrative game development, providing a comprehensive workflow and API tools to enhance game content generation. By leveraging large language models and text-to-image models, we aim to improve the efficiency and quality of game development, offering a valuable toolset for developers.

ComfyUI-Midware

Commits

feat: Add text to scene API

feat: Add random_seed

refactor: Extract a sendImage function

docs: Init README

chore: Fix some issues

feat: Init text to portrait api

README

ComfyUI Middleware

Introduction

Motivation

Objectives

Installation

Requirements

Steps

Exposing Local Server

Installation

Configuration

API Usage

Available Endpoints

Example Request

Style Transfer

Text to Portrait

Conclusion