GitXplorerGitXplorer
n

ComfyUI-Midware

public
2 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
0fa4ea907d7eb4e7edacb6c8f8b05830840bdccd

feat: Add text to scene API

nneverbiasu committed 2 months ago
Unverified
50a34a5cc0923cd71965d9bacb95201963119238

feat: Add random_seed

nneverbiasu committed 2 months ago
Unverified
5ab038dceb8070f60e5dc9742a9ae109f7f831f8

refactor: Extract a sendImage function

nneverbiasu committed 2 months ago
Unverified
523328e1f70df2948604568f5f2adf4c725303cf

docs: Init README

nneverbiasu committed 2 months ago
Unverified
4b810f31ac9ceb84198f15bcbaa837977046204e

chore: Fix some issues

nneverbiasu committed 2 months ago
Unverified
e8f2e44e21bb7cea3e200dec0b521193189cd485

feat: Init text to portrait api

nneverbiasu committed 2 months ago

README

The README file for this repository.

ComfyUI Middleware

Introduction

This project serves as middleware for ComfyUI, designed to wrap various workflows into accessible APIs. It primarily supports my undergraduate thesis project, which aims to empower a narrative card game with multimodal AI capabilities.

Motivation

With the evolution of the gaming industry, especially narrative games, the demand for immersive experiences has significantly increased. Game developers face substantial content creation requirements, which traditionally involve considerable human and time resources. Multimodal AI technology can significantly optimize this process, providing more efficient and flexible development tools.

This research aims to build a comprehensive model workflow using Python and TypeScript, integrating ms-swift and ComfyUI. By leveraging large language models like Qwen2.5Coder-7B and Qwen2.5-7B-Instruct, as well as text-to-image models like SDXL or Flux, we provide customized multimodal API tools for narrative games. The goal is to enhance the speed and quality of game content generation, thereby reducing development costs and improving efficiency.

Objectives

  1. Build and Train Custom Models: Train and optimize large language models and text-to-image models using ms-swift and sd-scripts based on specific game requirements.
  2. Optimize Multimodal Workflows: Use ComfyUI to build efficient image generation workflows, and integrate them with NodeJS (Express, Nodemon, Axios) for API encapsulation, enabling efficient reuse and rapid generation.
  3. API Integration and Performance Optimization: Optimize API call efficiency through POST requests and Promise concurrency control, and integrate with mainstream game engines like Unity and Unreal Engine (UE).
  4. Evaluation and Validation: Assess model generation effects using quality evaluation metrics (SSIM, FID, CLIPScore) and LLM Bench, and validate the project's practical impact.

Installation

Requirements

Steps

  1. Clone the repository:

    git clone https://github.com/neverbiasu/comfyui-midware.git
    cd comfyui-midware
  2. Install dependencies:

    npm install
  3. Set up environment variables: Create a .env file in the root directory and add the following:

    COMFYUI_DIR=/path/to/comfyui
  4. Run the application:

    npm run dev

Exposing Local Server

Cloudflare Tunnel allows you to expose your local server to the public.

Installation

  1. Download and Install: Get Cloudflare Tunnel from the Cloudflare Tunnel website.

Configuration

  1. Login to Cloudflare:
cloudflared login

Authenticate with your Cloudflare account in PowerShell.

  1. Start Tunnel:
cloudflared tunnel --url http://localhost:3000

Run the Cloudflare Tunnel command to generate a public URL for your local server.

API Usage

Available Endpoints

  • Style Transfer: /api/style_transfer
  • Text to Portrait: /api/text_to_portrait

Example Request

Style Transfer

curl -X POST http://localhost:3000/api/style_transfer \
  -F "content=@path/to/content.jpg" \
  -F "style=@path/to/style.jpg" \
  -F "positivePrompt=positive prompt text" \
  -F "negativePrompt=negative prompt text"

Text to Portrait

curl -X POST http://localhost:3000/api/text_to_portrait \
  -F "text=a girl"

Conclusion

This project demonstrates the application of multimodal AI in narrative game development, providing a comprehensive workflow and API tools to enhance game content generation. By leveraging large language models and text-to-image models, we aim to improve the efficiency and quality of game development, offering a valuable toolset for developers.