GitXplorerGitXplorer
a

sljit-gpu

public
5 stars
0 forks
2 issues

Commits

List of commits on branch master.
Unverified
f654dce9120bb8f17d31149a305efb9a3e63176e

Allow sljit_emit_label before sljit_emit_enter

aasuhan committed 9 years ago
Unverified
d5d5f2598bc48ee3277eafad18379a9ef54960a0

Create the IR builder in sljit_create_compiler

aasuhan committed 9 years ago
Unverified
b590081bc756d44b253dc973e34827b59b5c5eea

Update copyright notice

aasuhan committed 9 years ago
Unverified
622aef1635b0473dd04424e9a5778fbcfbaaf50c

Fix test with existing backend

aasuhan committed 9 years ago
Unverified
f5d2dc8956a08a236785f6cb0e585313886d7102

Converge to the canonical sljit API

aasuhan committed 9 years ago
Unverified
5eeb3b54ca677628256ce2b95ee02a947754646b

Update README.md

aasuhan committed 9 years ago

README

The README file for this repository.

sljit for GPUs

Goal

Make PCRE run (well) on GPUs.

Approach

Keep the regex translator as close to the original form as possible and add a backend to sljit which can target GPUs. Since there's no backwards compatibility at the binary level for GPUs, we're stuck with LLVM, for better or worse.

Status

Most of the instructions required by the regex translator are in place and most of sljit tests pass on x64 with this backend. We're only using x64 as a validation step. The LLVM IR we generate should run fine on the GPUs except for the overflow intrinsics. We currently rely on signed / unsigned overflow intrinsics to implement the sljit flags, which are the building blocks for implementing control flow in sljit. The regex translator only relies on control flow and doesn't actually care about flags. We'll add a custom LLVM pass to recognize the control flow patterns and replace flags usage with idiomatic LLVM control flow generation in order to unblock running on GPUs. Registers are implemented as alloca's for simplicity, which means we'll need to enable the SROA optimization pass for performance. Control flow is implemented through continuations and it'll also need some optimization work for competitive performance.

This project doesn't include any parallelization changes yet (but they will land), which are crucial for performance on GPUs; current focus is on correctness.

Non-goals

Replacing (or improving) any of the current backends is a non-goal. It's highly unlikely we'll invest a lot of effort into features which are not fundamentally required by the regex engine (floating point, self-modifying code, fast calls). We also assume that the strings are already in the GPU memory and stored contiguously; using GPUs as an "offload engine" over PCI Express is a non-goal.

Contributing

Pull requests are warmly welcomed and will be promptly reviewed. Fixes and improvements to the features required by the regex engine are preferred, but any quality contributions will be accepted.