GitXplorerGitXplorer
a

sljit-gpu

public
5 stars
0 forks
2 issues

Commits

List of commits on branch master.
Unverified
825b04a42b2f4e541722a4e433c7e14274f2ab66

Move to sljit r313

aasuhan committed 9 years ago
Unverified
c08ef51879dcb5a6ed7151549f9c0e94e3914e1a

Set llvm_func field to the entry function when compilation is done

aasuhan committed 9 years ago
Unverified
cc651bfbf5ba1dcf882d5c89f87584b8e78ad453

Add NVVM IR to PTX backend

aasuhan committed 9 years ago
Unverified
3d89bf8ede541a4879de674c91c69832e07c91c3

Restore sljit_free_code to original signature

aasuhan committed 9 years ago
Unverified
67741ed3b0743fcc142dfda3cf0eeb1664489b33

Bring range_set_label in regexJIT back to original

aasuhan committed 9 years ago
Unverified
d0f2b149684c0b26a99b8191ecdc2bfbe54feb67

Another API fix: go back to sljit_set_label

aasuhan committed 9 years ago

README

The README file for this repository.

sljit for GPUs

Goal

Make PCRE run (well) on GPUs.

Approach

Keep the regex translator as close to the original form as possible and add a backend to sljit which can target GPUs. Since there's no backwards compatibility at the binary level for GPUs, we're stuck with LLVM, for better or worse.

Status

Most of the instructions required by the regex translator are in place and most of sljit tests pass on x64 with this backend. We're only using x64 as a validation step. The LLVM IR we generate should run fine on the GPUs except for the overflow intrinsics. We currently rely on signed / unsigned overflow intrinsics to implement the sljit flags, which are the building blocks for implementing control flow in sljit. The regex translator only relies on control flow and doesn't actually care about flags. We'll add a custom LLVM pass to recognize the control flow patterns and replace flags usage with idiomatic LLVM control flow generation in order to unblock running on GPUs. Registers are implemented as alloca's for simplicity, which means we'll need to enable the SROA optimization pass for performance. Control flow is implemented through continuations and it'll also need some optimization work for competitive performance.

This project doesn't include any parallelization changes yet (but they will land), which are crucial for performance on GPUs; current focus is on correctness.

Non-goals

Replacing (or improving) any of the current backends is a non-goal. It's highly unlikely we'll invest a lot of effort into features which are not fundamentally required by the regex engine (floating point, self-modifying code, fast calls). We also assume that the strings are already in the GPU memory and stored contiguously; using GPUs as an "offload engine" over PCI Express is a non-goal.

Contributing

Pull requests are warmly welcomed and will be promptly reviewed. Fixes and improvements to the features required by the regex engine are preferred, but any quality contributions will be accepted.