GitXplorerGitXplorer
d

perceus-for-ocaml-notes

public
3 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
7e1ddcd5314baa0a337e0c0f09c9e3d9994710d7

Update status-fall2022.md

11ntEgr8 committed 2 years ago
Verified
9b468ef0f6d533791b194a67c302e0824427ed52

Create status-fall2022.md

11ntEgr8 committed 2 years ago
Verified
31f195d4251f7b22dbc225dd5df7c602079acce2

Update closures.md

11ntEgr8 committed 2 years ago
Verified
9e54853df6cdc3cfed0f51ca95fc4b2e43e3d24f

Update README.md

11ntEgr8 committed 2 years ago
Verified
f320db6855117e501fa9c97b48dc6b0dcfc591d8

Update closures.md

11ntEgr8 committed 2 years ago
Verified
3d0f943359405c68c1efca787683526b1251266c

Update closures.md

11ntEgr8 committed 2 years ago

README

The README file for this repository.

Perceus for OCaml (notes)

2022-10-03

  • Created repo
  • Edited dup/drop assembly to use less instructions
  • Discovered error while building system from scratch
    • compat-32 is set during the build which causes our system to crash since we don't support ref-counting for 32-bit systems
    • Wosize_hd fails as a result
    • possible fix is to disable compat-32 for now

2022-09-18

  • Figure out what code Koka generates
    • Looked at src/refcount.c.s
  • Added a new primitive to read refcount
    • Remember to first run make coldstart before make coreall opt-core
    • Currently it just returns the entire object header
  • dup/drop asm codegen:
    • Mark rax, r10, r11 as destroyed
    • In asmcomp/amd64/emit.mlp, move the argument into rax and emit a call into an assembly routine defined in runtime/amd64.s
    • runtime/amd64.s has two new functions: caml_obj_rc_dup, caml_obj_rc_drop
    • Note: both routines check if the object is a block

2022-09-10

  • https://github.com/ocaml/ocaml/blob/4.14.0/asmcomp/selectgen.ml
    • instruction selection (cmm to machine-independent assembly)
  • status of adding dup/drop instruction:
    • parsetree: done
    • typedtree: done (may need to edit typechecking portion)
    • lambda: done
    • clambda: done
    • flambda: ignored
    • bytecode gen: ignored
    • cmm: done
    • selectgen: done
  • some concerns:
    • not sure what effect and coeffect mean in selectgen
      • selected arbitrary effect for dup/drop, will have to stare at it more to see if it breaks things
      • CSEgen has certain operations marked as "handled specially"; couldn't find the code that handles Ialloc and Ipoll for amd64
  • played around with register allocation
    • need to understand how the calling convention is implemented
    • tried this:
      • marked r10, r11 as destroyed in destroyed_at_oper
      • used r10, r11 destructively in Idup, Idrop
    • expected:
      • codegen should save r10, r11 on the stack
    • actual:
      • no saving is done?

2022-09-06

Plan of attack:

  1. Make it possible to write dup/drop as OCaml functions
dup : 'a -> () 

drop : 'a -> () 

(later variants: like guaranteed pointer drop, drop_ptr : 'a -> ())

  1. These functions should become real instructions Idup / Idrop. Extend the destroyed_by_oper function (https://github.com/ocaml/ocaml/blob/trunk/asmcomp/amd64/proc.ml) and implement the emitted assembly.

  2. Write a map(xs,inc) example (or inlined sum) with that

  3. Change the assembly generation: a. Ialloc should become a call to "rc_fast_malloc"; implement malloc_fast in assembly (https://github.com/ocaml/ocaml/blob/4.14.0/runtime/amd64.S)

b. That assembly will just save/restore the needed registers and call "malloc" and then put the result in r15. (this is the inefficient version 😊

c. Emit assembly for Idup/Idrop, each calls "rc_checked_dup", "rc_checked_drop". Checked dup does nothing, checked drop will call "free" if the rc was 0 (saving / restoring registers)

d. There are multiple layers: (A) the assembly instructions generated for each instruction. (for example, drop: "r0 <- p->refcount; if (r0 <= 0) call checked_drop else p->refcount = r0 – 1". (actually, check first if it is a pointer?) (B) the fast "rc_xxx" assembly routines that use a "all callee save" convention. (like caml_gc does now). And (C) actual C code routines where we need to save the right registers.

  1. Add refcount to object headers in the runtime, and turn off garbage collection.

Instead of :

+--------+-------+-----+ 
| wosize | color | tag | 
+--------+-------+-----+ 
 63    10 9     8 7   0 

Use:

+--------+--------+-------+-----+ 
| rc     | wosize | color | tag | 
+--------+--------+-------+-----+ 

 63    31 30    10 9     8 7   0 

(probably use the profile info of 32 bits)

  1. Make this work reliably ; add the edge case, atomics etc.

  2. Write "fast" malloc/free unwrappings and link to mimalloc.

  3. Do some initial perf measurements – this will tell if it can work at all.

Keep calling convention clear:

We may reserve more registers for our fast Idup/Idrop/Ialloc by modifying https://github.com/ocaml/ocaml/blob/trunk/asmcomp/amd64/proc.ml

2022/09/05

  • Build without boostrapping
    • First, before making any change to the compiler, get a stable build
      • make world
    • Now, add your changes
    • To build the native compiler (without bootstrapping)
      • make coreall opt-core
    • Using the native compiler
      • ./boot/ocamlrun ./ocamlopt -I ./stdlib <prog>

2022/08/25

  • Goal: Emit code to invoke custom C function for allocation

    • Naively calling custom C function by editing emit.mlp doesn't seem like its going to work out. I need to understand exactly what registers to save across the C-OCaml interface
    • Here's what I tried (it doesn't compile, segfaults):
      • Add a file called refcnt.c that has a custom routine that prints something
      • Update amd64.S and add an assembly stub that invokes this routine (doesn't do calling convention stuff)
      • Edit amd64/emit.mlp to emit a call to this stub whenever an allocation instruction is found
  • Got something simple to work

    • Save/restore ALL registers
    • Make the call
    • Note: printing something to stdout in the custom routine breaks the build system. It messes with the generation of the .depend file. Use stderr :)

2022/08/24

  • Compilation pipeline
    • Parsetree -> Typedtree -> Lambda -> CLambda -> Cmm -> Mach -> code gen
    • Lose info on the type of block when lowering from CLambda to Cmm
      • So, we cannot insert dups/drops in Cmm (or it will be a little more complicated)
    • Allocation in a separate heap
      • https://github.com/ocaml/ocaml/blob/4.14.0/runtime/caml/address_class.h#L16
        • Classification of addresses
        • Explains page table, naked pointers
      • Proposal:
        • Similar to kk_lib, use malloc/mimalloc to create and manage a separate heap (not managed by GC)
        • Compile allocation instruction to allocate in this new heap
        • Compile drop instruction (which needs to be added) to deallocate in this new heap
  • Changing the obj header format

2022/08/22

  • how are the gc assembly snippets generated
    • Does it merge allocations?

    • A special register which saves the allocation pointer is reserved for each architecture

      • r15 for amd64
      • Grepping for "allocation pointer" in the asmcomp directory will list the others
      • Declared in asmcomp/{architecture}/proc.ml
    • https://github.com/ocaml/ocaml/blob/trunk/asmcomp/amd64/emit.mlp#L614-L643

      • Generation of GC assembly for amd64
      • n is not a constant; depends on the value set in the Ialloc instruction
      • GC is only called at allocation and polling sites
        • Polling site determined by the Ipoll instruction
    • Calling C from OCaml (links specific to amd64)

      • PREPARE_FOR_C_CALL + C_call
      • Stack switching (relevant only for 5.0)
      • General structure
        • Save regs (typically alloc ptr and any scratch regs you want to clobber)
        • Switch to C stack
        • Make the call
        • Switch to OCaml stack
        • Restore regs
      • Issue: ref counting functions will be invoked frequently, making a C call the naïve way is expensive!
        • Compile ref counting functions to use a subset of registers, and then only save those regs before making the C call
        • Inline ref count instructions?

2022/08/10

2022/08/09

  • Read the "Compiler and Runtime System" section of Real World OCaml
  • Memory representation
    • Blocks vs values, pointer tagging
    • Header format
      • Block size: 22 or 54 bit
      • Color: 2 bit
      • Tag: 8 bit
    • Opts
      • Float-array
      • Variants without payload
    • Defs: runtime/caml/mlvalues.h
  • IR

2022/08/08

Test program (lifted from https://ocaml.org/docs/garbage-collection)

let rec iterate r x_init i = 
  if i = 1 then x_init 
  else 
    let x = iterate r x_init (i - 1) in 
    r *. x *. (1.0 -. x) 

let () = 
  Random.self_init (); 
  for x = 0 to 100 do 
    let r = 4.0 *. float_of_int x /. 640.0 in 
    for i = 0 to 39 do 
      let x_init = Random.float 1.0 in 
      let x_final = iterate r x_init 500 in 
      let y = int_of_float (x_final *. 480.) in 
      () 
    done 
  done; 

  Gc.print_stat stdout