GitXplorerGitXplorer
T

uniset-snapshots

public
2 stars
1 forks
0 issues

Commits

List of commits on branch master.
Verified
e2e80c80a0fcb903b7a7aa345c0632908f77fbb5

snapshot of project "uniset", label t20240911

TThomasDickey committed 4 months ago
Verified
56a63761ec5876370e2e6a7f8051e9ff19ebabec

snapshot of project "uniset", label t20240704

TThomasDickey committed 6 months ago
Verified
83ca7172f560ba70a41c0d5037b6e2b80c7555a4

snapshot of project "uniset", label t20230925

TThomasDickey committed a year ago
Verified
dc6c2131af92cc7c2d04d8788e2a44cbbaa1f94b

snapshot of project "uniset", label t20220923

TThomasDickey committed 2 years ago
Unverified
3c3a719f8acc062dfa07f4e5ac4b7026c72b8d65

snapshot of project "uniset", label t20220316

TThomasDickey committed 3 years ago
Unverified
6f4082400da3fab3a9fc2e701cad6813efc83a55

snapshot of project "uniset", label t20220213

TThomasDickey committed 3 years ago

README

The README file for this repository.

-- $Id: README,v 1.1 2019/11/24 13:21:51 tom Exp $

Copyright 2019, Thomas E. Dickey

These scripts (and the associated data) are used for periodic updates to xterm's wcwidth.c file. A comment in the file refers to these scripts:

  • Originally added to xterm in 2000 (patch #141), there were a couple of
  • updates from Kuhn until 2005 (patch #202), renaming entrypoints and applying
  • data from Unicode.org (e.g., 3.2, 4.0, 4.1.0). The Unicode data is
  • transformed into tables in this file by a script "uniset" written by Kuhn.
  • While Kuhn implemented the original CJK variant, it was unused by xterm
  • until Jungshik Shin used it in 2002 to implement the -cjk_width command-line
  • option.
  • Kuhn added a check for the vertical forms block (double-width) in 2007;
  • other updates were derived from the Unicode.org data (release 5.0).

See 00README for Kuhn's original description of the scripts.

The uniset scripts transform the Unicode data into a table of ranges for the character-values, which (with a few special cases) gives xterm a portable method for obtaining character widths. At runtime, xterm can be told to use these tables in preference to (possibly old) system tables of character widths.

When a new release of the Unicode data is available, I get a copy of it. For instance, the last 2019 release was 12.1.0, here:

ftp://ftp.unicode.org/Public/12.1.0/ucd/UCD.zip

Unzipping that into "this" directory, and running

make clean
make

gives these chunks which can be used to replace corresponding text in xterm's wcwidth.c file:

uniset.out: static const struct interval combining[] = {

uniset_cjk.out: static const struct interval ambiguous[] = {

uniset_dbl.out: static const struct interval doublewidth[] = {

uniset_unk.out: static const struct interval unknowns[] = {

NOTE: there are other data files in this directory which are of historical interest, and are retained here because they are less accessible than before.