Fix potential conversion error

ayaka14732 · ayaka14732 · commit 979419e3d104 · 2020-10-04T23:16:18.000+08:00
And update opencc-data to 1.0.5
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -25,8 +25,15 @@ jobs:
       run: |
         build/prepare.sh
         python build/main.py
-    - name: Upload artifact
+    - name: Upload FanWunMing
       uses: actions/upload-artifact@v2
       with:
-        name: Font files
-        path: output/*.ttf
+        name: FanWunMing
+        path: |
+          output/FanWunMing-*.ttf
+          !output/FanWunMing-TW-*.ttf
+    - name: Upload FanWunMing-TW
+      uses: actions/upload-artifact@v2
+      with:
+        name: FanWunMing-TW
+        path: output/FanWunMing-TW-*.ttf
diff --git a/LICENSE b/LICENSE
@@ -1,35 +1,28 @@
-Copyright 2020 Ayaka Mikazuki (https://ayaka.shn.hk/).
+This Font Software is licensed under the SIL Open Font License,
+Version 1.1.
 
-
-Copyright 2014-2019 Adobe (http://www.adobe.com/), with Reserved Font
-Name 'Source'. Source is a trademark of Adobe in the United States
-and/or other countries.
-
-
-This Font Software is licensed under the SIL Open Font License, Version 1.1.
 This license is copied below, and is also available with a FAQ at:
 http://scripts.sil.org/OFL
 
-
 -----------------------------------------------------------
 SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007
 -----------------------------------------------------------
 
 PREAMBLE
 The goals of the Open Font License (OFL) are to stimulate worldwide
-development of collaborative font projects, to support the font creation
-efforts of academic and linguistic communities, and to provide a free and
-open framework in which fonts may be shared and improved in partnership
-with others.
+development of collaborative font projects, to support the font
+creation efforts of academic and linguistic communities, and to
+provide a free and open framework in which fonts may be shared and
+improved in partnership with others.
 
 The OFL allows the licensed fonts to be used, studied, modified and
 redistributed freely as long as they are not sold by themselves. The
-fonts, including any derivative works, can be bundled, embedded, 
+fonts, including any derivative works, can be bundled, embedded,
 redistributed and/or sold with any software provided that any reserved
 names are not used by derivative works. The fonts and derivatives,
 however, cannot be released under any other type of license. The
-requirement for fonts to remain under this license does not apply
-to any document created using the fonts or their derivatives.
+requirement for fonts to remain under this license does not apply to
+any document created using the fonts or their derivatives.
 
 DEFINITIONS
 "Font Software" refers to the set of files released by the Copyright
@@ -39,25 +32,25 @@ include source files, build scripts and documentation.
 "Reserved Font Name" refers to any names specified as such after the
 copyright statement(s).
 
-"Original Version" refers to the collection of Font Software components as
-distributed by the Copyright Holder(s).
+"Original Version" refers to the collection of Font Software
+components as distributed by the Copyright Holder(s).
 
-"Modified Version" refers to any derivative made by adding to, deleting,
-or substituting -- in part or in whole -- any of the components of the
-Original Version, by changing formats or by porting the Font Software to a
-new environment.
+"Modified Version" refers to any derivative made by adding to,
+deleting, or substituting -- in part or in whole -- any of the
+components of the Original Version, by changing formats or by porting
+the Font Software to a new environment.
 
 "Author" refers to any designer, engineer, programmer, technical
 writer or other person who contributed to the Font Software.
 
 PERMISSION & CONDITIONS
 Permission is hereby granted, free of charge, to any person obtaining
-a copy of the Font Software, to use, study, copy, merge, embed, modify,
-redistribute, and sell modified and unmodified copies of the Font
-Software, subject to the following conditions:
+a copy of the Font Software, to use, study, copy, merge, embed,
+modify, redistribute, and sell modified and unmodified copies of the
+Font Software, subject to the following conditions:
 
-1) Neither the Font Software nor any of its individual components,
-in Original or Modified Versions, may be sold by itself.
+1) Neither the Font Software nor any of its individual components, in
+Original or Modified Versions, may be sold by itself.
 
 2) Original or Modified Versions of the Font Software may be bundled,
 redistributed and/or sold with any software, provided that each copy
@@ -67,9 +60,9 @@ in the appropriate machine-readable metadata fields within text or
 binary files as long as those fields can be easily viewed by the user.
 
 3) No Modified Version of the Font Software may use the Reserved Font
-Name(s) unless explicit written permission is granted by the corresponding
-Copyright Holder. This restriction only applies to the primary font name as
-presented to the users.
+Name(s) unless explicit written permission is granted by the
+corresponding Copyright Holder. This restriction only applies to the
+primary font name as presented to the users.
 
 4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font
 Software shall not be used to promote, endorse or advertise any
@@ -80,8 +73,8 @@ permission.
 5) The Font Software, modified or unmodified, in part or in whole,
 must be distributed entirely under this license, and must not be
 distributed under any other license. The requirement for fonts to
-remain under this license does not apply to any document created
-using the Font Software.
+remain under this license does not apply to any document created using
+the Font Software.
 
 TERMINATION
 This license becomes null and void if any of the above conditions are
diff --git a/build/main.py b/build/main.py
@@ -1,29 +1,51 @@
 from collections import defaultdict
 from datetime import date
 from glob import glob
-from itertools import chain
+from itertools import chain, groupby
 import json
 from opencc import OpenCC
 import os
 import subprocess
 
-FONT_VERSION = 1.003
+FONT_VERSION = 1.004
 
 # Define the max entries size in a subtable.
 # We define a number that is small enough here, so that the entries will not exceed
 # the size limit.
 SUBTABLE_MAX_COUNT = 4000
 
-# This function is used to split a GSUB table into several subtables.
-def grouper(lst, n, start=0):
+# The following two functions are used to split a GSUB table into several subtables.
+def grouper(iterable, n=SUBTABLE_MAX_COUNT):
 	'''
 	Split a list into chunks of size n.
-	>>> list(grouper([1, 2, 3, 4, 5], 2))
+	>>> list(grouper([1, 2, 3, 4, 5], n=2))
 	[[1, 2], [3, 4], [5]]
+	>>> list(grouper([1, 2, 3, 4, 5, 6], n=2))
+	[[1, 2], [3, 4], [5, 6]]
 	'''
-	while start < len(lst):
-		yield lst[start:start+n]
-		start += n
+	iterator = iter(iterable)
+	while True:
+		lst = []
+		try:
+			for _ in range(n):
+				lst.append(next(iterator))
+		except StopIteration:
+			if lst:
+				yield lst
+			break
+		yield lst
+
+def grouper2(iterable, n=SUBTABLE_MAX_COUNT, key=None):
+	'''
+	Split a iterator into chunks of maximum size n by the given key.
+	>>> list(grouper2(['AA', 'BBB', 'CCC', 'DDD', 'EE'], n=3, key=len))
+	[['AA'], ['BBB', 'CCC', 'DDD'], ['EE']]
+	>>> list(grouper2(['AA', 'BBB', 'CCC', 'DDD', 'EE'], n=2, key=len))
+	[['AA'], ['BBB', 'CCC'], ['DDD'], ['EE']]
+	'''
+	for _, vx in groupby(iterable, key=key):
+		for vs in grouper(vx, n):
+			yield vs
 
 # An opentype font can hold at most 65535 glyphs.
 MAX_GLYPH_COUNT = 65535
@@ -142,7 +164,8 @@ def build_opencc_word_table(codepoints_tonggui, codepoints_font, twp=False):
 					codepoints.update(codepoints_v)
 
 	# Sort from longest to shortest to force longest match
-	return sorted(((k, v) for k, v in entries.items()), key=lambda k_v: (-len(k_v[0]), k_v[0])), codepoints
+	conversion_item_len = lambda conversion_item: len(conversion_item[0])
+	return sorted(entries.items(), key=conversion_item_len, reverse=True), codepoints
 
 def disassociate_codepoint_and_glyph_name(obj, codepoint, glyph_name):
 	'''
@@ -275,29 +298,34 @@ def insert_empty_feature(obj, feature_name):
 	obj['GSUB']['features'][feature_name] = []
 
 def create_word2pseu_table(obj, feature_name, conversions):
+	conversion_item_len = lambda conversion_item: len(conversion_item[0])
+	subtables = [{'substitutions': [{'from': glyph_names_k, 'to': pseudo_glyph_name} for glyph_names_k, pseudo_glyph_name in subtable]} for subtable in grouper2(conversions, key=conversion_item_len)]  # {from: [a1, a2, ...], to: b}
 	obj['GSUB']['features'][feature_name].append('word2pseu')
 	obj['GSUB']['lookups']['word2pseu'] = {
 		'type': 'gsub_ligature',
 		'flags': {},
-		'subtables': [{'substitutions': subtable} for subtable in grouper(conversions, SUBTABLE_MAX_COUNT)]
+		'subtables': subtables
 	}
 	obj['GSUB']['lookupOrder'].append('word2pseu')
 
 def create_char2char_table(obj, feature_name, conversions):
+	subtables = [{k: v for k, v in subtable} for subtable in grouper(conversions)]
 	obj['GSUB']['features'][feature_name].append('char2char')
 	obj['GSUB']['lookups']['char2char'] = {
 		'type': 'gsub_single',
 		'flags': {},
-		'subtables': [{k: v for k, v in subtable} for subtable in grouper(conversions, SUBTABLE_MAX_COUNT)]
+		'subtables': subtables
 	}
 	obj['GSUB']['lookupOrder'].append('char2char')
 
 def create_pseu2word_table(obj, feature_name, conversions):
+	conversion_item_len = lambda conversion_item: len(conversion_item[1])
+	subtables = [{k: v for k, v in subtable} for subtable in grouper2(conversions, key=conversion_item_len)]
 	obj['GSUB']['features'][feature_name].append('pseu2word')
 	obj['GSUB']['lookups']['pseu2word'] = {
 		'type': 'gsub_multiple',
 		'flags': {},
-		'subtables': [{k: v for k, v in subtable} for subtable in grouper(conversions, SUBTABLE_MAX_COUNT)]
+		'subtables': subtables
 	}
 	obj['GSUB']['lookupOrder'].append('pseu2word')
 
@@ -341,6 +369,8 @@ def build_dest_path_from_src_path(path, twp=False):
 def go(path, twp=False):
 	font = load_font(path, ttc_index=0)
 
+	# Determine the final Unicode range by the original font and OpenCC convert tables
+
 	codepoints_font = build_codepoints_font(font)
 	codepoints_tonggui = build_codepoints_tonggui() & codepoints_font
 
@@ -358,6 +388,8 @@ def go(path, twp=False):
 	available_glyph_count = MAX_GLYPH_COUNT - get_glyph_count(font)
 	assert available_glyph_count >= len(entries_word)
 
+	# Build glyph substitution tables and insert into font
+
 	word2pseu_table = []
 	char2char_table = []
 	pseu2word_table = []
@@ -367,7 +399,7 @@ def go(path, twp=False):
 		glyph_names_k = [codepoint_to_glyph_name(font, codepoint) for codepoint in codepoints_k]
 		glyph_names_v = [codepoint_to_glyph_name(font, codepoint) for codepoint in codepoints_v]
 		insert_empty_glyph(font, pseudo_glyph_name)
-		word2pseu_table.append({'from': glyph_names_k, 'to': pseudo_glyph_name})
+		word2pseu_table.append((glyph_names_k, pseudo_glyph_name))
 		pseu2word_table.append((pseudo_glyph_name, glyph_names_v))
 
 	for codepoint_k, codepoint_v in entries_char:
diff --git a/build/prepare.sh b/build/prepare.sh
@@ -1,13 +1,8 @@
 #!/bin/sh
-mkdir -p output
-wget -q -nc -P cache https://github.com/ButTaiwan/genyo-font/releases/download/v1.501/GenYoMin.zip
-wget -q -nc -P cache https://cdn.jsdelivr.net/npm/opencc-data@1.0.4/data/STCharacters.txt
-wget -q -nc -P cache https://cdn.jsdelivr.net/npm/opencc-data@1.0.4/data/STPhrases.txt
-wget -q -nc -P cache https://cdn.jsdelivr.net/npm/opencc-data@1.0.4/data/TWPhrasesIT.txt
-wget -q -nc -P cache https://cdn.jsdelivr.net/npm/opencc-data@1.0.4/data/TWPhrasesName.txt
-wget -q -nc -P cache https://cdn.jsdelivr.net/npm/opencc-data@1.0.4/data/TWPhrasesOther.txt
-wget -q -nc -P cache https://cdn.jsdelivr.net/npm/opencc-data@1.0.4/data/TWVariants.txt
-cat cache/TWPhrasesIT.txt cache/TWPhrasesName.txt cache/TWPhrasesOther.txt > cache/TWPhrases.txt
-wget -q -nc -P cache https://gist.githubusercontent.com/fatum12/941a10f31ac1ad48ccbc/raw/59d7e29b307ae3439317a975ef390cd729f9bc17/ttc2ttf.pe
-wget -q -nc -P cache https://raw.githubusercontent.com/rime-aca/character_set/e7d009a8a185a83f62ad2c903565b8bb85719221/通用規範漢字表.txt
-unzip -q -n -d cache cache/GenYoMin.zip "*.ttc"
+mkdir -p cache output
+cd cache
+curl -LsSO https://github.com/ButTaiwan/genyo-font/releases/download/v1.501/GenYoMin.zip
+curl -LsSZ --remote-name-all https://cdn.jsdelivr.net/npm/opencc-data@1.0.5/data/{STCharacters.txt,STPhrases.txt,TWPhrasesIT.txt,TWPhrasesName.txt,TWPhrasesOther.txt,TWVariants.txt}
+curl -LsSo 通用規範漢字表.txt https://raw.githubusercontent.com/rime-aca/character_set/e7d009a8a185a83f62ad2c903565b8bb85719221/%E9%80%9A%E7%94%A8%E8%A6%8F%E7%AF%84%E6%BC%A2%E5%AD%97%E8%A1%A8.txt
+cat TWPhrasesIT.txt TWPhrasesName.txt TWPhrasesOther.txt > TWPhrases.txt
+unzip -q -n GenYoMin.zip "*.ttc"