- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 55
How to build kegome v2 on web? #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The sample in the Here are the steps to follow: 1. Fix main.go to use a Korena dictionary.The working directory is ./sample/wasm/main.go package main
import (
"strings"
"syscall/js"
ko "github.com/ikawaha/kagome-dict-ko" // ← ※
"github.com/ikawaha/kagome/v2/tokenizer"
)
func igOK(s string, _ bool) string {
return s
}
func tokenize(_ js.Value, args []js.Value) interface{} {
if len(args) == 0 {
return nil
}
t, err := tokenizer.New(ko.Dict(), tokenizer.OmitBosEos()) // ← ※
if err != nil {
return nil
}
var ret []interface{}
tokens := t.Tokenize(args[0].String())
for _, v := range tokens {
//fmt.Printf("%s\t%+v%v\n", v.Surface, v.POS(), strings.Join(v.Features(), ","))
ret = append(ret, map[string]interface{}{
"word_id": v.ID,
"word_type": v.Class.String(),
"word_position": v.Start,
"surface_form": v.Surface,
"pos": strings.Join(v.POS(), ","),
"base_form": igOK(v.BaseForm()),
"reading": igOK(v.Reading()),
"pronunciation": igOK(v.Pronunciation()),
})
}
return ret
}
func registerCallbacks() {
_ = ko.Dict() // ← ※
js.Global().Set("kagome_tokenize", js.FuncOf(tokenize))
}
func main() {
c := make(chan struct{}, 0)
registerCallbacks()
println("Kagome Web Assembly Ready")
<-c
} diffdiff --git a/sample/wasm/go.mod b/sample/wasm/go.mod
index 89d4416..7fea152 100644
--- a/sample/wasm/go.mod
+++ b/sample/wasm/go.mod
@@ -1,3 +1,8 @@
module sample
go 1.16
+
+require (
+ github.com/ikawaha/kagome-dict-ko v1.1.0
+ github.com/ikawaha/kagome/v2 v2.7.0
+)
diff --git a/sample/wasm/main.go b/sample/wasm/main.go
index 6d42af1..1379a06 100644
--- a/sample/wasm/main.go
+++ b/sample/wasm/main.go
@@ -4,7 +4,7 @@ import (
"strings"
"syscall/js"
- "github.com/ikawaha/kagome-dict/ipa"
+ ko "github.com/ikawaha/kagome-dict-ko"
"github.com/ikawaha/kagome/v2/tokenizer"
)
@@ -16,7 +16,7 @@ func tokenize(_ js.Value, args []js.Value) interface{} {
if len(args) == 0 {
return nil
}
- t, err := tokenizer.New(ipa.Dict(), tokenizer.OmitBosEos())
+ t, err := tokenizer.New(ko.Dict(), tokenizer.OmitBosEos())
if err != nil {
return nil
}
@@ -39,7 +39,7 @@ func tokenize(_ js.Value, args []js.Value) interface{} {
}
func registerCallbacks() {
- _ = ipa.Dict()
+ _ = ko.Dict()
js.Global().Set("kagome_tokenize", js.FuncOf(tokenize))
} 2. Build WASM and prepare WASM libs.Build main.go GOOS=js GOARCH=wasm go build -trimpath -o kagome.wasm main.go Copy wasm_exec.js to the current directory. cp $(go env GOROOT)/misc/wasm_exec.js . 3. Serve HTTP server and access it.Prepare a simple script to set up an HTTP server on your local host. server.py # -*- coding: utf-8 -*-
import http.server
from http.server import HTTPServer, BaseHTTPRequestHandler
import socketserver
PORT = 8080
Handler = http.server.SimpleHTTPRequestHandler
Handler.extensions_map={
'.manifest': 'text/cache-manifest',
'.html': 'text/html',
'.png': 'image/png',
'.jpg': 'image/jpg',
'.svg': 'image/svg+xml',
'.css': 'text/css',
'.js': 'application/x-javascript',
'.wasm': 'application/wasm',
'': 'application/octet-stream', # Default
}
httpd = socketserver.TCPServer(("", PORT), Handler)
print("serving at port", PORT)
httpd.serve_forever() Serve a HTTP server. python3 server.py Access it! This sample only supports Korean analysis. To support both Japanese and Korean, you would need to prepare Japanese and Korean tokenizers and switch between them. However, loading both dictionaries may require too much memory to run in a web browser. I hope this will help. P.S. |
The build was successful. most function is working. |
A sample file of the user dictionary can be found at sample/userdict.txt. This sample is in Japanese, but the same applies to Korean. A simple morpheme specification is in the following format: for example,
(I can't read or write Korean, so the Korean part above may not be appropriate. :p ) In your program, you can use your user dictionary as follows: package main
import (
"fmt"
ko "github.com/ikawaha/kagome-dict-ko"
"github.com/ikawaha/kagome-dict/dict"
"github.com/ikawaha/kagome/v2/tokenizer"
)
func main() {
// load a user dictionary.
udic, err := dict.NewUserDict("userdict.txt")
if err != nil {
panic(err)
}
// specify the user dictionary.
t, err := tokenizer.New(ko.Dict(), tokenizer.UserDict(udic), tokenizer.OmitBosEos())
if err != nil {
panic(err)
}
tokens := t.Tokenize("두부냄비")
for _, token := range tokens {
fmt.Printf("%s\t%v\n", token.Surface, token.Features())
}
} Output:
See also: https://zenn.dev/ikawaha/books/kagome-v2-japanese-tokenizer/viewer/user_dictionary |
I want to host kagome v2 for korean and Japanese tokenizer on git page.
I know the main file is in sample/demo.html...
Looking at the example, "dic" is in Japanese and Chinese.
I want to change the example to Japanese and Korean.
How to do it?
The text was updated successfully, but these errors were encountered: