詳細検索結果

自然言語処理

橋田浩一

電気学会誌
2001年 121 巻 3 号 195-198
発行日: 2001/03/01
公開日: 2008/04/17

DOI https://doi.org/10.1541/ieejjournal.121.195

ジャーナルフリー

PDF形式でダウンロード (1197K)
自然言語処理

野村浩郷, 島津明

コンピュータソフトウェア
1986年 3 巻 4 号 4_409-4_410
発行日: 1986/10/15
公開日: 2018/11/05

DOI https://doi.org/10.11309/jssst.3.4_409

ジャーナルフリー

抄録を表示する抄録を非表示にする

抄録全体を表示

PDF形式でダウンロード (203K)
Prefix Alignment for Training Simultaneous Machine Translation

Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

自然言語処理

2024年 31 巻 1 号 79-104
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.79

ジャーナルフリー

抄録を表示する抄録を非表示にする

Simultaneous translation is a task that starts translation even before the speaker has finished speaking. This study focuses on prefix-to-prefix translation and proposes a method to align prefixes in a bilingual sentence pair iteratively to train a machine translation model to work with prefix-to-prefix. In the experiments, the proposed method demonstrated higher BLEU than those of the baseline methods in low latency ranges on the IWSLT simultaneous translation benchmark. However, the proposed method degraded the performance in high latency ranges in the English-to-Japanese experiments; thus, we analyzed it in length ratios and prefix boundary prediction accuracies. The obtained results suggested that the degraded performance was due to the large word order difference between English and Japanese.

抄録全体を表示

PDF形式でダウンロード (1305K)
クイズコンペティションの結果分析から見た日本語質問応答の到達点と課題

有山知希, 鈴木潤, 鈴木正敏, 田中涼太, 赤間怜奈, 西田京介

自然言語処理

2024年 31 巻 1 号 47-78
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.47

ジャーナルフリー

抄録を表示する抄録を非表示にする

質問応答は，
自然言語処理
における重要な研究テーマの一つである．近年の深層学習技術の発達と言語資源の充実により，質問応答技術は飛躍的な発展を遂げている．しかし，これらの研究は英語を対象としたものがほとんどであり，現状，日本語での質問応答に関する研究はあまり活発には行われていない．この背景を受けて，我々は日本語での質問応答研究を促進するため，日本語のクイズを題材とした質問応答のコンペティション「AI 王」を企画し，これまでに計 3 回実施してきた．本論文では，日本語の質問応答技術における現在の到達点と課題を明らかにすることを目標として，使用したクイズ問題と提出された質問応答システム，さらに比較対象として大規模言語モデルを用いた分析を行い，その結果を報告する．

抄録全体を表示

PDF形式でダウンロード (550K)
編集後記・原稿執筆案内・編集スケジュール・統計情報・学会案内

自然言語処理

2024年 31 巻 1 号 310-325
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.310

ジャーナルフリー

PDF形式でダウンロード (519K)
🚀 NLPコロキウム

丹羽彩奈, 横井祥, 高山隼矢, 斉藤いつみ

自然言語処理

2024年 31 巻 1 号 300-309
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.300

ジャーナルフリー

PDF形式でダウンロード (476K)
Bidirectional Transformer Reranker for Grammatical Error Correction

Ying Zhang, Hidetaka Kamigaito, Manabu Okumura

自然言語処理

2024年 31 巻 1 号 3-46
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.3

ジャーナルフリー

抄録を表示する抄録を非表示にする

Pre-trained sequence-to-sequence (seq2seq) models have achieved state-of-the-art results in the grammatical error correction tasks. However, these models are plagued by prediction bias owing to their unidirectional decoding. Thus, this study proposed a bidirectional transformer reranker (BTR) that re-estimates the probability of each candidate sentence generated by the pre-trained seq2seq model. The BTR preserves the seq2seq-style transformer architecture but utilizes a BERT-style self-attention mechanism in the decoder to compute the probability of each target token using masked language modeling to capture bidirectional representations from the target context. To guide the reranking process, the BTR adopted negative sampling in the objective function to minimize the unlikelihood. During inference, the BTR yielded the final results after comparing the reranked top-1 results with the original ones using an acceptance threshold λ. Experimental results showed that, when reranking candidates from a pre-trained seq2seq model, the T5-base, the BTR on top of T5-base yielded scores of 65.47 and 71.27 F_0.5 on the CoNLL-14 and building educational applications 2019 (BEA) test sets, respectively, and yielded 59.52 GLEU score on the JFLEG corpus, with improvements of 0.36, 0.76, and 0.48 points compared with the original T5-base. Furthermore, when reranking candidates from T5-large, the BTR on top of T5-base improved the original T5-large by 0.26 on the BEA test set.

抄録全体を表示

PDF形式でダウンロード (911K)
類似言語における ChatGPT 使用にまつわる諸問題：マレー語とインドネシア語の事例

野元裕樹

自然言語処理

2024年 31 巻 1 号 294-299
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.294

ジャーナルフリー

PDF形式でダウンロード (264K)
知識グラフ補完のためのモデル予測に基づくサブサンプリング

馮昕璨

自然言語処理

2024年 31 巻 1 号 287-293
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.287

ジャーナルフリー

PDF形式でダウンロード (683K)
Interpreting Languages with Bits

Yiran Wang, Taro Watanabe, Masao Utiyama, Yuji Matsumoto

自然言語処理

2024年 31 巻 1 号 280-286
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.280

ジャーナルフリー

PDF形式でダウンロード (127K)
LLM-jp: 日本語に強い大規模言語モデルの研究開発を行う組織横断プロジェクト

河原大輔, 空閑洋平, 黒橋禎夫, 鈴木潤, 宮尾祐介

自然言語処理

2024年 31 巻 1 号 266-279
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.266

ジャーナルフリー

PDF形式でダウンロード (1494K)
Focused Prefix Tuning for Controllable Text Generation

Congda Ma, Tianyu Zhao, Makoto Shing, Kei Sawada, Manabu Okumura

自然言語処理

2024年 31 巻 1 号 250-265
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.250

ジャーナルフリー

抄録を表示する抄録を非表示にする

In a controllable text generation dataset, unannotated attributes may provide irrelevant learning signals to models that use them for training, thereby degrading their performance. We propose focused prefix tuning(FPT) to mitigate this problem and enable control to focus on the desired attribute. Experimental results show that FPT can achieve better control accuracy and text fluency than baseline models in single-attribute control tasks. In multi-attribute control tasks, FPT achieves control accuracy comparable to that of the state-of-the-art approach while maintaining the flexibility to control new attributes without retraining existing models.

抄録全体を表示

PDF形式でダウンロード (455K)
語りの傾聴において不同意を示す応答の生成

伊藤滉一朗, 村田匡輝, 大野誠寛, 松原茂樹

自然言語処理

2024年 31 巻 1 号 212-249
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.212

ジャーナルフリー

抄録を表示する抄録を非表示にする

コミュニケーションロボットなどの会話エージェントが語りを聴く役割を担うことが期待されている．これらが聴き手として認められるには，語りを傾聴していることを語り手に伝達する機能を備える必要がある．このための明示的な手段は語りに応答することであり，傾聴を示す目的で語りに応答する発話，すなわち傾聴応答の表出が有力である．語りの傾聴では，語り手の発話を受容することが聴き手の基本的な応答方略となる．しかし，語りには，自虐や謙遜などの発話が含まれることがある．この場合，語り手の発話に同意しないことを示す応答，すなわち，不同意応答を確実に表出できることが求められる．本論文では，語りの傾聴において不同意応答を適切に生成することの実現性を示す．そのために，本研究ではまず，時間制約のない環境で語りデータに不同意応答のタイミングと表現をタグ付けする方式を定めた．作成したコーパスを用いて，不同意応答タイミングを網羅的に，不同意応答表現を安定的にタグ付けできることを検証する．続いて，事前学習済みの Transformer ベースのモデルに基づく，不同意応答タイミングの検出手法，及び，不同意応答表現への分類手法を実装し，実験により応答コーパスを用いた不同意応答生成の実現性を検証した．

抄録全体を表示

PDF形式でダウンロード (3233K)
A Table Question Alignment based Cell-Selection Method for Table-Text QA

Jian Wu, Yicheng Xu, Börje F. Karlsson, Manabu Okumura

自然言語処理

2024年 31 巻 1 号 189-211
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.189

ジャーナルフリー

抄録を表示する抄録を非表示にする

Hybrid Question-Answering (HQA), which targets reasoning over tables and passages linked from table cells, has witnessed significant research in recent years. A common challenge in HQA and other passage-table QA datasets is that it is generally unrealistic to iterate over all table rows, columns, and linked passages to retrieve evidence. Such a challenge made it difficult for previous studies to show their reasoning ability in retrieving answers. To bridge this gap, we propose a novel Table-alignment-based Cell-selection and Reasoning model (TACR) for hybrid text and table QA, evaluated on the HybridQA and WikiTableQuestions datasets. In evidence retrieval, we design a table-question-alignment enhanced cell-selection method to retrieve fine-grained evidence. In answer reasoning, we incorporate a QA module that treats the row containing selected cells as context. Experimental results over the HybridQA and WikiTableQuestions (WTQ) datasets show that TACR achieves state-of-the-art results on cell selection and outperforms fine-grained evidence retrieval baselines on HybridQA, while achieving competitive performance on WTQ. We also conducted a detailed analysis to demonstrate that being able to align questions to tables in the cell-selection stage can result in important gains from experiments of over 90% table row and column selection accuracy, meanwhile also improving output explainability.

抄録全体を表示

PDF形式でダウンロード (899K)
DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation

Haiyue Song, Zhuoyuan Mao, Raj Dabre, Chenhui Chu, Sadao Kurohashi

自然言語処理

2024年 31 巻 1 号 155-188
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.155

ジャーナルフリー

抄録を表示する抄録を非表示にする

In this study, we proposed DiverSeg to exploit diverse segmentations from multiple subword segmenters that capture the various perspectives of each word for neural machine translation. In DiverSeg, multiple segmentations are encoded using a subword lattice input, a subword-relation-aware attention mechanism integrates relations among subwords, and a cross-granularity embedding alignment objective enhances the similarity across different segmentations of a word. We conducted experiments on five datasets to evaluate the effectiveness of DiverSeg in improving machine translation quality. The results demonstrate that DiverSeg outperforms baseline methods by approximately two BLEU points. Additionally, we performed ablation studies to investigate the improvement over non-subword methods, the contribution of each component of DiverSeg, the choice of subword relations, the choice of similarity metrics in alignment loss, and combinations of segmenters.

抄録全体を表示

PDF形式でダウンロード (3838K)
言語モデルを用いた漢詩文の返り点付与と書き下し文生成

王昊, 清水博文, 河原大輔

自然言語処理

2024年 31 巻 1 号 134-154
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.134

ジャーナルフリー

抄録を表示する抄録を非表示にする

近年の
自然言語処理
の研究は，現代語を中心に行われ，多くのタスクで高い性能を達成している．一方，古文やそれに関連するタスクにはほとんど注意が払われてこなかった．漢文は約 2000 年前の弥生時代に中国から日本に伝えられたと推測されており，それ以降日本文学に多大な影響を与えた．現在においても大学入学共通テストの国語において漢文は 200 点の内 50 点を占めている．しかし，中国にある豊富な言語資源に比べ，日本にある漢文の書き下し文資源は非常に少ない．この問題を解決するために，本研究は漢詩文を対象とし，白文と書き下し文からなる漢文訓読データセットを構築する．そして，漢文理解において重要視される返り点付与，書き下し文生成の二つのタスクに対し，言語モデルを用いて精度向上を試みる．また，人間の評価結果と比較することで，最適な自動評価指標について議論する．データセットとコードは https://github.com/nlp-waseda/Kanbun-LM で公開している．

抄録全体を表示

PDF形式でダウンロード (1080K)
文書レベル関係抽出における根拠認識の統合

Youmi Ma, An Wang, 岡崎直観

自然言語処理

2024年 31 巻 1 号 105-133
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.105

ジャーナルフリー

抄録を表示する抄録を非表示にする

文書レベル関係抽出 (DocRE) は文書中のすべてのエンティティの組の関係を推定するタスクである．エンティティ組の関係推定に十分な手掛かりを含む文の集合を根拠と呼ぶ．根拠は関係抽出の性能を改善できるが，既存研究では DocRE と根拠認識を別々のタスクとしてモデル化していた．本稿では，根拠認識を関係抽出のモデルに統合する手法を提案する．具体的には，エンティティ組のエンコード過程において，根拠に高い重みを配分するように自己注意機構を誘導することにより，根拠に注目した分散表現を得る．さらに，根拠のアノテーションが付与されていないデータに根拠の疑似的な教師信号を付与し，大量の自動ラベル付けデータを活用する方法を提案する．実験結果から，提案手法は文書レベル関係抽出のベンチマーク DocRED 及び Re-DocRED において，関係抽出と根拠認識の両方で現時点の世界最良性能を達成した．

抄録全体を表示

PDF形式でダウンロード (1847K)
立体言語

永田亮

自然言語処理

2024年 31 巻 1 号 1-2
発行日: 2024年
公開日: 2024/03/15

DOI https://doi.org/10.5715/jnlp.31.1

ジャーナルフリー

PDF形式でダウンロード (117K)
編集操作予測に基づく語彙制約付きデコーディングによるテキスト平易化の難易度制御

舌達也, 梶原智之, 荒瀬由紀

自然言語処理

2023年 30 巻 3 号 991-1010
発行日: 2023年
公開日: 2023/09/15

DOI https://doi.org/10.5715/jnlp.30.991

ジャーナルフリー

抄録を表示する抄録を非表示にする

テキスト平易化の難易度制御は,目標難易度に応じて文を平易化することで，言語学習支援に貢献する技術である．このタスクに対する既存手法には，入力を大幅に言い換える学習が困難である問題と柔軟な文生成が難しい問題がある．提案手法では，平易な出力文に出現させる単語の制約と，出現させない単語の制約を作成し，それらによって難易度を制御しつつテキスト平易化を行う．制約は，文中の各単語に対する編集操作予測，難易度判定，難解な単語の平易な言い換えにより作成する．提案手法は，正・負の制約を用いることで言い換えを促進しつつ，系列変換モデルで柔軟に文を生成するため，既存手法の問題を解決できる．評価実験によって，提案手法が文法性を損なったり，文の意味を大幅に欠落させることなく目標とする難易度に応じたテキスト平易化を実現できることを確認した．

抄録全体を表示

PDF形式でダウンロード (679K)
A Comprehensive Empirical Study on Personalized Dialogue Generation

Itsugun Cho, Dongyang Wang, Ryota Takahashi, Hiroaki Saito

自然言語処理

2023年 30 巻 3 号 959-990
発行日: 2023年
公開日: 2023/09/15

DOI https://doi.org/10.5715/jnlp.30.959

ジャーナルフリー

抄録を表示する抄録を非表示にする

Current studies on the generation of personalized dialogue primarily contribute to an agent presenting a consistent personality and driving a more informative response. However, we found that the responses generated from most previous models were self-centered, with little consideration for the user in the dialogue. Moreover, we consider human-like conversations to be essentially based on inferring information about the persona of the other party. Therefore, we propose a novel personalized dialogue generator that detects implicit user personas. Because it is difficult to collect a large amount of detailed personal facts for each user, we attempted to model the potential persona of a user and its representation from the dialogue history with no external knowledge. The perception and fader variables were conceived using conditional variational inference. The two latent variables simulate the process of people becoming aware of each other’s personas and producing a corresponding expression in conversations. Subsequently, posterior-discriminated regularization was performed to enhance the training procedure. Finally, a selector was designed to help our model provide long-sighted responses. Comprehensive experiments demonstrate that compared to the state-of-the-art methods, our approach is more concerned with the user’s persona and achieves a notable boost across both automatic metrics and human evaluations.

抄録全体を表示

PDF形式でダウンロード (3585K)

J-STAGEへの登録はこちら（無料）