Ryuto Koike
Email: ryuto.koike [at] nlp.c.titech.ac.jp
|
|
About me
Hello! I am a final-year PhD student (est. March 2026) at the Institute of Science Tokyo, advised by Prof. Naoaki Okazaki.
My research aims to develop principles and methods for building and ensuring safe, secure, and
reliable AI systems, mitigating their negative societal implications. My current work
focuses on
membership inference attacks, AI-generated
text detection, jailbreak defense, and reliable LLM-as-a-judge. My work has been published in top-tier
conferences such as ACL, EMNLP, and AAAI, and I have led multiple collaborative projects with Prof.
Chris
Callison-Burch at UPenn and Prof. Preslav
Nakov at MBZUAI.
Outside of academia, I was involved as a research advisor for a
startup in Japan on multi-lingual text generation.
|
|
Selected Works (*: equal contribution. †: undergraduate/master's mentee.)
[
Google Scholar
]
|
|
Machine Text Detectors are Membership Inference Attacks
Ryuto Koike*, Liam Dugan*, Masahiro
Kaneko, Chris Callison-Burch, Naoaki Okazaki
Preprint 2025
TL;DR - We theoretically prove that membership inference attacks (MIA) and machine-generated text
detection share the same optimal metric, and empirically demonstrate strong cross-task transferability
(ρ > 0.6) across diverse domains and generators. Notably, a machine text detector outperforms
a state-of-the-art MIA on MIA benchmarks. To support cross-task development and fair evaluation,
we introduce MINT, a unified evaluation suite implementing 15 recent methods from both tasks.
|
|
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability
Ryuto Koike, Masahiro
Kaneko, Ayana Niwa, Preslav Nakov, Naoaki Okazaki
Preprint 2025
TL;DR - We propose ExaGPT, an interpretable AI text detector that identifies a text by checking
whether it shares more similar spans with human-written vs. machine-generated texts from a datastore
and presents those spans as evidence for users to assess how reliably correct the decision is. ExaGPT
achieves both high interpretability and significant performance, outperforming prior interpretable
detectors by up to +37.0% accuracy at 1% FPR.
|
|
OUTFOX: LLM-Generated Essay
Detection Through In-Context Learning with Adversarially
Generated Examples
Ryuto Koike, Masahiro
Kaneko, Naoaki Okazaki
AAAI 2024
TL;DR - We propose OUTFOX, a framework that improves the robustness of AI text detectors by
allowing both the detector and the attacker to adversarially learn from each other's output through
in-context learning, achieving a +41.3% F1 improvement against strong adaptive attacks. This paper is
among the first to effectively use AI to detect AI.
|
|
How You Prompt Matters! Even Task-Oriented Constraints in Instructions Affect
LLM-Generated Text Detection
Ryuto Koike, Masahiro
Kaneko, Naoaki Okazaki
EMNLP Findings 2024
TL;DR - We reveal the vulnerabiltiies of AI text detectors against prompt diversity in text
generation.
Specifically, even task-oriented constraints -- constraints that would naturally be included in
an
instruction and are not related to detection-evasion -- cause existing powerful detectors to degrade
their detection performance.
We highlight the importance of ensuring prompt diversity to build robust benchmarks grounded in
real-world scenarios.
|
|
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
Masanari Ohi†, Masahiro Kaneko, Ryuto
Koike, Mengsay Loem, Naoaki Okazaki
ACL Findings 2024
TL;DR - We identify a self-preference bias in LLM-as-a-judge i.e., LLMs overrate texts with higher
likelihoods while underrating those
with lower likelihoods. We further propose a simple yet effective mitigation method via in-context
learning, achieving better alignment with human evaluations.
|
All Publications
Journal
- Ryuto Koike, Masahiro Kaneko, Naoaki Okazaki. On the Robustness
of LLM-Generated Text Detection Against Instruction Diversity. Journal of Natural
Language Processing, 33(1), to
appear, March 2026.
- 大井 聖也, 金子 正弘, 小池 隆斗, Mengsay Loem, 岡崎 直観.
大規模言語モデルにおける評価バイアスの尤度に基づく緩和. 自然言語処理, 32(2):480–496, July 2025. 最優秀論文賞 🎉
- 小池 隆斗, 萩原 将文. スタイル変換を用いたYouTube動画タイトルの生成支援システム. 日本感性工学会論文誌,
21(1):49–55, 2022.
International Conferences
- Ryuto Koike, Masahiro Kaneko, Naoaki Okazaki. OUTFOX:
LLM-generated Essay Detection through In-context Learning with Adversarially Generated
Examples. AAAI, pp. 21258–21266, Vancouver,
Canada, February 2024.
- Ryuto Koike, Masahiro Kaneko, Naoaki Okazaki. How You Prompt
Matters! Even Task-Oriented Constraints in Instructions Affect LLM-Generated Text
Detection. Findings of EMNLP, pp. 14384–14395,
Miami, Florida, USA, November
2024.
- Masanari Ohi, Masahiro Kaneko, Ryuto Koike, Mengsay Loem, Naoaki Okazaki.
Likelihood-based Mitigation of Evaluation Bias in Large Language Models.
Findings of ACL, pp. 3237–3245, Bangkok, Thailand, August
2024.
- Yuxia Wang, Artem Shelmanov, Jonibek Mansurov, Akim Tsvigun, Vladislav Mikhailov, Rui Xing,
Zhuohan Xie, Jiahui Geng, Giovanni Puccetti, Ekaterina Artemova, Jinyan Su, Minh Ngoc Ta, Mervat
Abassy, Kareem Elozeiri, Saad El Dine Ahmed, Maiya Goloburda, Tarek Mahmoud, Raj Vardhan Tomar,
Alexander Aziz, Nurkhan Laiyk, Osama Mohammed Afzal, Ryuto Koike, Masahiro
Kaneko, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov. GenAI Content
Detection Task 1: English and Multilingual Machine-generated Text Detection: AI vs.
Human.
Proceedings
of the 1st Workshop on GenAI Content Detection (GenAIDetect),
pp. 244–261, COLING, Abu Dhabi,
UAE.
Preprints
- Masanari Oi, Koki Maeda, Ryuto Koike, Daisuke Oba, Nakamasa Inoue, Naoaki
Okazaki. From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in
Multi-modal Large Language Models. arXiv. 2026.
- Ryuto Koike*, Liam Dugan*, Masahiro Kaneko, Chris Callison-Burch, Naoaki
Okazaki. Machine Text Detectors are Membership Inference Attacks. arXiv. 2025.
- Ryuto Koike, Masahiro Kaneko, Ayana Niwa, Preslav Nakov, Naoaki Okazaki.
ExaGPT: Example-Based Machine-Generated Text Detection for Human
Interpretability. arXiv. 2025.
- Yuxia Wang, Rui Xing, Jonibek Mansurov, Giovanni Puccetti, Zhuohan Xie, Minh Ngoc Ta, Jiahui
Geng, Jinyan Su, Mervat Abassy, Saad El Dine Ahmed, Kareem Elozeiri, Nurkhan Laiyk, Maiya
Goloburda, Tarek Mahmoud, Raj Vardhan Tomar, Alexander Aziz, Ryuto Koike,
Masahiro Kaneko, Artem Shelmanov, Ekaterina Artemova, Vladislav Mikhailov, Akim Tsvigun, Alham
Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov. Is Human-like Text Liked by
Human?
Multilingual Human Detection and Preference Against AI. arXiv. 2025.
Domestic Conferences and Symposiums
- 小池 隆斗*, Liam Dugan*, 金子 正弘, Chris Callison-Burch, 岡崎 直観.
機械生成文検出とメンバーシップ推論は相互に転移可能. 言語処理学会第32回年次大会 (NLP2026), 2026年3月. 若手奨励賞 🎉
- 一瀬 達矢, Youmi Ma, 大井 聖也, 小池 隆斗, 岡崎 直観.
対照的デコーディングを用いた指示学習データの合成. 言語処理学会第32回年次大会 (NLP2026), 2026年3月.
- 島田 比奈理, 大葉 大輔, 小池 隆斗, 金子 正弘, 岡崎 直観.
現在と将来の応答の有害性を低減させることによるマルチターン脱獄攻撃の防御手法. 言語処理学会第32回年次大会 (NLP2026), 2026年3月.
- 齋藤 幸史郎, 小池 隆斗, 金子 正弘, 岡崎 直観. 機械文としての検出されやすさと文章の品質は両立する.
言語処理学会第32回年次大会 (NLP2026), 2026年3月. 若手奨励賞 🎉
スポンサー賞(CyberAgent、ELYZA)🎉
- 島田 比奈理, 大葉 大輔, 小池 隆斗, 金子 正弘, 岡崎 直観.
マルチターンJailbreak攻撃に対する防御アルゴリズムの提案. 第20回YANSシンポジウム (YANS2025), S3-P49, 2025年9月.
スポンサー賞(Polaris.AI)🎉
- 齋藤 幸史郎, 小池 隆斗, 金子 正弘, 岡崎 直観.
PUPPET:タスク性能を維持しながらLLMとして検出されやすくする学習フレームワーク. 第20回YANSシンポジウム (YANS2025), S2-P13,
2025年9月.
- 一瀬 達矢, Youmi Ma, 大井 聖也, 小池 隆斗, 岡崎 直観.
対照的デコーディングを用いた指示学習データの合成. 第20回YANSシンポジウム (YANS2025), S1-P34, 2025年9月. スポンサー賞(CyberAgent)🎉
- 小池 隆斗, 金子 正弘, 丹羽 彩奈, Preslav Nakov, 岡崎 直観. ExaGPT:
解釈性向上に向けた事例ベース機械文検出. 人工知能学会第39回年次大会 (JSAI2025), 2025年5月.
- 齋藤 幸史郎, 小池 隆斗, 金子 正弘, 岡崎 直観.
PUPPET:タスク性能を維持しながらLLMとして検出されやすくする学習フレームワーク. 言語処理学会第31回年次大会 (NLP2025), P7-5,
pp. 2791–2796, 2025年3月.
- 齋藤 幸史郎, 小池 隆斗, 金子 正弘, 岡崎 直観.
強化学習を用いた、言語理解能力を維持したLLM検出器の性能向上. 第19回YANSシンポジウム (YANS2024), S1-P23, 2024年9月.
奨励賞🎉 スポンサー賞(CyberAgent)🎉
- 大井 聖也, 金子 正弘, 小池 隆斗, Mengsay Loem, 岡崎 直観.
大規模言語モデルにおける評価バイアスの尤度に基づく緩和. 言語処理学会第30回年次大会 (NLP2024), A11-4, pp. 3021–3026,
2024年3月. 若手奨励賞 🎉
- 小池 隆斗, 金子 正弘, 岡崎 直観. 制約が異なる指示で生成された文章に対するLLM生成検出の頑健性.
言語処理学会第30回年次大会 (NLP2024), A4-4, pp. 943–948, 2024年3月.
- 小池 隆斗, 金子 正弘, 岡崎 直観. 敵対的事例を用いたIn-context
learningによるLLM生成エッセイの検出. 第18回NLP若手の会シンポジウム, S3-P13, 2023年8月. スポンサー賞(PKSHA Technology、HAKUHODO Technologies)🎉
|
|
Experiences
|
|
Institute of Science Tokyo, Tokyo, Japan
Doctoral Researcher (2023.04 - Present)
Advisor: Prof. Naoaki Okazaki
|
|
University of Pennsylvania, Philadelphia, PA, USA
Visiting Researcher (2024.10 - 2025.10)
Advisor: Prof. Chris Callison-Burch
|
|
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
Research Collaborate (2024.04 - 2025.04)
Advisor: Prof. Preslav Nakov
|
|
Exawizards, Inc., Tokyo, Japan
Machine Learning Engineer Intern (2022.02 - 2022.03)
|
|
CyberAgent, Inc., Tokyo, Japan
Research Intern (2021.09 - 2022.01)
Software Engineer Intern (2021.07 - 2021.08)
|
|
Education
|
|
Institute of Science Tokyo (formerly Tokyo Institute of Technology), Tokyo, Japan
Ph.D. in Computer Science (2023.04 - est. 2026.04)
|
|
Keio University, Tokyo, Japan
M.S. in Information and Computer Science (2021.04 - 2023.03)
B.S. in Information and Computer Science (2017.04 - 2021.03)
|
Grants
- Off-Campus Study Plus in Tokyo Tech SPRING
Scholarship
Tokyo Institute of Technology, 2024.
Research Funds: 900,000 JPY / APPROX 6,000 USD
- Tobitate! (Leap for Tomorrow)
Study Abroad Scholarship (Acceptance Rate: 16.7%)
The Ministry of Education, Culture, Sports, Science and Technology (MEXT), 2024.
Scholarship: 1,920,000 JPY / APPROX 12,100 USD per year, Preparation Funds: 350,000 JPY /
APPROX 2,200 USD
- Tokyo Tech SPRING Scholarship
Tokyo Institute of Technology, Apr. 2024 - Mar.2026.
Scholarship: 2,160,000 JPY / APPROX 14,400 USD per year, Research Funds: 300,000 JPY / APPROX
2,000 USD per year,
Full Tuition Exemption.
- Tokyo Tech Advanced Human Resource
Development Fellowship for Doctoral Students
Tokyo Institute of Technology, Apr. 2023 - Mar. 2024.
Scholarship: 1,800,000 JPY / APPROX 12,000 USD per year, Research Funds: 300,000 JPY / APPROX
2,000 USD per year,
Full Tuition Exemption.
|
|