John AI Lab: Home

News

January 23, 2025
One paper will be presented at NAACL 2025. Congrats to Xiangyan Liu, and our collaborators!
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases
January 23, 2025
Three papers will be presented at ICLR 2025. Congrats to Jinjie Ni, Yiran Zhao, Guanzheng Chen, and our collaborators!
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Identifying and Tuning Safety Neurons in Large Language Models
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
December 19, 2024
One paper will be presented at AAAI 2025. Congrats to Leon Lin, Hannah Brown and our collaborators!
Single Character Perturbations Break LLM Alignment

Featured Research

Safety Neuron

We propose a neuron detection method to identify and tune safety neurons—less than 1% of model parameters—using SN-Tune, which enhances safety without affecting general model performance.

MixEval-X

The first any-to-any evaluations from real-world data mixtures.

Noisy Student

The first method that uses extra unlabeled data to achieve state-of-the-art on ImageNet. Also employed in AlphaFold 2 (Nobel Prize 2024), Google Search and other state-of-the-art AI systems.

RACE Dataset

The first large-scale language understanding dataset collected from exams for human. Currently MMLU and MATH are also collected from exams.

Mission

Feel the AGI. Use 10k GPUs like rich kids in the industry. Get 100k citations. Get Nobel prizes and Turing awards.

Just kidding haha. We work on democratizing AI in aspects including but not limited to resource-efficient AI, vision language models and more exploratory topics.

People

Faculty

Michael Qizhe Shieh

Assistant Professor

Jinjie Ni

Research Fellow

PhD Students

Esther Gan

Co-advised with Min-Yen Kan

Masters Students

Fengtao He

National University of Singapore

Leon Lin

National University of Singapore

Undergraduate Students

Qilong Feng

National University of Singapore

Zhennan Shen

Shanghai Jiao Tong University

Yang Zhang

Peking University

Publications

Download BibTeX.

*: equal contribution, †: equal advising

2025 January	CODEXGRAPH: Bridging Large Language Models and Code Repositories via Code Graph Databases Xiangyan Liu, Bo Lan, Zhiyuan Hu, Yang Liu, Zhicheng Zhang, Fei Wang, Michael Qizhe Shieh, and Wenmeng Zhou NAACL 2025
2025 January	MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Jinjie Ni, Yiran Song, Deepanway Ghosal, Bo Li, David Junhao Zhang, Xiang Yue, Fuzhao Xue, Zian Zheng, Kaichen Zhang, Mahir Shah, Kabir Jain, Yang You, and Michael Qizhe Shieh ICLR 2025 (Spotlight)
2025 January	Identifying and Tuning Safety Neurons in Large Language Models Yiran Zhao, Wenxuan Zhang, Yuxi Xie, Anirudh Goyal, Kenji Kawaguchi, and Michael Qizhe Shieh ICLR 2025
2025 January	LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Guanzheng Chen, Xin Li, Michael Qizhe Shieh, and Lidong Bing ICLR 2025
2024 December	Single Character Perturbations Break LLM Alignment Leon Lin, Hannah Brown, Kenji Kawaguchi, and Michael Shieh AAAI 2025
2024 September	Accelerating greedy coordinate gradient via probe sampling Yiran Zhao, Wenyue Zheng, Tianle Cai, Xuan Long Do, Kenji Kawaguchi, Anirudh Goyal, and Michael Qizhe Shieh NeurIPS 2024
2024 September	Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning Yuxi Xie, Anirudh Goyal, Wenyue Zheng, Min-Yen Kan, Timothy P Lillicrap, Kenji Kawaguchi, and Michael Shieh NeurIPS 2024 Workshop
2024 September	Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models Hongfu Liu, Yuxi Xie, Ye Wang, and Michael Shieh EMNLP 2024
2024 September	Reasoning Robustness of LLMs to Adversarial Typographical Errors Esther Gan, Yiran Zhao, Liying Cheng, Yancan Mao, Anirudh Goyal, Kenji Kawaguchi, Min-Yen Kan, and Michael Shieh EMNLP 2024
2024 May	Instructcoder: Instruction tuning large language models for code editing Kaixin Li, Qisheng Hu, James Zhao, Hui Chen, Yuxi Xie, Tiedong Liu, Michael Shieh†, and Junxian He† ACL 2024 Workshop
2024 May	Prompt optimization via adversarial in-context learning Xuan Long Do, Yiran Zhao, Hannah Brown*, Yuxi Xie, James Xu Zhao, Nancy F. Chen, Kenji Kawaguchi, Michael Shieh†, and Junxian He† ACL 2024 (Oral)
2023 October	Self-evaluation guided beam search for reasoning Yuxi Xie, Kenji Kawaguchi, Yiran Zhao, James Xu Zhao, Min-Yen Kan†, Junxian He†, and Michael Xie† NeurIPS 2023
2023 October	Automatic model selection with large language models for reasoning James Xu Zhao, Yuxi Xie, Kenji Kawaguchi, Junxian He, and Michael Qizhe Xie EMNLP 2023 Findings
2023 May	Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering Hai Ye, Qizhe Xie, and Hwee Tou Ng ACL 2023
2021 February	Meta pseudo labels Hieu Pham, Zihang Dai, Qizhe Xie, and Quoc V. Le CVPR 2021
2020 October	Unsupervised data augmentation for consistency training Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc V. Le NeurIPS 2020
2020 February	Self-training with noisy student improves imagenet classification Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V. Le CVPR 2020
2017 November	RACE: Large-scale ReAding Comprehension Dataset From Examinations Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy EMNLP 2017

John AI Lab

News

Featured Research

Safety Neuron

MixEval-X

Noisy Student

RACE Dataset

Mission

People

Faculty

Michael Qizhe Shieh

Jinjie Ni

PhD Students

Hannah Brown

Guanzheng Chen

Keyu Duan

Esther Gan

Ziyao Guo

Xiangyan Liu

Zijian Wu

Yiran Zhao

Masters Students

Fengtao He

Leon Lin

Undergraduate Students

Qilong Feng

Zhennan Shen

Yang Zhang

Publications