Minghui Liao (廖明辉)

Researcher / 研究员

Huawei (华为)

My recent works are mainly on Multimodal Vision Language Models and Document Intelligence.

About Me

I am currently a Researcher at Huawei (2021–Present). Before joining Huawei, I received my Ph.D. degree.

My research journey includes visiting Johns Hopkins University (with Prof. Alan Yuille, 2019–2020), and research internships at Facebook (2019) and Megvii/Face++ (2018–2019).

I am always looking for self-motivated research interns.

Research directions: Multimodal Vision Language Models, Document Intelligence.
Priority given to candidates with top-tier publications or available for long-term internship (>6 months). Email: liaominghui1@huawei.com

News

  • Recent 🔥 EMMA is released.
  • Recent 🚀 VisuRiddles is released.

Publications

Paper Image

EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture

Xin He, et al.

arXiv preprint, 2025

Paper Image

Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective

Xinmiao Yu, et al.

AAAI Conference on Artificial Intelligence (AAAI), 2025

VisuRiddles

VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning

Hao Yan, et al.

arXiv preprint, 2025

Project Lead GitHub
UI-Hawk

UI-Hawk: Unleashing the Screen Stream Understanding for GUI Agents

Jiwen Zhang, et al.

EMNLP, 2025

Project Lead GitHub
TextHawk2

TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens

Ya-Qi Yu, Minghui Liao, Jiwen Zhang, Jihao Wu

arXiv preprint arXiv:2410.05261, 2024

Project Lead GitHub
TextHawk

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

Ya-Qi Yu, Minghui Liao, Jihao Wu, Yongxin Liao, Xiaoyu Zheng, Wei Zeng

arXiv preprint arXiv:2404.09204, 2024

Co-first author Project Lead GitHub
WuKong

WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

Xudong Xie, et al.

arXiv preprint arXiv:2410.05970, 2024

Android Zoo

Android in the zoo: Chain-of-action-thought for gui agents

Jiwen Zhang, et al.

EMNLP Findings, 2024

Partial Scene Text

Partial Scene Text Retrieval

Hao Wang, Minghui Liao, Zhouyi Xie, Wenyu Liu, Xiang Bai

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

ACM MM 2022

Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition

Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Shenggao Zhu, Hualin Luo, Qi Tian, Xiang Bai

ACM Multimedia (ACM MM), 2022

Co-first author GitHub
DBNet

Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion

Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

GitHub
MOST

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai

CVPR, 2021

Co-first author
Mask TextSpotter v3

Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, Xiang Bai

European Conference on Computer Vision (ECCV), 2020

GitHub
DBNet

Real-time Scene Text Detection with Differentiable Binarization

Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai

AAAI, 2020 Oral

GitHub
SynthText3D

SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds

Minghui Liao, Boyu Song, Minghang He, Shangbang Long, Cong Yao, Xiang Bai

SCIS 2020

GitHub
Mask TextSpotter v2

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

GitHub
ICCV 2019

Symmetry-constrained Rectification Network for Scene Text Recognition

MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai

International Conference on Computer Vision (ICCV), 2019

AAAI 2019

Scene Text Recognition from Two-Dimensional Perspective

Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai

AAAI, 2019 Oral

Mask TextSpotter v1

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai

European Conference on Computer Vision (ECCV), 2018

Co-first author GitHub
Rotation Sensitive Regression

Rotation-Sensitive Regression for Oriented Scene Text Detection

Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-song Xia, Xiang Bai

CVPR, 2018

GitHub
TextBoxes++

TextBoxes++: A Single-Shot Oriented Scene Text Detector

Minghui Liao, Baoguang Shi, Xiang Bai

IEEE Transactions on Image Processing (TIP), 2018

GitHub
TextBoxes

Textboxes: A fast text detector with a single deep neural network

Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu

AAAI, 2017 Oral

GitHub

Honors & Awards

2023-2025 Elsevier Highly Cited Chinese Researchers (爱思唯尔中国高被引学者)
2023-2025 Stanford World's Top 2% Scientists (斯坦福大学全球前2%顶尖科学家)
2024 First Prize of Hubei Natural Science Award (湖北省自然科学奖一等奖)
2022 CSIG Excellent Doctoral Dissertation Award (CSIG 优秀博士学位论文奖)
2021 Top 100 Chinese New Stars in AI (AI华人新星百强)
2019 National Scholarship for PhD Student