About Me
I am currently a Researcher at Huawei (2021–Present). Before joining Huawei, I received my Ph.D. degree.
My research journey includes visiting Johns Hopkins University (with Prof. Alan Yuille, 2019–2020), and research internships at Facebook (2019) and Megvii/Face++ (2018–2019).
I am always looking for self-motivated research interns.
Research directions: Multimodal Vision Language Models, Document Intelligence.
Priority given to candidates with top-tier publications or available for long-term internship (>6 months). Email: liaominghui1@huawei.com
News
- Recent 🔥 EMMA is released.
- Recent 🚀 VisuRiddles is released.
Publications
EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture
Xin He, et al.
arXiv preprint, 2025
Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective
Xinmiao Yu, et al.
AAAI Conference on Artificial Intelligence (AAAI), 2025
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
Hao Yan, et al.
arXiv preprint, 2025
UI-Hawk: Unleashing the Screen Stream Understanding for GUI Agents
Jiwen Zhang, et al.
EMNLP, 2025
TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens
Ya-Qi Yu, Minghui Liao, Jiwen Zhang, Jihao Wu
arXiv preprint arXiv:2410.05261, 2024
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Ya-Qi Yu, Minghui Liao, Jihao Wu, Yongxin Liao, Xiaoyu Zheng, Wei Zeng
arXiv preprint arXiv:2404.09204, 2024
WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
Xudong Xie, et al.
arXiv preprint arXiv:2410.05970, 2024
Android in the zoo: Chain-of-action-thought for gui agents
Jiwen Zhang, et al.
EMNLP Findings, 2024
Partial Scene Text Retrieval
Hao Wang, Minghui Liao, Zhouyi Xie, Wenyu Liu, Xiang Bai
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Shenggao Zhu, Hualin Luo, Qi Tian, Xiang Bai
ACM Multimedia (ACM MM), 2022
Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion
Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
GitHub
MOST: A Multi-Oriented Scene Text Detector with Localization Refinement
Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai
CVPR, 2021
Co-first author
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting
Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, Xiang Bai
European Conference on Computer Vision (ECCV), 2020
GitHub
Real-time Scene Text Detection with Differentiable Binarization
Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai
AAAI, 2020 Oral
GitHub
SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds
Minghui Liao, Boyu Song, Minghang He, Shangbang Long, Cong Yao, Xiang Bai
SCIS 2020
GitHub
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
GitHub
Symmetry-constrained Rectification Network for Scene Text Recognition
MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai
International Conference on Computer Vision (ICCV), 2019
Scene Text Recognition from Two-Dimensional Perspective
Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai
AAAI, 2019 Oral
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai
European Conference on Computer Vision (ECCV), 2018
Rotation-Sensitive Regression for Oriented Scene Text Detection
Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-song Xia, Xiang Bai
CVPR, 2018
GitHub
TextBoxes++: A Single-Shot Oriented Scene Text Detector
Minghui Liao, Baoguang Shi, Xiang Bai
IEEE Transactions on Image Processing (TIP), 2018
GitHub
Textboxes: A fast text detector with a single deep neural network
Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu
AAAI, 2017 Oral
GitHubHonors & Awards
| 2023-2025 | Elsevier Highly Cited Chinese Researchers (爱思唯尔中国高被引学者) |
| 2023-2025 | Stanford World's Top 2% Scientists (斯坦福大学全球前2%顶尖科学家) |
| 2024 | First Prize of Hubei Natural Science Award (湖北省自然科学奖一等奖) |
| 2022 | CSIG Excellent Doctoral Dissertation Award (CSIG 优秀博士学位论文奖) |
| 2021 | Top 100 Chinese New Stars in AI (AI华人新星百强) |
| 2019 | National Scholarship for PhD Student |