I am now a researcher in Huawei.
My name is Minghui Liao (廖明辉). I am now a researcher in Huawei. My recent works are mainly on multimodal vision language models.
长期招研究实习生,研究方向为视觉语言多模态、文档智能(多模态)等。有顶会顶刊论文优先;能长期(6个月以上)实习优先。感兴趣的同学可以将简历发送至:liaominghui1<at>huawei.com
Jiwen Zhang, Yaqi Yu, Minghui Liao, Wentao Li, Jihao Wu, Zhongyu Wei, “Ui-hawk: Unleashing the screen stream understanding for gui agents.” Preprints, manuscript/202408.2137 (2024).
Ya-Qi Yu, Minghui Liao, Jiwen Zhang, Jihao Wu, “Texthawk2: A large vision-language model excels in bilingual ocr and grounding with 16x fewer tokens.” arXiv preprint arXiv:2410.05261 (2024).
Ya-Qi Yu, Minghui Liao, Jihao Wu, Yongxin Liao, Xiaoyu Zheng, Wei Zeng. “Texthawk: Exploring efficient fine-grained perception of multimodal large language models.” arXiv preprint arXiv:2404.09204 (2024).
Xinmiao Yu, et al., “Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective”. AAAI 2025.
Xudong Xie, et al. “WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling.” arXiv preprint arXiv:2410.05970 (2024).
Jiwen Zhang, et al. “Android in the zoo: Chain-of-action-thought for gui agents.” EMNLP Findings 2024.
Hao Wang, Minghui Liao, Zhouyi Xie, Wenyu Liu, Xiang Bai, “Partial Scene Text Retrieval”. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024.
Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Shenggao Zhu, Hualin Luo, Qi Tian, Xiang Bai, “Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition”. ACM MM, 2022. Co-first author
Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai, “Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion”. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022. code
Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai, “MOST: A Multi-Oriented Scene Text Detector with Localization Refinement”. CVPR, 2021. Co-first author
Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, Xiang Bai, “Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting”. ECCV, 2020. code
Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai, “Real-time Scene Text Detection with Differentiable Binarization”. AAAI, 2020. oral presentation code
Minghui Liao, Boyu Song, Minghang He, Shangbang Long, Cong Yao, Xiang Bai, “SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds”. SCIS 2020. code and data
Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai, “Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes”. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) code
MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai, “Symmetry-constrained Rectification Network for Scene Text Recognition”. International Conference on Computer Vision (ICCV), 2019
Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai, “Scene Text Recognition from Two-Dimensional Perspective”. Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), 2019 oral presentation
Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai, “Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes”. European Conference on Computer Vision (ECCV), 2018, pp. 67-83. Co-first author code
Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-song Xia, Xiang Bai, “Rotation-Sensitive Regression for Oriented Scene Text Detection”. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 5909-5918. code
Minghui Liao, Baoguang Shi, Xiang Bai, “TextBoxes++: A Single-Shot Oriented Scene Text Detector”. IEEE Transactions on Image Processing (TIP), 2018, 27(8): 3676-3690. code
Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu, “Textboxes: A fast text detector with a single deep neural network”. Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017, pp. 4161-4167. oral presentation code
Yingying Zhu, Minghui Liao, Mingkun Yang, Wenyu Liu, “Cascaded segmentation-detection networks for text-based traffic sign detection”. IEEE transactions on intelligent transportation systems, 2018, 19(1): 209-219.
Baoguang Shi, Cong Yao, Minghui Liao, Mingkun Yang, Pei Xu, Linyan Cui, Serge Belongie, Shijian Lu, Xiang Bai, “ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)”. 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2017, 1: 1429-1434.