LI Miqi (Mickey)

Education

Beijing University of Posts and Telecommunications (BUPT)

2018.9 - 2022.6

School of Computer Science | Bachelor of Engineering | Major in Data Science and Big Data Technology

The University of Hong Kong (HKU)

2022.9 - 2024.1

Faculty of Science, Department of Statistics and Actuarial Science | Master of Science | Major in Data Science

Summary

AI and data science enthusiast with hands-on experience in machine learning, large language models, and data analytics across tech and industry settings.

Work Experience

AS Watson Group, DataLab - AI Lab

Hong Kong

Full-Time | LLM Developer

2025.8 - Present

Designed and implemented production-grade agentic applications for commercial analysis and daily business workflows, leveraging Watson enterprise data and LangGraph / Langfuse toolchains.
Category management system (agent-driven analytics application)
- Architected and built a multi-agent application that ingests and reasons over real enterprise data to deliver KPI monitoring, root-cause analysis, and actionable recommendations.
- Capabilities demonstrated:
- Multi-agent workflow:
  - Intent recognizer: leverages the RAG system over internal knowledge to infer user intent, resolve internal terminology (e.g., product lines, channel names, abbreviations), and determine which agents (KPI, sales tree, customer segmentation, SQL, visualization, driver analysis) are relevant to the request.
  - Planner: constructs a coordinated execution plan across the selected agents based on the user query and intent recognizer output, including data dependencies and intermediate artifacts.
  - Finalizer: aggregates outputs from all participating agents into a coherent, evidence-backed response; evaluates coverage and confidence; and, when results are insufficient to answer the question, invokes the replanner with targeted guidance.
  - Replanner: iteratively refines the multi-agent plan (e.g., adding new sub-tasks, adjusting SQL queries, requesting additional context) and re-runs the workflow until the user's question is satisfactorily addressed or clear limitations are surfaced.
- Outcome and production readiness:
  - Delivered automated investigative narratives and dashboards that reduced manual analyst time on high-priority incidents and accelerated root-cause detection.
  - Deployed with monitoring and retraining hooks (via Langfuse telemetry and LangGraph orchestration) to continuously improve prompt templates, agent policies, and model selection.
Trend system (agent-driven product discovery and market analysis application)
- Architected and built a multi-agent trend analysis platform that discovers trending external products via web search, ranks them against user preferences, and validates market availability across competitor platforms to identify white-space opportunities.
- Capabilities demonstrated:
- Reactive frontend with real-time progress tracking:
  - Progress bar component: renders animated, segmented step indicators driven by AG-UI protocol events (activity snapshots and step-completion events streamed via SSE); visually tracks multi-phase agent execution with step labels, colour-coded segments, and clickable navigation.
  - Interactive form and selection cards: renders form inputs, multiple-choice cards, and SKU selectors as rich UI components within the chat stream; supports locked/read-only state after submission for clear conversation history.
  - Workflow node topology: declarative workflow graph defining agent steps and transitions, enabling the frontend to map backend progress events to visual step indicators across the hero SKU and white-space pipelines.
- Outcome and production readiness:
  - Delivered an end-to-end product discovery and competitive intelligence tool that automates trending product identification, preference-weighted ranking, and multi-market competitor availability analysis — replacing manual analyst research workflows.
  - Built with LangGraph subgraph composition, AG-UI event streaming, and MCP-based tool abstraction for modularity and extensibility.

Huawei Hong Kong Research Center, Design Automation Lab

Hong Kong

Full-Time | R&D Engineer

2024.6 - 2025.8

Developed and deployed AI-driven automation and data science solutions for chip design and manufacturing, focusing on scalable technologies for schematic review, yield improvement and process optimization.
Retrieval-Augmented Generation (RAG) System for Schematic Review
- Designed and implemented a multi-modal data processing pipeline for RAG, supporting various document formats (PDF, DOCX, PPTX, PNG, JPG).
- Integrated visual LLM models for image-based information extraction alongside text extraction systems.
- Built a vector database storing both dense and sparse embeddings, enabling advanced semantic and keyword-based search; applied the reciprocal rank fusion (RRF) algorithm and a rerank model to further refine and present the most relevant results.
- Developed a Gradio-based frontend, allowing users to query PCB chip-related issues with high accuracy. The system now serves enterprise customers and meets key requirements.
Automated Code Generation for Schematic Reviews
- Developed AI-powered solutions to automate schematic review processes, significantly reducing manual scripting efforts. Established a reliable code generation framework, optimizing both accuracy and response time through extensive architecture testing.
- Designed a workflow where user prompts specify only the core logic of the code, which is then expanded by an LLM-based code writer using pseudo-APIs. A secondary LLM replaces pseudo-APIs with real, validated APIs.
- Implemented a recursive process to iteratively search the database for missing APIs and integrate them into the final code.
- Validated the generated code on real-world schematic files, achieving results consistent with manually written scripts.
Deep-semi-sight LLM Project
- Developed an end-to-end framework for fine-tuning, deploying, and evaluating LLMs, leveraging vllm, vllm-ascend, and LLamaFactory. Supported advanced fine-tuning techniques (Full, LoRA, QLoRA, DPO) and concurrent deployment on non-NVIDIA platforms (e.g., Huawei Ascend hardware).
- Built an evaluation pipeline for both open-source general datasets and custom domain-specific data. Successfully fine-tuned Qwen3-32B and Qwen2.5-32B-Instruct, achieving an average 9.64% and up to 12.69% improvement in general semiconductor capability while maintaining overall model performance.
- Delivered robust, customer-facing LLM solutions and comprehensive evaluation results for enterprise deployment.
Yield Loss Data Analysis
- Designed two high-efficiency statistical algorithms for single-step yield loss identification, reducing users’ analysis time from 0.5 days to 5 minutes in real-world cases, and significantly lowering the probability of misjudgement caused by excessive highlighting compared to previous methods.
- Validated the algorithms against expert results in a two-week production test, leading to seamless integration into the latest product release.
- Developed a multi-step yield loss identification algorithm and graphical interface, completing two rounds of feature iteration. This automation replaced manual experience-based analysis, enabling root cause identification time to drop from weeks to hours.
- Demonstrated expert-level accuracy and efficiency in a three-month real-world validation, substantially reducing manual investigation time. Delivered a user-friendly executable application, deployed for customer grey testing and now fully adopted in production environments.

BASF East Asia Regional Headquarters, Global Digitalization Unit

Hong Kong

Intern | Data and AI Engineer

2024.1 - 2024.6

Contributed to BASF's AI and digitalization initiatives through two key projects: NewsBot Marketing Intelligence and Abnormal Result Inference.

Alibaba Group, TaoBao and TMall Group, Alimama

Beijing

Intern | Algorithm Engineer

2023.5 - 2023.9

Contributed to the development of Alibaba's LLM technologies through two key projects: Natural Language to Crowd Package Generation and participation in the AI Hackathon Competition.

Ability

C, C++, Python, R, SQL.
Data Science, Machine Learning, Artificial Intelligence, LLM.
LangChain, LangGraph, Langfuse, vLLM, Ollama, LLaMA-Factory.
English (Business Professional, TOEFL 109, College English Test-6 677), Mandarin (Native), Cantonese (Intermediate).
Debate: Third Place of Chinese Debate World Cup 2023, Champion of Chinese College Debate Tournament 2020.