Welcome to Hong Hong’s Project Page
About Myself
I am an applied scientist at Meta and previously MSFT, with 9+ years of science modeling and 4+ years of engineering development experiences. Here is my summary:
- Experienced in R&D in areas like LLM, DocAI, NLP, KG, as well as training models within compliant boundary
- Passionate in probabilistic modeling with small data; weak / semi supervision & transfer learning in Enterprise settings
- Hands-on experience in distributed system data processing, online / offline experiments and metrics evaluation
- Used to be the expert of high performance / concurrent / Async development with C++
I am an active member (Moderator) in Chinese AI community on the Clubhouse
I am an Co-Host in Chinese AI podcast EnterAI
The Blogs
- Random and Fair Red Pockets: A Statistical Approach
- Shared a common but interesting “stats cookie” problem about how to randomly and fairly split money in Chinese Red-Pockets.
- Discussed 3 different statistical approaches to split the red pockets money.
- Analyzed different characteristics and demonstrated the sampling process in python code.
- Commenting on the O1 Implementation
- Discussed 2 hypotheses on how would the OpenAI O1 were implemented
- Hypotheses 1 - using MCTS to perform self CoT training
- Hypotheses 2 - using self-play to align model’s output to human preference
- Sampling and Estimation Step-by-Step
- Generative-model and discriminative Model
- Perfect Coin
- Boost Series AdaBoost and GBDT
- A Story about e
The Lab Projects
- The Toy Sample for Message-Passing Variation Bayesian Inference
- Simple implementation demo for message-passing based Bayesian inference
- Capable of building simple Bayesian graph in Gaussian foamily
- With detailed documentation and sample code
- Private Domain Topic Representation Training
- Solve the topic representation training issues under private domain settings (small corpus size)
- Joint training between topic representation and token representation
- BERT/ERL pretrained model based
The Fun Coding Life
- The Airline Price Optimization from Kaggle
- Basic solution for airline price optimization challenge with detailed explanation.
- Risk of Collisions in Fast Rolling Hash Implementation
- Code samples demonstrate a magic case that cause hash collisions in one of the fast rolling hash implementation
Previous Work Projects
- Microsoft Viva Topics
- Smartly discover topic based knowledges in your organization
- Worked on areas over extraction, conflation as well as ranking
- Answers @ Microsoft Search
- Microsoft Outlook Online
- The modern enterprise smart email client
- Worked on areas over people search, email search
- Visual C++ STL Library
- The Microsoft version of C++ standard library
- Contributed to version 2017, 2015
- Contributed to C++ co_routine design proposal to language committee
- C++ Rest SDK
- The open source Rest SDK for C++
- Worked as one of the primary project contributor
- Microsoft Parallel Pattern Library
- The earlier effort from Microsoft for C++ parallel computing
- Worked as one of the primary project contributor