Welcome to Hong Hong’s Project Page
About Myself
I am an applied scientist at Meta and previously MSFT, with 9+ years of science modeling and 4+ years of engineering development experiences. Here is my summary:
  - Experienced in R&D in areas like LLM, DocAI, NLP, KG, as well as training models within compliant boundary
 
  - Passionate in probabilistic modeling with small data; weak / semi supervision & transfer learning in Enterprise settings
 
  - Hands-on experience in distributed system data processing, online / offline experiments and metrics evaluation
 
  - Used to be the expert of high performance / concurrent / Async development with C++
 
I am an active member (Moderator) in Chinese AI community on the Clubhouse
I am an Co-Host in Chinese AI podcast EnterAI
The Blogs
  - Random and Fair Red Pockets: A Statistical Approach
    
      - Shared a common but interesting “stats cookie” problem about how to randomly and fairly split money in Chinese Red-Pockets.
 
      - Discussed 3 different statistical approaches to split the red pockets money.
 
      - Analyzed different characteristics and demonstrated the sampling process in python code.
 
    
   
  - Commenting on the O1 Implementation
    
      - Discussed 2 hypotheses on how would the OpenAI O1 were implemented
 
      - Hypotheses 1 - using MCTS to perform self CoT training
 
      - Hypotheses 2 - using self-play to align model’s output to human preference
 
    
   
  - Sampling and Estimation Step-by-Step
 
  - Generative-model and discriminative Model
 
  - Perfect Coin
 
  - Boost Series AdaBoost and GBDT
 
  - A Story about e
 
The Lab Projects
  - The Toy Sample for Message-Passing Variation Bayesian Inference
    
      - Simple implementation demo for message-passing based Bayesian inference
 
      - Capable of building simple Bayesian graph in Gaussian foamily
 
      - With detailed documentation and sample code
 
    
   
  - Private Domain Topic Representation Training
    
      - Solve the topic representation training issues under private domain settings (small corpus size)
 
      - Joint training between topic representation and token representation
 
      - BERT/ERL pretrained model based
 
    
   
The Fun Coding Life
  - The Airline Price Optimization from Kaggle
    
      - Basic solution for airline price optimization challenge with detailed explanation.
 
    
   
  - Risk of Collisions in Fast Rolling Hash Implementation
    
      - Code samples demonstrate a magic case that cause hash collisions in one of the fast rolling hash implementation
 
    
   
Previous Work Projects
  - Microsoft Viva Topics
    
      - Smartly discover topic based knowledges in your organization
 
      - Worked on areas over extraction, conflation as well as ranking
 
    
   
  - Answers @ Microsoft Search
    
  
 
  - Microsoft Outlook Online
    
      - The modern enterprise smart email client
 
      - Worked on areas over people search, email search
 
    
   
  - Visual C++ STL Library
    
      - The Microsoft version of C++ standard library
 
      - Contributed to version 2017, 2015
 
      - Contributed to C++ co_routine design proposal to language committee
 
    
   
  - C++ Rest SDK
    
      - The open source Rest SDK for C++
 
      - Worked as one of the primary project contributor
 
    
   
  - Microsoft Parallel Pattern Library
    
      - The earlier effort from Microsoft for C++ parallel computing
 
      - Worked as one of the primary project contributor