Hi, I'm

Qianjin (Zac) Zhou

Data Scientist & ML Engineer

Passionate about developing insightful data science solutions to solve complex business problems.

About Me

I'm a Data Scientist and Machine Learning Engineer currently pursuing my M.S. in Data Science at New York University. I hold a B.S. in Data Science and Business Economics from UC San Diego, where I graduated Magna Cum Laude with a 3.97 GPA.

My background spans quantitative analysis, economic modeling, and advanced statistical methods. I have hands-on internship experience at Fortune Global 500 companies and leading policy think tanks, building data-driven solutions with machine learning, NLP, and cloud computing.

New York University

M.S. Data Science

Sep 2025 — May 2027 GPA: 4.00 / 4.00 · Graduate TA for Causal Inference

UC San Diego

B.S. Data Science & Business Economics (Double Major)

Sep 2021 — Jun 2025 GPA: 3.97 / 4.00 (Top 3%) · Magna Cum Laude · Provost's Honors · Dean's List
Qianjin (Zac) Zhou

Technical Skills

Languages

Python (pandas, NumPy) SQL (MySQL / Postgres / BigQuery) R Java HTML / CSS / JavaScript

ML & Deep Learning

PyTorch TensorFlow / Keras scikit-learn Hugging Face Transformers XGBoost / LightGBM

Cloud & Big Data

AWS (S3, EC2, SageMaker) Apache Spark (PySpark) Hive Dask Docker Git Linux

Visualization & Analytics

Tableau Power BI matplotlib / plotly Jupyter Streamlit Excel

Econ & Stats

Causal Inference A/B Testing Econometrics Statistical Modeling STATA MATLAB

AI Tools & Applications

Claude Code Cursor ChatGPT Gemini Deepseek Prompt Engineering AI Agents

Experience

Data Scientist Intern

YINGDA Securities Co., Ltd

Brokerage & investment bank owned by State Grid Corporation of China (Fortune Global 500)

Jul 2024 — Sep 2024
  • Built Python ETL pipelines to scrape, clean, and transform private fund data from databases and regulatory filings; automated multiple financial data collection and cleaning workflows.
  • Designed normalized schemas and indexes in MySQL that improved fund data query performance and access time by ~30%.
  • Co-authored research reports for 10+ alternative investment funds, analyzing quantitative trading strategies in private equity funds and providing portfolio adjustment recommendations.
  • Developed a Power BI dashboard to deliver regularly updated fund data reports with interactive visualizations, improving accessibility and decision support for financial analysts.

Research & Data Analyst Intern

China Development Institute

Leading economic research think tank in China

Jul 2023 — Sep 2023
  • Programmed a Python scraper to collect China-EU bilateral trade and investment datasets and news from multiple media sources.
  • Automated the extraction of statistics and keywords from datasets by utilizing LLM APIs and prompt engineering.
  • Created an interactive Power BI dashboard to analyze Guangdong province's electric vehicle market trends, integrating sales data, policy incentives, and more to support strategic recommendations.
  • Co-authored a 60-page report on the economic development of the Guangdong–Hong Kong–Macao Greater Bay Area.

Featured Projects

Sponsored by Prism Data (FinTech Startup)

Fair-Access Credit Scoring System

  • Developed ML pipeline on 6M bank transactions to predict default risk; fine-tuned BERT & Llama for transaction classification achieving 96.9% accuracy.
  • Engineered 260+ behavioral features and optimized XGBoost (ROC-AUC 0.81); presented at UCSD Data Science Showcase.
BERT Llama XGBoost NLP Python
Machine Learning · NLP · Recommender

Recommender Systems for Google Local Reviews

  • Built a leak-free recommender pipeline on 3.1M reviews and 21K+ businesses with temporal evaluation; reduced test MSE by 40% through multi-model benchmarking.
  • Designed ALS retrieval + LambdaMART reranking framework, improving NDCG@5 by 3.3x over baseline.
LightGBM TF-IDF ALS LambdaMART NLP
PySpark · Spark MLlib · AWS

Big Data E-Commerce Rating Prediction

  • Processed 12.3 GiB Amazon data via PySpark ETL; enhanced features with Word2Vec embeddings and PCA dimensionality reduction.
  • Achieved 88% rating prediction accuracy using Random Forest with hyperparameter tuning on AWS.
PySpark Spark MLlib Word2Vec Random Forest AWS