Overview
Layoff data is usually summarized as headlines and counts — but the root causes are buried in vague corporate language. We decoded those reasons systematically, surfaced macro patterns, and tested whether layoffs show predictable structures across industries and economic cycles.
Key question: What are the dominant drivers of layoffs globally, and how do they vary by industry, funding stage, and time?
Why This Is Hard
Real-world layoff reasons are inconsistent ("restructuring", "strategic realignment", "cost optimization"), noisy, and entirely unstructured. No off-the-shelf classifier exists — the signal had to be engineered from scratch.
My Contributions
- Pipeline architecture: Designed the end-to-end workflow and implemented preprocessing, normalization, and feature engineering for modeling.
- NLP reason extraction: Implemented web scraping and local LLM extraction (Ollama/Qwen2.5 7B) with KeyBERT keyword mining, then standardized reasons via synonym normalization.
- Modeling and clustering: Built clustering experiments and trained predictive models (classification and regression) with evaluation and feature importance analysis.
Data Pipeline
- Collect: Ingest 3,600+ layoff events with metadata — company, industry, stage, country, date, percentage laid off.
- Clean and explore: Handle missing fields, standardize numeric formats, validate distributions and outliers.
- Extract reasons: Scrape linked sources → filter relevant sentences → local LLM summarizes root causes.
- Normalize text: Merge synonyms and corporate euphemisms into consistent reason categories.
- Cluster and predict: Cluster layoff behavior; train ML models to forecast severity and magnitude.
Technical Stack
- pandas, numpy, BeautifulSoup (scraping)
- KeyBERT, SentenceTransformers (NLP)
- Ollama / Qwen2.5 7B (local LLM reason extraction)
- scikit-learn (KMeans, Random Forest, Gradient Boosting)
- Tableau (interactive dashboard)
Key Findings
- Funding ≠ stability. Mass layoffs often occur even after fresh funding rounds — not only during financial distress.
- "Restructuring" is an umbrella label that frequently masks multiple distinct operational root causes.
- Fiscal rhythm: Layoffs show seasonality aligned with corporate planning cycles, with notable Q1 and Q4 concentration.
- Industry specificity: Root causes differ meaningfully by sector — demand shifts vs. cost cutting vs. product strategy pivots.
Interactive Dashboard
Explore the full layoff dataset — filter by industry, year, and country. Built in Tableau Public from 3,642 layoff events spanning 2020–2024.