Scoring Methodology_
Full transparency on how Chapa decodes your developer impact. Every weight, cap, and decision is explained here.
Watch the explainer
Prefer reading? The full methodology is detailed below.
Philosophy
AI-assisted development makes traditional volume metrics — commits, lines of code, PR counts — increasingly meaningless. A single number that blends everything together hides the difference between a prolific code shipper and a dedicated reviewer.
Chapa replaces that single number with a multi-dimensional impact breakdown: four core dimension scores (each 0-100) plus an optional fifth Craft dimension, a developer archetype, a composite score, and a confidence rating. Each contribution style can shine on its own terms.
Normalization
Most raw metrics are transformed using logarithmic normalization to reward genuine contribution while making gaming impractical:
This produces a value between 0 and 1. The logarithmic curve means early contributions add significant value, but volume beyond the cap has zero effect. Pushing 1,000 commits does not produce a score 10x higher than 100 commits.
Signal caps
Each signal has a cap — the point beyond which additional volume adds nothing. Caps are calibrated to the P50-P75 developer range, so a solid year of consistent work can reach maximum credit.
| Signal | Cap | Rationale |
|---|---|---|
| Commits | 300 | 150 commits/year normalizes to ~81%; a strong year is fully rewarded |
| PR Weight | 60 | Weighted by complexity, not count; 25 weight normalizes to ~83% |
| Reviews | 80 | 30 reviews/year normalizes to ~78%; collaboration without extreme volume |
| Issues | 40 | 10 issues/year normalizes to ~70%; meaningful resolution, not churn |
| Repos | 12 | Cross-project work; 5 repos normalizes to ~42% |
| Stars | 150 | Community recognition; 10 stars normalizes to ~49% |
| Forks | 80 | People building on your work; 10 forks normalizes to ~55% |
| Watchers | 50 | Active repo followers; retained for normalization but dropped from Breadth weights |
The core dimensions
Each dimension is scored 0-100 independently. A dimension returns 0 when its primary signal is completely absent.
Delivery — shipping meaningful changes
| Signal | Weight | Rationale |
|---|---|---|
| PR Weight | 70% | Merged PRs weighted by file count and line changes are the strongest signal of meaningful code shipped |
| Issues Closed | 20% | Resolving issues shows end-to-end ownership from problem to solution |
| Commits | 10% | Raw commit count is the weakest signal — easy to inflate, so it gets the lowest weight |
PR weight is not a simple count. Each merged PR is weighted by its size and complexity, capped at 3.0 per PR to prevent a single massive PR from dominating. A size multiplier scales the weight from 0 to 1 as total changes (files + additions + deletions) grow from 0 to 10, so trivial or empty PRs contribute zero weight while normal PRs are unaffected.
Delivery also includes a flow efficiency modifier (±5%) based on median PR lead time — how quickly your PRs go from creation to merge. Fast turnaround (≤4 hours) earns a 5% boost; slow flow (>7 days) applies a 5% penalty. This aligns with the DORA "lead time for changes" metric.
Quality — engineering discipline
Quality is measured differently depending on your profile type. Collaborative developers (those who actively review others' code) are scored on review behavior. Solo developers — those with a review-to-PR ratio below 15% — are scored on engineering discipline signals visible in their PR workflow.
Collaborative Quality
| Signal | Weight | Rationale |
|---|---|---|
| Reviews Submitted | 60% | The core signal — how much time you spend reviewing others' code |
| Review-to-PR Ratio | 25% | A high ratio means you review more than you ship, signaling a quality-focused role. Capped at 5:1 |
| Batch Size Score | 15% | Fraction of PRs in the reviewable sweet spot (20-500 lines). Rewards small, focused changes; penalizes both micro and oversized PRs |
Solo Quality
| Signal | Weight | Rationale |
|---|---|---|
| PR Description Rate | 40% | Percentage of merged PRs with a non-empty description — the strongest solo discipline signal |
| Feature Branch Rate | 25% | Percentage of PRs from a feature branch (not main/master/develop) — shows structured development workflow |
| Issue Linkage Rate | 20% | Percentage of PRs that close at least one issue — connects code to tracked work |
| Batch Size Score | 15% | Fraction of PRs in the reviewable sweet spot (20-500 lines). Rewards small, focused changes |
Solo developers are never penalized for working alone. Instead of skipping Quality entirely, Chapa evaluates the engineering habits visible in their PR workflow. All core dimensions are always scored for every developer.
Consistency — reliable, sustained contributions
| Signal | Weight | Rationale |
|---|---|---|
| Active Days (sqrt curve) | 45% | Square root of activeDays/365 — rewards getting started while still valuing sustained contribution. 120 active days scores ~57% (vs 33% with linear) |
| Heatmap Evenness | 40% | Measures how evenly activity is distributed across weeks. A steady rhythm scores higher than concentrated bursts |
| Week Coverage | 15% | Fraction of weeks with at least one contribution — captures sustainable cadence, how regularly you show up |
The active days signal uses a square root curve instead of a linear ratio. This makes it easier to build early momentum — 50 active days scores 37% instead of 14% — while preserving the full range for dedicated daily contributors. Heatmap evenness uses the inverted coefficient of variation across weekly totals, with outlier weeks clipped at 3× the median to prevent a single ultra-productive week from dominating the variance. Week coverage measures what fraction of weeks had any activity at all — rewarding developers who show up consistently regardless of daily output volume.
Breadth — cross-project influence
| Signal | Weight | Rationale |
|---|---|---|
| Repos Contributed | 40% | How many repos you contributed 3+ commits to. Single-commit drive-by contributions are excluded to ensure depth |
| Inverse Top-repo Share | 25% | Rewards diverse contribution across repos rather than concentration in one. If 95% of your work is in one repo, this approaches 0 |
| Docs-only PR Ratio | 15% | Documentation contributions show breadth of involvement beyond code — something every developer can control |
| Stars | 10% | Broadest community recognition signal, but weighted lower because it is outside your direct control |
| Forks | 5% | Deeper engagement than stars — someone intends to build on your work. Narrower and noisier signal |
Breadth prioritizes signals you can directly control — repo diversity (40%), contribution spread (25%), and documentation (15%) — over community signals like stars and forks that depend on external recognition. Watchers have been dropped entirely as the weakest and most passive indicator.
Developer archetypes
Your archetype is derived from the shape of your dimension profile. It tells you what kind of developer you are, not how good you are.
| Archetype | Rule | What it means |
|---|---|---|
| Emerging | Average < 25 OR no dimension >= 40 | Getting started or light activity period |
| Balanced | All dimensions within 20 pts AND avg >= 50 | Well-rounded contributor across all areas |
| Artificer | Craft is highest AND >= 60 | Master of AI tool collaboration — amplifies output while maintaining quality |
| Polymath | Breadth is highest AND >= 60 | Cross-project influence is your strongest suit |
| Quality Champion | Quality is highest AND >= 60 | Your strongest trait is engineering discipline — through reviews (collaborative) or PR hygiene (solo) |
| Marathoner | Consistency is highest AND >= 60 | Your most notable trait is sustained, reliable contribution |
| Builder | Delivery is highest AND >= 60 | You ship a high volume of meaningful code changes |
Tie-breaking priority: Polymath > Quality Champion > Marathoner > Builder > Artificer. If no specific archetype matches (highest dimension < 60 and not Balanced), the fallback is Emerging.
Composite score and tiers
The composite score is the average of all active dimensions, rounded to an integer. For collaborative developers, all 4 (or 5 with Craft) dimensions are included. For solo developers, Quality is excluded from the composite — only Delivery, Consistency, Breadth, and optional Craft are averaged. The score then passes through recency weighting and confidence adjustment:
recencyWeighted = composite × recencyMultiplier
adjustedScore = recencyWeighted × (0.85 + 0.15 × confidence / 100)
The recency multiplier ranges from 0.98x (all activity is old) to 1.06x (all activity is recent), with proportional activity (25% in the last 90 days) being neutral at 1.0x. This cushions the impact of old contributions dropping out of the rolling window.
At full confidence (100), there is no confidence reduction. At minimum confidence (50), the reduction is only 7.5% — deliberate and gentle.
| Tier | Score Range | Description |
|---|---|---|
| Emerging | 0 – 29 | Getting started or light activity period |
| Solid | 30 – 69 | Active hobbyists through consistent contributors |
| High | 70 – 84 | Strong impact across multiple dimensions |
| Elite | 85 – 100 | Exceptional breadth and depth of contribution |
Confidence system
Confidence (50-100) measures signal clarity, not morality. A low confidence score never accuses wrongdoing — it simply means the data patterns make it harder to assess impact precisely.
Confidence starts at 100 and can be reduced by detected patterns:
| Pattern | Penalty | Trigger | What it means |
|---|---|---|---|
| Burst activity | -15 | 100+ contributions in a single day | Extreme daily spikes reduce timing confidence |
| Micro-commits | -10 | 60%+ of commits are very small | Many tiny changes reduce signal clarity |
| Generated changes | -15 | 20,000+ lines changed AND fewer than 3 reviews | Large volume with limited review suggests possible automation |
| Low collaboration | -10 | 10+ PRs merged AND 1 or fewer reviews given | Significant output without peer interaction |
| Single repo focus | -5 | 95%+ of activity in one repo AND only 1 repo | Less cross-project signal (not bad, just less diverse data) |
| Supplemental data | -5 | Includes merged EMU account data | Data from a linked account that cannot be independently verified |
| Low activity signal | -10 | Fewer than 30 active days OR fewer than 50 commits | Very limited activity reduces the signal available for scoring |
| Review volume imbalance | -10 | 50+ reviews submitted AND fewer than 3 PRs merged | High review volume with very few merged changes reduces confidence in the activity mix |
The confidence floor is 50. No combination of penalties can push confidence below 50. All messaging is non-accusatory — we describe patterns, not intent.
Review volume imbalance and low collaboration are mutually exclusive — only one can apply at a time, so the maximum of 7 simultaneous penalties cannot be exceeded.
Score smoothing
Your displayed score uses an exponential moving average (EMA) to prevent jarring day-to-day swings. Instead of recalculating from scratch each time, the displayed score blends 15% of the newly computed score with 85% of the previous day's smoothed score.
This means a sudden 10-point raw drop manifests as roughly -1.5 per day, reaching the new level over about 4 days. On your first visit (no previous score), the raw score passes through with no smoothing.
What we deliberately exclude
Some signals are intentionally left out of scoring:
- Followers — a social metric with no correlation to engineering output
- Lines of code — easily gamed; used only for confidence heuristics, never for dimension scoring
- Private repo names — we track repo count, not identities. Your private repos are never exposed
Help us improve this
We believe scoring methodology should be a conversation, not a black box. If you have ideas on how to make this fairer, more accurate, or more transparent — we want to hear from you.