Skip to main content

AI Resume Screening Agent: Ranked Shortlists Without the Manual Review Hours

· 5 min read
Metadata Morph
AI & Data Engineering Team

A typical job posting for a technical role receives 200–400 applications. A recruiter manually reviews each one, spending 30–60 seconds per resume to decide whether to advance the candidate. That's 3–6 hours of screening work per role, per round — work that is largely pattern-matching against a known rubric.

A resume screening agent replaces the manual first pass entirely. It evaluates every application against a structured rubric, produces a scored and ranked shortlist with written reasoning, and surfaces only the edge cases that genuinely need human judgment.

What Makes This Different from Keyword Matching

Legacy ATS keyword matching is brittle. It rejects candidates who describe the same skill differently ("built data pipelines" vs. "ETL development") and passes candidates who list keywords they barely know.

An LLM-based screening agent reads for demonstrated competence, not keyword presence:

  • "Led migration of 3TB Postgres database to Snowflake" → scores higher on data warehousing than "Experience with Snowflake" with no context
  • "Maintained dbt models" scores differently than "Architected dbt project structure for 40-person analytics org"
  • Gaps in employment are noted but not penalized — the agent flags them for the recruiter to assess in context

The Rubric

The rubric is the foundation of a reliable screening agent. A vague rubric produces inconsistent scores. The rubric should be specific enough that two humans applying it independently would produce similar results.

{
"role": "Senior Data Engineer",
"required_criteria": [
{
"id": "sql_proficiency",
"description": "Demonstrated SQL experience with complex queries (window functions, CTEs, performance tuning)",
"weight": 0.20
},
{
"id": "pipeline_experience",
"description": "Built or maintained production data pipelines handling >1B records",
"weight": 0.20
},
{
"id": "cloud_warehouse",
"description": "Hands-on experience with Snowflake, BigQuery, or Redshift in a production context",
"weight": 0.15
},
{
"id": "orchestration",
"description": "Experience with Airflow, Prefect, or similar orchestration tool",
"weight": 0.15
},
{
"id": "python_proficiency",
"description": "Python for data engineering (not just scripting — libraries like pandas, PySpark, SQLAlchemy)",
"weight": 0.15
}
],
"preferred_criteria": [
{
"id": "dbt_experience",
"description": "Used dbt in production; understanding of testing and documentation patterns",
"weight": 0.10
},
{
"id": "streaming",
"description": "Experience with Kafka, Kinesis, or Pub/Sub",
"weight": 0.05
}
],
"disqualifying_criteria": [
"Less than 3 years of professional data engineering experience",
"No evidence of production system ownership (only academic or toy projects)"
]
}

Architecture

┌─────────────────────────────────────────────────────────────────┐
│ RESUME SCREENING AGENT │
│ │
│ Trigger: new applications batch (daily or on-demand) │
│ │
│ For each application: │
│ 1. Read resume from Filesystem MCP │
│ 2. Score against rubric criteria │
│ 3. Check disqualifying criteria │
│ 4. Generate written reasoning (2-3 sentences per criterion) │
│ 5. Compute weighted total score │
│ │
│ After batch: │
│ 6. Rank all candidates │
│ 7. Write shortlist report to Filesystem MCP │
│ 8. Notify recruiting team via Notification MCP │
└──────────┬──────────────────────────┬──────────────────────────┘
│ │
┌──────▼──────┐ ┌───────▼────────┐
│ Filesystem │ │ Notification │
│ MCP │ │ MCP │
│ (resumes, │ │ (Slack / email)│
│ reports) │ │ │
└─────────────┘ └────────────────┘

Scoring Prompt

You are a technical recruiting agent evaluating candidates for a {role_title} position.

Evaluate the resume below against each criterion. For each criterion:
1. Assign a score from 0–10 (0 = no evidence, 5 = mentioned without context, 10 = strong demonstrated evidence)
2. Write 1–2 sentences of reasoning citing specific resume content
3. Note the strongest evidence (direct quote or paraphrase from resume)

Check each disqualifying criterion. If any applies, set disqualified: true
and provide the reason.

Be conservative. When in doubt, score lower. A score of 7+ requires
specific, concrete evidence from the resume — not just keywords.

Return structured JSON only. Do not add commentary outside the JSON schema.

Rubric: {rubric_json}
Resume: {resume_text}

Sample Output (Single Candidate)

{
"candidate_id": "app_4821",
"name": "Jordan Martinez",
"disqualified": false,
"total_score": 7.6,
"criteria_scores": {
"sql_proficiency": {
"score": 9,
"reasoning": "Resume describes optimizing a query that reduced runtime from 4 hours to 12 minutes using window functions and partition pruning. Strong demonstrated evidence of advanced SQL.",
"evidence": "Reduced nightly revenue reconciliation query from 4h to 12min via window function refactor and partition pruning"
},
"pipeline_experience": {
"score": 8,
"reasoning": "Built Airflow DAGs processing 2.5B events/day at current employer. Production scale confirmed.",
"evidence": "Designed and owns 3 production Airflow DAGs processing 2.5B clickstream events daily"
},
"cloud_warehouse": {
"score": 7,
"reasoning": "Snowflake mentioned with migration context. Lacks detail on performance tuning or warehouse sizing.",
"evidence": "Migrated 6TB legacy Redshift cluster to Snowflake"
},
"dbt_experience": {
"score": 5,
"reasoning": "dbt listed under skills with no contextual evidence of production use or project scope.",
"evidence": "Skills: dbt, Airflow, Python, SQL"
}
},
"recruiter_notes": "Strong SQL and pipeline evidence. dbt experience unclear — worth probing in screening call. No streaming experience.",
"recommended_action": "ADVANCE"
}

The Shortlist Report

After scoring all applications, the agent produces a ranked report:

# Screening Report — Senior Data Engineer
Date: 2025-12-15 | Applications reviewed: 247 | Time: 4m 12s

## Tier 1 — Advance to Screening Call (12 candidates)
| Rank | Candidate | Score | Standout |
|------|-----------|-------|---------|
| 1 | Jordan Martinez | 7.6 | 2.5B/day pipeline, advanced SQL proven |
| 2 | Alex Kim | 7.4 | dbt project architect, Kafka experience |
| ... | | | |

## Tier 2 — Human Review Needed (8 candidates)
Candidates where the agent confidence was low or criteria were ambiguous.

## Disqualified (227 candidates)
- 189: Under 3 years experience
- 31: No production system ownership evidence
- 7: Role mismatch (data analyst, not engineer)

247 applications reviewed in 4 minutes. The recruiter reviews 12 ranked candidates with written reasoning instead of skimming 247 resumes cold.

What the Recruiting Team Retains

The agent handles the first pass. Everything after — screening calls, technical assessments, hiring manager interviews, offer decisions — remains with the humans. The agent's job is to surface the right candidates, not to make hiring decisions.

The recruiter reviews the shortlist, can override any disqualification, and always has the agent's written reasoning to audit. No black box.

Book a strategy session to build your recruiting automation agent.