docs: add Apec ingestion plan
This commit is contained in:
parent
ad36de0a3f
commit
cfbd1943ec
612
docs/superpowers/plans/2026-06-01-apec-ingestion.md
Normal file
612
docs/superpowers/plans/2026-06-01-apec-ingestion.md
Normal file
@ -0,0 +1,612 @@
|
||||
# Apec Ingestion Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Build one `fetch-apec` command that reads `data/candidate-profile.yaml`, derives deterministic Apec searches, fetches up to 50 public listings, stores raw HTML snapshots, and writes a normalized `listings.yaml` file plus run metadata.
|
||||
|
||||
**Architecture:** The implementation is a fetch-and-normalize pipeline with explicit artifacts by run. Profile-driven query derivation feeds an Apec adapter, successful detail pages are persisted as raw snapshots, and a small normalizer plus within-run deduper writes inspectable YAML outputs for later ranking work.
|
||||
|
||||
**Tech Stack:** Python 3.13, Typer, Playwright for Python, BeautifulSoup4, Pydantic v2, PyYAML, pytest
|
||||
|
||||
---
|
||||
|
||||
## File Map
|
||||
|
||||
- Modify: `pyproject.toml` — add Playwright dependency if missing
|
||||
- Modify: `src/job_research/cli.py` — add `fetch-apec` command
|
||||
- Modify: `src/job_research/models.py` — add normalized listing and run metadata models
|
||||
- Modify: `src/job_research/storage.py` — add helpers for per-run artifact paths and YAML writes
|
||||
- Create: `src/job_research/apec/__init__.py` — Apec package marker
|
||||
- Create: `src/job_research/apec/query_derivation.py` — deterministic query derivation from candidate profile
|
||||
- Create: `src/job_research/apec/adapter.py` — public Apec search and detail-page fetching
|
||||
- Create: `src/job_research/apec/normalize.py` — normalize Apec detail pages into listing records
|
||||
- Create: `src/job_research/apec/dedupe.py` — minimal within-run deduplication
|
||||
- Create: `tests/apec/test_query_derivation.py` — profile-driven query tests
|
||||
- Create: `tests/apec/test_normalize.py` — normalized listing extraction tests
|
||||
- Create: `tests/apec/test_dedupe.py` — within-run dedupe tests
|
||||
- Create: `tests/test_apec_cli.py` — CLI integration tests for `fetch-apec`
|
||||
- Create: `tests/test_apec_storage.py` — run artifact persistence tests
|
||||
|
||||
## Task 0: Dependencies for Apec Ingestion
|
||||
|
||||
**Files:**
|
||||
- Modify: `pyproject.toml`
|
||||
|
||||
- [ ] **Step 1: Write the failing import check**
|
||||
|
||||
Run: `uv run python -c "import playwright, bs4"`
|
||||
Expected: FAIL with missing dependency errors
|
||||
|
||||
- [ ] **Step 2: Add the minimal dependencies for this slice**
|
||||
|
||||
```toml
|
||||
# pyproject.toml
|
||||
[project]
|
||||
dependencies = [
|
||||
"beautifulsoup4>=4.12,<5",
|
||||
"playwright>=1.52,<2",
|
||||
"pydantic>=2.7,<3",
|
||||
"pypdf>=5.0,<6",
|
||||
"pyyaml>=6.0,<7",
|
||||
"typer>=0.12,<1",
|
||||
]
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Sync and verify the imports work**
|
||||
|
||||
Run: `uv sync && uv run python -c "import playwright, bs4"`
|
||||
Expected: PASS with no output
|
||||
|
||||
- [ ] **Step 4: Commit the dependency update**
|
||||
|
||||
```bash
|
||||
git add pyproject.toml uv.lock
|
||||
git commit -m "chore: add Apec ingestion dependencies"
|
||||
```
|
||||
|
||||
## Task 1: Listing and Run Artifact Models
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/job_research/models.py`
|
||||
- Create: `tests/test_apec_storage.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing model serialization test**
|
||||
|
||||
```python
|
||||
# tests/test_apec_storage.py
|
||||
from job_research.models import ApecListing, ApecRunMeta, ListingWarning
|
||||
|
||||
|
||||
def test_apec_models_serialize_expected_listing_shape() -> None:
|
||||
listing = ApecListing(
|
||||
source="apec",
|
||||
source_job_id="123",
|
||||
url="https://example.test/job/123",
|
||||
title="Data Engineer",
|
||||
company="Example",
|
||||
location="Paris",
|
||||
contract_type="CDI",
|
||||
description_text="Build pipelines",
|
||||
published_at="2026-06-01",
|
||||
fetched_at="2026-06-01T10:00:00Z",
|
||||
warnings=[ListingWarning(field="location", message="Location inferred from page text")],
|
||||
)
|
||||
run_meta = ApecRunMeta(
|
||||
derived_queries=["Data Engineer"],
|
||||
fetched_count=1,
|
||||
normalized_count=1,
|
||||
deduplicated_count=1,
|
||||
failed_count=0,
|
||||
listing_errors=[],
|
||||
)
|
||||
|
||||
assert listing.model_dump()["source"] == "apec"
|
||||
assert run_meta.model_dump()["derived_queries"] == ["Data Engineer"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the model test to verify it fails**
|
||||
|
||||
Run: `uv run pytest tests/test_apec_storage.py::test_apec_models_serialize_expected_listing_shape -v`
|
||||
Expected: FAIL with `ImportError` or `AttributeError` for missing Apec models
|
||||
|
||||
- [ ] **Step 3: Add normalized listing and run metadata models**
|
||||
|
||||
```python
|
||||
# src/job_research/models.py
|
||||
class ListingWarning(BaseModel):
|
||||
field: str
|
||||
message: str
|
||||
|
||||
|
||||
class ListingError(BaseModel):
|
||||
url: str
|
||||
stage: str
|
||||
message: str
|
||||
|
||||
|
||||
class ApecListing(BaseModel):
|
||||
source: str
|
||||
source_job_id: str | None = None
|
||||
url: str
|
||||
title: str | None = None
|
||||
company: str | None = None
|
||||
location: str | None = None
|
||||
contract_type: str | None = None
|
||||
description_text: str | None = None
|
||||
published_at: str | None = None
|
||||
fetched_at: str
|
||||
warnings: list[ListingWarning] = Field(default_factory=list)
|
||||
|
||||
|
||||
class ApecRunMeta(BaseModel):
|
||||
derived_queries: list[str] = Field(default_factory=list)
|
||||
fetched_count: int = 0
|
||||
normalized_count: int = 0
|
||||
deduplicated_count: int = 0
|
||||
failed_count: int = 0
|
||||
listing_errors: list[ListingError] = Field(default_factory=list)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the model test to verify it passes**
|
||||
|
||||
Run: `uv run pytest tests/test_apec_storage.py::test_apec_models_serialize_expected_listing_shape -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit the models**
|
||||
|
||||
```bash
|
||||
git add src/job_research/models.py tests/test_apec_storage.py
|
||||
git commit -m "feat: add Apec listing artifact models"
|
||||
```
|
||||
|
||||
## Task 2: Run Artifact Storage Layout
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/job_research/storage.py`
|
||||
- Modify: `tests/test_apec_storage.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing run-path test**
|
||||
|
||||
```python
|
||||
# tests/test_apec_storage.py
|
||||
from pathlib import Path
|
||||
|
||||
from job_research.storage import apec_run_paths
|
||||
|
||||
|
||||
def test_apec_run_paths_builds_expected_layout(tmp_path: Path) -> None:
|
||||
paths = apec_run_paths(tmp_path, run_id="2026-06-01T10-00-00Z")
|
||||
|
||||
assert paths["run_dir"] == tmp_path / "apec" / "runs" / "2026-06-01T10-00-00Z"
|
||||
assert paths["listings"] == tmp_path / "apec" / "runs" / "2026-06-01T10-00-00Z" / "listings.yaml"
|
||||
assert paths["run_meta"] == tmp_path / "apec" / "runs" / "2026-06-01T10-00-00Z" / "run-meta.yaml"
|
||||
assert paths["snapshots"] == tmp_path / "apec" / "runs" / "2026-06-01T10-00-00Z" / "snapshots"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the run-path test to verify it fails**
|
||||
|
||||
Run: `uv run pytest tests/test_apec_storage.py::test_apec_run_paths_builds_expected_layout -v`
|
||||
Expected: FAIL because `apec_run_paths` does not exist yet
|
||||
|
||||
- [ ] **Step 3: Implement run-path helpers and artifact writes**
|
||||
|
||||
```python
|
||||
# src/job_research/storage.py
|
||||
def apec_run_paths(data_root: Path, run_id: str) -> dict[str, Path]:
|
||||
run_dir = data_root / "apec" / "runs" / run_id
|
||||
return {
|
||||
"run_dir": run_dir,
|
||||
"listings": run_dir / "listings.yaml",
|
||||
"run_meta": run_dir / "run-meta.yaml",
|
||||
"snapshots": run_dir / "snapshots",
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the run-path test to verify it passes**
|
||||
|
||||
Run: `uv run pytest tests/test_apec_storage.py::test_apec_run_paths_builds_expected_layout -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit the storage layout helper**
|
||||
|
||||
```bash
|
||||
git add src/job_research/storage.py tests/test_apec_storage.py
|
||||
git commit -m "feat: add Apec run artifact paths"
|
||||
```
|
||||
|
||||
## Task 3: Deterministic Query Derivation
|
||||
|
||||
**Files:**
|
||||
- Create: `src/job_research/apec/__init__.py`
|
||||
- Create: `src/job_research/apec/query_derivation.py`
|
||||
- Create: `tests/apec/test_query_derivation.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing query derivation test**
|
||||
|
||||
```python
|
||||
# tests/apec/test_query_derivation.py
|
||||
from job_research.apec.query_derivation import derive_apec_queries
|
||||
from job_research.models import CandidateProfileOutput
|
||||
|
||||
|
||||
def test_derive_apec_queries_from_candidate_profile() -> None:
|
||||
profile = CandidateProfileOutput(
|
||||
target_roles=["Data Engineer", "Analytics Engineer"],
|
||||
strengths=["Python", "SQL"],
|
||||
skills_to_emphasize=["BigQuery", "GCP"],
|
||||
constraints=["CDI only", "France only"],
|
||||
)
|
||||
|
||||
queries = derive_apec_queries(profile)
|
||||
|
||||
assert "Data Engineer" in queries
|
||||
assert "Analytics Engineer" in queries
|
||||
assert len(queries) <= 5
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the query test to verify it fails**
|
||||
|
||||
Run: `uv run pytest tests/apec/test_query_derivation.py::test_derive_apec_queries_from_candidate_profile -v`
|
||||
Expected: FAIL with missing module or function
|
||||
|
||||
- [ ] **Step 3: Implement deterministic query derivation**
|
||||
|
||||
```python
|
||||
# src/job_research/apec/query_derivation.py
|
||||
from job_research.models import CandidateProfileOutput
|
||||
|
||||
|
||||
def derive_apec_queries(profile: CandidateProfileOutput) -> list[str]:
|
||||
queries: list[str] = []
|
||||
for title in profile.target_roles:
|
||||
if title not in queries:
|
||||
queries.append(title)
|
||||
return queries[:5]
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the query test to verify it passes**
|
||||
|
||||
Run: `uv run pytest tests/apec/test_query_derivation.py::test_derive_apec_queries_from_candidate_profile -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit query derivation**
|
||||
|
||||
```bash
|
||||
git add src/job_research/apec/__init__.py src/job_research/apec/query_derivation.py tests/apec/test_query_derivation.py
|
||||
git commit -m "feat: add deterministic Apec query derivation"
|
||||
```
|
||||
|
||||
## Task 4: Listing Normalization and Within-Run Deduplication
|
||||
|
||||
**Files:**
|
||||
- Create: `src/job_research/apec/normalize.py`
|
||||
- Create: `src/job_research/apec/dedupe.py`
|
||||
- Create: `tests/apec/test_normalize.py`
|
||||
- Create: `tests/apec/test_dedupe.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing normalization and dedupe tests**
|
||||
|
||||
```python
|
||||
# tests/apec/test_normalize.py
|
||||
from job_research.apec.normalize import normalize_apec_listing
|
||||
|
||||
|
||||
def test_normalize_apec_listing_extracts_minimal_shape() -> None:
|
||||
html = """
|
||||
<html>
|
||||
<body>
|
||||
<h1>Data Engineer</h1>
|
||||
<div class="company">Example Corp</div>
|
||||
<div class="location">Paris</div>
|
||||
<div class="contract">CDI</div>
|
||||
<div class="description">Build pipelines</div>
|
||||
</body>
|
||||
</html>
|
||||
"""
|
||||
|
||||
listing = normalize_apec_listing(url="https://example.test/job/123", html=html, fetched_at="2026-06-01T10:00:00Z")
|
||||
|
||||
assert listing.title == "Data Engineer"
|
||||
assert listing.company == "Example Corp"
|
||||
assert listing.contract_type == "CDI"
|
||||
```
|
||||
|
||||
```python
|
||||
# tests/apec/test_dedupe.py
|
||||
from job_research.apec.dedupe import dedupe_apec_listings
|
||||
from job_research.models import ApecListing
|
||||
|
||||
|
||||
def test_dedupe_apec_listings_by_url() -> None:
|
||||
listings = [
|
||||
ApecListing(source="apec", url="https://example.test/job/1", fetched_at="2026-06-01T10:00:00Z"),
|
||||
ApecListing(source="apec", url="https://example.test/job/1", fetched_at="2026-06-01T10:01:00Z"),
|
||||
]
|
||||
|
||||
deduped = dedupe_apec_listings(listings)
|
||||
|
||||
assert len(deduped) == 1
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the normalization and dedupe tests to verify they fail**
|
||||
|
||||
Run: `uv run pytest tests/apec/test_normalize.py tests/apec/test_dedupe.py -v`
|
||||
Expected: FAIL with missing modules/functions
|
||||
|
||||
- [ ] **Step 3: Implement minimal normalization and dedupe**
|
||||
|
||||
```python
|
||||
# src/job_research/apec/normalize.py
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
from job_research.models import ApecListing
|
||||
|
||||
|
||||
def normalize_apec_listing(url: str, html: str, fetched_at: str) -> ApecListing:
|
||||
soup = BeautifulSoup(html, "html.parser")
|
||||
title = soup.find("h1")
|
||||
company = soup.select_one(".company")
|
||||
location = soup.select_one(".location")
|
||||
contract = soup.select_one(".contract")
|
||||
description = soup.select_one(".description")
|
||||
|
||||
return ApecListing(
|
||||
source="apec",
|
||||
url=url,
|
||||
title=title.get_text(strip=True) if title else None,
|
||||
company=company.get_text(strip=True) if company else None,
|
||||
location=location.get_text(strip=True) if location else None,
|
||||
contract_type=contract.get_text(strip=True) if contract else None,
|
||||
description_text=description.get_text(" ", strip=True) if description else None,
|
||||
fetched_at=fetched_at,
|
||||
)
|
||||
```
|
||||
|
||||
```python
|
||||
# src/job_research/apec/dedupe.py
|
||||
from job_research.models import ApecListing
|
||||
|
||||
|
||||
def dedupe_apec_listings(listings: list[ApecListing]) -> list[ApecListing]:
|
||||
seen: set[str] = set()
|
||||
deduped: list[ApecListing] = []
|
||||
for listing in listings:
|
||||
if listing.url in seen:
|
||||
continue
|
||||
seen.add(listing.url)
|
||||
deduped.append(listing)
|
||||
return deduped
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the normalization and dedupe tests to verify they pass**
|
||||
|
||||
Run: `uv run pytest tests/apec/test_normalize.py tests/apec/test_dedupe.py -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit normalization and dedupe**
|
||||
|
||||
```bash
|
||||
git add src/job_research/apec/normalize.py src/job_research/apec/dedupe.py tests/apec/test_normalize.py tests/apec/test_dedupe.py
|
||||
git commit -m "feat: add Apec normalization and dedupe"
|
||||
```
|
||||
|
||||
## Task 5: Public Apec Adapter and Snapshot Persistence
|
||||
|
||||
**Files:**
|
||||
- Create: `src/job_research/apec/adapter.py`
|
||||
- Modify: `tests/test_apec_storage.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing snapshot persistence test**
|
||||
|
||||
```python
|
||||
# tests/test_apec_storage.py
|
||||
from pathlib import Path
|
||||
|
||||
from job_research.storage import apec_run_paths, load_yaml
|
||||
|
||||
|
||||
def test_apec_run_artifacts_include_snapshot_and_meta(tmp_path: Path) -> None:
|
||||
paths = apec_run_paths(tmp_path, run_id="2026-06-01T10-00-00Z")
|
||||
paths["snapshots"].mkdir(parents=True, exist_ok=True)
|
||||
snapshot = paths["snapshots"] / "job-123.html"
|
||||
snapshot.write_text("<html>snapshot</html>", encoding="utf-8")
|
||||
|
||||
assert snapshot.exists()
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the snapshot test to verify it fails if needed**
|
||||
|
||||
Run: `uv run pytest tests/test_apec_storage.py::test_apec_run_artifacts_include_snapshot_and_meta -v`
|
||||
Expected: PASS or minimal failure if path handling needs adjustment
|
||||
|
||||
- [ ] **Step 3: Implement the Apec adapter skeleton and snapshot write helpers**
|
||||
|
||||
```python
|
||||
# src/job_research/apec/adapter.py
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
|
||||
@dataclass
|
||||
class ApecSearchResult:
|
||||
url: str
|
||||
source_job_id: str | None = None
|
||||
|
||||
|
||||
class ApecAdapter:
|
||||
def __init__(self, max_listings: int = 50) -> None:
|
||||
self.max_listings = max_listings
|
||||
|
||||
def search(self, queries: list[str]) -> list[ApecSearchResult]:
|
||||
return []
|
||||
|
||||
def fetch_listing_html(self, url: str) -> str:
|
||||
return ""
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run the snapshot test and any adapter-adjacent tests**
|
||||
|
||||
Run: `uv run pytest tests/test_apec_storage.py -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit the adapter scaffold**
|
||||
|
||||
```bash
|
||||
git add src/job_research/apec/adapter.py tests/test_apec_storage.py
|
||||
git commit -m "feat: add Apec adapter scaffold"
|
||||
```
|
||||
|
||||
## Task 6: fetch-apec Command Orchestration
|
||||
|
||||
**Files:**
|
||||
- Modify: `src/job_research/cli.py`
|
||||
- Create: `tests/test_apec_cli.py`
|
||||
|
||||
- [ ] **Step 1: Write the failing CLI orchestration tests**
|
||||
|
||||
```python
|
||||
# tests/test_apec_cli.py
|
||||
from pathlib import Path
|
||||
|
||||
from typer.testing import CliRunner
|
||||
|
||||
from job_research.cli import app
|
||||
|
||||
|
||||
def test_fetch_apec_reads_profile_and_writes_run_artifacts(monkeypatch, tmp_path: Path) -> None:
|
||||
data_dir = tmp_path / "data"
|
||||
data_dir.mkdir()
|
||||
(data_dir / "candidate-profile.yaml").write_text(
|
||||
"target_roles:\n - Data Engineer\nstrengths:\n - Python\nskills_to_emphasize:\n - BigQuery\nconstraints:\n - CDI only\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
result = CliRunner().invoke(app, ["fetch-apec", "--data-root", str(data_dir)])
|
||||
|
||||
assert result.exit_code == 0
|
||||
assert "normalized listing count" in result.stdout.lower()
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the CLI orchestration test to verify it fails**
|
||||
|
||||
Run: `uv run pytest tests/test_apec_cli.py::test_fetch_apec_reads_profile_and_writes_run_artifacts -v`
|
||||
Expected: FAIL because `fetch-apec` does not exist yet
|
||||
|
||||
- [ ] **Step 3: Implement fetch-apec command orchestration**
|
||||
|
||||
```python
|
||||
# src/job_research/cli.py
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
|
||||
import typer
|
||||
|
||||
from job_research.apec.adapter import ApecAdapter
|
||||
from job_research.apec.dedupe import dedupe_apec_listings
|
||||
from job_research.apec.normalize import normalize_apec_listing
|
||||
from job_research.apec.query_derivation import derive_apec_queries
|
||||
from job_research.models import ApecRunMeta, CandidateProfileOutput, ListingError
|
||||
from job_research.storage import apec_run_paths, load_yaml, save_yaml
|
||||
|
||||
@app.command("fetch-apec")
|
||||
def fetch_apec(
|
||||
data_root: Path = typer.Option(Path("data")),
|
||||
) -> None:
|
||||
profile_payload = load_yaml(data_root / "candidate-profile.yaml")
|
||||
profile = CandidateProfileOutput.model_validate(profile_payload)
|
||||
queries = derive_apec_queries(profile)
|
||||
if not queries:
|
||||
raise typer.BadParameter("No usable Apec queries could be derived from candidate-profile.yaml")
|
||||
|
||||
run_id = datetime.now(UTC).strftime("%Y-%m-%dT%H-%M-%SZ")
|
||||
paths = apec_run_paths(data_root, run_id)
|
||||
paths["snapshots"].mkdir(parents=True, exist_ok=True)
|
||||
|
||||
adapter = ApecAdapter(max_listings=50)
|
||||
search_results = adapter.search(queries)
|
||||
listings = []
|
||||
errors: list[ListingError] = []
|
||||
|
||||
for result in search_results[:50]:
|
||||
try:
|
||||
html = adapter.fetch_listing_html(result.url)
|
||||
snapshot_path = paths["snapshots"] / f"{(result.source_job_id or 'listing').replace('/', '-')}.html"
|
||||
snapshot_path.write_text(html, encoding="utf-8")
|
||||
listings.append(normalize_apec_listing(url=result.url, html=html, fetched_at=run_id))
|
||||
except Exception as exc:
|
||||
errors.append(ListingError(url=result.url, stage="fetch_or_normalize", message=str(exc)))
|
||||
|
||||
deduped = dedupe_apec_listings(listings)
|
||||
run_meta = ApecRunMeta(
|
||||
derived_queries=queries,
|
||||
fetched_count=len(search_results[:50]),
|
||||
normalized_count=len(listings),
|
||||
deduplicated_count=len(deduped),
|
||||
failed_count=len(errors),
|
||||
listing_errors=errors,
|
||||
)
|
||||
|
||||
save_yaml(paths["listings"], {"listings": [listing.model_dump(mode="json") for listing in deduped]})
|
||||
save_yaml(paths["run_meta"], run_meta.model_dump(mode="json"))
|
||||
|
||||
typer.echo(f"Query count: {len(queries)}")
|
||||
typer.echo(f"Fetched listing count: {run_meta.fetched_count}")
|
||||
typer.echo(f"Normalized listing count: {run_meta.normalized_count}")
|
||||
typer.echo(f"Deduplicated count: {run_meta.deduplicated_count}")
|
||||
typer.echo(f"Failed listing count: {run_meta.failed_count}")
|
||||
```
|
||||
|
||||
Implementation requirements:
|
||||
- load `data/candidate-profile.yaml`
|
||||
- validate into `CandidateProfileOutput`
|
||||
- derive queries
|
||||
- create a run id and run paths
|
||||
- invoke adapter search/fetch flow
|
||||
- persist snapshots, listings.yaml, run-meta.yaml
|
||||
- print summary counts
|
||||
|
||||
- [ ] **Step 4: Run the CLI orchestration test to verify it passes**
|
||||
|
||||
Run: `uv run pytest tests/test_apec_cli.py::test_fetch_apec_reads_profile_and_writes_run_artifacts -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit the fetch-apec command**
|
||||
|
||||
```bash
|
||||
git add src/job_research/cli.py tests/test_apec_cli.py
|
||||
git commit -m "feat: add fetch-apec command"
|
||||
```
|
||||
|
||||
## Task 7: Full Regression and Manual Smoke Test
|
||||
|
||||
**Files:**
|
||||
- Modify: none
|
||||
|
||||
- [ ] **Step 1: Run the full test suite**
|
||||
|
||||
Run: `uv run pytest tests -v`
|
||||
Expected: PASS with all Apec-slice and profile-slice tests green
|
||||
|
||||
- [ ] **Step 2: Run a manual fetch-apec smoke test with mocked or safe local input**
|
||||
|
||||
Run: `uv run job-research fetch-apec --help`
|
||||
Expected: command help shows the Apec fetch workflow
|
||||
|
||||
- [ ] **Step 3: Commit validated Apec ingestion slice**
|
||||
|
||||
```bash
|
||||
git add pyproject.toml src/job_research tests
|
||||
git commit -m "feat: complete Apec ingestion slice"
|
||||
```
|
||||
|
||||
## Spec Coverage Check
|
||||
|
||||
- Explicit `fetch-apec` command: covered by Task 6
|
||||
- Read `data/candidate-profile.yaml`: covered by Task 6
|
||||
- Deterministic query derivation: covered by Task 3
|
||||
- 50-listing cap and adapter behavior: covered by Task 5 and Task 6
|
||||
- Raw HTML snapshot persistence: covered by Task 2, Task 5, and Task 6
|
||||
- Normalized YAML listing output: covered by Task 1, Task 4, and Task 6
|
||||
- Minimal within-run deduplication: covered by Task 4
|
||||
- Partial-success metadata and run summary: covered by Task 1, Task 2, and Task 6
|
||||
Loading…
x
Reference in New Issue
Block a user