news_reminder パイプライン再設計

Issue: #328 | 作成日: 2026-04-07 | 種別: アーキテクチャ設計書

現状の課題
設計コンセプト
ドメイン定義
ソース抽象化
共通パイプライン
LLM呼び出し統一
Slack投稿設計
スケジューラ統合
データ管理・ローテーション
ディレクトリ構成
インターフェース定義
移行計画
テスト戦略
セキュリティ・パフォーマンス

1. 現状の課題

課題	詳細	影響
AI/金融の混在	`scanner.py`（974行、AI+地政学）と `pipeline/steps/`（金融）が別アーキテクチャで同居。取得・評価・投稿の処理が重複	変更時の影響範囲が不明、新ドメイン追加が困難
Claude CLI 3種混在	scanner.py（`subprocess.run + --model haiku`）、summarize.py（`asyncio + bypassPermissions`）、dashboard.py（`subprocess.run + --model sonnet`）	設定の散在、エラーハンドリングが各所で異なる
scanned.json 肥大化	1,000件上限はあるが、日付ベースのローテーションがない	古い記事が残り続け、新規記事の重複判定が遅くなる
IF層の記事評価が直列	各候補記事を1件ずつIF層（Claude CLI）で判断。5件で最大10分	ドメイン増でさらに遅延
テストなし	scanner.py / pipeline全体にテストが存在しない	リファクタリング時の安全網がない

2. 設計コンセプト

基本方針: 「ドメインは設定、パイプラインは共通」

AI・金融・地政学（と将来の追加ドメイン）は、同一パイプラインの設定違いとして扱う。ドメイン固有ロジックは「評価プロンプト」と「投稿テンプレート」の差分のみで表現する。

設計の柱

ドメイン = JSON設定ファイル — ドメイン追加はJSONファイル1つ追加で完結
ソース = アダプターパターン — RSS / HN API / arXiv API / Google News RSS を統一インターフェースで扱う。将来のX API追加もアダプター1つ
パイプライン = 4ステップ固定 — 取得 → フィルタ → 要約 → 投稿
LLM呼び出し = 1モジュールに集約 — Claude CLIのラッパーを統一
Slack投稿 = ドメイン別テンプレート — 金融は銘柄影響表、AIはMNML改修余地など

3. ドメイン定義

3.1 ドメイン設定ファイル

各ドメインを data/domains/{ドメイン名}.json で定義する。ファイル名はドメイン名をそのまま使う（例: ai.json, finance.json, geopolitics.json）。

AI ドメイン — `data/domains/ai.json`

{
  "id": "ai",
  "name": "AI",
  "emoji": "🤖",
  "thread_prefix": "AINews",
  "sources": ["openai_blog", "google_ai_blog", "deepmind_blog",
              "meta_engineering", "ms_research", "huggingface_blog",
              "arxiv_ai", "techcrunch_ai", "theverge_ai",
              "mit_tech_review", "hn_ai"],
  "evaluate_prompt": "prompts/evaluate_ai.txt",
  "summarize_prompt": "prompts/summarize_ai.txt",
  "formatter": "ai",
  "max_articles_per_run": 30,
  "mention_ceo": true
}

金融ドメイン — `data/domains/finance.json`

{
  "id": "finance",
  "name": "金融",
  "emoji": "📊",
  "thread_prefix": "金融News",
  "sources": ["google_news"],
  "source_config": {
    "google_news": {
      "holdings_file": "data/holdings.json"
    }
  },
  "evaluate_prompt": "prompts/evaluate_finance.txt",
  "summarize_prompt": "prompts/summarize_finance.txt",
  "formatter": "finance",
  "max_articles_per_run": 50,
  "mention_ceo": true
}

地政学ドメイン — `data/domains/geopolitics.json`

{
  "id": "geopolitics",
  "name": "地政学",
  "emoji": "🌍",
  "thread_prefix": "地政学News",
  "sources": ["bbc_world", "aljazeera", "nhk_intl", "nhk_politics", "jiji"],
  "evaluate_prompt": "prompts/evaluate_geopolitics.txt",
  "summarize_prompt": "prompts/summarize_geopolitics.txt",
  "formatter": "default",
  "max_articles_per_run": 30,
  "mention_ceo": true
}

3.2 ドメイン設定のスキーマ

フィールド	型	必須	説明
`id`	str	○	一意なドメインID（ディレクトリ名にもなる）
`name`	str	○	表示名（Slackスレッドヘッダ等）
`emoji`	str	○	Slackヘッダの絵文字
`thread_prefix`	str	○	スレッド名のプレフィックス（例: `20260407_AINews`）
`sources`	list[str]	○	使用するソースIDのリスト（`data/sources.json` のキー）
`source_config`	dict		ソース固有の追加設定（金融のholdings等）
`evaluate_prompt`	str	○	記事評価プロンプトファイルの相対パス
`summarize_prompt`	str	○	バッチ要約プロンプトファイルの相対パス
`formatter`	str	○	Slack投稿フォーマッタ名（`default` / `ai` / `finance`）
`max_articles_per_run`	int		1回の実行で処理する最大記事数（デフォルト: 30）
`mention_ceo`	bool		CEOメンション付与（デフォルト: true）

3.3 ドメイン追加の手順

新ドメイン追加 = 3ファイル追加のみ

data/domains/{new_domain}.json — ドメイン設定
data/prompts/evaluate_{new_domain}.txt — 評価プロンプト
data/prompts/summarize_{new_domain}.txt — 要約プロンプト

コード変更不要。フォーマッタは default を使えばOK。固有の投稿形式が必要ならフォーマッタを追加。

4. ソース抽象化

4.1 ソース定義ファイル

data/sources.json を継続利用。現行フォーマットをベースに id フィールドを追加する。

{
  "sources": [
    {
      "id": "openai_blog",
      "name": "OpenAI Blog",
      "type": "rss",
      "url": "https://openai.com/blog/rss.xml",
      "max_articles": 5,
      "tier": "primary"
    },
    {
      "id": "arxiv_ai",
      "name": "arXiv AI",
      "type": "arxiv_api",
      "categories": ["cs.AI", "cs.CL", "cs.LG"],
      "max_articles": 20,
      "tier": "primary"
    },
    {
      "id": "hn_ai",
      "name": "Hacker News AI",
      "type": "hn_api",
      "min_score": 50,
      "max_articles": 15,
      "tier": "secondary"
    },
    {
      "id": "google_news",
      "name": "Google News (銘柄検索)",
      "type": "google_news_rss",
      "max_articles": 20,
      "tier": "primary"
    }
  ]
}

変更点: 現行の category フィールドは廃止。ソースとドメインの紐づけはドメイン設定の sources リストで管理する。これにより1つのソースを複数ドメインで共有可能。

4.2 ソースアダプター

各ソースタイプに対応するアダプタークラスを app/sources/ に配置する。

ソースタイプ	アダプター	入力	備考
`rss`	`RssAdapter`	URL	feedparser使用。現行scanner.pyの `_fetch_rss_source()` を移植
`hn_api`	`HackerNewsAdapter`	min_score	Firebase API。現行の `_fetch_hn_api()` を移植
`arxiv_api`	`ArxivAdapter`	categories	arXiv REST API。現行の `_fetch_arxiv_api()` を移植
`google_news_rss`	`GoogleNewsAdapter`	holdings_file	現行pipeline/steps/fetch.pyを移植。銘柄ごとの検索語でRSS取得
`x_api`	`XAdapter`	（未実装）	将来対応。researcher調査結果を待って実装

4.3 共通インターフェース

class SourceAdapter(Protocol):
    """ソースアダプターの共通インターフェース"""

    async def fetch(
        self,
        source_config: dict,
        domain_config: DomainConfig,
    ) -> list[Article]:
        """記事を取得して返す。"""
        ...

@dataclass
class Article:
    """取得した記事の統一データモデル"""
    id: str               # URL等から生成したハッシュ
    title: str
    url: str
    source_name: str      # ソース名（例: "OpenAI Blog"）
    source_id: str        # ソースID（例: "openai_blog"）
    published: datetime | None
    content: str          # 本文（あれば。RSSのsummary等）
    tier: str             # "primary" | "secondary"
    metadata: dict        # ソース固有の追加情報（HNのscore、arXivの著者等）

google_news_rss タイプは domain_config.source_config から holdings_file パスを受け取り、銘柄ごとの検索語でRSSを取得する。返す Article の metadata に ticker / holding_name を含める。

5. 共通パイプライン

5.1 パイプライン4ステップ

Step 1: 取得 (fetch)

ドメイン設定のソースリストを順に実行
各アダプターが list[Article] を返す
既読フィルタ（scanned.json）で既知記事を除外
重複排除（URL hash）

Step 2: 評価 (evaluate)

各記事をLLMで評価
ドメイン固有の評価プロンプト使用
バッチ評価（後述）でスループット向上
評価結果を Article.evaluation に格納

Step 3: 要約 (summarize)

評価済み記事群をLLMでバッチ要約
ドメイン固有の要約プロンプト使用
出力: サマリー + 重要記事リスト

Step 4: 投稿 (post)

ドメイン別フォーマッタでBlock Kit構築
日次スレッドに投稿（追記）
scanned.json更新
IF層への候補通知（AIドメイン）

5.2 パイプラインのエントリーポイント

async def run_domain_pipeline(
    domain_id: str,
    *,
    slack_client: AsyncWebClient | None = None,
    dry_run: bool = False,
) -> PipelineResult:
    """1ドメインのパイプラインを実行する。

    Args:
        domain_id: ドメインID（例: "ai", "finance", "geopolitics"）
        slack_client: Slack APIクライアント（None時はCLI実行で投稿スキップ）
        dry_run: True時はLLM呼び出し・Slack投稿をスキップ

    Returns:
        PipelineResult: 実行結果（記事数、候補数等）
    """
    ...

5.3 バッチ評価によるスループット向上

現行: 1記事1回のClaude CLI呼び出し（直列）。30記事 = 30回呼び出し。
新設計: 20記事を1バッチにまとめて評価。30記事 = 2回呼び出し。

評価プロンプトに複数記事をまとめて渡し、JSON配列で結果を返させる。バッチサイズはデフォルト20件。Haikuのコンテキスト長（200K）に対して十分な余裕がある。設定で変更可能。

# バッチ評価の処理フロー
articles = fetch_step(domain)          # 例: 30件
batches = chunk(articles, size=20)     # 2バッチに分割
for batch in batches:
    evaluations = await llm.evaluate_batch(batch, domain.evaluate_prompt)
    for article, eval in zip(batch, evaluations):
        article.evaluation = eval

バッチサイズは Settings.evaluate_batch_size で設定可能（デフォルト: 20）。

6. LLM呼び出し統一

6.1 統一モジュール `app/llm.py`

現行3種のClaude CLI呼び出しを1モジュールに統合する。

class ClaudeCli:
    """Claude CLI統一ラッパー"""

    def __init__(
        self,
        cli_path: str = "claude",
        default_model: str = "haiku",
        timeout: int = 120,
    ):
        ...

    async def run(
        self,
        prompt: str,
        *,
        model: str | None = None,    # None時はdefault_model
        timeout: int | None = None,   # None時はself.timeout
        json_output: bool = False,    # True時はJSON抽出を試みる
    ) -> str:
        """Claude CLIを実行しテキスト結果を返す。"""
        ...

    async def run_json(
        self,
        prompt: str,
        *,
        model: str | None = None,
        timeout: int | None = None,
    ) -> dict | list:
        """Claude CLIを実行しJSONをパースして返す。"""
        ...

6.2 統一する設定

設定項目	現行	新設計
CLIパス	3箇所で別々に定義	`Settings.claude_cli_path` 一箇所
環境変数フィルタ	summarize.pyのみで実装	`ClaudeCli.__init__` で統一適用
JSON抽出	scanner.pyの `_extract_json()` が独自実装	`ClaudeCli.run_json()` に統合
エラーハンドリング	各所で異なる	統一例外 `LlmError` / `LlmTimeoutError`
モデル指定	haiku固定 or sonnet固定	呼び出し時に指定可（デフォルト: haiku）
権限モード	summarize.pyのみ`bypassPermissions`	全呼び出しで `--permission-mode bypassPermissions --output-format text` を統一付与

7. Slack投稿設計

7.1 スレッド管理

現行の thread.py を拡張する。CEO要件の「20260407_ドメイン_News」形式に対応。

項目	現行	新設計
スレッド名	`🤖 AINews 20260407分`	`20260407_AINews`（CEO指定形式）
キー形式	`ai_20260407`	`ai_20260407`（変更なし）
カテゴリ名	ハードコード	ドメイン設定の `thread_prefix` から取得
スレッド追記	対応済み	同じ日のスレッドに投稿（既存動作を維持）

7.2 投稿構成（ドメイン別フォーマッタ）

全フォーマッタは Slack Block Kit で構築する。Markdownテーブル（| col | col | 形式）は使わない。

共通フォーマッタ（default）— 地政学等

ヘッダーブロック（header）
サマリー（section + mrkdwn）
重要ニュース（記事ごとに section + mrkdwn、divider で区切り）
リファレンス（context ブロックにURL一覧）
CEOメンション（section）

AIフォーマッタ（ai）

ヘッダーブロック
サマリー
MNML-agent改修余地（該当記事がある場合のみ。section + fields で2カラム表示）
重要ニュース
リファレンス
CEOメンション

金融フォーマッタ（finance）

ヘッダーブロック
サマリー（市場全体の概況）
保有銘柄影響（section + fields で表形式表現）
重要ニュース
リファレンス
CEOメンション

7.3 Block Kit 構造例

Slack Block Kit には <table> 要素がないため、section ブロックの fields（2カラムレイアウト）で表形式を再現する。

共通フォーマッタの出力例（Block Kit JSON）

[
  {
    "type": "header",
    "text": { "type": "plain_text", "text": "🌍 20260407_地政学News" }
  },
  {
    "type": "section",
    "text": {
      "type": "mrkdwn",
      "text": "*本日のサマリー*\n米中関係の緊張が高まり、半導体輸出規制の追加措置が発表された。中東情勢は..."
    }
  },
  { "type": "divider" },
  {
    "type": "section",
    "text": {
      "type": "mrkdwn",
      "text": "*🔴 米国、対中半導体規制を強化*\n新たにEUV関連装置の輸出を全面禁止。ASML株に影響。\n<https://example.com/article1|記事を読む>"
    }
  },
  { "type": "divider" },
  {
    "type": "context",
    "elements": [
      { "type": "mrkdwn", "text": "📎 <https://example.com/1|BBC> | <https://example.com/2|NHK>" }
    ]
  },
  {
    "type": "section",
    "text": { "type": "mrkdwn", "text": "<@U0AHXTRDQMA>" }
  }
]

金融フォーマッタ — 銘柄影響表の出力例（Block Kit JSON）

section + fields を使い、2カラムのペアで「銘柄 / 影響+アクション」を並べる。fields は最大10個（5銘柄分）。超過時は複数の section ブロックに分割する。

[
  {
    "type": "header",
    "text": { "type": "plain_text", "text": "📊 20260407_金融News" }
  },
  {
    "type": "section",
    "text": {
      "type": "mrkdwn",
      "text": "*市場概況*\n米国株は利下げ観測で続伸。日経平均は円安を好感し+1.2%。"
    }
  },
  { "type": "divider" },
  {
    "type": "section",
    "text": { "type": "mrkdwn", "text": "*📊 保有銘柄影響*" }
  },
  {
    "type": "section",
    "fields": [
      { "type": "mrkdwn", "text": "*🟢 eMAXIS S&P500*" },
      { "type": "mrkdwn", "text": "ポジティブ: FRB利下げ示唆\n→ 静観（積立継続）" },
      { "type": "mrkdwn", "text": "*🔴 リクルート*" },
      { "type": "mrkdwn", "text": "ネガティブ: 求人広告規制強化\n→ 決算注視" },
      { "type": "mrkdwn", "text": "*⚪ BTC*" },
      { "type": "mrkdwn", "text": "中立: ETF資金流入鈍化\n→ 静観" }
    ]
  },
  { "type": "divider" },
  {
    "type": "section",
    "text": {
      "type": "mrkdwn",
      "text": "*🔴 求人広告規制法案が上院通過*\nリクルートHD（6098）に直接影響...\n<https://example.com/article1|記事を読む>"
    }
  },
  {
    "type": "context",
    "elements": [
      { "type": "mrkdwn", "text": "📎 <https://example.com/1|日経> | <https://example.com/2|Bloomberg>" }
    ]
  },
  {
    "type": "section",
    "text": { "type": "mrkdwn", "text": "<@U0AHXTRDQMA>" }
  }
]

AIフォーマッタ — MNML改修余地の出力例（Block Kit JSON）

該当記事がある場合のみ、section + fields で「技術/改修案」を2カラム表示する。

{
  "type": "section",
  "text": { "type": "mrkdwn", "text": "*🔧 MNML-agent改修余地*" }
},
{
  "type": "section",
  "fields": [
    { "type": "mrkdwn", "text": "*Claude 4.5 Tool Use改善*" },
    { "type": "mrkdwn", "text": "W層のツール呼び出し精度向上の可能性\n→ プロンプト見直し検討" },
    { "type": "mrkdwn", "text": "*MCP Server安定版リリース*" },
    { "type": "mrkdwn", "text": "外部ツール連携の標準化\n→ ops/ パイプラインへの適用検討" }
  ]
}

Block Kit 実装上の注意

fields は1ブロックあたり最大10要素。2カラムなので5行分。超過時は section ブロックを分割する
mrkdwn テキストは1フィールドあたり最大2000文字。影響説明が長い場合は要約を切り詰める
section の text は最大3000文字。重要ニュースの記事説明が長い場合は複数ブロックに分割する
フォーマッタの build_blocks() は list[dict] を返す。呼び出し元で chat_postMessage(blocks=blocks) に渡す
50ブロック/メッセージの制限あり。超過時は複数メッセージに分割して同一スレッドに投稿する

7.4 CEOメンション

各投稿の末尾に <@U0AHXTRDQMA> を付与する。mention_ceo: true のドメインのみ。

8. スケジューラ統合

8.1 統合先: `bot/scheduler.py` の `DailyScheduler`

統合先: 既存の bot/scheduler.py に実装された DailyScheduler クラスに統合する。新たなスケジューラは作らない。

DailyScheduler の現行ジョブ一覧

ジョブキー	時刻	内容	再設計後
`news_scan`	07:00	AI+地政学のRSS巡回	廃止 → `news_domain` に統合
`mail_filing`	毎時	メール添付ファイル取得	維持
`mail_digest`	08:00	メールダイジェスト	維持
`news_pipeline`	08:00/12:00/18:00	金融ニュースパイプライン	廃止 → `news_domain` に統合
`geo_dashboard`	月次	地政学ダッシュボード更新	維持

8.2 新ジョブ: `news_domain`

CEO要件: 1日3回、各ドメインごとに更新

時刻	ジョブキー	実行内容
07:00	`news_domain_07`	全ドメイン（AI, 金融, 地政学）パイプライン実行
12:00	`news_domain_12`	全ドメインパイプライン実行
18:00	`news_domain_18`	全ドメインパイプライン実行

8.3 scheduler.py への統合

現行の news_scan（07:00）と news_pipeline（08:00/12:00/18:00）を廃止し、news_domain ジョブに一本化する。

# bot/scheduler.py 変更箇所

# 定数
NEWS_DOMAIN_HOURS = [7, 12, 18]  # 旧: NEWS_PIPELINE_HOURS + scan_hour

# run() ループ内
if now.hour in NEWS_DOMAIN_HOURS:
    key = f"news_domain_{now.hour:02d}"
    if self._last_runs.get(key) != today:
        self._last_runs[key] = today
        self._save_last_runs()
        await self._run_all_domains()

async def _run_all_domains(self) -> None:
    """全ドメインのパイプラインを順次実行する。"""
    domain_ids = load_all_domain_ids()  # data/domains/*.json を走査
    for domain_id in domain_ids:
        try:
            await run_domain_pipeline(
                domain_id,
                slack_client=self.client,
            )
        except Exception:
            log.exception("ドメイン %s のパイプラインでエラー", domain_id)
            await self._notify_job_error(f"news_{domain_id}", ...)

8.4 廃止するジョブ・プロセス

廃止対象	種別	対応
`news_scan`（07:00、AI+地政学）	DailySchedulerジョブ	`news_domain` に統合
`news_pipeline_{HH}`（金融のみ）	DailySchedulerジョブ	`news_domain` に統合
`com.news-reminder.pipeline`	launchd plist	廃止。DailyScheduler経由に一本化
`geo_dashboard`（月次）	DailySchedulerジョブ	維持（月次処理は別ジョブのまま）

9. データ管理・ローテーション

9.1 scanned.json ローテーション

既読記事の重複判定用。現行は1,000件上限のみで日付ベースのローテーションがない。

条件	アクション
記事数 > 1,000件	古い記事から削除（`scanned_at` 昇順で超過分を削除）
`scanned_at` が90日超	削除
どちらか先に到達した条件	が適用される

ローテーションは各パイプライン実行後に _rotate_scanned() で実行する。

9.2 データファイル整理

ファイル	現行	新設計
`data/scanned.json`	AI+地政学の処理済み記事	全ドメイン統合。ローテーション追加
`data/notified.json`	金融の通知済みID	`scanned.json` に統合（`notified: true` フラグ）
`data/fetched.json`	金融の取得中間データ	廃止（パイプライン内でメモリ上処理）
`data/sources.json`	ソース定義（category付き）	ソース定義（category廃止、id追加）
`data/holdings.json`	保有銘柄	変更なし
`data/domains/*.json`	（新規）	ドメイン設定
`data/prompts/*.txt`	（新規）	LLMプロンプトテンプレート
`~/.mnml/news_reminder/thread_ts.json`	スレッドts管理	変更なし（リポジトリ外に維持）

9.3 scanned.json の新フォーマット

{
  "articles": {
    "a1b2c3d4e5f6": {
      "title": "...",
      "url": "...",
      "source_id": "openai_blog",
      "domain_id": "ai",
      "scanned_at": "2026-04-07T07:00:00+09:00",
      "notified": true,
      "evaluation": { ... }
    }
  }
}

現行フォーマットとの違いは domain_id と notified の追加のみ。後方互換性あり（既存エントリは domain_id がなければ "ai" とみなす）。

10. ディレクトリ構成

ops/news_reminder/
├── __init__.py
├── app/
│   ├── __init__.py
│   ├── config.py                 変更  Settings拡張
│   ├── pipeline.py               新規  共通パイプライン
│   ├── llm.py                    新規  Claude CLI統一ラッパー
│   ├── domain.py                 新規  DomainConfig ローダー
│   ├── models.py                 新規  Article, Evaluation, PipelineResult
│   ├── scanner.py                廃止  新パイプラインに置換
│   ├── thread.py                 変更  ヘッダ形式変更
│   ├── dashboard.py              維持  月次処理（変更なし）
│   ├── sources/                  新規
│   │   ├── __init__.py                   アダプターレジストリ
│   │   ├── base.py                       SourceAdapter Protocol
│   │   ├── rss.py                        RssAdapter
│   │   ├── hn.py                         HackerNewsAdapter
│   │   ├── arxiv.py                      ArxivAdapter
│   │   └── google_news.py                GoogleNewsAdapter
│   └── formatters/               新規
│       ├── __init__.py                   フォーマッタレジストリ
│       ├── base.py                       BaseFormatter
│       ├── default.py                    DefaultFormatter（地政学等）
│       ├── ai.py                         AiFormatter（MNML改修余地）
│       └── finance.py                    FinanceFormatter（銘柄影響表）
├── pipeline/                     廃止
│   ├── __init__.py                       廃止
│   ├── __main__.py              変更  CLIは維持（内部の呼び出し先を新モジュールに差し替え。ファイルは存続）
│   ├── cli.py                   変更  CLIは維持（内部の呼び出し先を新モジュールに差し替え。ファイルは存続）
│   ├── config.py                         廃止
│   └── steps/                            廃止
│       ├── fetch.py                      → sources/google_news.py に移植
│       ├── summarize.py                  → llm.py + pipeline.py に統合
│       ├── notify.py                     → formatters/finance.py に移植
│       └── sync.py                       維持（Excel同期は独立CLIコマンド）
├── data/
│   ├── sources.json              変更  id追加、category廃止
│   ├── holdings.json             維持
│   ├── scanned.json              変更  domain_id追加、ローテーション
│   ├── domains/                  新規
│   │   ├── ai.json
│   │   ├── finance.json
│   │   └── geopolitics.json
│   └── prompts/                  新規
│       ├── evaluate_ai.txt
│       ├── evaluate_finance.txt
│       ├── evaluate_geopolitics.txt
│       ├── summarize_ai.txt
│       ├── summarize_finance.txt
│       └── summarize_geopolitics.txt
└── tests/                        新規
    ├── __init__.py
    ├── conftest.py                       共通フィクスチャ
    ├── test_sources.py                   ソースアダプターのテスト
    ├── test_pipeline.py                  パイプライン統合テスト
    ├── test_formatters.py                フォーマッタのテスト
    ├── test_llm.py                       LLMラッパーのテスト
    └── test_domain.py                    ドメインローダーのテスト

11. インターフェース定義

11.1 データモデル — `app/models.py`

from __future__ import annotations

from dataclasses import dataclass, field
from datetime import datetime


@dataclass
class Article:
    """取得した記事"""
    id: str
    title: str
    url: str
    source_name: str
    source_id: str
    published: datetime | None = None
    content: str = ""
    tier: str = "primary"
    metadata: dict = field(default_factory=dict)
    evaluation: Evaluation | None = None


@dataclass
class Evaluation:
    """LLMによる記事評価"""
    relevant: bool = False
    summary_ja: str = ""
    impact: str = ""           # "high" | "medium" | "low"
    action: str = ""           # 推奨アクション
    reason: str = ""           # 評価理由
    extra: dict = field(default_factory=dict)  # ドメイン固有（urgency等）


@dataclass
class PipelineResult:
    """パイプライン実行結果"""
    domain_id: str
    total_fetched: int = 0
    total_new: int = 0
    total_relevant: int = 0
    summary_text: str = ""
    articles: list[Article] = field(default_factory=list)
    error: str | None = None

11.2 ドメインローダー — `app/domain.py`

from __future__ import annotations

from dataclasses import dataclass, field
from pathlib import Path

DOMAINS_DIR = Path(__file__).resolve().parent.parent / "data" / "domains"
PROMPTS_DIR = Path(__file__).resolve().parent.parent / "data" / "prompts"


@dataclass
class DomainConfig:
    """ドメイン設定"""
    id: str
    name: str
    emoji: str
    thread_prefix: str
    sources: list[str]
    source_config: dict = field(default_factory=dict)
    evaluate_prompt_path: str = ""
    summarize_prompt_path: str = ""
    formatter: str = "default"
    max_articles_per_run: int = 30
    mention_ceo: bool = True

    def load_evaluate_prompt(self) -> str:
        """評価プロンプトテンプレートを読み込む"""
        ...

    def load_summarize_prompt(self) -> str:
        """要約プロンプトテンプレートを読み込む"""
        ...


def load_domain(domain_id: str) -> DomainConfig:
    """指定ドメインの設定を読み込む"""
    ...


def load_all_domain_ids() -> list[str]:
    """data/domains/ 内の全ドメインIDを返す"""
    ...

11.3 ソースアダプター — `app/sources/base.py`

from __future__ import annotations

from typing import Protocol

from news_reminder.app.domain import DomainConfig
from news_reminder.app.models import Article


class SourceAdapter(Protocol):
    """ソースアダプターのプロトコル"""

    async def fetch(
        self,
        source_config: dict,
        domain_config: DomainConfig,
    ) -> list[Article]:
        ...


# アダプターレジストリ
_ADAPTERS: dict[str, type] = {}

def register(source_type: str):
    """デコレータ: ソースタイプにアダプターを登録"""
    def decorator(cls):
        _ADAPTERS[source_type] = cls
        return cls
    return decorator

def get_adapter(source_type: str) -> SourceAdapter:
    """ソースタイプに対応するアダプターを返す"""
    cls = _ADAPTERS.get(source_type)
    if cls is None:
        raise ValueError(f"未知のソースタイプ: {source_type}")
    return cls()

11.4 フォーマッタ — `app/formatters/base.py`

from __future__ import annotations

from typing import Protocol

from news_reminder.app.domain import DomainConfig
from news_reminder.app.models import Article, PipelineResult


class Formatter(Protocol):
    """Slack投稿フォーマッタのプロトコル"""

    def build_blocks(
        self,
        result: PipelineResult,
        domain: DomainConfig,
    ) -> tuple[list[dict], str]:
        """Block Kitブロックとフォールバックテキストを返す"""
        ...

11.5 Settings拡張 — `app/config.py`

class Settings(BaseSettings):
    """環境変数から読み込む設定"""

    # Slack
    slack_bot_token: str = ""
    slack_channel: str = "news-reminder"

    # Claude CLI
    claude_cli_path: str = "claude"
    default_model: str = "haiku"
    llm_timeout: int = 120
    evaluate_batch_size: int = 20

    # データ管理
    scanned_max_articles: int = 1000
    scanned_max_days: int = 90

    # CEO
    ceo_user_id: str = "U0AHXTRDQMA"

    model_config = {"env_file": ".env", "env_file_encoding": "utf-8", "extra": "ignore"}

12. 移行計画

方針: 一括実装。全モジュールをまとめて作成し、動作確認後に旧コードを削除する。

Step 1: 一括実装

新アーキテクチャの全モジュールを一度に作成する。

app/models.py — データモデル定義
app/domain.py — ドメインローダー
app/llm.py — Claude CLI統一ラッパー
app/sources/ — ソースアダプター（既存コードから関数を移植）
app/pipeline.py — 共通パイプライン
app/formatters/ — Slack投稿フォーマッタ（Block Kit出力）
data/domains/*.json — ドメイン設定ファイル
data/prompts/*.txt — プロンプトテンプレート
bot/scheduler.py — _run_all_domains() に切り替え
pipeline/__main__.py / cli.py — 内部を新パイプラインに差し替え
scanned.json ローテーション実装
notified.json → scanned.json への統合マイグレーション
tests/ — 全モジュールのテスト

Step 2: 旧コード削除

動作確認・テスト通過後、旧コードを一括削除する。

scanner.py 廃止
pipeline/config.py / pipeline/steps/ 廃止（sync.py のみCLIコマンドとして残す）
fetched.json / notified.json 削除
sources.json の旧 category フィールド削除
scheduler.pyの旧ジョブ（news_scan / news_pipeline_*）を削除
launchd plist（com.news-reminder.pipeline）を廃止

13. テスト戦略

13.1 テスト対象と方針

テスト対象	テスト種別	方針
ソースアダプター	ユニットテスト	HTTPレスポンスをモック（`httpx` の `MockTransport`）。RSS/JSON パースの正常系・異常系
ドメインローダー	ユニットテスト	正常な設定ファイル読み込み、必須フィールド欠落時のエラー、存在しないドメインID
LLMラッパー	ユニットテスト	CLIをモック（`subprocess` をパッチ）。JSON抽出、タイムアウト、エラーケース
フォーマッタ	ユニットテスト	Block Kit構造の検証（キー・型の確認）。金融テーブルの銘柄数0/1/多のケース
パイプライン	統合テスト	ソース・LLMをモックし、全ステップの接続を検証。scanned.jsonの読み書き
scanned.jsonローテーション	ユニットテスト	1,000件超過時の削除、90日超過時の削除、空ファイル

13.2 テストで検証しないもの

実際のClaude CLI呼び出し（モックで代替）
実際のSlack API呼び出し（モックで代替）
外部RSSフィードの可用性

14. セキュリティ・パフォーマンス

14.1 セキュリティ

項目	対策
環境変数フィルタ	Claude CLI実行時に `CLAUDECODE`, `ANTHROPIC_API_KEY`, `CLAUDE_CODE_OAUTH_TOKEN` を除外（現行summarize.pyの方式を全呼び出しに適用）
入力バリデーション	ドメイン設定ファイルの読み込み時にJSONスキーマ検証（必須フィールド・型チェック）
プロンプトインジェクション	記事タイトル・本文をLLMプロンプトに渡す際、制御文字を除去。評価結果のJSONバリデーション
ファイルパストラバーサル	`domain_id` に使える文字を `[a-z0-9_-]` に制限

14.2 パフォーマンス

項目	対策
LLM呼び出し削減	バッチ評価（20記事/回）で呼び出し回数を大幅に削減
HTTP接続プール	`httpx.AsyncClient` をパイプライン全体で1インスタンス共有
scanned.jsonの読み込み	起動時に1回読み込み、メモリ上でフィルタ。書き込みはパイプライン完了時に1回
ドメイン間の直列実行	初期は直列実行。将来的にドメイン間の並行実行も可能な構造（`run_domain_pipeline` が独立関数のため）
Slack API rate limit	ブロック分割送信（50ブロック/メッセージ）は現行通り維持

14.3 キャッシュ戦略

データ	更新頻度	キャッシュ
ドメイン設定	ほぼ変更なし	プロセス起動時に1回読み込み
ソース定義	ほぼ変更なし	プロセス起動時に1回読み込み
scanned.json	1日3回更新	パイプライン実行中のみメモリ保持
thread_ts.json	1日1回更新	パイプライン実行ごとに読み込み（ファイルが小さいため）

前提条件・制約

X(Twitter) API対応は設計上のソースタイプ定義のみ。実装はresearcherの調査結果（Grok API / X API v2）が揃ってから
Claude CLIはMax Plan（API課金なし）。Haiku使用でコスト最適化
dashboard.py（BlackRock月次解釈）は今回のスコープ外。現行のまま維持
pipeline/steps/sync.py（Excel銘柄同期）は独立CLIコマンドとして維持
IF層への候補通知（AIドメインのみ）は現行ロジックを scheduler.py 側に維持