Kaf Extract API

Turn any website into structured JSON with a single HTTP call. No selectors, no scrapers, no broken pipelines.

🟢 v0.2.0 — Production Ready Bearer Token + API Key Auth JSON / CSV / MD Export

Quick Start

Three lines. Any language. One API call.

Python SDK

bashpip install kaf-extract
python# 1. Create client
from kaf_extract import KafExtract

client = KafExtract(api_key="kaf_your_key")

# 2. Extract
result = client.extract_sync(
    "https://books.toscrape.com",
    fields=[{"name": "title", "selector": "h1", "type": "text"}]
)

# 3. Use the data
print(result.data["title"])  # "A Light in the Attic"

JavaScript / TypeScript

bashnpm install kaf-extract
typescriptimport { KafExtract } from "kaf-extract";

const client = new KafExtract({ apiKey: "kaf_your_key" });

const result = await client.extract("https://books.toscrape.com", {
    fields: [{ name: "title", selector: "h1", type: "text" }],
});

// Clean JSON, ready to use
console.log(result.data?.title);

cURL

bashcurl -X POST "https://extract.kafcenter.com/api/v1/extract" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: kaf_your_api_key" \
  -d '{
    "url": "https://books.toscrape.com",
    "schema": {
      "fields": [
        {"name": "title", "selector": "h1", "type": "text"},
        {"name": "price", "selector": ".price_color", "type": "text"}
      ]
    }
  }'

SDKs

🐍 Python

pip install kaf-extract
Async/await, type hints, httpx-based. Full async streaming support.

PyPI →
📦 JavaScript / TypeScript

npm install kaf-extract
ESM + CJS, full TypeScript declarations, fetch-based.

npm →
🖥️ cURL

Zero dependencies. Works in every language, on every platform.

See examples →

API Reference

Base URL: https://extract.kafcenter.com
Auth: X-API-Key for extraction endpoints. Authorization: Bearer <JWT> for user endpoints.

Extraction

POST/api/v1/extract

Extract structured data from a URL. CSS selectors, AI mode, or custom schema. Redis-cached with 5min TTL.

POST/api/v1/extract/ai

AI-powered extraction — no selectors needed. Uses kimi-k2.6:cloud or glm-5.1:cloud for LLM extraction.

POST/api/v1/extract/batch

Extract from up to 50 URLs in parallel. Same schema applied to all pages.

GET/api/v1/extract/{job_id}

Poll for async extraction job results by job ID.

POST/api/v1/extract/screenshot

Capture screenshots: full page, viewport, or specific element. Returns base64 PNG.

POST/api/v1/extract/schedule

Schedule recurring extractions with cron expressions. Results delivered via webhook.

GET/api/v1/extract/history

Paginated extraction history with metadata, status, and export links.

Authentication

POST/auth/register

Create an account. Returns JWT access + refresh tokens. Optional TOTP setup.

POST/auth/login

Authenticate with email + password. Returns token pair.

POST/auth/refresh

Exchange refresh token for a new access + refresh token pair.

GET/auth/me

Get current user profile: name, email, role, subscription tier.

PUT/auth/me/password

Change password. Requires current password confirmation.

POST/auth/totp/setup

Generate TOTP secret and provisioning URI. Returns QR code data.

POST/auth/totp/verify

Verify TOTP code. Required after setup before 2FA is active.

API Keys

POST/api/v1/keys

Create a new API key with optional name and expiration. Scoped to user/organization.

GET/api/v1/keys

List all API keys with creation date, last used, and status.

PUT/api/v1/keys/{key_id}

Update key name, scopes, or expiration. Cannot regenerate the secret.

DEL/api/v1/keys/{key_id}

Revoke an API key permanently. Immediate invalidation.

Billing

GET/api/v1/billing/subscription

Current plan: Hobby, Pro, or Enterprise. Includes current usage against quota.

POST/api/v1/billing/checkout

Initiate LemonSqueezy checkout session. Returns checkout URL for the user to complete.

GET/api/v1/billing/invoices

Invoice history with PDF download links.

POST/api/v1/billing/cancel

Schedule subscription cancellation at end of billing period.

Vouchers

POST/api/v1/vouchers/redeem

Redeem a voucher code for credits or subscription tier upgrade.

GET/api/v1/vouchers/history

Voucher redemption history: code, value, applied at.

GET/api/v1/vouchers/{code}/check

Check voucher validity before redemption. Returns remaining uses and tier.

Organizations

POST/api/v1/orgs

Create a new organization. Owner gets full admin privileges.

GET/api/v1/orgs/me

Get current organization details, member count, and quota usage.

POST/api/v1/orgs/members

Invite a member by email to the organization. They must register first.

PUT/api/v1/orgs/members/{user_id}

Update member role: admin, editor, viewer. RBAC enforced.

DEL/api/v1/orgs/members/{user_id}

Remove a member from the organization. Cascades API keys.

System

GET/health

Health check: API, Postgres, Redis. No auth required.

GET/metrics

Usage metrics: total requests, cache hits, avg latency, error rate.

GET/version

Current version string. Useful for client compatibility checks.

Extraction Types

text

Inner text of matched element. Most common. Use for headings, prices, descriptions.

html

Full inner HTML. Preserves inline tags. Useful for rich descriptions.

attribute

Value of a specific HTML attribute. Specify attribute: "href" or attribute: "src". Default is inner text.

exists

Returns boolean true if the selector matches anything on the page. Useful for availability flags.

markdown

Full page content converted to clean Markdown. Strips ads, nav, and noise. Great for content feeds.

screenshot

Base64-encoded PNG of the matched element. Use with viewport or element selector.

ai

LLM-powered extraction via kimi-k2.6:cloud or glm-5.1:cloud. Describe what you want in natural language. No CSS selectors needed.

Rate Limits

All extraction endpoints enforce per-API-key sliding window limits. Exceeding returns HTTP 429. Response headers include rate limit details.

HeaderDescription
X-RateLimit-LimitMax requests per window for your plan
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when the window resets
Retry-AfterSeconds until retry (only on HTTP 429)

Tier Limits

TierRequests/MinRequests/MonthBatch Size
Hobby605,00010
Pro30050,00050
Enterprise1,000500,000100

Integrations

🔔 Slack Notifications

Pip>Push extraction results to Slack with Block Kit formatting. Auto-detects Slack webhook URLs.

bashcurl -X POST "https://extract.kafcenter.com/api/v1/extract/schedule" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "name": "Price Monitor",
    "cron_expression": "0 */6 * * *",
    "url": "https://shop.example.com/product",
    "fields": [{"name": "price", "selector": ".price", "type": "text"}],
    "webhook_url": "https://hooks.slack.com/services/T.../B.../xxx"
  }'

Test your Slack webhook before scheduling:

bashcurl -X POST "https://extract.kafcenter.com/api/v1/integrations/slack/test" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"webhook_url": "https://hooks.slack.com/services/T.../B.../xxx"}'

📊 Export Formats

Download results in JSON (default), CSV, or Markdown. Add ?format=csv or ?format=markdown to any extraction endpoint.

bash# CSV export
curl -X POST "https://extract.kafcenter.com/api/v1/extract?format=csv" \
  -H "X-API-Key: kaf_key" \
  -d '{"url":"https://example.com","schema":{"fields":[{"name":"title","selector":"h1"}]}}'

# Markdown export
curl -X POST "https://extract.kafcenter.com/api/v1/extract?format=markdown" \
  -H "X-API-Key: kaf_key" \
  -d '{"url":"https://example.com","schema":{"fields":[{"name":"content","type":"markdown"}]}}'

🪝 Webhooks

All webhook payloads are signed with HMAC-SHA256(X-Kaf-Signature). Verify in your endpoint:

pythonimport hmac, hashlib

secret = "your_jwt_secret"  # same as Kaf Extract JWT_SECRET
body = request.body
sig = request.headers.get("X-Kaf-Signature")

expected = hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
assert hmac.compare_digest(expected, sig)

Pricing

🆓 Hobby

$0 — 1,000 extractions/month, 60/min. CSS/XPath mode. Community support. Free forever.

🚀 Pro

$29/mo — 50,000 extractions/month, 300/min. AI extraction, batch, webhooks, priority support. 7-day trial.

🏢 Enterprise

$199/mo — 500,000 extractions/month. Dedicated proxies, SSO, SLA guarantee. 99.9% uptime.