Kaf Extract — API Documentation

Quick Start

Three lines. Any language. One API call.

Python SDK

bashpip install kaf-extract

python# 1. Create client
from kaf_extract import KafExtract

client = KafExtract(api_key="kaf_your_key")

# 2. Extract
result = client.extract_sync(
    "https://books.toscrape.com",
    fields=[{"name": "title", "selector": "h1", "type": "text"}]
)

# 3. Use the data
print(result.data["title"])  # "A Light in the Attic"

JavaScript / TypeScript

bashnpm install kaf-extract

typescriptimport { KafExtract } from "kaf-extract";

const client = new KafExtract({ apiKey: "kaf_your_key" });

const result = await client.extract("https://books.toscrape.com", {
    fields: [{ name: "title", selector: "h1", type: "text" }],
});

// Clean JSON, ready to use
console.log(result.data?.title);

cURL

bashcurl -X POST "https://extract.kafcenter.com/api/v1/extract" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: kaf_your_api_key" \
  -d '{
    "url": "https://books.toscrape.com",
    "schema": {
      "fields": [
        {"name": "title", "selector": "h1", "type": "text"},
        {"name": "price", "selector": ".price_color", "type": "text"}
      ]
    }
  }'

SDKs

🐍 Python

pip install kaf-extract
Async/await, type hints, httpx-based. Full async streaming support.

PyPI →

📦 JavaScript / TypeScript

npm install kaf-extract
ESM + CJS, full TypeScript declarations, fetch-based.

npm →

🖥️ cURL

Zero dependencies. Works in every language, on every platform.

See examples →

API Reference

Base URL: https://extract.kafcenter.com
Auth: X-API-Key for extraction endpoints. Authorization: Bearer <JWT> for user endpoints.

Extraction

POST/api/v1/extract

Extract structured data from a URL. CSS selectors, AI mode, or custom schema. Redis-cached with 5min TTL.

POST/api/v1/extract/ai

AI-powered extraction — no selectors needed. Uses kimi-k2.6:cloud or glm-5.1:cloud for LLM extraction.

POST/api/v1/extract/batch

Extract from up to 50 URLs in parallel. Same schema applied to all pages.

GET/api/v1/extract/{job_id}

Poll for async extraction job results by job ID.

POST/api/v1/extract/screenshot

Capture screenshots: full page, viewport, or specific element. Returns base64 PNG.

POST/api/v1/extract/schedule

Schedule recurring extractions with cron expressions. Results delivered via webhook.

GET/api/v1/extract/history

Paginated extraction history with metadata, status, and export links.

Authentication

POST/auth/register

Create an account. Returns JWT access + refresh tokens. Optional TOTP setup.

POST/auth/login

Authenticate with email + password. Returns token pair.

POST/auth/refresh

Exchange refresh token for a new access + refresh token pair.

GET/auth/me

Get current user profile: name, email, role, subscription tier.

PUT/auth/me/password

Change password. Requires current password confirmation.

POST/auth/totp/setup

Generate TOTP secret and provisioning URI. Returns QR code data.

POST/auth/totp/verify

Verify TOTP code. Required after setup before 2FA is active.

API Keys

POST/api/v1/keys

Create a new API key with optional name and expiration. Scoped to user/organization.

GET/api/v1/keys

List all API keys with creation date, last used, and status.

PUT/api/v1/keys/{key_id}

Update key name, scopes, or expiration. Cannot regenerate the secret.

DEL/api/v1/keys/{key_id}

Revoke an API key permanently. Immediate invalidation.

Billing

GET/api/v1/billing/subscription

Current plan: Hobby, Pro, or Enterprise. Includes current usage against quota.

POST/api/v1/billing/checkout

Initiate LemonSqueezy checkout session. Returns checkout URL for the user to complete.

GET/api/v1/billing/invoices

Invoice history with PDF download links.

POST/api/v1/billing/cancel

Schedule subscription cancellation at end of billing period.

Vouchers

POST/api/v1/vouchers/redeem

Redeem a voucher code for credits or subscription tier upgrade.

GET/api/v1/vouchers/history

Voucher redemption history: code, value, applied at.

GET/api/v1/vouchers/{code}/check

Check voucher validity before redemption. Returns remaining uses and tier.

Organizations

POST/api/v1/orgs

Create a new organization. Owner gets full admin privileges.

GET/api/v1/orgs/me

Get current organization details, member count, and quota usage.

POST/api/v1/orgs/members

Invite a member by email to the organization. They must register first.

PUT/api/v1/orgs/members/{user_id}

Update member role: admin, editor, viewer. RBAC enforced.

DEL/api/v1/orgs/members/{user_id}

Remove a member from the organization. Cascades API keys.

System

GET/health

Health check: API, Postgres, Redis. No auth required.

GET/metrics

Usage metrics: total requests, cache hits, avg latency, error rate.

GET/version

Current version string. Useful for client compatibility checks.

Extraction Types

text

Inner text of matched element. Most common. Use for headings, prices, descriptions.

html

Full inner HTML. Preserves inline tags. Useful for rich descriptions.

attribute

Value of a specific HTML attribute. Specify attribute: "href" or attribute: "src". Default is inner text.

exists

Returns boolean true if the selector matches anything on the page. Useful for availability flags.

markdown

Full page content converted to clean Markdown. Strips ads, nav, and noise. Great for content feeds.

screenshot

Base64-encoded PNG of the matched element. Use with viewport or element selector.

ai

LLM-powered extraction via kimi-k2.6:cloud or glm-5.1:cloud. Describe what you want in natural language. No CSS selectors needed.

Rate Limits

All extraction endpoints enforce per-API-key sliding window limits. Exceeding returns HTTP 429. Response headers include rate limit details.

Header	Description
`X-RateLimit-Limit`	Max requests per window for your plan
`X-RateLimit-Remaining`	Requests remaining in current window
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds until retry (only on HTTP 429)

Tier Limits

Tier	Requests/Min	Requests/Month	Batch Size
Hobby	60	5,000	10
Pro	300	50,000	50
Enterprise	1,000	500,000	100

Integrations

🔔 Slack Notifications

Pip>Push extraction results to Slack with Block Kit formatting. Auto-detects Slack webhook URLs.

bashcurl -X POST "https://extract.kafcenter.com/api/v1/extract/schedule" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "name": "Price Monitor",
    "cron_expression": "0 */6 * * *",
    "url": "https://shop.example.com/product",
    "fields": [{"name": "price", "selector": ".price", "type": "text"}],
    "webhook_url": "https://hooks.slack.com/services/T.../B.../xxx"
  }'

Test your Slack webhook before scheduling:

bashcurl -X POST "https://extract.kafcenter.com/api/v1/integrations/slack/test" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"webhook_url": "https://hooks.slack.com/services/T.../B.../xxx"}'

📊 Export Formats

Download results in JSON (default), CSV, or Markdown. Add ?format=csv or ?format=markdown to any extraction endpoint.

bash# CSV export
curl -X POST "https://extract.kafcenter.com/api/v1/extract?format=csv" \
  -H "X-API-Key: kaf_key" \
  -d '{"url":"https://example.com","schema":{"fields":[{"name":"title","selector":"h1"}]}}'

# Markdown export
curl -X POST "https://extract.kafcenter.com/api/v1/extract?format=markdown" \
  -H "X-API-Key: kaf_key" \
  -d '{"url":"https://example.com","schema":{"fields":[{"name":"content","type":"markdown"}]}}'

🪝 Webhooks

All webhook payloads are signed with HMAC-SHA256(X-Kaf-Signature). Verify in your endpoint:

pythonimport hmac, hashlib

secret = "your_jwt_secret"  # same as Kaf Extract JWT_SECRET
body = request.body
sig = request.headers.get("X-Kaf-Signature")

expected = hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
assert hmac.compare_digest(expected, sig)

Pricing

🆓 Hobby

$0 — 1,000 extractions/month, 60/min. CSS/XPath mode. Community support. Free forever.

🚀 Pro

$29/mo — 50,000 extractions/month, 300/min. AI extraction, batch, webhooks, priority support. 7-day trial.

🏢 Enterprise

$199/mo — 500,000 extractions/month. Dedicated proxies, SSO, SLA guarantee. 99.9% uptime.

Kaf Extract API

Quick Start

Python SDK

JavaScript / TypeScript

cURL

SDKs

API Reference

Extraction

Authentication

API Keys

Billing

Vouchers

Organizations

System

Extraction Types

Rate Limits

Tier Limits

Integrations

🔔 Slack Notifications

📊 Export Formats

🪝 Webhooks

Pricing