Kaf Extract API
Turn any website into structured JSON with a single HTTP call. No selectors, no scrapers, no broken pipelines.
Quick Start
Three lines. Any language. One API call.
Python SDK
bashpip install kaf-extract
python# 1. Create client
from kaf_extract import KafExtract
client = KafExtract(api_key="kaf_your_key")
# 2. Extract
result = client.extract_sync(
"https://books.toscrape.com",
fields=[{"name": "title", "selector": "h1", "type": "text"}]
)
# 3. Use the data
print(result.data["title"]) # "A Light in the Attic"
JavaScript / TypeScript
bashnpm install kaf-extract
typescriptimport { KafExtract } from "kaf-extract";
const client = new KafExtract({ apiKey: "kaf_your_key" });
const result = await client.extract("https://books.toscrape.com", {
fields: [{ name: "title", selector: "h1", type: "text" }],
});
// Clean JSON, ready to use
console.log(result.data?.title);
cURL
bashcurl -X POST "https://extract.kafcenter.com/api/v1/extract" \
-H "Content-Type: application/json" \
-H "X-API-Key: kaf_your_api_key" \
-d '{
"url": "https://books.toscrape.com",
"schema": {
"fields": [
{"name": "title", "selector": "h1", "type": "text"},
{"name": "price", "selector": ".price_color", "type": "text"}
]
}
}'
SDKs
pip install kaf-extract
Async/await, type hints, httpx-based. Full async streaming support.
npm install kaf-extract
ESM + CJS, full TypeScript declarations, fetch-based.
Zero dependencies. Works in every language, on every platform.
See examples →API Reference
https://extract.kafcenter.comAuth:
X-API-Key for extraction endpoints. Authorization: Bearer <JWT> for user endpoints.
Extraction
Extract structured data from a URL. CSS selectors, AI mode, or custom schema. Redis-cached with 5min TTL.
AI-powered extraction — no selectors needed. Uses kimi-k2.6:cloud or glm-5.1:cloud for LLM extraction.
Extract from up to 50 URLs in parallel. Same schema applied to all pages.
Poll for async extraction job results by job ID.
Capture screenshots: full page, viewport, or specific element. Returns base64 PNG.
Schedule recurring extractions with cron expressions. Results delivered via webhook.
Paginated extraction history with metadata, status, and export links.
Authentication
Create an account. Returns JWT access + refresh tokens. Optional TOTP setup.
Authenticate with email + password. Returns token pair.
Exchange refresh token for a new access + refresh token pair.
Get current user profile: name, email, role, subscription tier.
Change password. Requires current password confirmation.
Generate TOTP secret and provisioning URI. Returns QR code data.
Verify TOTP code. Required after setup before 2FA is active.
API Keys
Create a new API key with optional name and expiration. Scoped to user/organization.
List all API keys with creation date, last used, and status.
Update key name, scopes, or expiration. Cannot regenerate the secret.
Revoke an API key permanently. Immediate invalidation.
Billing
Current plan: Hobby, Pro, or Enterprise. Includes current usage against quota.
Initiate LemonSqueezy checkout session. Returns checkout URL for the user to complete.
Invoice history with PDF download links.
Schedule subscription cancellation at end of billing period.
Vouchers
Redeem a voucher code for credits or subscription tier upgrade.
Voucher redemption history: code, value, applied at.
Check voucher validity before redemption. Returns remaining uses and tier.
Organizations
Create a new organization. Owner gets full admin privileges.
Get current organization details, member count, and quota usage.
Invite a member by email to the organization. They must register first.
Update member role: admin, editor, viewer. RBAC enforced.
Remove a member from the organization. Cascades API keys.
System
Health check: API, Postgres, Redis. No auth required.
Usage metrics: total requests, cache hits, avg latency, error rate.
Current version string. Useful for client compatibility checks.
Extraction Types
Inner text of matched element. Most common. Use for headings, prices, descriptions.
Full inner HTML. Preserves inline tags. Useful for rich descriptions.
Value of a specific HTML attribute. Specify attribute: "href" or attribute: "src". Default is inner text.
Returns boolean true if the selector matches anything on the page. Useful for availability flags.
Full page content converted to clean Markdown. Strips ads, nav, and noise. Great for content feeds.
Base64-encoded PNG of the matched element. Use with viewport or element selector.
LLM-powered extraction via kimi-k2.6:cloud or glm-5.1:cloud. Describe what you want in natural language. No CSS selectors needed.
Rate Limits
All extraction endpoints enforce per-API-key sliding window limits. Exceeding returns HTTP 429. Response headers include rate limit details.
| Header | Description |
|---|---|
X-RateLimit-Limit | Max requests per window for your plan |
X-RateLimit-Remaining | Requests remaining in current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Retry-After | Seconds until retry (only on HTTP 429) |
Tier Limits
| Tier | Requests/Min | Requests/Month | Batch Size |
|---|---|---|---|
| Hobby | 60 | 5,000 | 10 |
| Pro | 300 | 50,000 | 50 |
| Enterprise | 1,000 | 500,000 | 100 |
Integrations
🔔 Slack Notifications
Pip>Push extraction results to Slack with Block Kit formatting. Auto-detects Slack webhook URLs.
bashcurl -X POST "https://extract.kafcenter.com/api/v1/extract/schedule" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"name": "Price Monitor",
"cron_expression": "0 */6 * * *",
"url": "https://shop.example.com/product",
"fields": [{"name": "price", "selector": ".price", "type": "text"}],
"webhook_url": "https://hooks.slack.com/services/T.../B.../xxx"
}'
Test your Slack webhook before scheduling:
bashcurl -X POST "https://extract.kafcenter.com/api/v1/integrations/slack/test" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{"webhook_url": "https://hooks.slack.com/services/T.../B.../xxx"}'
📊 Export Formats
Download results in JSON (default), CSV, or Markdown. Add ?format=csv or ?format=markdown to any extraction endpoint.
bash# CSV export
curl -X POST "https://extract.kafcenter.com/api/v1/extract?format=csv" \
-H "X-API-Key: kaf_key" \
-d '{"url":"https://example.com","schema":{"fields":[{"name":"title","selector":"h1"}]}}'
# Markdown export
curl -X POST "https://extract.kafcenter.com/api/v1/extract?format=markdown" \
-H "X-API-Key: kaf_key" \
-d '{"url":"https://example.com","schema":{"fields":[{"name":"content","type":"markdown"}]}}'
🪝 Webhooks
All webhook payloads are signed with HMAC-SHA256(X-Kaf-Signature). Verify in your endpoint:
pythonimport hmac, hashlib
secret = "your_jwt_secret" # same as Kaf Extract JWT_SECRET
body = request.body
sig = request.headers.get("X-Kaf-Signature")
expected = hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
assert hmac.compare_digest(expected, sig)
Pricing
$0 — 1,000 extractions/month, 60/min. CSS/XPath mode. Community support. Free forever.
$29/mo — 50,000 extractions/month, 300/min. AI extraction, batch, webhooks, priority support. 7-day trial.
$199/mo — 500,000 extractions/month. Dedicated proxies, SSO, SLA guarantee. 99.9% uptime.