NVD — National Vulnerability Database

nvd
Table: cve Rows: ~333,000 Size: ~5 GB

Every CVE published by NIST, including CVSS scores, severity ratings, CWE weaknesses, and advisory references. Updated periodically from the NVD API v2.0.

Schema — table cve
ColumnTypeDescription
idstringCVE identifier (e.g. CVE-2024-12345)
publishedstringPublication date
lastModifiedstringLast modification date
vulnStatusstringAnalyzed, Modified, Awaiting Analysis
descriptions[0].valuestringEnglish description
metrics.cvssMetricV31[0]
.cvssData.baseScore
floatCVSS v3.1 base score (0–10)
metrics.cvssMetricV31[0]
.cvssData.baseSeverity
stringLOW / MEDIUM / HIGH / CRITICAL
metrics.cvssMetricV31[0]
.cvssData.vectorString
stringFull CVSS vector string
weaknesses[0].description[0].valuestringPrimary CWE identifier
referencesarrayAdvisory URLs and sources

Note: Timestamp columns require backtick-quoted ISO 8601 literals for range comparisons — single-quoted strings return zero rows. Example: `2024-01-01T00:00:00Z`

-- Critical CVEs published in 2024
SELECT id, published,
       metrics.cvssMetricV31[0].cvssData.baseScore AS score,
       descriptions[0].value AS description
FROM   cve
WHERE  metrics.cvssMetricV31[0].cvssData.baseSeverity = 'CRITICAL'
  AND  published >= `2024-01-01T00:00:00Z`
ORDER BY score DESC
LIMIT 20

OSV — Open Source Vulnerabilities

osv
Table: advisories Rows: ~4,140,000 Size: ~400 MB

OSV.dev advisories flattened to one row per (advisory, affected package). Covers PyPI, npm, Go, Maven, Rust, Debian, Alpine, and more. Superset of NVD — includes ecosystem-specific advisories (GHSA, PYSEC, RUSTSEC) with precise package-version ranges for direct dependency scanning.

Schema — table advisories
ColumnTypeDescription
idstringAdvisory ID: CVE-..., GHSA-..., PYSEC-...
publishedstringFirst published date
modifiedstringLast modified date
withdrawnstringWithdrawal date if retracted (optional)
aliasesarrayRelated IDs (e.g. CVE alias for a GHSA)
summarystringShort description
severity_typestringCVSS type (e.g. CVSS_V3)
severity_scorestringFull CVSS vector string
cwe_idsarrayCWE weakness identifiers
ecosystemstringPyPI, npm, Go, Maven, crates.io, ...
package_namestringPackage name within the ecosystem
purlstringPackage URL (optional)
introducedstringFirst affected version (0 = all versions before fixed)
fixedstringFirst fixed version; absent if not yet fixed
-- All advisories for a specific npm package
SELECT id, introduced, fixed, summary
FROM   advisories
WHERE  ecosystem = 'npm' AND package_name = 'lodash'

-- Unfixed critical advisories in PyPI (CVSS network-reachable, high impact)
SELECT id, package_name, summary
FROM   advisories
WHERE  ecosystem = 'PyPI'
  AND  severity_score ~ 'AV:N.*C:H'
  AND  fixed IS MISSING
LIMIT 50

-- Advisory counts by ecosystem
SELECT ecosystem, COUNT(*) AS n
FROM   advisories
GROUP BY ecosystem
ORDER BY n DESC

PubMed — Biomedical Literature

pubmed
Table: baseline Rows: ~36,000,000 Size: ~350 GB

Full PubMed baseline: titles, abstracts, authors, journals, MeSH terms, and publication metadata for biomedical literature. Data originates from NLM XML; nested fields preserve the original XML element names.

Schema — table baseline (selected fields)
ColumnTypeDescription
MedlineCitation.PMIDstringPubMed identifier
MedlineCitation.Article
.ArticleTitle
stringArticle title
MedlineCitation.Article
.Abstract.AbstractText
stringAbstract text (may be structured)
MedlineCitation.Article
.Journal.Title
stringJournal name
MedlineCitation.Article
.AuthorList.Author
arrayAuthors (LastName, ForeName, Affiliation)
MedlineCitation
.MeshHeadingList.MeshHeading
arrayMeSH controlled vocabulary terms
MedlineCitation
.DateCompleted.Year
stringYear NLM indexing was completed
PubmedData.ArticleIdList
.ArticleId
arrayDOI, PMC ID, and other identifiers
-- Recent articles matching a topic (regex title search)
SELECT MedlineCitation.PMID,
       MedlineCitation.Article.ArticleTitle        AS title,
       MedlineCitation.Article.Journal.Title        AS journal,
       MedlineCitation.DateCompleted.Year           AS year
FROM   baseline
WHERE  MedlineCitation.Article.ArticleTitle ~ '(?i)CRISPR'
  AND  MedlineCitation.DateCompleted.Year >= '2023'
LIMIT 20

SEC EDGAR — Financial Facts

sec
Table: edgar Rows: ~97,800,000 Companies: ~8,900 Size: ~8 GB

Structured XBRL financial facts from SEC filings (10-K, 10-Q, 8-K). One row per reported fact — every line item a public company files: revenue, net income, assets, shares outstanding, and thousands of other GAAP/IFRS concepts.

Note: The end column (period end date) is a reserved keyword in Sneller SQL. Do not use it in SELECT or ORDER BY — filter by filed or fy instead.

Schema — table edgar
ColumnTypeDescription
cikstringSEC Central Index Key
tickerstringStock ticker symbol
companystringCompany name
taxonomystringus-gaap, ifrs-full, dei, ...
conceptstringXBRL concept (e.g. Revenues, NetIncomeLoss)
labelstringHuman-readable label for the concept
unitstringUSD, shares, pure, ...
endstringPeriod end date (reserved keyword — avoid in SELECT)
valfloatReported value
formstringFiling form: 10-K, 10-Q, 8-K, ...
fystringFiscal year
fpstringFiscal period: FY, Q1, Q2, Q3
filedstringFiling date
accnstringSEC accession number
-- Annual revenue for a company across fiscal years
SELECT ticker, fy, fp, val AS revenue, filed
FROM   edgar
WHERE  ticker = 'AAPL'
  AND  concept = 'Revenues'
  AND  form = '10-K'
ORDER BY fy DESC
LIMIT 10

-- Compare net income across S&P 500 for latest fiscal year
SELECT ticker, company, val AS net_income, fy
FROM   edgar
WHERE  concept = 'NetIncomeLoss'
  AND  form = '10-K' AND fy = '2024' AND unit = 'USD'
ORDER BY net_income DESC
LIMIT 20

-- Balance sheet: total assets vs total liabilities for one company
SELECT ticker, fy, fp, concept, val, filed
FROM   edgar
WHERE  ticker = 'MSFT'
  AND  concept IN ('Assets', 'Liabilities')
  AND  form = '10-K'
ORDER BY fy DESC
LIMIT 10

-- Discover which concepts a company has reported (useful before writing queries)
SELECT concept, label, unit, COUNT(*) AS filings
FROM   edgar
WHERE  ticker = 'TSLA'
GROUP BY concept, label, unit
ORDER BY filings DESC
LIMIT 50

-- Quarterly earnings per share trend
SELECT ticker, fy, fp, val AS eps, filed
FROM   edgar
WHERE  ticker = 'NVDA'
  AND  concept = 'EarningsPerShareBasic'
  AND  form = '10-Q'
ORDER BY filed DESC
LIMIT 12

Ethereum — Token Transfers & Logs

eth
Tables: transfers, logs History: 2015 – present Size: ~1.4 TB full history

On-chain Ethereum data sourced from the AWS Public Blockchain dataset. Both tables are partitioned by date — queries can target a rolling window (e.g. last 90 days, ~42 GB) rather than full history.

Table transfers — decoded ERC-20 / ERC-721 Transfer events  ·  ~361 GB full history
ColumnTypeDescription
block_numberintBlock height
block_timestampstringBlock timestamp (ISO 8601)
transaction_hashstringTransaction hash
log_indexintLog position within the transaction
token_addressstringContract address of the token
from_addressstringSender address
to_addressstringRecipient address
valuestringTransfer amount (raw; token decimals vary)
-- Recent large USDC transfers (USDC = 6 decimals, $1M = 1e12)
SELECT transaction_hash, from_address, to_address, value
FROM   transfers
WHERE  token_address = '0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'
  AND  CAST(value AS FLOAT) > 1000000000000
  AND  block_timestamp >= '2025-01-01'
LIMIT 20
Table logs — raw contract event logs  ·  ~1 TB full history
ColumnTypeDescription
block_numberintBlock height
block_timestampstringBlock timestamp (ISO 8601)
transaction_hashstringTransaction hash
addressstringContract that emitted the log
topicsarrayIndexed event topics (event sig + params)
datastringABI-encoded non-indexed parameters
log_indexintLog position within the transaction
transaction_indexintTransaction position within the block
-- All events from a specific contract (e.g. Uniswap v2 router)
SELECT block_timestamp, transaction_hash,
       topics[0] AS event_sig, data
FROM   logs
WHERE  address = '0x7a250d5630b4cf539739df2c5dacb4c659f2488d'
  AND  block_timestamp >= '2025-01-01'
LIMIT 20