Methodology & Sources

Documentation

Data provenance at a glance

Every figure on this site traces to one of the sources below. We separate measured data (reported by companies or published by statistical agencies) from generated & modeled data (an AI-generated network and this project's own editorial estimates), and never present the latter as if it were observed. Full citations, licenses and acknowledgments follow in the references section.

Source	Type	What we use it for	Origin
U.S. SEC EDGAR company-facts XBRL API	Measured	All company financials (FY2021–25) from audited 10-K / 20-F / 40-F filings.	data.sec.gov
FRED Federal Reserve Bank of St. Louis	Measured	Monthly commodity prices (IMF) and producer-price indexes, 2015–present.	fred.stlouisfed.org
U.S. Bureau of Labor Statistics BLS Public Data API	Measured	CPI-U, average hourly earnings, industry output PPIs, import price indexes.	bls.gov/data
AIPNET AI-generated Production Network	AI-generated	Global production-network layer: product centrality, world-trade share, and edge validation of our flow graph.	aipnet.io
Material taxonomy, flows & tags this project	Editorial	The 64 categories, 127 flow edges, and company–material produce/consume tags.	curated in-repo

Every machine-readable file this site loads carries an inline _provenance block naming its source and citation, so the data can be attributed correctly if reused. A single manifest of all sources is published at data/SOURCES.json.

1. Company universe

The 1,500 largest U.S.-listed registrants by approximate market capitalization, taken from the SEC's company ticker file (which is ordered by market cap) and de-duplicated by CIK so multi-class issuers (e.g. GOOGL/GOOG) appear once — roughly the S&P 1500. On top of the size cutoff, a curated list of ~25 smaller material-critical producers (e.g. Boise Cascade in lumber, Hecla in silver, Warrior Met in coking coal) is always included, since materials coverage matters more here than market-cap purity. Roughly 200 entries are foreign companies trading as unsponsored ADRs that do not file structured annual reports with the SEC; they remain in the directory for reference but carry no financial data. Notably, large foreign producers that never list in the U.S. (e.g. Saudi Aramco) are outside the register entirely.

2. Financial data

All figures come from the SEC EDGAR company facts XBRL API (data.sec.gov/api/xbrl/companyfacts), which exposes every numeric fact companies tag in their audited filings. We retain values reported in annual filings (forms 10-K, 20-F, 40-F and amendments), in USD, for full-year periods (330–400 days) ending in fiscal years 2021–2025. When a company restates a figure in a later filing, the most recently filed value wins. Around 30 line items are tracked per company — income statement, cash-flow investment lines (capex, land purchases, acquisitions, buybacks), balance-sheet stocks, the PP&E breakdown (land, buildings, machinery, construction in progress) and the inventory breakdown (raw materials, work in process, finished goods).

Each metric maps to an ordered list of US-GAAP (or IFRS, for foreign private issuers) concepts, taking the first that the company reports. Companies that report under unusual concepts may show gaps. Fiscal years are labeled by the calendar year containing the majority of the period, so a January-2025 fiscal year end (e.g. Walmart) is labeled FY2024. All figures are nominal, as reported, unadjusted for inflation.

3. Material taxonomy & flows

The 64 material/product categories and the 127 directed flow edges between them (e.g. iron ore → steel → automobiles) are a curated editorial model, designed to capture the main physical input–output relationships of the industrial economy at a readable scale. It is intentionally far coarser than official input–output tables (such as the BEA's 400-industry use tables) and makes no claim about dollar magnitudes of any single flow.

4. Company–material tags

Each company is tagged with materials it produces and consumes using two layers: (a) rules keyed on the company's SIC industry classification as reported to the SEC, then (b) a curated override list for ~100 major companies whose business mix the SIC code misses (e.g. Tesla as battery producer, hyperscalers as data-center builders, Weyerhaeuser as timber REIT). Tags are estimates intended for dependency tracing, not a statement that a material is a specific fraction of any company's costs. The "upstream dependency chain" on company pages walks the flow graph transitively from the company's tags.

4b. Filing-text evidence

To ground the heuristic tags in primary sources, the pipeline also downloads the text of each company's most recent annual report from EDGAR and counts occurrences of a curated alias vocabulary per material (“coking coal”, “lithium hydroxide”, “containerboard”, …). Company pages show these mention counts with a context snippet and a link to the filing, marked as corroborating a tag or as a material the company discusses but is not tagged with. Counts are raw term frequency — a company can mention a material it neither produces nor meaningfully consumes — so mentions are presented as supporting evidence, not used to generate tags automatically.

5. Price series & correlations

54 monthly series, 2015–present, retrieved from FRED (Federal Reserve Bank of St. Louis): IMF global commodity prices (copper, aluminum, iron ore, nickel, coal, uranium, wheat, beef, cotton, sugar), spot energy prices (WTI, Henry Hub) and BLS producer-price indexes where no world price exists (lumber, steel, cement, glass, chemicals, resins, fertilizers, semiconductors, plus — since June 2026 — scrap steel, industrial gases, graphite products, batteries, pharmaceuticals, processed foods, apparel, aircraft, machinery, appliances, medical devices, railroad equipment, ethanol, solar-inclusive semiconductor devices, electronic components, semiconductor equipment, data-center hosting services and freight transportation). Where no material-specific series exists, the nearest published index is used and labeled with its official series title; ten categories (e.g. lithium, cobalt, rare earths, silicon wafers) have no honest free monthly series and are deliberately left unmapped rather than proxied. PPI series are indexes, not dollar prices, so cross-series charts are shown indexed to 100 at a common date.

Correlations are Pearson r computed on year-over-year percent changes of the monthly series (this removes seasonality and most common trend), over all overlapping months with a minimum of 36 observations. Correlation is not causation — commodities share macro drivers (energy costs, construction cycles, the dollar), and a high r between two series does not establish a physical supply linkage.

5b. BLS-direct series & inflation adjustment

Three groups of series come straight from the BLS Public Data API rather than FRED: headline CPI-U (CUUR0000SA0) and average hourly earnings (CES0500000003, shown as the “Labor” series); industry output PPIs (NAICS-based PCU series) mapped to ~25 SEC industry classifications and shown on the Industries page; and import price indexes (BEA end-use EIUIR series) for seventeen import-heavy materials (crude oil, natural gas, refined fuels, chemicals, fertilizers, lumber, steelmaking materials, steel products, aluminum, nonferrous metals, semiconductors, computers & electronics, machinery, passenger cars, pharmaceuticals, apparel and textiles), shown on material pages as “what US importers pay”. The inflation scorecard deflates each material price by CPI-U over the same months: real change = (1 + nominal) ÷ (1 + CPI change) − 1. CPI-U is the deflator throughout; a different deflator would shift levels but rarely the ranking. Real-terms views on material and company pages use the same arithmetic (constant latest-month dollars for monthly prices; constant FY2021 dollars, via calendar-year average CPI, for annual financials).

6. AIPNET global production network AI-generated

The “position in the global production network” sections, the world-trade-share column, the confirmed/dashed distinction on the dependency graph, and the “links AIPNET sees” table all come from AIPNET — the AI-generated Production Network, an academic dataset and working paper by Thiemo Fetzer (University of Warwick), Peter John Lambert and Bennet Feld (London School of Economics), and Prashant Garg (Imperial College London). It is a knowledge graph of input→output relationships between ~5,000 six-digit HS products (429,871 directed edges, HS2002 vintage), with each product scored by Integrated Global Product Centrality (IGPC) and its share of world goods trade. The network is generated by an ensemble of prompt-tuned large-language-model classifications, so its edges are model predictions of technological feasibility — what can feed into what — not measured trade flows. We present it as such throughout and mark it AI-generated wherever it appears beside measured data.

This site is not affiliated with or endorsed by the AIPNET authors. We use their publicly released v1.0 data pack for non-commercial research with attribution, per the dataset's stated terms, and cite the accompanying paper (full reference below). The original data, paper and methodology are at aipnet.io — please go there for the authoritative version and to cite it in your own work.

We hand-mapped each material to HS2002 headings/subheadings in scripts/fetch_aipnet.py (870 products matched across 59 materials; longest-prefix-wins, so e.g. solar cells 854140 belong to renewable equipment, not semiconductors). Material-to-material link counts project AIPNET's product edges onto our taxonomy: a curated flow is “confirmed” if at least one product-level edge crosses it. Unconfirmed flows are mostly services (labor, electricity, transport, data centers) outside AIPNET's traded-goods universe — buildings and refined-fuel centrality scores are also outside its v1.0 node list. AIPNET edges are technological feasibility (what can feed into what), not realized trade volumes.

7. Known limitations

Private companies (Cargill, Koch) and non-U.S.-listed giants (Aramco, CATL) are absent.
“All producers combined” figures sum each producer's whole-company revenue, including segments unrelated to the material, so they overstate the material-specific market.
XBRL tagging quality varies by company; PP&E and inventory detail is the sparsest.
Material tags are editorial estimates; corrections are cheap to apply in scripts/build_tags.py.

8. Reproducing the data

The register is fully reproducible from public APIs with the included scripts: fetch_companies.py (EDGAR pull, ~10 min), build_tags.py (tagging), fetch_prices.py (FRED pull), fetch_mentions.py (filing-text evidence), fetch_bls.py (BLS-direct series; run after fetch_prices.py), fetch_aipnet.py (AIPNET production network), build_site_data.py (bakes the static JSON this site reads). Re-running the pipeline refreshes the register to the latest filings.

References & acknowledgments

This project stands on publicly released data and research by others. We gratefully acknowledge the sources below and encourage anyone reusing figures from this site to cite the original sources directly.

Academic dataset — AIPNET. The global production-network layer is the work of academic researchers and should be credited to them, not to this site:

Fetzer, T., Lambert, P. J., Feld, B., & Garg, P. (2024). AI-Generated Production Networks: Measurement and Applications to Global Trade. Working paper, AIPNET project. PDF · aipnet.io. Data: AIPNET Data Pack v1.0 (HS2002 vintage), released 2024-12-04, retrieved June 2026. Authors’ affiliations: University of Warwick (Fetzer); London School of Economics (Lambert, Feld); Imperial College London (Garg). Used for non-commercial research with attribution, per the dataset’s terms; this site is independent of and not endorsed by the authors.

Official statistical & regulatory sources. Public-domain U.S. government data:

U.S. Securities and Exchange Commission. EDGAR company-facts XBRL API (annual filings 10-K / 20-F / 40-F). data.sec.gov. Retrieved June 2026.
Federal Reserve Bank of St. Louis. FRED — Federal Reserve Economic Data (IMF Primary Commodity Prices; Bureau of Labor Statistics producer-price indexes redistributed via FRED). fred.stlouisfed.org. Retrieved June 2026.
U.S. Bureau of Labor Statistics. Public Data API (CPI-U, Current Employment Statistics average hourly earnings, industry producer-price indexes, U.S. import/export price indexes). bls.gov/data. Retrieved June 2026.

This project’s own contributions — the 64-category material taxonomy, the 127 flow edges, the company–material produce/consume tags, the material→HS code mapping, and all derived aggregates and narratives — are editorial estimates by this project, offered under the same non-commercial research spirit. They are clearly labeled as estimates and are not a substitute for official input–output tables (e.g. the BEA Input-Output Accounts). Corrections are welcome and cheap to apply in the source scripts.

When using data from this site, please attribute the original sources above. If you use the production-network figures specifically, cite Fetzer, Lambert, Feld & Garg (2024) and link to aipnet.io.