Developer Documentation

DevCannabis Docs

How DevCannabis works, how to use public API surfaces, and how to request approved contributor/source access.

Overview

DevCannabis is a transparent platform that extracts, standardizes, and publishes cannabis regulatory data from US state agencies. We turn PDFs, messy spreadsheets, and hard-to-access dashboards into clean, accessible datasets.

Tech Stack

FastAPI + SQLAlchemy + Scrapy + pdfplumber. Python-first, no Java dependencies.

Multi-State

Modular architecture. Each state is a self-contained module with its own scrapers and config.

Data Quality

Operator aliasing, lineage tracking, and validation rules ensure accurate, consistent data.

Quick Start

Public users can start with the API and docs. Local source setup is for approved contributors under project terms.

1. Public docs / dashboard path

# Start here without source access browse public dashboards, methodology, and source notes submit source-backed corrections or new source suggestions # API keys and exports are signup-gated request non-commercial API access, bulk access, or partner permissions

2. Approved contributor source setup

# After maintainer approval/source access git clone https://github.com/GJManno/devcannabis.git cd devcannabis/ommu # legacy app path, current implementation directory # Start with Docker (recommended) docker-compose -f docker-compose.dev.yml up -d # Seed the database with all data docker-compose -f docker-compose.dev.yml exec web python scripts/seed_all.py

3. Access the Dashboard

# Approved local contributor workflow open http://localhost:8000 # Local API docs are for approved contributors/dev review open http://localhost:8000/docs

4. Run the Scraper (Approved contributor workflow)

# Full scrape + process pipeline docker-compose -f docker-compose.dev.yml exec scraper python cli.py --full # Just download new PDFs docker-compose -f docker-compose.dev.yml exec scraper python cli.py --scrape # Check current status docker-compose -f docker-compose.dev.yml exec scraper python cli.py --status
Source access note

Developer/source access is request-based and approved. Public dashboards are free; API/export access requires signup, auth, API key, or contributor login. Public outputs, issues, source-backed corrections, and methodology review remain available under the controlled public-use terms.

Access Policy

DevCannabis keeps the public site and dashboards free while protecting programmatic access and the core platform. API access, exports, bulk downloads, and automated use require signup/auth, an API key, or contributor login.

Allowed without permission

Non-commercial public-interest use of public dashboards and published notes with attribution to DevCannabis and original regulatory sources.

Written permission required

Bulk export, commercial use, resale/republication, model training, competing products, hosted mirrors, or using DevCannabis as a backend.

Approved contributor access

Developer/source access is reviewed manually. Approved roles include viewer, data_contributor, reviewer, moderator, developer, partner, and admin.

Request statuses should move through submitted, needs_source, under_review, approved, rejected, and published. API-key infrastructure should track key owner, rate limits, request logs, revoke/disable state, and terms acceptance timestamp before broad export access is enabled.

Read the full Access Policy.

Architecture

The project is organized into three main layers:

ommu/ ├── app/ # FastAPI web application │ ├── main.py # App entry point, routing │ ├── models/ # SQLAlchemy ORM models │ ├── api/ # REST API endpoints │ ├── services/ # Business logic │ └── templates/ # Jinja2 templates ├── states/ # State-specific modules │ ├── florida/ # Florida config, operators, scraper settings │ ├── _template/ # Template for new states │ └── regulatory_sources.py # Official state URLs reference ├── scraper/ # Scrapy project │ └── spiders/ # State-specific spiders ├── scripts/ # CLI tools for seeding, migration └── data/ # SQLite DB, PDFs, CSVs

Data Flow

  1. Scraper downloads PDFs from state regulatory websites
  2. Parser (pdfplumber) extracts tables from PDFs
  3. Normalizer applies operator aliases and validation rules
  4. Database stores normalized data with lineage tracking
  5. API/Dashboard serves data to users

Contributor Pathways

Pick the narrowest useful path. Code/source access is approved; source-backed public corrections are welcome.

Data Contributor

Submit Sources & Corrections

Best for official source links, provenance notes, data discrepancies, and evidence-backed corrections.

  • Submit verified regulator source URLs
  • Flag parser or data-quality issues
  • Submit data corrections
  • Improve public methodology notes
  • Cite primary sources where practical
Approved Contributor

Build, Review, or Partner

For developers, moderators/reviewers, and research partners who need deeper collaboration.

  • Request source access for adapters, tests, API, or dashboard work
  • Apply as State Lead or developer contributor
  • Review submissions as moderator/reviewer
  • Request research/partner collaboration
  • Follow licensing and contribution terms
Ready to help?

Use the contributor path to submit sources/corrections or request approved developer, reviewer, or partner access.

Add a New State

Adding a new state involves four main steps:

1. Create State Module

# Copy the template cp -r states/_template states/massachusetts cd states/massachusetts

2. Configure the State

Edit the following files in your new state folder:

File Purpose
operators.py Define operators, aliases (name variations), junk names to filter
regulatory.py State info, license types, regulatory timeline
config.py Data source URLs, scraper settings, PDF parsing coordinates
seed.py Database seeding functions

3. Register the State

# In states/__init__.py from . import massachusetts AVAILABLE_STATES = { "FL": florida, "MA": massachusetts, # Add your state }

4. Build the Spider

Add a Scrapy spider in scraper/spiders/ that downloads data from your state's regulatory portal.

Check the official source registry first

Canonical sources must be verified public government or regulatory organizations. Use /api/sources/official and data/sources/official_regulatory_sources.json before adding a pipeline.

Data Sources Reference

We maintain a canonical registry of verified public government/regulatory portals. Private or unverified links may be useful as non-canonical references, but they are not source-of-truth inputs.

State Regulator Format Status
Florida OMMU PDF Live
Massachusetts Cannabis Control Commission CSV Planned
California Dept of Cannabis Control HTML Planned
Colorado MED CSV Planned
Michigan CRA ArcGIS Research

Canonical registry API: /api/sources/official. Registry data: data/sources/official_regulatory_sources.json.

Trust fields to preserve

Where feasible, every source-backed claim should expose source URL, date fetched, last verified date, confidence, reviewer, correction history, and a challenge/correction link.

API Reference

Public summary endpoints return JSON. Base URL: /api. API/export access for regular programmatic use requires signup/auth/API key or contributor login.

Response note: newer payloads may expose operator-oriented fields such as operator_count and licenses_per_operator alongside backward-compatible legacy aliases.

Route note: prefer /api/operators and /api/operator/{name} in new consumers. Legacy /api/mmtc* paths remain available as compatibility routes during the transition.

Endpoint Method Description
/api/summary GET Current KPIs and market share
/api/timeseries GET Historical data by metric
/api/timeseries/all GET All metrics in one response
/api/operators GET All operators with latest stats
/api/operator/{name} GET Single operator with full history
/api/mmtcs, /api/mmtc/{name} GET Legacy compatibility routes for older consumers
/api/export/csv GET Authenticated contributor export; bulk/commercial use requires approval

Interactive docs: /docs (Swagger UI)

Data Models

Key database models:

WeeklyReport

Weekly dispensing data: THC, CBD, smokable oz, patient/physician counts per operator.

Operator

Licensed operators with license info, business structure, and parent company.

OperatorLineage

Tracks acquisitions, mergers, and rebrands over time.

DispensaryLocation

Individual dispensary locations with coordinates.

State Pipeline Status

Current status of state data pipelines:

Complete — Full data, actively maintained
Partial — Some data, gaps exist
In Progress — Pipeline being built
Planned — On roadmap, seeking lead
State Status Lead Data Range
Florida Complete @GJManno 2019 - Present
Massachusetts Planned Seeking -
Colorado Planned Seeking -
Oregon Planned Seeking -

Want to lead a state? Apply to become a State Lead.