DevCannabis Docs
How DevCannabis works, how to use public API surfaces, and how to request approved contributor/source access.
Overview
DevCannabis is a transparent platform that extracts, standardizes, and publishes cannabis regulatory data from US state agencies. We turn PDFs, messy spreadsheets, and hard-to-access dashboards into clean, accessible datasets.
Tech Stack
FastAPI + SQLAlchemy + Scrapy + pdfplumber. Python-first, no Java dependencies.
Multi-State
Modular architecture. Each state is a self-contained module with its own scrapers and config.
Data Quality
Operator aliasing, lineage tracking, and validation rules ensure accurate, consistent data.
Quick Start
Public users can start with the API and docs. Local source setup is for approved contributors under project terms.
1. Public docs / dashboard path
# Start here without source access
browse public dashboards, methodology, and source notes
submit source-backed corrections or new source suggestions
# API keys and exports are signup-gated
request non-commercial API access, bulk access, or partner permissions
2. Approved contributor source setup
# After maintainer approval/source access
git clone https://github.com/GJManno/devcannabis.git
cd devcannabis/ommu # legacy app path, current implementation directory
# Start with Docker (recommended)
docker-compose -f docker-compose.dev.yml up -d
# Seed the database with all data
docker-compose -f docker-compose.dev.yml exec web python scripts/seed_all.py
3. Access the Dashboard
# Approved local contributor workflow
open http://localhost:8000
# Local API docs are for approved contributors/dev review
open http://localhost:8000/docs
4. Run the Scraper (Approved contributor workflow)
# Full scrape + process pipeline
docker-compose -f docker-compose.dev.yml exec scraper python cli.py --full
# Just download new PDFs
docker-compose -f docker-compose.dev.yml exec scraper python cli.py --scrape
# Check current status
docker-compose -f docker-compose.dev.yml exec scraper python cli.py --status
Developer/source access is request-based and approved. Public dashboards are free; API/export access requires signup, auth, API key, or contributor login. Public outputs, issues, source-backed corrections, and methodology review remain available under the controlled public-use terms.
Access Policy
DevCannabis keeps the public site and dashboards free while protecting programmatic access and the core platform. API access, exports, bulk downloads, and automated use require signup/auth, an API key, or contributor login.
Allowed without permission
Non-commercial public-interest use of public dashboards and published notes with attribution to DevCannabis and original regulatory sources.
Written permission required
Bulk export, commercial use, resale/republication, model training, competing products, hosted mirrors, or using DevCannabis as a backend.
Approved contributor access
Developer/source access is reviewed manually. Approved roles include viewer, data_contributor, reviewer, moderator, developer, partner, and admin.
Request statuses should move through submitted, needs_source, under_review, approved, rejected, and published.
API-key infrastructure should track key owner, rate limits, request logs, revoke/disable state, and terms acceptance timestamp before broad export access is enabled.
Architecture
The project is organized into three main layers:
ommu/
├── app/ # FastAPI web application
│ ├── main.py # App entry point, routing
│ ├── models/ # SQLAlchemy ORM models
│ ├── api/ # REST API endpoints
│ ├── services/ # Business logic
│ └── templates/ # Jinja2 templates
├── states/ # State-specific modules
│ ├── florida/ # Florida config, operators, scraper settings
│ ├── _template/ # Template for new states
│ └── regulatory_sources.py # Official state URLs reference
├── scraper/ # Scrapy project
│ └── spiders/ # State-specific spiders
├── scripts/ # CLI tools for seeding, migration
└── data/ # SQLite DB, PDFs, CSVs
Data Flow
- Scraper downloads PDFs from state regulatory websites
- Parser (pdfplumber) extracts tables from PDFs
- Normalizer applies operator aliases and validation rules
- Database stores normalized data with lineage tracking
- API/Dashboard serves data to users
Contributor Pathways
Pick the narrowest useful path. Code/source access is approved; source-backed public corrections are welcome.
Submit Sources & Corrections
Best for official source links, provenance notes, data discrepancies, and evidence-backed corrections.
- Submit verified regulator source URLs
- Flag parser or data-quality issues
- Submit data corrections
- Improve public methodology notes
- Cite primary sources where practical
Build, Review, or Partner
For developers, moderators/reviewers, and research partners who need deeper collaboration.
- Request source access for adapters, tests, API, or dashboard work
- Apply as State Lead or developer contributor
- Review submissions as moderator/reviewer
- Request research/partner collaboration
- Follow licensing and contribution terms
Use the contributor path to submit sources/corrections or request approved developer, reviewer, or partner access.
Add a New State
Adding a new state involves four main steps:
1. Create State Module
# Copy the template
cp -r states/_template states/massachusetts
cd states/massachusetts
2. Configure the State
Edit the following files in your new state folder:
| File | Purpose |
|---|---|
operators.py |
Define operators, aliases (name variations), junk names to filter |
regulatory.py |
State info, license types, regulatory timeline |
config.py |
Data source URLs, scraper settings, PDF parsing coordinates |
seed.py |
Database seeding functions |
3. Register the State
# In states/__init__.py
from . import massachusetts
AVAILABLE_STATES = {
"FL": florida,
"MA": massachusetts, # Add your state
}
4. Build the Spider
Add a Scrapy spider in scraper/spiders/ that downloads data from your state's regulatory portal.
Canonical sources must be verified public government or regulatory organizations. Use /api/sources/official and data/sources/official_regulatory_sources.json before adding a pipeline.
Data Sources Reference
We maintain a canonical registry of verified public government/regulatory portals. Private or unverified links may be useful as non-canonical references, but they are not source-of-truth inputs.
| State | Regulator | Format | Status |
|---|---|---|---|
| Florida | OMMU | Live | |
| Massachusetts | Cannabis Control Commission | CSV | Planned |
| California | Dept of Cannabis Control | HTML | Planned |
| Colorado | MED | CSV | Planned |
| Michigan | CRA | ArcGIS | Research |
Canonical registry API: /api/sources/official. Registry data: data/sources/official_regulatory_sources.json.
Where feasible, every source-backed claim should expose source URL, date fetched, last verified date, confidence, reviewer, correction history, and a challenge/correction link.
API Reference
Public summary endpoints return JSON. Base URL: /api. API/export access for regular programmatic use requires signup/auth/API key or contributor login.
Response note: newer payloads may expose operator-oriented fields such as operator_count and licenses_per_operator alongside backward-compatible legacy aliases.
Route note: prefer /api/operators and /api/operator/{name} in new consumers. Legacy /api/mmtc* paths remain available as compatibility routes during the transition.
| Endpoint | Method | Description |
|---|---|---|
/api/summary |
GET | Current KPIs and market share |
/api/timeseries |
GET | Historical data by metric |
/api/timeseries/all |
GET | All metrics in one response |
/api/operators |
GET | All operators with latest stats |
/api/operator/{name} |
GET | Single operator with full history |
/api/mmtcs, /api/mmtc/{name} |
GET | Legacy compatibility routes for older consumers |
/api/export/csv |
GET | Authenticated contributor export; bulk/commercial use requires approval |
Interactive docs: /docs (Swagger UI)
Data Models
Key database models:
WeeklyReport
Weekly dispensing data: THC, CBD, smokable oz, patient/physician counts per operator.
Operator
Licensed operators with license info, business structure, and parent company.
OperatorLineage
Tracks acquisitions, mergers, and rebrands over time.
DispensaryLocation
Individual dispensary locations with coordinates.
State Pipeline Status
Current status of state data pipelines:
| State | Status | Lead | Data Range |
|---|---|---|---|
| Florida | Complete | @GJManno | 2019 - Present |
| Massachusetts | Planned | Seeking | - |
| Colorado | Planned | Seeking | - |
| Oregon | Planned | Seeking | - |
Want to lead a state? Apply to become a State Lead.