Automating Data Ingestion

In the highly regulated cannabis industry, testing and compliance workflows are a constant challenge. They vary by state and are often reliant on manual, tedious processes. At Kiefa, we created a solution to help cultivators collect, organize, and act on compliance data more efficiently. It reduced errors and brought more structure to a critical part operations.

Client

Kiefa

Time

7 months

Tags

0 → 1, AI Powered

0 → 1, AI Powered

Impact

$31,000 +
Saved per quarter

1,248 hrs +
Saved per quarter

10,000's
Test results processed quarterly

My Role

As the sole designer and researcher on the team, I conducted on-site contextual inquiries, shadowing facility technicians to understand how they used our tools in the field. During one visit, I held an open feedback session and spoke with an employee whose full-time job was manually extracting data from thousands of printed lab reports. This was a job our team was not aware of and I knew I just uncovered something that could drive massive impact.

Problem

After each harvest, cannabis operators send samples to third-party labs for testing. This step is essential for ensuring product safety, supporting marketing claims, and meeting state compliance requirements. Once the lab results are received, operators must manually extract key data from hundreds of reports (each lab formatted differently) every week. The process is slow, error-prone, and labor-intensive, leading to high operational costs and an increasing risk of compliance issues.

An example of some varying lab result formats that operators need to parse through for the same information.

Solution

Working alongside engineering from the start, we mapped out a manual data entry MVP to validate the need for structured test result management. This allowed us to start ingesting data to: create visual analytics and highlight trends across harvests, a comparison view to help marketing differentiate strains, and an API integration to automate compliance uploads. These features proved useful in the interim as we worked on figuring out the backend of v2.

In version two, we eliminated the biggest pain point: manual data entry. We built a parsing engine using AWS Textract to extract values from PDFs, ChatGPT to interpret structure and isolate key compounds, and Levenshtein distance to accurately match them against our database. This shifted the product from a basic compliance utility to a reliable, insight-driven engine that reduced time, overhead and unlocked strategic value.

This page houses all COAs that have ben uploaded as well as allows users to review the data we’ve extracted.

Once values were automatically ingested, users could double-check for accuracy. To improve discoverability and accessibility, we highlighted extracted values in pink and auto-scrolled the document to align with the selected input field.

This feature allows users to select specific strains to compare and contrast specific test result values.

Under the "Test Results" section, users see the total averages based on all uploaded test results. They can click on a specific value to view changes over time.

Learnings + Next Steps

Start simple to validate assumptions: The manual entry MVP was crucial building out insights and reporting [Learnings]
Automation is only as good as its data: Matching logic and review flows needed constant tuning [Learnings]
Solving for compliance unlocked value for new personas: Sales and Marketing [Learnings]
Build alerting features to flag anomalies in test results [Next steps]
Further refine parsing with AI model fine-tuning [Next steps]
Explore integrations with inventory and production planning tools [Next steps]
F