Attention: You are using an outdated browser, device or you do not have the latest version of JavaScript downloaded and so this website may not work as expected. Please download the latest software or switch device to avoid further issues.

LEARN > Blogs > Unlocking Data with AI: Why Prompting Frameworks Matter

Unlocking Data with AI: Why Prompting Frameworks Matter

As public servants and the private sector seek to integrate AI systems, prompting frameworks can offer a foundation for consistent, effective interaction.
29 Jul 2025
Written by Priscilla Kang
Blogs
Blog, an abstract digital background with lines and dots
Blog, an abstract digital background with lines and dots

As public servants and the private sector navigate ways to integrate Artificial Intelligence (AI) systems into their operations, prompting frameworks can provide a foundation for interacting with AI in consistent, effective ways. The Trump Administration is interested in accelerating wide-spread use of AI in the federal government, as seen in their July 2025 AI Action Plan. A key element of the plan is ensuring that agencies are equipped not only with state-of-the-art tools, but also the workforce training needed to deploy those tools effectively. 

AI offers an opportunity to reimagine how governments interact with their own data by not only maintaining quality and integrity, but actively upgrading legacy information systems into usable formats that support public decision-making. As AI becomes increasingly embedded in the public sector, the role of data policy must evolve from a guardian of compliance to an enabler of capability. 

AI and AI-powered tools can create systems that minimize the tedious tasks of transcribing data into a malleable format that can be used by humans and machines to inform decision-making. Increasingly used to create a more expansive and efficient data system through “unlocking data,” AI’s effectiveness in these tasks hinges on one critical and often overlooked component: How we prompt it.

Prompting AI is a phrase used to describe the way people ask AI tools to do a certain task, often through typing out requests or queries. Human intuition about how to phrase instructions, or design prompts, does not always align with how models interpret those instructions. Research has demonstrated that prompt design has a significant impact on output quality for AI tools. Poorly structured or vague prompts frequently result in inconsistent or incomplete outputs. To reduce these poor outputs, it is important to create effective prompts for AI tools to produce the desired output, which is known as prompt engineering. And prompt frameworks can help the average user understand and craft cohesive prompts for specific tasks.

Prompt engineering is important for many operations that rely on AI tools, like unlocking data. Unlocking data refers to the process that involves extracting information buried in PDFs, scans, legal documents, or other unstructured formats(1) and converting it into machine-readable, accessible formats. AI unlocks data by automatically identifying key fields, extracting the right information, and organizing it into a structured format – freeing people to focus on analysis rather than manually transferring one data field from a PDF to a digital spreadsheet.

However, Large Language Models (LLMs), like OpenAI’s ChatGPT or Anthropic’s Claude, can produce fabricated data, what are often referred to as “hallucinations,” which are intensified when prompts are ambiguous or when the task context is unclear. A poorly constructed prompt for unlocking data could result in data pulls that include incorrect values, omitted data, or misaligned field labels, leading to hallucinations. Errors from hallucinations are easier to spot in small datasets, but with LLMs, which process huge amounts of data, bad prompts can create errors on a much bigger scale. If an LLM is repeatedly prompted incorrectly, and those outputs are used or reinforced without correction, the model starts to treat those patterns as valid. Poor prompting might initiate a negative feedback loop where the AI tool repeats its own mistakes, making future outputs even less reliable. Having proper prompts and reliable quality checks to minimize risk of hallucinations are, therefore, important.

A prompting framework can help deter human errors within the human-computer interaction. Prompting frameworks prevent the AI from making harmful assumptions that could lead to undesirable patterns, while ensuring the generation of desired outputs. By crafting effective prompts, organizations can enhance customer experiences, streamline processes, and make data-driven decisions with greater precision.

By giving AI systems consistent, structured instructions, users not only improve the immediate output but also help shape the model’s internal patterns and tendencies. Over time, consistent prompting can improve the quality of outputs and reinforce norms and expectations that enhance reliability across tasks. 

This iterative process, where humans refine inputs and observe outputs and AI adjusts accordingly to the feedback, forms the foundation for more responsive and intelligent AI behavior. For tasks that have clearly defined right and wrong answers like unlocking data, this process is especially important.

 

Why Unlocking Data Isn’t Plug-and-Play

Imagine a city government trying to digitize years of building permit data stored in PDFs. In an ideal world, the government official could just input all the PDFs into an AI model and have a perfectly organized spreadsheet pop out. While this is not a very distant dream, the AI model needs more information than just the PDF to understand what the desired spreadsheet should look like and which information the model should focus on. It needs context.

A good prompt encompasses what the final output should look like, which information to pull from and which to ignore, and how the AI model should handle unclear or missing data. That way, a city official could digitize a single building permit in a spreadsheet by feeding both a prompt and an example of a finished product into an LLM to “teach” the AI model how they would want their output to look like. The result: a more reliable, better-trained tool for unlocking data from similar PDFs. 

Prompting Framework inclusions, applicable for beginner, intermediate, and advanced users, as noted:

  1. Clear Objective Definition (intermediate and advanced): Define what the user is trying to achieve to help focus its goals. 
    • Is it extracting data into a table? Mapping to a specific taxonomy? Identifying discrepancies? 
  2. Contextual Metadata (beginner through advanced; metadata should already be included in documents): Include information like document type, date range, regulatory context, and the data's intended use. Metadata aids in semantic ambiguity resolution and supports interoperability
  3. Output Format Requirements (beginner through advanced): Specify the desired structure (CSV, JSON, XML), taxonomy (XBRL, FHIR), and any field-level formatting expectations (dates, currency, etc.).
  4. Semantic Modeling Guidance (intermediate and advanced): Provide instructions aligned with a semantic model – not just what to extract, but what it means. For example, “date issued” and “date filed” should be recognized as distinct.
  5. Verified Assumptions and Constraints (intermediate and advanced): Preemptively include known truths or boundaries the model should verify. This creates a kind of “cognitive scaffolding” that functions as a lightweight internal verifier. (“All invoice totals should equal the sum of their line items.”)
  6. Feedback Loop with Cognitive Verification (advanced): AI should be prompted to flag uncertainties, contradictions, or potential anomalies in the data it extracts to become a collaborator in quality assurance.
  7. Examples and Templates (beginner through advanced): Showing a few sample documents alongside correct outputs can drastically improve accuracy by anchoring model behavior.
  8. Chat history and maintenance (beginner through advanced): To maintain efficiency, certain chats should only be used for a certain task so that the AI model has the previous training and framework to refer to
  9. Prompt Hygiene and Versioning (intermediate and advanced): Prompts should be reusable and documented so teams can learn from past errors and gradually improve prompt performance over time.

Example prompt for the city building permit data, each number corresponds to the prompt inclusions listed above:

  1. Extract structured data from the attached building permit PDF and convert it into a list 
  2. This building permit PDF is from 2021 and has the relevant metadata already included within the document
  3. Provide output in a table that can be copy pasted into a spreadsheet, where each key corresponds to a field in the permit. Use only the information provided in the attached PDF. Do not infer or fabricate values. Dates should be formatted as YYYY-MM-DD
  4. Treat “issue date” and “expiration date” as separate, non-interchangeable fields
  5. All expiration dates should be exactly one year from the issue date
  6. If a value is missing or illegible, leave the corresponding field blank. “Zoning code” must match the terminology used in the document verbatim. Validate that the permit number follows the format BP-YYYY-####. If not, flag it. If any fields are ambiguous or conflicting, flag them in the output using “flags”
  7. Example Reference Output (for another permit):
    Permit Number Issue Date Expiration Date Applicant Name Property Address Zoning Code Project Description Estimated Cost Permit Status
    BP-2021-1024 2021-09-30 2022-09-30 John Doe Construction 123 Maple St. R-3 New single-family home construction $350,000 Approved
  8. This chat will be used exclusively for extracting and structuring building permit data. Please retain formatting consistency and follow the schema and instructions for all future documents in this session.

Reference

  1. Unstructured data formats are types of information that lack a predefined data model, like images or text documents, making them difficult for machines to easily interpret or analyze
image

DATA FOUNDATION
1100 13TH STREET NORTHWEST
SUITE 800, WASHINGTON, DC
20005, UNITED STATES

INFO@DATAFOUNDATION.ORG

This website is powered by
ToucanTech