📊

Dataset Profile

Content Category • Version v1

Data collections and scientific datasets

✅ Best Practices

  • Use schema:Dataset with proper name, description, and distribution information.
  • Include dataset format, size, and access information.
  • Add license and usage rights when available.
  • Use structured metadata for better data discovery.
  • Include data quality and provenance information.

❌ Avoid These

  • Do not use for general content or individual data points.
  • Do not use for articles or blog posts about data.
  • Do not use for software applications or tools.
  • Do not use for live data feeds or real-time data.

📄 Profile Definition

JSON-LD profile definition with all properties and constraints

JSON-LD ~2KB

🔧 Page Schema

JSON Schema for validating page markup and on-page structured data

JSON Schema ~3KB

📊 Output Schema

JSON Schema for RAG/ML output validation and data extraction

JSON Schema ~4KB

🎓 Training Data

Sample training data in JSONL format for fine-tuning LLMs

JSONL ~5KB

Implementation Examples

Learn how to implement this profile with real-world examples:

Minimal Example

Basic implementation with required properties only

View Example

Rich Example

Full-featured implementation with all optional properties

View Example

Basic Implementation

<script type="application/ld+json">
{
  "@context": "https://llmprofiles.org/profiles/content/dataset/v1/",
  "@type": "Dataset",
  "name": "Example Dataset",
  "description": "This is an example Dataset implementation."
}
</script>

Schema Information

This profile is based on Schema.org Dataset and extends it with LLM-specific properties and constraints.

Key Properties:

  • @context: Profile context URL
  • @type: Dataset
  • name: Required name/title
  • description: Required description

Validation Tools

Use these tools to validate your implementation: