EXTRACT STATEMENTS.
VISUALIZE RELATIONSHIPS.

Transform unstructured text into structured statements using our T5-Gemma 2 model. Identify subjects, objects, predicates, and entity types automatically.

Try an example:
0 / 10,000

Powered by corp-o-rate

Statements

No statements extracted yet.

Enter some text and click "Extract Statements" to begin.

Relationship Graph

Graph will appear after extracting statements

Documentation

# Installation
pip install transformers torch

# Usage
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained(
    "Corp-o-Rate-Community/statement-extractor",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "Corp-o-Rate-Community/statement-extractor",
    trust_remote_code=True,
)

# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

def extract_statements(text: str) -> str:
    """Extract statements from text."""
    # Wrap text in page tags
    inputs = tokenizer(
        f"<page>{text}</page>",
        return_tensors="pt",
        max_length=4096,
        truncation=True
    ).to(device)

    # Generate
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        num_beams=4,
        do_sample=False,
    )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
text = "Apple Inc. announced a commitment to carbon neutrality by 2030."
result = extract_statements(text)
print(result)

For AI Assistants

AI Assistant Prompt

Copy this prompt for Claude Code, Cursor, or other AI assistants

## Using the Statement Extractor Model

When extracting statements from text, use the corp-o-rate statement extraction model:

**Model:** `Corp-o-Rate-Community/statement-extractor`
**Architecture:** T5-Gemma 2 (seq2seq, 540M params)

### Input Format
Wrap your text in `<page>` tags:
```
<page>Your text here...</page>
```

### Output Format
The model outputs XML with extracted statements:
```xml
<statements>
  <stmt>
    <subject type="ENTITY_TYPE">Subject Name</subject>
    <object type="ENTITY_TYPE">Object Name</object>
    <predicate>action/relationship</predicate>
    <text>Full resolved statement text</text>
  </stmt>
</statements>
```

### Python Example
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch

model = AutoModelForSeq2SeqLM.from_pretrained(
    "Corp-o-Rate-Community/statement-extractor",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "Corp-o-Rate-Community/statement-extractor",
    trust_remote_code=True,
)

def extract_statements(text: str) -> str:
    inputs = tokenizer(f"<page>{text}</page>", return_tensors="pt", max_length=4096, truncation=True)
    outputs = model.generate(**inputs, max_new_tokens=2048, num_beams=4)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)
```

### Entity Types
- ORG - Organizations (companies, agencies)
- PERSON - People (names, titles)
- GPE - Geopolitical entities (countries, cities)
- LOC - Locations (mountains, rivers)
- PRODUCT - Products (devices, services)
- EVENT - Events (announcements, meetings)
- WORK_OF_ART - Creative works (reports, books)
- LAW - Legal documents
- DATE - Dates and time periods
- MONEY - Monetary values
- PERCENT - Percentages
- QUANTITY - Quantities and measurements

### Tips for Best Results
1. Provide clear, well-structured text (news articles, reports work well)
2. The model handles coreference resolution (replaces pronouns with entity names)
3. Each statement includes the full resolved text for context
4. For large documents, consider chunking by paragraph or section
Add this to your project's CLAUDE.md or AI assistant configuration

T5-Gemma 2 Statement Extractor

This model is based on Google's T5-Gemma 2 architecture (540M parameters) and has been fine-tuned on 77,515 examples of statement extraction from corporate and news documents.

Key capabilities:

  • Extract subject-predicate-object triples from text
  • Identify entity types (ORG, PERSON, GPE, EVENT, etc.)
  • Resolve coreferences (pronouns → entity names)
  • Generate full resolved statement text

Training details: Final eval loss of 0.209, trained with beam search (num_beams=4) for high-quality outputs.