NLP MODEL DEMOEXTRACT STATEMENTS.
EXTRACT STATEMENTS.
VISUALIZE RELATIONSHIPS.
Transform unstructured text into structured statements using our T5-Gemma 2 model. Identify subjects, objects, predicates, and entity types automatically.
Statements
No statements extracted yet.
Enter some text and click "Extract Statements" to begin.
Relationship Graph
Graph will appear after extracting statements
Documentation
# Installation
pip install transformers torch
# Usage
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained(
"Corp-o-Rate-Community/statement-extractor",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"Corp-o-Rate-Community/statement-extractor",
trust_remote_code=True,
)
# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
def extract_statements(text: str) -> str:
"""Extract statements from text."""
# Wrap text in page tags
inputs = tokenizer(
f"<page>{text}</page>",
return_tensors="pt",
max_length=4096,
truncation=True
).to(device)
# Generate
outputs = model.generate(
**inputs,
max_new_tokens=2048,
num_beams=4,
do_sample=False,
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example usage
text = "Apple Inc. announced a commitment to carbon neutrality by 2030."
result = extract_statements(text)
print(result)For AI Assistants
AI Assistant Prompt
Copy this prompt for Claude Code, Cursor, or other AI assistants
## Using the Statement Extractor Model
When extracting statements from text, use the corp-o-rate statement extraction model:
**Model:** `Corp-o-Rate-Community/statement-extractor`
**Architecture:** T5-Gemma 2 (seq2seq, 540M params)
### Input Format
Wrap your text in `<page>` tags:
```
<page>Your text here...</page>
```
### Output Format
The model outputs XML with extracted statements:
```xml
<statements>
<stmt>
<subject type="ENTITY_TYPE">Subject Name</subject>
<object type="ENTITY_TYPE">Object Name</object>
<predicate>action/relationship</predicate>
<text>Full resolved statement text</text>
</stmt>
</statements>
```
### Python Example
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
model = AutoModelForSeq2SeqLM.from_pretrained(
"Corp-o-Rate-Community/statement-extractor",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"Corp-o-Rate-Community/statement-extractor",
trust_remote_code=True,
)
def extract_statements(text: str) -> str:
inputs = tokenizer(f"<page>{text}</page>", return_tensors="pt", max_length=4096, truncation=True)
outputs = model.generate(**inputs, max_new_tokens=2048, num_beams=4)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
```
### Entity Types
- ORG - Organizations (companies, agencies)
- PERSON - People (names, titles)
- GPE - Geopolitical entities (countries, cities)
- LOC - Locations (mountains, rivers)
- PRODUCT - Products (devices, services)
- EVENT - Events (announcements, meetings)
- WORK_OF_ART - Creative works (reports, books)
- LAW - Legal documents
- DATE - Dates and time periods
- MONEY - Monetary values
- PERCENT - Percentages
- QUANTITY - Quantities and measurements
### Tips for Best Results
1. Provide clear, well-structured text (news articles, reports work well)
2. The model handles coreference resolution (replaces pronouns with entity names)
3. Each statement includes the full resolved text for context
4. For large documents, consider chunking by paragraph or sectionAdd this to your project's CLAUDE.md or AI assistant configuration
ABOUT THE MODEL
T5-Gemma 2 Statement Extractor
This model is based on Google's T5-Gemma 2 architecture (540M parameters) and has been fine-tuned on 77,515 examples of statement extraction from corporate and news documents.
Key capabilities:
- Extract subject-predicate-object triples from text
- Identify entity types (ORG, PERSON, GPE, EVENT, etc.)
- Resolve coreferences (pronouns → entity names)
- Generate full resolved statement text
Training details: Final eval loss of 0.209, trained with beam search (num_beams=4) for high-quality outputs.