Python Developer – Transition Repetition Analysis Module
Milestone 1 – Linguistic QA Validator (French Transition Rules)
🎯 Objective
Develop a Python module to validate batches of AI-generated French transition phrases. This module ensures:
1. No stylistically significant word repetition across transitions in a group
2. “Enfin” is used only in the final transition of each group
3. Grammatical stopwords (like "le", "de", "à", "et") are excluded from repetition checks
📁 Module Target
File: utils/validate_prompt_compliance.py
📚 Definitions
✅ "Repetition" Violation
Flag repeated meaningful words in a group of transitions.
Use a French stopword list to ignore non-stylistic words such as:
["le", "la", "les", "de", "des", "un", "une", "à", "et", "en", "du", "par", "que", "si", "ce", "sur"]
🛑 "Enfin" Misuse
Flag if “enfin” appears in any position other than the last transition in a group.
🧩 Required Functions
tokenize(text: str) -> List[str]: Normalize case, remove punctuation, return word tokens
check_transition_group(transitions: List[str]) -> Dict:
Example return:
{
"repetition": ["par", "direction"],
"enfin_misplaced": True
}
validate_batch(batch_outputs: List[List[str]]) -> Dict: Returns summary of violations and per-output breakdown
📤 Output Format (Example)
{
"total_outputs": 5,
"outputs_with_violations": 4,
"violations_summary": {
"repetition": {
"count": 3,
"affected_outputs": [1, 2, 4],
"violated_words": ["par", "direction", "dans"]
},
"enfin_misplaced": {
"count": 1,
"affected_outputs": [3]
}
},
"details": [
{
"output_id": 1,
"transitions": ["Par ailleurs,", "Par contre,", "Par exemple,"],
"violations": {"repetition": ["par"]}
},
{
"output_id": 2,
"transitions": ["Prenons la direction de Paris,", "Ensuite, prenons la direction de Lyon,", "Enfin, une note sur Marseille"],
"violations": {"repetition": ["prenons", "direction"]}
},
{
"output_id": 3,
"transitions": ["Enfin, une annonce importante", "Puis une autre nouvelle", "Pour conclure,"],
"violations": {"enfin_misplaced": true}
},
{
"output_id": 4,
"transitions": ["Dans un autre registre,", "Dans la même région,", "Encore dans le domaine économique,"],
"violations": {"repetition": ["dans"]}
},
{
"output_id": 5,
"transitions": ["À noter également,", "Nous terminons avec cette info :", "Pour finir,"],
"violations": {}
}
]
}
✅ Completion Criteria
- tokenize() correctly splits and lowercases all transition text
- Repetition logic excludes stopwords
- enfin_misplaced triggers only when “enfin” is not last
- All outputs match the JSON schema above
- Module is testable and cleanly structured
🧠 Skills Required
- Python 3
- Regex and tokenization
- Set logic and dictionaries
- JSON formatting
- NLP or editorial QA experience (preferred)