Python
Verb Tenses
Natural Language Processing
Programming
Linguistics

Identifying verb tenses in python

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

Identifying verb tenses in text is a common NLP task used in grammar checkers, language learning apps, and text analysis pipelines. Python's NLP libraries — NLTK, spaCy, and pattern — can detect verb forms through part-of-speech (POS) tagging. POS tags encode tense information, and mapping those tags to human-readable tense labels gives you a practical verb tense detector.

Penn Treebank Verb Tags

Most English POS taggers use the Penn Treebank tagset, which assigns specific tags to verb forms:

TagMeaningExample
VBBase form"walk", "be"
VBDPast tense"walked", "was"
VBGGerund / present participle"walking", "being"
VBNPast participle"walked", "been"
VBPPresent tense, non-3rd person"walk", "are"
VBZPresent tense, 3rd person singular"walks", "is"

Using NLTK

NLTK provides POS tagging through nltk.pos_tag() after tokenization.

python
1import nltk
2nltk.download("averaged_perceptron_tagger_eng", quiet=True)
3nltk.download("punkt_tab", quiet=True)
4
5text = "She walked to the store and is buying groceries."
6tokens = nltk.word_tokenize(text)
7tagged = nltk.pos_tag(tokens)
8
9for word, tag in tagged:
10    if tag.startswith("VB"):
11        print(f"{word:15s} -> {tag}")
12
13# walked          -> VBD  (past tense)
14# is              -> VBZ  (present, 3rd person)
15# buying          -> VBG  (gerund/present participle)

Mapping Tags to Tense Labels

python
1TENSE_MAP = {
2    "VB": "base/infinitive",
3    "VBD": "past",
4    "VBG": "present participle",
5    "VBN": "past participle",
6    "VBP": "present (non-3rd)",
7    "VBZ": "present (3rd person)",
8}
9
10def get_verb_tenses(text):
11    tokens = nltk.word_tokenize(text)
12    tagged = nltk.pos_tag(tokens)
13    return [(word, TENSE_MAP[tag]) for word, tag in tagged if tag in TENSE_MAP]
14
15results = get_verb_tenses("He runs every day and has trained for months.")
16for word, tense in results:
17    print(f"{word:15s} -> {tense}")
18
19# runs            -> present (3rd person)
20# has             -> VBZ -> present (3rd person)
21# trained         -> past participle

Using spaCy

spaCy provides richer morphological analysis, including tense as a separate feature.

python
1import spacy
2
3nlp = spacy.load("en_core_web_sm")
4doc = nlp("She walked to the store and is buying groceries.")
5
6for token in doc:
7    if token.pos_ == "VERB" or token.pos_ == "AUX":
8        morph = token.morph.to_dict()
9        tense = morph.get("Tense", "unknown")
10        aspect = morph.get("Aspect", "")
11        print(f"{token.text:15s} POS={token.tag_:5s} Tense={tense} Aspect={aspect}")
12
13# walked          POS=VBD   Tense=Past  Aspect=
14# is              POS=VBZ   Tense=Pres  Aspect=
15# buying          POS=VBG   Tense=Pres  Aspect=Prog

spaCy's morphological features distinguish tense from aspect, allowing you to detect progressive ("is walking"), perfect ("has walked"), and simple forms.

Detecting Compound Tenses

Simple POS tagging identifies individual verb forms but not compound tenses like "has been walking" (present perfect progressive). To detect these, analyze auxiliary chains.

python
1def detect_compound_tense(doc):
2    results = []
3    for token in doc:
4        if token.pos_ == "VERB":
5            auxiliaries = [child for child in token.children if child.dep_ == "aux"]
6            aux_text = " ".join(a.text for a in auxiliaries)
7            full_verb = f"{aux_text} {token.text}".strip()
8            results.append((full_verb, token.tag_, token.morph.to_dict()))
9    return results
10
11doc = nlp("She has been running for two hours.")
12for verb, tag, morph in detect_compound_tense(doc):
13    print(f"{verb:25s} tag={tag} morph={morph}")
14
15# has been running          tag=VBG morph={'Aspect': 'Prog', 'Tense': 'Pres', 'VerbForm': 'Part'}

Using the pattern Library

The pattern library can conjugate and lemmatize verbs, making it useful for tense detection by comparing forms.

python
1from pattern.en import tenses, conjugate
2
3word = "running"
4detected = tenses(word)
5print(f"{word}: {detected}")
6# running: [('present', 'participle')]
7
8word = "walked"
9detected = tenses(word)
10print(f"{word}: {detected}")
11# walked: [('past', '1st', 'singular', 'indicative'), ...]

Common Pitfalls

  • Relying on POS tags alone for compound tenses — "has been walking" tags each word separately, so you need to analyze auxiliary verb chains to determine the full tense.
  • Not downloading required NLTK data — pos_tag fails silently or crashes without averaged_perceptron_tagger and punkt.
  • Confusing past tense (VBD) with past participle (VBN) — "walked" can be either, depending on context. POS taggers use surrounding context to disambiguate, but errors occur.
  • Treating gerunds (VBG) as always present tense — "Walking is good exercise" uses "walking" as a noun (gerund), not a present-tense verb.
  • Assuming spaCy and NLTK always agree — different models produce different tags for ambiguous cases. Pick one and be consistent.

Summary

  • POS tags (VBD, VBZ, VBG, VBN, VBP, VB) encode verb tense information in the Penn Treebank tagset.
  • Use NLTK's pos_tag for quick tense detection or spaCy for richer morphological analysis.
  • Map POS tags to human-readable tense labels with a lookup dictionary.
  • Detect compound tenses by analyzing auxiliary verb chains in the dependency tree.
  • Always validate tense detection against your specific domain text — POS taggers have error rates on informal or domain-specific language.

Course illustration
Course illustration

All Rights Reserved.