In Files
Methods
Information
PorterStemmer
An implementaion of the Porter Stemming algorithm by Martin Porter.
This is the Porter Stemming algorithm, ported to Ruby from the version coded up in Perl. It’s easy to follow against the rules in the original paper in:
Porter, 1980, An algorithm for suffix stripping, Program, Vol. 14, no. 3, pp 130-137,
Taken from www.tartarus.org/~martin/PorterStemmer (Public Domain)
This version based on Ray Pereda’s stemmable.rb © 2003.
Constants
Public Class Methods
stem(word)
click to toggle source
# File lib/english/porter.rb, line 100 100: def self.stem(word) 101: # make a copy of the given object and convert it to a string. 102: word = word.dup.to_str 103: 104: return word if word.length < 3 105: 106: # now map initial y to Y so that the patterns never treat it as vowel 107: word[0] = 'Y' if word[0] == yy 108: 109: # Step 1a 110: if word =~ /(ss|i)es$/ 111: word = $` + $1 112: elsif word =~ /([^s])s$/ 113: word = $` + $1 114: end 115: 116: # Step 1b 117: if word =~ /eed$/ 118: word.chop! if $` =~ MGR0 119: elsif word =~ /(ed|ing)$/ 120: stem = $` 121: if stem =~ VOWEL_IN_STEM 122: word = stem 123: case word 124: when /(at|bl|iz)$/ then word << "e" 125: when /([^aeiouylsz])\11$$/ then word.chop! 126: when /^#{CC}#{V}[^aeiouwxy]$/ then word << "e" 127: end 128: end 129: end 130: 131: if word =~ /y$/ 132: stem = $` 133: word = stem + "i" if stem =~ VOWEL_IN_STEM 134: end 135: 136: # Step 2 137: if word =~ PORTER_STEMS_RE[0] 138: stem = $` 139: suffix = $1 140: # print "stem= " + stem + "\n" + "suffix=" + suffix + "\n" 141: if stem =~ MGR0 142: word = stem + PORTER_STEMS[0][suffix] 143: end 144: end 145: 146: # Step 3 147: if word =~ PORTER_STEMS_RE[1] 148: stem = $` 149: suffix = $1 150: if stem =~ MGR0 151: word = stem + PORTER_STEMS[1][suffix] 152: end 153: end 154: 155: # Step 4 156: if word =~ PORTER_STEMS_RE[2] 157: stem = $` 158: if stem =~ MGR1 159: word = stem 160: end 161: elsif word =~ /(s|t)(ion)$/ 162: stem = $` + $1 163: if stem =~ MGR1 164: word = stem 165: end 166: end 167: 168: # Step 5 169: if word =~ /e$/ 170: stem = $` 171: if (stem =~ MGR1) || 172: (stem =~ MEQ1 && stem !~ /^#{CC}#{V}[^aeiouwxy]$/) 173: word = stem 174: end 175: end 176: 177: if word =~ /ll$/ && word =~ MGR1 178: word.chop! 179: end 180: 181: # and turn initial Y back to y 182: word[0] = 'y' if word[0] == YY 183: 184: word 185: end
Disabled; run with --debug to generate this.