English  English::PorterStemmer

Methods

[Validate]
Generated with RDazzle Newfish 1.3.0

PorterStemmer

An implementaion of the Porter Stemming algorithm by Martin Porter.

This is the Porter Stemming algorithm, ported to Ruby from the version coded up in Perl. It’s easy to follow against the rules in the original paper in:

  Porter, 1980, An algorithm for suffix stripping, Program, Vol. 14,
  no. 3, pp 130-137,

Taken from www.tartarus.org/~martin/PorterStemmer (Public Domain)

This version based on Ray Pereda’s stemmable.rb © 2003.

Public Class Methods

stem(word) click to toggle source
     # File lib/english/porter.rb, line 100
100:     def self.stem(word)
101:       # make a copy of the given object and convert it to a string.
102:       word = word.dup.to_str
103: 
104:       return word if word.length < 3
105: 
106:       # now map initial y to Y so that the patterns never treat it as vowel
107:       word[0] = 'Y' if word[0] == yy
108: 
109:       # Step 1a
110:       if word =~ /(ss|i)es$/
111:         word = $` + $1
112:       elsif word =~ /([^s])s$/
113:         word = $` + $1
114:       end
115: 
116:       # Step 1b
117:       if word =~ /eed$/
118:         word.chop! if $` =~ MGR0
119:       elsif word =~ /(ed|ing)$/
120:         stem = $`
121:         if stem =~ VOWEL_IN_STEM
122:           word = stem
123:           case word
124:             when /(at|bl|iz)$/             then word << "e"
125:             when /([^aeiouylsz])\11$$/       then word.chop!
126:             when /^#{CC}#{V}[^aeiouwxy]$/ then word << "e"
127:           end
128:         end
129:       end
130: 
131:       if word =~ /y$/
132:         stem = $`
133:         word = stem + "i" if stem =~ VOWEL_IN_STEM
134:       end
135: 
136:       # Step 2
137:       if word =~ PORTER_STEMS_RE[0]
138:         stem = $`
139:         suffix = $1
140:         # print "stem= " + stem + "\n" + "suffix=" + suffix + "\n"
141:         if stem =~ MGR0
142:           word = stem + PORTER_STEMS[0][suffix]
143:         end
144:       end
145: 
146:       # Step 3
147:       if word =~ PORTER_STEMS_RE[1]
148:         stem = $`
149:         suffix = $1
150:         if stem =~ MGR0
151:           word = stem + PORTER_STEMS[1][suffix]
152:         end
153:       end
154: 
155:       # Step 4
156:       if word =~ PORTER_STEMS_RE[2]
157:         stem = $`
158:         if stem =~ MGR1
159:           word = stem
160:         end
161:       elsif word =~ /(s|t)(ion)$/
162:         stem = $` + $1
163:         if stem =~ MGR1
164:           word = stem
165:         end
166:       end
167: 
168:       #  Step 5
169:       if word =~ /e$/
170:         stem = $`
171:         if (stem =~ MGR1) ||
172:             (stem =~ MEQ1 && stem !~ /^#{CC}#{V}[^aeiouwxy]$/)
173:           word = stem
174:         end
175:       end
176: 
177:       if word =~ /ll$/ && word =~ MGR1
178:         word.chop!
179:       end
180: 
181:       # and turn initial Y back to y
182:       word[0] = 'y' if word[0] == YY
183: 
184:       word
185:     end

Disabled; run with --debug to generate this.