snowball vs. lucene tokenizers
Лице кое објавува дискусија: Deborah Kolosova
Deborah Kolosova
Deborah Kolosova  Identity Verified
Соединети Американски Држави
руски на англиски
+ ...
Nov 3, 2011

The instructions on the OmegaT site for installing tokenizers say you should select the appropriate tokenizer from the list. For my source language, Russian, there are two listed: the SnowballRussianTokenizer and the LuceneRussianTokenizer. What is the difference, and which one is the best to use? Or do they each have their own advantages?

 
Susan Welsh
Susan Welsh  Identity Verified
Соединети Американски Држави
Local time: 03:12
руски на англиски
+ ...
I use lucene Nov 3, 2011

My recollection of past discussions is that lucene has a "stop word" function that snowball does not (meaning it ignores little irrelevant words like "and" and "the" when matching segments). Someone will probably correct me if I'm wrong. You can try them both and see what you like.

I translate from Russian, and lucene works great for me.

Susan


 


Нема назначено посебен модератор за овој форум.
Обратете се кај персоналот на сајтот » ако сакате да пријавите нарушување на правилата на сајтот или да добиете помош


snowball vs. lucene tokenizers






Pastey
Your smart companion app

Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.

Find out more »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »