Pages in topic: < [1 2 3] | How to convert TMX to tab-delimited? Thread poster: Hans Lenting
| Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER
Samuel Murray wrote: Yes, so you have to first replace all whitespace characters (except spaces, duh) with replacement characters. ... = horizontal tab Thank you for reminding me of that one! I'll add a rule to the TextFactory. Or just replace \n with ① and replace \t with ② throughout the file -- no need to restrict it to segments, for since you're not going to use the TMX file after this
I'll use the tab-delimited file for several purposes. One of them is ... creating a cleaned and smaller TMX. That TextFactory will be pretty straightforward. BTW: BBEdit introduces the Text Factory, which allows you to assemble a list of text transformations that will be applied in order to either the current document or selection (when invoked as a filter), or to a specified list of files and folders (when invoked via the Scripts menu). | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER | Why not use CafeTran Espresso? | Oct 27, 2022 |
Could you simply use CafeTran Espresso for that conversion? 1. Create or open a project with the required language pair. 2. Open or Import the TMX file (or an SDLTB/TBX, which will be automatically converted to TMX), possibly not as read-only and with fragments enabled 3. Select the tab of the glossary you wish to import into (an empty Project Terms page will do, or you create a new glossary and select its tab) 4. Memory menu > Export > Export segments to glossary.... See more Could you simply use CafeTran Espresso for that conversion? 1. Create or open a project with the required language pair. 2. Open or Import the TMX file (or an SDLTB/TBX, which will be automatically converted to TMX), possibly not as read-only and with fragments enabled 3. Select the tab of the glossary you wish to import into (an empty Project Terms page will do, or you create a new glossary and select its tab) 4. Memory menu > Export > Export segments to glossary. A dialog will ask you to select which memory to import segments from. And if the currently selected/opened tab is not a glossary, it will first ask you to select one. That's it. CafeTran also includes some TM Filter options, including one called "Clean and replace foreign codes": Some TMX files from third-party tools have unusual codes in the segments such as codes inside the curly brackets or emdash, endash, tab code. CafeTran clears or replaces them with equivalent unicode characters. https://github.com/idimitriadis0/TheCafeTranFiles/wiki/3-TM-options#tm-filter-options If needed, prior TMX editing (including search and replace, with or without regular expressions) can also be done from within CafeTran.
[Edited at 2022-10-27 05:50 GMT] ▲ Collapse | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER
Jean Dimitriadis wrote: Could you simply use CafeTran Espresso for that conversion? 1. Create or open a project with the required language pair. 2. Open or Import the TMX file (or an SDLTB/TBX, which will be automatically converted to TMX), possibly not as read-only and with fragments enabled 3. Select the tab of the glossary you wish to import into (an empty Project Terms page will do, or you create a new glossary and select its tab) 4. Memory menu > Export > Export segments to glossary. A dialog will ask you to select which memory to import segments from. And if the currently selected/opened tab is not a glossary, it will first ask you to select one. That's it. CafeTran also includes some TM Filter options, including one called "Clean and replace foreign codes": Some TMX files from third-party tools have unusual codes in the segments such as codes inside the curly brackets or emdash, endash, tab code. CafeTran clears or replaces them with equivalent unicode characters. https://github.com/idimitriadis0/TheCafeTranFiles/wiki/3-TM-options#tm-filter-options If needed, prior TMX editing (including search and replace, with or without regular expressions) can also be done from within CafeTran. [Edited at 2022-10-27 05:50 GMT] I am familiar with this procedure. However, it is extremely slow. This takes ages. Besides that, I like to have an alternative solution that I can use as a framework and possibly integrate in my workflows.
[Edited at 2022-10-27 07:18 GMT] | |
|
|
Dan Lucas United Kingdom Local time: 00:30 Member (2014) Japanese to English
Stepan Konev wrote: If that MacOS text editor can mark the match, you can use the following regex: to mark and then copy all segments to clipboard Although I've only tried it on one file, this typically clever solution from Stepan seems to work well in Notepad++ here - much appreciated. Given that we already have the regex, it looks like an obvious choice for a tiny script in the programming language of one's choice (probably just from the command line in Perl!). I have never actually needed to convert TMX to tab-delimited, but it's nice to know that it's possible. Thanks to Hans and other contributors for the topic. Dan | | | Pages in topic: < [1 2 3] | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » How to convert TMX to tab-delimited? TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |