Pages in topic:   < [1 2 3]
How to convert TMX to tab-delimited?
Thread poster: Hans Lenting
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Nice Oct 16, 2022

Samuel Murray wrote:

Yes, so you have to first replace all whitespace characters (except spaces, duh) with replacement characters.
...
= horizontal tab


Thank you for reminding me of that one! I'll add a rule to the TextFactory.


Or just replace \n with ① and replace \t with ② throughout the file -- no need to restrict it to segments, for since you're not going to use the TMX file after this


I'll use the tab-delimited file for several purposes. One of them is ... creating a cleaned and smaller TMX. That TextFactory will be pretty straightforward.

BTW:
BBEdit introduces the Text Factory, which allows you to assemble a list of text transformations that will be applied in order to either the current document or selection (when invoked as a filter), or to a specified list of files and folders (when invoked via the Scripts menu).


Screen Shot 2022-10-16 at 07.54.11


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Solution for Mac Oct 27, 2022

For a solution for Mac, see: https://www.proz.com/post/2975518#2975518

 
Jean Dimitriadis
Jean Dimitriadis  Identity Verified
English to French
+ ...
Why not use CafeTran Espresso? Oct 27, 2022

Could you simply use CafeTran Espresso for that conversion?

1. Create or open a project with the required language pair.
2. Open or Import the TMX file (or an SDLTB/TBX, which will be automatically converted to TMX), possibly not as read-only and with fragments enabled
3. Select the tab of the glossary you wish to import into (an empty Project Terms page will do, or you create a new glossary and select its tab)
4. Memory menu > Export > Export segments to glossary.
... See more
Could you simply use CafeTran Espresso for that conversion?

1. Create or open a project with the required language pair.
2. Open or Import the TMX file (or an SDLTB/TBX, which will be automatically converted to TMX), possibly not as read-only and with fragments enabled
3. Select the tab of the glossary you wish to import into (an empty Project Terms page will do, or you create a new glossary and select its tab)
4. Memory menu > Export > Export segments to glossary. A dialog will ask you to select which memory to import segments from. And if the currently selected/opened tab is not a glossary, it will first ask you to select one.

That's it.

CafeTran also includes some TM Filter options, including one called "Clean and replace foreign codes": Some TMX files from third-party tools have unusual codes in the segments such as codes inside the curly brackets or emdash, endash, tab code. CafeTran clears or replaces them with equivalent unicode characters.

https://github.com/idimitriadis0/TheCafeTranFiles/wiki/3-TM-options#tm-filter-options

If needed, prior TMX editing (including search and replace, with or without regular expressions) can also be done from within CafeTran.

[Edited at 2022-10-27 05:50 GMT]
Collapse


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Too slow Oct 27, 2022

Jean Dimitriadis wrote:

Could you simply use CafeTran Espresso for that conversion?

1. Create or open a project with the required language pair.
2. Open or Import the TMX file (or an SDLTB/TBX, which will be automatically converted to TMX), possibly not as read-only and with fragments enabled
3. Select the tab of the glossary you wish to import into (an empty Project Terms page will do, or you create a new glossary and select its tab)
4. Memory menu > Export > Export segments to glossary. A dialog will ask you to select which memory to import segments from. And if the currently selected/opened tab is not a glossary, it will first ask you to select one.

That's it.

CafeTran also includes some TM Filter options, including one called "Clean and replace foreign codes": Some TMX files from third-party tools have unusual codes in the segments such as codes inside the curly brackets or emdash, endash, tab code. CafeTran clears or replaces them with equivalent unicode characters.

https://github.com/idimitriadis0/TheCafeTranFiles/wiki/3-TM-options#tm-filter-options

If needed, prior TMX editing (including search and replace, with or without regular expressions) can also be done from within CafeTran.

[Edited at 2022-10-27 05:50 GMT]


I am familiar with this procedure. However, it is extremely slow.

Screen Shot 2022-10-27 at 09.17.09

This takes ages.

Besides that, I like to have an alternative solution that I can use as a framework and possibly integrate in my workflows.



[Edited at 2022-10-27 07:18 GMT]


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 00:30
Member (2014)
Japanese to English
Huh Oct 27, 2022

Stepan Konev wrote:
If that MacOS text editor can mark the match, you can use the following regex:
to mark and then copy all segments to clipboard

Although I've only tried it on one file, this typically clever solution from Stepan seems to work well in Notepad++ here - much appreciated. Given that we already have the regex, it looks like an obvious choice for a tiny script in the programming language of one's choice (probably just from the command line in Perl!). I have never actually needed to convert TMX to tab-delimited, but it's nice to know that it's possible. Thanks to Hans and other contributors for the topic.

Dan


 
Pages in topic:   < [1 2 3]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to convert TMX to tab-delimited?







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »