testing a regex code
Thread poster: Lenart
Lenart
Lenart  Identity Verified
Luxembourg
Local time: 16:19
Jul 15, 2018

Hello everybody,

Could someone please explain me, why the code „sa\s*(?!\w* question)“ matches the word „sa“ in a phrase „par sa question“.

the condition of negative look ahead is fulfilled and I would think that the word „sa“ shouldn't match in this case.

Thank you,

L


 
NeoAtlas
NeoAtlas
Spain
Local time: 16:19
English to Spanish
+ ...
regex engine Jul 15, 2018

First, the regex engine matches “sa”.

Then the engine tries to match spaces, as many as possible. It matches 1 space and tries with this first (it it failed, it would try with no spaces at all). So trying with 1 space, note the actual position is after the space of “sa ”.

Then the engine processes the negative lookahead. Inside the lookahead it can't match “\w* question” (remember where is actual position) so the engine notes success and captures “sa ” (
... See more
First, the regex engine matches “sa”.

Then the engine tries to match spaces, as many as possible. It matches 1 space and tries with this first (it it failed, it would try with no spaces at all). So trying with 1 space, note the actual position is after the space of “sa ”.

Then the engine processes the negative lookahead. Inside the lookahead it can't match “\w* question” (remember where is actual position) so the engine notes success and captures “sa ” (with the mentioned space). The engine doesn’t try more possibilities (I mean, zero spaces) once notes success (lookarounds work in this way).

Note that if the string were “par sa question question” the lookahead would match “ question” (I mean, the second ”question“), and because it’s a negative lookahead, the same regex wouldn’t match anything.

I hope this helps,

… Jesús Prieto …
Collapse


 
Lenart
Lenart  Identity Verified
Luxembourg
Local time: 16:19
TOPIC STARTER
thank you Jul 16, 2018

thank you Jesús, this was very helpful!

 
Lenart
Lenart  Identity Verified
Luxembourg
Local time: 16:19
TOPIC STARTER
another related question Jul 16, 2018

I wrote this code (?negative look behind!EU:)EU:(?!EU:)

I would like a code to identify segments where the word „EU:“ appears only once. However in a following segment „EU:“ appears 3x and the code is still applied. So something is wrong. Can somebody please explain me why the criteria of this code is fulfilled in a following case?

source segment: „(voir, en ce sens, arrêts du 15 mai 2014, Briels e.a., C-521/12, EU:C:2014:330, points 28 et 29 ; du 21 juillet
... See more
I wrote this code (?negative look behind!EU:)EU:(?!EU:)

I would like a code to identify segments where the word „EU:“ appears only once. However in a following segment „EU:“ appears 3x and the code is still applied. So something is wrong. Can somebody please explain me why the criteria of this code is fulfilled in a following case?

source segment: „(voir, en ce sens, arrêts du 15 mai 2014, Briels e.a., C-521/12, EU:C:2014:330, points 28 et 29 ; du 21 juillet 2016, Orleans e.a., C-387/15 et C-388/15, EU:C:2016:583, point 48, ainsi que du 26 avril 2017, Commission/Allemagne, C-142/16, EU:C:2017:301, points 34 et 71).“

Thank you,

L

[Edited at 2018-07-16 12:41 GMT]

[Edited at 2018-07-16 12:42 GMT]

[Edited at 2018-07-16 12:43 GMT]
Collapse


 
NeoAtlas
NeoAtlas
Spain
Local time: 16:19
English to Spanish
+ ...
Your lookahead doens't work… Jul 17, 2018

Your lookahead doesn't work for the same reason that
EU:(?=EU:)
doesn't match
EU:XXXEU:
Lookaheads don't mean “anywhere ahead”, only “just ahead”.

You'd need this regex:
EU:(?=.*?EU:)
to match:
EU:XXXEU:

Same thing about lookbehind.

Once explaind, this regex may work:
(?[minor than sign]!EU:.*?)EU:(?!.*?EU:)
to match “EU:” appearing only once.

You may need to change it, but It's
... See more
Your lookahead doesn't work for the same reason that
EU:(?=EU:)
doesn't match
EU:XXXEU:
Lookaheads don't mean “anywhere ahead”, only “just ahead”.

You'd need this regex:
EU:(?=.*?EU:)
to match:
EU:XXXEU:

Same thing about lookbehind.

Once explaind, this regex may work:
(?[minor than sign]!EU:.*?)EU:(?!.*?EU:)
to match “EU:” appearing only once.

You may need to change it, but It's a good starting point.

Please let us know whether it works, otherwise, I'm curious to know your final regex.

… Jesús Prieto …
Collapse


 
Lenart
Lenart  Identity Verified
Luxembourg
Local time: 16:19
TOPIC STARTER
thank you! Jul 27, 2018

Jesús, your code seems to be working great and I don't think I'll change anything to it.

But I am not sure I understand all of it. For example if I focus on the last part of the code EU:(?!.*?EU:)

Could you please tell me what does the second question mark stand for?

I think I am trying to ask what is a difference between .*? and .*


 
NeoAtlas
NeoAtlas
Spain
Local time: 16:19
English to Spanish
+ ...
Greedy versus Lazy quantifiers Jul 28, 2018

.* is a greedy quantifier, the default behaviour in regex (to match as many characters as possible)
.*? is lazy (same as above, but it matches as few characters as possible)

Glad the regex worked for you!


 
Lenart
Lenart  Identity Verified
Luxembourg
Local time: 16:19
TOPIC STARTER
same same but different Oct 5, 2018

I wrote this code: (?

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

testing a regex code







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »