What is the Regex in sketch engine's concordance for space inside CQL

51 views Asked by At

I'm having troubles with a query in my corpus. What I need to find are all the instances where there is a dot that is neither preceded nor followed by a space, so for example a.a b.b c.c. I've found the Regex for space in this page https://www.sketchengine.eu/guide/regular-expressions/#toggle-id-2 which should be [[:space:]] for every whitespace character (space, new line, tab, carriage return).

My thought was I would built a CQL inside concordance that would search for not space (with the !), a dot and again not a space like this:

[lemma!="[[:space:]]"] [lemma="\."] [lemma!="[[:space:]]"]

but it is not working. I've tried to search for the regular expression alone [[:space:]] but it does not work either. I've also tried it with the \s:

[lemma="[\s]"] [lemma="\."] [lemma="[\s]"] and [lemma="\s"] [lemma="\."] [lemma="\s"]

and the "less-than sign"/s"greater-than sign""less-than sign"s"greater-than sign" which I know it only marks the end and start of a new paragraph, but would still be a start

[lemma="</s><s>"] [lemma="\."] [lemma="</s><s>"]

But still to no avail.

I've tried to solve the problem in another way, by saying that the dot must be preceded and followed by a letter (I don't need numbers or symbols anyway) with this research:

[lemma="[[:alpha:]]*"] [lemma="\."] [lemma="[[:alpha:]]*"]

But it is not finding me all the results.

Can someone please tell me what I'm doing wrong in my Regex/CQL?

1

There are 1 answers

3
Barmar On

Try:

[lemma=".*(^|\S)\.($|\S).*"]

\S means any non-whitespace character. So this matches . preceded by either the beginning of the string or a non-space, and followed by either the end of the string or a non-space.