I have data that always starts with a substring repeated twice without a delimiter, and then other data that I don't care about. The length of the repeated substring varies, and in the example below I'm using mostly [a-z] characters for the sake of simplicity, but the repeated substring is mostly unicode squiggles in the real dataset.
| my data | what I want to extract |
|---|---|
johnjohnsajoalsas |
john |
peterpeteraaksoskco |
peter |
a8co.a8co.robinson |
a8co. |
robrob7s:s7 |
rob |
dkoisawks |
\[null\] |
This can be done easily with a positive lookahead
^(.+)(?=\1)
or directly referencing the capture group like this
^(.+)\1
However, Google Sheets doesn't support either of these.
Any help will be greatly appreaciated.



Here's one approach (
non-regex) you may test out in Sheets:left()andmid()to filter out the possible match if any...