LISP - Splitting string with delimiter also included in new list

2.4k views Asked by At

I have a list of elements following

("(aviyon" "213" "flyingman" "no))") as list

What i want is that I want to split this list containing strings using parentheses as splitter but also want to include these parentheses in a new list without breaking the order

My desired output of new list(or same list modified)

("(" "aviyon" "213" "flyingman" "no" ")" ")") 

I am coming from imperative languages and this would be 15 minute job in Java or C++. But here i'm stuck what to do. I know i have to

1- Get a element from list in a loop

I think this is done with (nth 1 '(listname) )

2- separate without removing delimiter put in to a new list

I found functions such as SPLIT-SEQUENCE but i can't do without removing it and without breaking original order.

Any help would be appreciated.

3

There are 3 answers

1
coredump On BEST ANSWER

Let's have another answer, without external libraries. Like you already did, we can split in the problem into smaller parts:

  1. define a function which builds a list of tokens from a string, all-tokens
  2. apply this function on all strings in your input list, and concatenate the result:

    (mapcan #'all-tokens strings)
    

The first part, taking a state and building a list from it, looks like an unfold operation (anamorphism).

Fold (catamorphism), called reduce in Lisp, builds a value from a list of values and a function (and optionally an initial value). The dual operation, unfold, takes a value (the state), a function, and generate a list of values. In the case of unfold, the step function accepts a state and returns new state along with the resulting list.

Here, let's define a state as 3 values: a string, a starting position in the string, and a stack of tokens parsed so far. Our step function next-token returns the next state.

 ;; definition follows below
 (declare (ftype function next-token))

The main function which gets all tokens from a string just computes a fixpoint:

(defun all-tokens (string)
  (do (;; initial start value is 0
       (start 0)
       ;; initial token stack is nil
       (tokens))

      ;; loop until start is nil, then return the reverse of tokens
      ((not start) (nreverse tokens))

    ;; advance state
    (multiple-value-setq (string start tokens)
      (next-token string start tokens))))

We need an auxiliary function:

(defun parenthesisp (c)
  (find c "()"))

The step function is defined as follows:

(defun next-token (string start token-stack)
  (let ((search (position-if #'parenthesisp string :start start)))
    (typecase search
      (number
       ;; token from start to parenthesis
       (when (> search start)
         (push (subseq string start search) token-stack))
       ;; parenthesis
       (push (subseq string search (1+ search)) token-stack)
       ;; next state
       (values string (1+ search) token-stack))
      (null
       ;; token from start to end of string
       (when (< start (1- (length string)))
         (push (subseq string start) token-stack))
       ;; next-state
       (values string nil token-stack)))))

You can try with a single string:

(next-token "(aviyon" 0 nil)
"(aviyon"
1
("(")

If you take the resulting state values and reuse them, you have:

(next-token "(aviyon" 1 '("("))
"(aviyon"
NIL
("aviyon" "(")

And here, the second return value is NIL, which ends the generation process. Finally, you can do:

(mapcan #'all-tokens '("(aviyon" "213" "flyingman" "no))"))

Which gives:

("(" "aviyon" "213" "flyingman" "no" ")" ")")

The above code is not fully generic in the sense that all-tokens knows too much about next-token: you could rewrite it to take any kind of state. You could also handle sequences of strings using the same mechanism, by keeping more information in your state variable. Also, in a real lexer you would not want to reverse the whole list of tokens, you would use a queue to feed a parser.

3
Alexander Artemenko On

You can use cl-ppcre library to do the job.

For example:

CL-USER> (ql:quickload :cl-ppcre)

CL-USER> (cl-ppcre:split "([\\(\\)])" "(aviyon" :with-registers-p t)
("" "(" "aviyon")
CL-USER> (cl-ppcre:split "([\\(\\)])" "no))" :with-registers-p t)
("no" ")" "" ")")
CL-USER> 

However, it makes empty-strings in a list. Use remove-if function to get rid of them:

CL-USER> (defun empty-string-p (s) (string= s ""))
EMPTY-STRING-P
CL-USER> (remove-if 'empty-string-p
                    (list "no" ")" "" ")"))
("no" ")" ")")

Finally, you can construct a function which does both, and run it in an imperative loop (yes, Common Lisp is not functional as many think):

CL-USER> (defun remove-empty-strings (l)
           (remove-if 'empty-string-p l))
REMOVE-EMPTY-STRINGS
CL-USER> (defun split (s)
           (cl-ppcre:split "([\\(\\)])"
                           s
                           :with-registers-p t))
SPLIT
CL-USER> (defparameter *the-list* '("(aviyon" "213" "flyingman" "no))"))
*THE-LIST*
CL-USER> (loop for item in *the-list*
               for splitted = (split item)
               for cleaned = (remove-empty-strings splitted)
               append cleaned)
("(" "aviyon" "213" "flyingman" "no" ")" ")")
6
Gwang-Jin Kim On

solution

Since you didn't understood Alexander's solution and since I anyway wrote my solution:

;; load two essential libraries for any common lisper
(ql:quickload :cl-ppcre)
(ql:quickload :alexandria)
;; see below to see how to install quicklisp for `ql:quickload` command
;; it is kind of pythons `import` and if not install `pip install`
;; in one command for common-lisp

(defun remove-empty-string (string-list) 
  (remove-if #'(lambda (x) (string= x "")) string-list))


(defun split-parantheses-and-preserve-them (strings-list)
  (remove-empty-string 
  (alexandria:flatten 
    (mapcar #'(lambda (el) (cl-ppcre:split "(\\(|\\))" 
                                           el 
                                           :with-registers-p t)) 
            strings-list))))

 ;; so now your example
 (defparameter *list* '("(aviyon" "213" "flyingman" "no))"))

 (split-parantheses-and-preserve-them *list*)
 ;; returns:
 ;; ("(" "aviyon" "213" "flyingman" "no" ")" ")")

how this works

(cl-ppcre:split "(\\(|\\))" a-string) splits the string by ( or ). Because in regex pattern ( or ) are used for capturing the match - like here too (the outer parantheses captures) - you have to escape them. \\( or \\). So with cl-ppcre:split you can split any string in common lisp by regex-pattern. Super cool and super efficient package written by Edi Weitz. He wrote several super sophisticated packages for common lisp - they are also called ediware or edicls in the community. By the way - cl-ppcre is even more efficient and faster than gold-standard for regex: the perl regex engine!

:with-regiesters-p t option then preserves the matched delimiter - which has to be captured by parentheses like this: (<pattern>) in the pattern.

mapcar this over the list to apply it on each string element in your string list.

However, what you got after that is a list of lists. (Each inner list containing the splitted result for each string-element of the list).

Flatten the list by alexandria:flatten. For many functions not in the standard of lisp, but which you think they are basic - like flatten a list - look always first in alexandria - mostly it has a function you desire - it is a huge library. That is why you need it anyway as a common lisper ;) .

But still, there will be empty strings around to be removed. That is why I wrote remove-empty-string which uses remove-if - which together with remove-if-not is the standard filtering function for lists. It takes a predicate function - here (lambda (x) (string= x "")) which gives T if string is an empty string and NIL if not. It removes all elements in the resulting flattened list in our function, which are empty strings. In other languages it will be named filter but yeah - sometimes function names in common-lisp are not very well chosen. Sometimes I think we should create alias names and move over to them and keep the old names for backward-compatibility. Clojure has nicer names for functions ... Maybe cl people should overtake clojure function names ...

quicklisp

@Alexander Artemenko wrote exactly my solution - he came first. I will add: If you are so new to common lisp, maybe you don't know how to use quicklisp. Do in terminal (linux or macos):

wget https://beta.quicklisp.org/quicklisp.lisp

Otherwise manually download in windows from the address.

I put it into ~/quicklisp folder.

Then in clisp or sbcl do:

(load "~/quicklisp/quicklisp.lisp") ;; just path to where downloaded
;; quicklisp.lisp file is!

;; then install quicklisp:
(quicklisp-quickstart:install)

;; then search for cl-ppcre
(ql:system-apropos "cl-ppcre")

;; then install cl-ppcre
(ql:quickload "cl-ppcre")

;; and to autoload everytime you start sbcl or clisp
;; in linux or mac - sorry I don't now windows that well
;; I have the opinion every programmer should us unix
;; as their OS
;; you have to let quicklisp be loaded when they start
;; by an entry into the init file
;; mostly located in ~/.sbclrc or ~/.clisprc.slip or such ...
;; respectively.
;; quicklisp does an entry automatically if you do:
(ql:add-to-init-file)

;; after installation do:
(quit)

;; If you then restart sbcl or clisp and try:
(ql:quickload :cl-ppcre)
;; it should work, - if not, you have to manually load
;; quicklisp first
(load "~/quicklisp/setup.lisp") ;; or wherever quicklisp's
;; setup.lisp file has been stored in your system!
;; and then you can do
(ql:quickload :cl-ppcre)

;; to install alexandria package then, do
(ql:quickload :alexandria) ;; or "alexandria"

;; ql:quickload installs the package from quicklisp repository,
;; if it cannot find package on your system.

;; learn more about quicklisp, since this is the package
;; manager of common lisp - like pip for python