for loop in bash command

90 views Asked by At

Hi I have a server with an old version of LXD on it which dosen't have the automated delete snapshots on it, so i created the following bash line to bulk delete but would like to add "pv" to it for a progress bar how can I put the following in a loop so pv can show me how many it has deleted of how many that match the pattern?

lxc info wiki | awk '{print$1}' | grep wiki | grep --null '2022' | xargs -n 1 -I{} lxc delete wiki/{}

I add PV at various points passing it to pv but only shows how long it takes to bring up the grep wiki. and doesn't proceed to make a progress bar for all of them listed in the to delete pile.

Edit

from the comments below here is what I am looking at, the following is an output from having got rid of most of the snapshots I needed to. Ideally I would want to get rid of snapshots by date, but it doesn't recognise the dates as a date

Name: wiki
Remote: unix://
Architecture: x86_64
Created: 2018
Status: Running
Type: persistent
Profiles: macvlan-default
Pid: 3170
Ips:
  eth0: inet    X
  eth0: inet6   X
  eth0: inet6   X
  lo:   inet    127.0.0.1
  lo:   inet6   ::1
Resources:
  Processes: 60
  CPU usage:
    CPU usage (in seconds): 3891
  Memory usage:
    Memory (current): 231.60MB
    Memory (peak): 893.12MB
  Network usage:
    eth0:
      Bytes received: 145.47MB
      Bytes sent: 116.42MB
      Packets received: 287345
      Packets sent: 61107
    lo:
      Bytes received: 127.05kB
      Bytes sent: 127.05kB
      Packets received: 1259
      Packets sent: 1259
Snapshots:
  wiki+28092019 (taken at 2019/09/28 14:22 UTC) (stateless)
  wiki118012023 (taken at 2023/01/18 05:37 UTC) (stateless)
  wiki119082023 (taken at 2023/08/19 02:39 UTC) (stateless)
  wiki123092023 (taken at 2023/09/23 10:21 UTC) (stateless)

2

There are 2 answers

0
Ed Morton On

I don't know what the pv aspect of your initial question was about but from your recent edit it sounds like all you really want to do is print the first field for every wiki... line between 2 dates, e.g.:

$ cat file | awk -v beg='2023/01/10' -v end='2023/08/25' '($1 ~ /^wiki[^[:alpha:]]+$/) && (beg <= $4) && ($4 <= end) {print $1}'
wiki118012023
wiki119082023

Replace cat file with lxc info wiki.

0
Socowi On

Why is pv not showing progress?

There are two problems here:

1. Real progress bars are impossible when reading from pipes

When pv reads its input from a pipe, it does not know what 100% means, because it sees the end of the stream only when reaching it. Therefore, it can only show a status, not a real progress bar.

2. Buffering

pv updates the status only while its standard input is open. With sleep 5 | pv | cat we get an output like 0.00 B 0:00:05 [0.00 B/s] [<=> ] where the time part updates every second (00:00 -> 00:01 -> ... -> 00:05) , even though we don't read anything.
When we close its stdin after two seconds (sleep 2; exec >/dev/null; sleep 3;) | pv | cat we only get updates till 00:02. After that, pv does not update the status anymore.

Why is stdin closed in your case?

Pipes are buffered. In a pipeline fast-generate | slow-consume the command on the left can exit before the command on the right, because it can write all its output to the buffer without waiting for it to be actually read.
Only when the buffer between those commands is full, fast-generate has to wait for slow-consume to read something from the buffer, which frees up space for writing again.

Buffers are usually big, (e.g. 64kB, see How big is the pipe buffer?) and your input is small (a few snapshot names, only 14B each); small enough to allow it to be stored completely in xargs' input buffer.
pv forwards its input as fast as it can. Once it forwarded (and therefore read) everything, its stdin is closed.

Attempt to fix above issues

Let's try to fix above problems by using a temporary file.

Here, I also simplified the extraction of snapshot names using this suggestion. The API call is documented here and here. I couldn't test it myself. Please verify this step.

instance=wiki
year=2022
lxc query "/1.0/instances/$instance/snapshots" |
jq -r --arg year "$year" '.metadata[] | select(endswith($year)) | sub(".*/"; "")' > old
pv old | xargs -n1 -I{} lxc delete "$instance/{}"

This causes another problem:

Why is pv showing 100% immediately

pv only knows what it forwarded to xargs' input buffer, but it cannot inspect how much of that buffer was actually read by xargs so far.
When the input is shorter than the buffer size (very likely in this case, see notes on buffer size above) then pv immediately writes everything to xargs input buffer. From the perspective of pv, this means it processed 100% of the input, even though xargs didn't even start reading that input from its buffer.

Minimal example: seq 9 > f; pv f | xargs -n1 bash -c 'echo "$0"; sleep 1'

Unbuffering doesn't seem to work here

I assumed, those issues were solvable by disabling buffering. But I had no luck with stdbuf, unbuffer, and script. man unbuffer even documents ...

unbuffer -p may appear to work incorrectly if a process feeding input to unbuffer exits.
Consider: process1 | unbuffer -p process2 | process3
If process1 exits, process2 may not yet have finished. It is impossible for unbuffer to know [how] long to wait for process2 and process2 may not ever finish, for example, if it is a filter. For expediency, unbuffer simply exits when it encounters an EOF from either its input or process2. In order to have a version of unbuffer that worked in all situations, an oracle would be necessary.

Workaround: Manually notify pv of progress made

Instead of running pv | xargs we run xargs | pv by printing special output for pv only.

xargs -a old -I{} bash -c '
  lxc delete "$0"
  echo "$0" >&3 # inform pv that we processed another item
' "$instance/{}" 3> >(pv -l -s "$(wc -l <old)" >/dev/null)

This has the additional benefit of counting lines instead of bytes, making it accurate even if the snapshot names differ in length.

The extra file descriptor >&3/3> is only needed when lxc delete might produce output. If it doesn't write anything to stdout, you can simplify the command to

xargs -a old -I{} bash -c 'lxc delete "$0"; echo "$0"' "$instance/{}" |
pv -l -s "$(wc -l <old)" >/dev/null

Alternative to pv in this scenario

Wow... what a mess! I assume pv isn't made for inputs smaller than one buffer size. If you want a simple alternative, try GNU parallel instead of pv | xargs:

cat old | parallel --progress -j1 --bar "lxc delete $instance/{}"

cat old is just a placeholder for the command that extracts the snapshot names. No need to create a helper file here.

General note (may be counterproductive in your use-case): If you want, you can leave out the -j1 and parallel will execute multiple commands in parallel (depending on the number of available CPU cores).