Generate shell script call tree

4.6k views Asked by At

I've been handed a project that consists of several dozen (probably over 100, I haven't counted) bash scripts. Most of the scripts make at least one call to another one of the scripts. I'd like to get the equivalent of a call graph where the nodes are the scripts instead of functions.

Is there any existing software to do this?

If not, does anybody have clever ideas for how to do this?

Best plan I could come up with was to enumerate the scripts and check to see if the basenames are unique (they span multiple directories). If there are duplicate basenames, then cry, because the script paths are usually held in variable names so you may not be able to disambiguate. If they are unique, then grep the names in the scripts and use those results to build up a graph. Use some tool (suggestions?) to visualize the graph.

Suggestions?

3

There are 3 answers

1
Michael Rusch On BEST ANSWER

Here's how I wound up doing it (disclaimer: a lot of this is hack-ish, so you may want to clean up if you're going to use it long-term)...

Assumptions: - Current directory contains all scripts/binaries in question. - Files for building the graph go in subdir call_graph.

Created the script call_graph/make_tgf.sh:

#!/bin/bash
# Run from dir with scripts and subdir call_graph
# Parameters:
# $1 = sources (default is call_graph/sources.txt)
# $2 = targets (default is call_graph/targets.txt)

SOURCES=$1
if [ "$SOURCES" == "" ]; then SOURCES=call_graph/sources.txt; fi
TARGETS=$2
if [ "$TARGETS" == "" ]; then TARGETS=call_graph/targets.txt; fi

if [ ! -d call_graph ]; then echo "Run from parent dir of call_graph" >&2; exit 1; fi
(
#  cat call_graph/targets.txt
  for file in `cat $SOURCES `
  do
    for target in `grep -v -E '^ *#' $file | grep -o -F -w -f $TARGETS | grep -v -w $file | sort | uniq`
    do echo $file $target
    done
  done
)

Then, I ran the following (I wound up doing the scripts-only version):

cat /dev/null | tee call_graph/sources.txt > call_graph/targets.txt
for file in *
do
  if [ -d "$file" ]; then continue; fi
  echo $file >> call_graph/targets.txt
  if file $file | grep text >/dev/null; then echo $file >> call_graph/sources.txt; fi
done

# For scripts only:
bash call_graph/make_tgf.sh call_graph/sources.txt call_graph/sources.txt > call_graph/scripts.tgf

# For scripts + binaries (binaries will be leaf nodes):
bash call_graph/make_tgf.sh > call_graph/scripts_and_bin.tgf

I then opened the resulting tgf file in yEd, and had yEd do the layout (Layout -> Hierarchical). I saved as graphml to separate the manually-editable file from the automatically-generated one.

I found that there were certain nodes that were not helpful to have in the graph, such as utility scripts/binaries that were called all over the place. So, I removed these from the sources/targets files and regenerated as necessary until I liked the node set.

Hope this helps somebody...

2
Raphael Bossek On

Wrap the shell itself by your implementation, log who called you wrapper and exec the original shell.

Yes you have to start the scripts in order to identify which script is really used. Otherwise you need a tool with the same knowledge as the shell engine itself to support the whole variable expansion, PATHs etc -- I never heard about such a tool.

In order to visualize the calling graph use GraphViz's dot format.

1
Michael Dillon On

Insert a line at the beginning of each shell script, after the #! line, which logs a timestamp, the full pathname of the script, and the argument list.

Over time, you can mine this log to identify likely candidates, i.e. two lines logged very close together have a high probability of the first script calling the second.

This also allows you to focus on the scripts which are still actually in use.

You could use an ed script

1a
log blah blah blah
.
wq

and run it like so:

find / -perm +x -exec ed {} <edscript

Make sure you test the find command with -print instead of the exec clause. And / is probably not the path that you want to use. If you have to include bin directories then you will probably need to switch to grep in order to identify the pathnames to include, then when you have a file full of the right names, use xargs instead of find to run the script.