So, I've recently made a React app which I have posted on GitHub. However, I would like to post the output (build folder after I run npm run build) to a Glitch application. Since all Glitch applications have a git repository, I thought that would be the best way to go about doing this. Here is my desired structure:
- My main
gitrepo, which pushes to GitHub. This repository ignores thebuildfolder. - Another "sub"
gitrepository, which only pushes the contents ofbuildto Glitch.
I've seen people using submodules, but I can't figure out how to make my main git repo ignore the build folder and have the submodule just push the build folder.
I'm also confused on how to setup a submodule in general, so an example/explanation for that as well would be appreciated.
~ Ayush
I'm not entirely sure that you want a submodule here, but submodules will let you do what you are describing. Submodules are tricky, though. There's a reason people call them sob-modules.
Long
First, it will help a great deal if you get your definitions—actors and actions—straight:
A repository does not push anything. It's just a collection of commits (plus some names; see the last point below).
Git (the software suite) creates and manipulates repositories, including the commits inside them.
The
git pushcommand pushes commits.A commit is a thingy (technically, a commit object, but people use the term pretty loosely, hence the loose "thingy" term here ) with the following features:
A repository also contains names—such as branch and tag names—that allow Git to find commits. This works by having one name store exactly one hash ID. For branch names, that stored hash ID is, by definition, the last commit in the branch. Since commits store parent hash IDs, Git can work backwards from whichever commit we decide to call "last in branch
X":X~1is the second-to-last inX,X~2is the third-to-last, and so on.The act of adding a new commit to a branch consists of the following steps:
You check out that commit (with
git checkoutorgit switch) by checking out that branch (with the same command), so that this is now the current branch. This action fills in both Git's index—which holds your proposed next commit—and your working tree, where Git copies out all the files into a usable form. The internal, de-duplicated form is generally unusable to everything except Git itself.You do some stuff in your working tree. Git has zero control or influence over this part, a lot of the time, since you'll be using your own editor or compiler or whatever. You can use Git commands here and then Git will be able to see what you did, but mostly, Git doesn't have to care, because we move on to step 3:
You run
git add. This instructs Git to take a look at the updated working tree files. Git will copy these updated files back into Git's index (aka the staging area), in their updated form, re-compressing and de-duplicating them and generally making them ready for the next commit.You run
git commit. This packages up new metadata—your name, the current date and time, a log message, and so on—and adds the current commit's hash ID to make up the metadata for the new commit. The new commit's parent will thus be the current commit. Git then snapshots everything in the index at this time (which is whygit checkoutfilled it in, in step 1, and thengit addupdated it in step 3), along with the metadata, to make the new commit. This gives the new commit its new hash ID, which is actually just a cryptographic checksum of the entire data set here.It's at this point that the magic happens:
git commitwrites the new commit's hash ID into the current branch name. So now, the last commit on the branch is your new commit. This is how a branch grows, one commit at a time. No existing commit changes—none can change—but the new commit points back to what was the last commit, and is now the second-to-last commit. The branch name moves.You really need to have all of these down pretty cold to make submodules work, because submodules actually use all of this stuff, but then violate some rules. Now it starts to get tricky. We also need to look more closely at
git push, just for a moment.git push: cross-connecting one Git repository with anotherMaking a new Git commit, in some Git repository, just makes a new snapshot-plus-metadata. The next trick is to get that commit into some other Git repository.
If we start with two otherwise-identical Git repositories, each has some set of commits and some branch names identifying the same last commit:
and the same in Repo B. But then, over in Repo A, we do:
which causes repo A to contain:
(I get lazy and don't bother drawing the commit-to-commit arrows correctly here). New commit
I—I, likeHandGandF, stands in for some big ugly random-looking hash ID—points back to existing commitH. You might even make more than one new commit:Now you run
git push origin branch-name, to send your new commits, in your repository, back to the "origin" repo (which we were calling "repo B" before, but let's call itoriginnow).Your Git software suite ("your Git") calls up theirs. Your Git lists out the hash ID of your latest commit, i.e., commit
J. Their Git checks in their repository, to see if they haveJ, by hash ID. They don't (because you just made it). So their Git tells your Git: OK, gimme! Your Git is now obligated to offerJ's parentI. They check and don't haveIeither, so they ask for that one too. Your Git is now obligated to offer commitH. They check and—hey!—this time they do have commitHalready, so they say: no thanks, I have that one already.Your Git now knows not only that you must send commits
JandI, but also which files they already have. They have commitH, so they must have commitGtoo, and commitF, and so on. They have all the de-duplicated files that go with those commits. So your Git software suite can now compute a minimal set of stuff to send them so that they can reconstruct commitsI-J.Your Git does so; that's the "counting" and "compressing" and so on that you see. Their Git receives this stuff, unpacks it, and adds the new commits to their repository. They now have:
in their Git repository. Now we hit a really tricky bit: How does a Git, in general, find a commit? The answer is always, ultimately, by its hash ID—but that just brings another question, which is: how does a Git find a hash ID? They look random.
We already said this earlier though: a Git (the software suite) often finds some specific commit in some specific repository through the use of a branch name. The branch name
branch-name, in your repository, finds the last commit, which is nowJ. We'd like the same name in their repository to find the same last commit.So, your Git software now asks their Git to set their repository's branch name
branch-nameto identify commitJ. They will do this if you are allowed to do this. The "allowed" part can get arbitrarily complicated—sites like GitHub and Bitbucket add all kinds of permissions and rules here—but if we assume that it's OK, and that they'll do that, then they will end up with:in their repository, and your Git repository and their Git repository will be in sync again, at least for this particular branch name.
So that's how
git pushnormally works: you make new commits, adding them on to the end of your branch, and then you send your new commits to some other Git, and ask their software to add the same commits to the end of a branch of the same name in their repository. (Whew!)Submodules
A submodule, in Git, is little more than two separate, mostly-independent Git repositories. This of course needs a lot of explanation:
First, like any repository, a submodule repository is a collection of commits, each with a unique hash ID. We—or Git at least—like to refer to one of the two repositories as the superproject and the other as the submodule. Both of these start with the letter S, which is annoying, and both words are long and klunky, so here I'll use R (in bold like this) as the superproject Repository, and S as the Submodule.
(Side note: the hash IDs in R and S are independent from each other. Git tries pretty hard—and usually succeeds—at making hash IDs globally unique across every Git repository everywhere in the universe. So there's no need to worry about "contaminating" R with S IDs or vice versa. In any case we can just treat every commit hash ID as if it's totally unique. Normally, with a normal non-R non-S repository, we don't even have to care about IDs, as we just use names. But submodules make you have to be more aware of the IDs.)
What makes R a superproject in the first place is that it lists raw hash IDs from S. It also has to list instructions: if we've done a
git cloneof R, we don't even have a clone of S yet. So R needs to contain the instructions so that your Git software can make a clone of S.The instructions you give to
git cloneare pretty simple:(where the
pathpart is even optional, but here, R will always specify a path—using those forward slash path names we mentioned earlier). This set of instructions goes into a file named.gitmodules. Thegit submodule addcommand will set up this file in R for you. It's important to use it, to set up the.gitmodulesfile. Git will still make a submodule even if you don't set this up, but without the cloning instructions, the submodule won't actually work.Note that there's no proper place to put authentication (user and password names) in here. That's a generic submodule issue. (You can put them in as plaintext in the
.gitmodulesfile, but don't do it, it's a very bad idea, they're not encrypted or protected.) As long as you have open access to cloning the submodule, it doesn't normally present any real problem. If you don't, you'll have to solve this problem somehow.In any case, you will need, just once, to run:
(filling in the
...part) in what will thus become superproject R, so as to create the.gitmodulesfile. You then need to commit the resulting.gitmodulesfile, so that people who clone R and check out a commit that contains that file, get that file, so that their Git software can run thegit clonecommand to create S on their system.You'll also need to put S somewhere they can clone it. This, of course, means that first you need to create a Git repository to hold S. You do this the way you make any Git repository:
or:
(locally, on your machine) along with whatever you do on whatever hosting site that creates the repository there.
Now that you have a local repository S, you need to put some commit(s) into it. What goes into these commits?
Well, you already said that you'd like your R to have a
build/directory (folder) in it, but not actually store any of the built files in any of the commits made in R. This is where submodules actually work. A submodule, in R, for S, works by saying: create me a folder here, then clone the submodule into the folder. Or, if the submodule repository already exists—as it will when you're setting all this up in the first place, with you just now having createdS—you simply put that entire repository into your working tree for R, under the namebuild.Note that
build/.gitwill exist in R's working tree at this point. That's because a Git repository hides all the Git files in the.gitdirectory (folder) at the top level of the working tree. So your new, empty S repository consists of just a.git/containing Git files.You can now run that
git submodule addcommand in R, because now you have the submodule in place:(You might want to wait just a little bit, but you can definitely do it at this point—and this is the earliest point at which you can do it, since up until now, S didn't exist or was not in the right place yet.)
You can now fill the
build/directory that lives in R's working tree with files, e.g., by runningnpm run build, or whatever it is that populates thebuild/directory. Then you can:or equivalent, so as to add the build output in S. You can now create the first commit in S, or maybe as the second commit in S if you like to create a
README.mdandLICENSEand such as your initial commit. You can now have branches in S as well, since you now have at least one commit in S.Now that you're back in R though, it's time to
git add build—or, if you chose to delay it, run that firstgit submodule add. In the future you'll usegit add build. This directs the Git that is manipulating the index / staging-area for R to enter the repository S and run:to find the raw hash ID of the current commit in S.
The superproject's Git repository's index now acquires a new gitlink entry. A gitlink entry is like a regular file, except that instead of
git checkoutchecking it out as a file, it provides a raw hash ID. That's basically all it is: a pathname—in this case,build/—and a raw hash ID.This gitlink is like one of those read-only, compressed, and de-duplicated files that goes in a commit. It's just that instead of storing file data, it stores a commit hash ID. That hash ID is that of some commit in S, not some commit in R itself. But now that you've updated the index (or staging area) for R, you will need to make a new commit in R. The new commit will contain any updated files, plus the right hash ID for S, as found just now by the
git addyou ran (or thatgit submodule addran for you).The next commit you make in R (not in S) will list the hash ID of the current commit in S. So once you've committed the built files in S, you can
git addthem in R andgit commitin R.The last and trickiest part
Now comes the last part, which—if you thought all of the above was complicated and tricky—is the trickiest:
You have to
git pushthe submodule commit in S so that it's generally available. In general, you should do this first, though you don't actually have to.Then you have to
git pushthe superproject commit in R so that others can get it. When others get this commit from the other clone of R, they'll be able to see the right hash ID from S.Then, if someone else—let's say your co-worker Bob—wants to get both the built files and the sources, they have to:
git fetchin S so as to obtain the new S commit.git checkoutthe correct commit.They can do this all at once with
git checkout --recursive, or set the recursive checkout option. Note what can go wrong though:They might obtain your new R commit and check it out, but forget to update their S at all.
Or, they might obtain your new R commit and check it out and then try to check out the new commit in S without first running
git fetchin their clone of S, so that they don't have the new commit.Or, they might remember everything they should do, but someone forgot to push the new S commit to the shared repository people can get it from. They'll get an error about their submodule Git being unable to find the requested commit.
You can see how this can get pretty messy. It's very easy for the various separate commits to get de-synchronized in various ways. Once you have the procedures down, and have scripts around everything that make sure that all the steps happen at the right times, it can work pretty well. But there are many ways for things to go wrong.