How to create MacOS/Linux link for reading data file?

881 views Asked by At

I have a huge raw data file which I do not intend to change or copy. And I have two projects in RStudio and both need to be able to access it.

I originally created the alias (in MacOS) in the following way right click the file ~/A/data.csv in finder, and click "make alias". Then copy the alias to ~/B/ and rename it ~/B/data.csv

I also tried the following command later ln -s ~/A/data.csv ~/B

For project A, I put the actual data file in A/data/data.csv.
For project B, I created an alias under B/data/.

But when I try fread('B/data/data.csv'), it complains:

sh: ./data/data.csv: Too many levels of symbolic links

Error in fread("./data/data.csv") :

File is empty: /var/folders/4h/2jg64xk52mv3fyq4sb7s371w0000gn/T//Rtmp7cWNN3/filebf3013ad9194

I think I can use a hard link to solve this issue, but just want to see if I can use alias to make it work.

===== I don't think it matters, but for completeness, see the following for my OS and R version:

platform       x86_64-apple-darwin10.8.0   
arch           x86_64                      
os             darwin10.8.0                
system         x86_64, darwin10.8.0        
status                                     
major          3                           
minor          1.0                         
year           2014                        
month          04                          
day            10                          
svn rev        65387                       
language       R                           
version.string R version 3.1.0 (2014-04-10)
nickname       Spring Dance    
1

There are 1 answers

0
Thomas Guillerme On

I'm not entirely sure why using aliases in this specific case:

  • note that for small files (e.g. < 1 MB), the alias can have a way bigger memory footprint. For example, for a simple text file containing "test" (echo "test" > test.txt) the alias will be 274k times bigger:

test.txt: 5 bytes

test.txt alias: 1372636 bytes

  • since RStudio is good at using absolute paths, why not directly link to ~/A/data.csv directly rather than linking to it's alias?

Two alternative solutions (not directly answering question) could be to (1) copy the file or (2) create a kind of symbolic link.

Copying the file

#!/bin/bash
mkdir ~/B/data/
cp ~/A/data.csv ~/B/data/

Or in R, using system (on Mac):

system("mkdir ~/B/data/")
system("cp ~/A/data.csv ~/B/data/")

Creating a symbolic link

This can be done by simply saving the path of the file ~/A/data.csv in ~/B/data/.

In shell:

#!/bin/bash
mkdir ~/B/data/
echo "~/A/data.csv" > ~/B/data/data.csv

(this part can be also done in R using system() as above) And then, in R:

## Reading path in B/data/
PATH <- scan(file = "~/B/data/data.csv", what = character())
## Opening the file (~/A/data/data.csv)
my_csv <- read.csv(PATH)