Handling data with 1000~ variables, preferably using SQL

401 views Asked by At

Basically, I have tons of files with some data. each differ, some lack some variables(null) etc, classic stuff.

The part it gets somewhat interesting is that, since each file can have up to 1000 variables, and has at least 800~ values that is not null, I thought: "Hey I need 1000 columns". Another thing to mention is, they are integers, bools, text, everything. they differ by size, and type. Each variable is under 100 bytes, at all files, alth. they vary.

I found this question Work around SQL Server maximum columns limit 1024 and 8kb record size Im unfamiliar with capacities of sql servers and table design, but the thing is: people who answered that question say that they should reconsider the design, but I cant do that. I however, can convert what I already have, as long as I still have that 1000 variables.

Im willing to use any sql server, but I dont know what suits my requirements best. If doing something else is better, please tell so.

What I need to do with this data is, look, compare, and search within. I dont need the ability to modify these. I thought of just using them as they are and keeping them as plain text files and reading from, that requires "seconds" of php runtime for viewing data out of "few" of these files and that is too much. Not even considering the fact that I need to check about 1000 or more of these files to do any search.

So the question is, what is the fastest way of having 1000++ entities with 1000 variables each, and searching/comparing for any variable I wish within them, etc. ? and if its SQL, which SQL server functions best for this sort of stuff?

4

There are 4 answers

5
halfbit On BEST ANSWER

You are asking for best, I can give an answer (how I solved it), but cant say if it is the 'best' way (in your environment), I had the Problem to collect inventory data of many thousend PCs (no not NSA - kidding)

my soultion was:

One table per PC (File for you?)

Table File: one row per file, PK FILE_ID

Table File_data one row per column in file, PK FILE_ID, ATTR_ID, ATTR_NAME, ATTR_VALUE, (ATTR_TYPE)

The Table File_data, was - somehow - big (>1e6 lines) but the DB handled that fast

HTH

EDIT:

I was pretty short in my anwser, lately; I want to put some additional information to my (and still working) solution:

the table 'per info source' has more than the two fields PK, FILE_ID ie. ISOURCE, ITYPE, where ISOURCE and ITYPE dscribe from where (I had many sources) and what basic Information type it is / was. This helps to get a structure into queries. I did not need to include data from 'switches' or 'monitors', when searching for USB divices (edit: to day probably: yes)

the attributes table had more fields, too. I mention here the both fileds: ISOURCE, ITYPE, yes, the same as above, but a slightly different meaning, the same idea behind

What you would have to put into these fields, depends definitely on your data.

I am sure, that if you take a closer look, what information you have to collect, you will find some 'KEY Values' for that

0
Nate Noonen On

For storage, XML is probably the best way to go. There is really good support for XML in SQL. For queries, if they are direct SQL queries, 1000+ rows isn't a lot and XML will be plenty fast. If you're moving towards a million+ rows, you're probably going to want to take the data that is most selective out of the XML and index that separately.

Link: http://technet.microsoft.com/en-us/library/hh403385.aspx

3
Anon On

You want to use an EAV model. This is pretty common

0
Brad On

Sounds like you need a different kind of database for what you're doing. Consider a document database, such as MongoDB, or one of the other not-only-SQL database flavors that allows for manipulation of data in different ways than a traditional table structure.

I just saw the note mentioning that you're only reading as well. I've had good luck with Solr on a similar dataset.