|
What do you want to know about databases?
10-02-2008, 05:17 PM
|
What do you want to know about databases?
|
Posts: 5,662
Name: John Alexander
|
This is the companion thread to the DB Resource Sticky. That thread is meant to be like a "help file," and only moderators post in it. The good part of that is, you can trust anything you read in that thread. The bad part is, you can't just add a question, and have it answered there. So, any requests can go here, and we'll figure out the best way to address them.
To get things started, if anybody would like to suggest broad areas ( How do I tune indexes? Best way to export data? When to make backups?) they'd like to see covered, specific questions, or anything else, we'll make sure these are covered.
Last edited by Learning Newbie; 10-02-2008 at 08:51 PM..
|
|
|
|
10-02-2008, 06:09 PM
|
Re: What do you want to know about databases?
|
Posts: 1,228
|
This might not be the best question, but I'd be interested in learning more about storing data in a semantic way. Also, are EAV databases scalable and practical and how might one best use one?
|
|
|
|
10-02-2008, 07:58 PM
|
Re: What do you want to know about databases?
|
Posts: 2,389
Name: <member type="brilliant" alt="foolish">James Lewitzke</member>
Location: / public_html / Universe / Virgo_Supercluster / Local_Group / Milky_Way / Orion_Arm / Solar_System / Earth / North_America / USA / Wisconsin
|
There's a DB Resource Sticky?
Anyway, like I said on your blog John, I'm going to be starting to create my own network website of sorts, so I'll be building the MySQL DB from scratch.
In fact, I just started writing it today and I think I understand the basic concepts well (Tables, Fields, Records, etc.) But as I started to fill in the records, I came across some input columns I had no clue about (using phpmyadmin): - Collation - It looks like it's used for encoding, but I have no idea which one to pick, and why Latin-Swedish is the default choice
- Null - What in the world is this for?
- Attribute - Not sure about this column either
One more thing I'm going to be concerned with too is Database Optimization and the exact number of inserts I'm going to need, but that won't be for later.
Anyway, expect many more threads and questions from me in this forum pretty soon  .
|
|
|
|
10-02-2008, 08:01 PM
|
Re: What do you want to know about databases?
|
Posts: 5,662
Name: John Alexander
|
An EAV system can be very hard to scale. It can be done, if the data is inherently set up to allow distribution and partitioning. For everyone else, an EAV (Entity, Attribute, Value) database is generally two tables
Create Table Object (
ObjectID int identity primary key,
ObjectName nvarchar(max)
)
Create Table Attribute (
ObjectID foreign key references Object,
AttributeName nvarchar(max),
AttributeValue nvarchar(max)
)
This allows any number of objects to be stored (persisted) with any number of attributes. Each object can have its own set of attributes. The number of attributes (key value pairs) can be different for every object, however makes sense. This allows for highly unstructured data storage, where each and every object can be stored with whatever properties make sense for it, rather than having to fit it into a given schema.
The downside, though, is it gives up most of the strengths of a relational database system. If you store 1,000 objects with an average of 10 properties each, that's 10,000 rows stored in your Attribute table. That's an easy task for a DB, but say you need to store 10,000 more of them, and the average number of properties goes up to 15. Pretty quickly, this is going to get very big. Having only two tables, it will eventually get to the point that lookups start taking sizeable fractions of a second.
This can be ameliorated by using a natural key. Ideally a 16 byte hash value of the object. You might store 100 objects per table, or any number that performs well. Given an object to get properties for, you would calculate the hash value, then go to the appropriate table or database to get the real data. If you had a million objects to keep track of, but broke them into lists of 100 each, using a method like this, the seek times can remain very low.
Another downside is that, from one perspective, this is as simple as can be. But from another, it's actually kind of opaque. If you were hired to manage a database set up in this way, or to build web pages around one, it might take some wrapping your head around until you get the hang of it? I don't personally think it's any more combersome to learn (probably less!) than any other schema, but a lot of people seem to object, saying it's too complicated to see a lot of property douples, with ID values next to them.
Now, I've been planning to add a section for schema objects and schema definition, with some of the more common ones described, like a star schema, third normalized form, and such. It seems like this would be a good one to include. Honestly, for small web sites, it's not going to be much of a burden with good indexing, and it's highly flexible.
|
|
|
|
10-02-2008, 09:02 PM
|
Re: What do you want to know about databases?
|
Posts: 5,662
Name: John Alexander
|
Quote:
Originally Posted by jamestl2
There's a DB Resource Sticky?
|
There is now!  It's not a sticky, yet, tho. Still filling out the content - I don't want to get people's hopes up until the answers are there.
Quote:
Originally Posted by jamestl2
- Collation - It looks like it's used for encoding, but I have no idea which one to pick, and why Latin-Swedish is the default choice
|
How strings are sorted and compared. Really it's just the character set. This is meaningless for English, because we have no "special" characters by definition. European languages have a few accents, but enough to fit inside a UTF 8. Only a-z, A-Z, 0-9, and ~-+ all fit in UTF-7, leaving you 128 slots left, and you can decide what they represent.
Quote:
Originally Posted by jamestl2
- Null - What in the world is this for?
|
In most programming languages, a boolean is a two pronged variable, with true and false. In a database, unless you disallow it, null is the third prong. It's the absence of a value. Technically, it means "I don't know." It's treated specially, in that nothing equals null, and null plus ( or minus or times, etc) anything is null. This makes sense, because 5 + "I don't know" is still unknown.
|
|
|
|
10-02-2008, 10:41 PM
|
Re: What do you want to know about databases?
|
Posts: 2,389
Name: <member type="brilliant" alt="foolish">James Lewitzke</member>
Location: / public_html / Universe / Virgo_Supercluster / Local_Group / Milky_Way / Orion_Arm / Solar_System / Earth / North_America / USA / Wisconsin
|
Alright, thanks  .
Just realized I'm going to have to create a LOT of records for everything that people will be using on the site. Something I'm wondering about here is whether or not I'm going to be creating *too many* records, as some may cross-over into other tables. I have a feeling I won't be needing a few of them.
The thing is I don't have absolutely everything planned out for the main site right away either. So would it be better you think to concentrate on creating the DB first? Or design and program the actual site first?
I know I'll probably be coming back to the DB anyway for whatever small changes that are necessary, but I don't want to get too ahead of myself.
|
|
|
|
10-03-2008, 12:01 AM
|
Re: What do you want to know about databases?
|
Posts: 1,228
|
I think you may want to take the time to plan ahead a little. Usually my databases dictate what I program to a great degree, so if you have that part straightened out, it could save you a lot of programming. One thing you might want to look into is normalization, which I'm sure John could explain and argue for a lot better than I could, even though I try to use it in all of my database design.
|
|
|
|
10-03-2008, 03:16 AM
|
Re: What do you want to know about databases?
|
Posts: 1,228
|
Thanks, John. Once again, I'm learning a great deal about databases from you. And of course, as always, I have more questions.
Quote:
|
An EAV system can be very hard to scale. It can be done, if the data is inherently set up to allow distribution and partitioning.
|
So what would a scalable version look like? I'm having a little trouble wrapping my mind around how something like that could be effectively or logically partitioned.
Quote:
Create Table Object (
ObjectID int identity primary key,
ObjectName nvarchar(max)
)
Create Table Attribute (
ObjectID foreign key references Object,
AttributeName nvarchar(max),
AttributeValue nvarchar(max)
)
This allows any number of objects to be stored (persisted) with any number of attributes. Each object can have its own set of attributes. The number of attributes (key value pairs) can be different for every object, however makes sense. This allows for highly unstructured data storage, where each and every object can be stored with whatever properties make sense for it, rather than having to fit it into a given schema.
|
Would it perhaps be an improvement to add a third table that connects the objects with the attributes by foreign keys while removing the foreign key in the attributes? The example above seems like a many-to-many relationship, so to me, an associative table would make sense in this case.
Upon thinking it over a little more, how would you connect objects to each other? Or would you need to? I guess it depends on what the definition of an object is. So, for a use case example, suppose I have an object that represents a user id. I could attach first name, last name, email address, etc. to that object. But if that user makes a comment in a blog, is that comment an attribute or an object? If it's an attribute, how could I attach a timestamp, a vote, or a reply to that comment? If it's an object, how could attach it to the user object?
As an alternative, could you store all attributes and objects in the same table and use the second table to house two foreign keys (or hashes) that represent the same column in the first table (albeit with different key values)?
Quote:
|
The downside, though, is it gives up most of the strengths of a relational database system. If you store 1,000 objects with an average of 10 properties each, that's 10,000 rows stored in your Attribute table. That's an easy task for a DB, but say you need to store 10,000 more of them, and the average number of properties goes up to 15. Pretty quickly, this is going to get very big. Having only two tables, it will eventually get to the point that lookups start taking sizeable fractions of a second.
|
Any idea on what the upper limit might be, or at least the limit at which lookups start to bog down noticeably?
Quote:
|
This can be ameliorated by using a natural key. Ideally a 16 byte hash value of the object. You might store 100 objects per table, or any number that performs well. Given an object to get properties for, you would calculate the hash value, then go to the appropriate table or database to get the real data. If you had a million objects to keep track of, but broke them into lists of 100 each, using a method like this, the seek times can remain very low.
|
I'm a little unclear about your example here. Which table is the hash stored in and what exactly does it represent? What I'm kind of getting is that instead of having a primary key that auto-increments, you would programmatically assign a unique 16 byte hash as a unique identity for that object across all of the tables that objects are stored in. Would and could you also do the same for the attributes?
Quote:
|
Another downside is that, from one perspective, this is as simple as can be. But from another, it's actually kind of opaque. If you were hired to manage a database set up in this way, or to build web pages around one, it might take some wrapping your head around until you get the hang of it? I don't personally think it's any more combersome to learn (probably less!) than any other schema, but a lot of people seem to object, saying it's too complicated to see a lot of property douples, with ID values next to them.
|
I'm still trying to wrap my head around it too. I haven't researched it a whole lot yet (I've been buried in JavaScript lately), but it seems like it might be an interesting way to store data. I'm particularly interested in semantic data storage and retrieval and an EAV database appears like it might be fairly easy to search against, but I can definitely see some of the drawbacks as well. Have you personally used it and would you recommend using it, or which schema do you recommend?
Quote:
|
Now, I've been planning to add a section for schema objects and schema definition, with some of the more common ones described, like a star schema, third normalized form, and such. It seems like this would be a good one to include. Honestly, for small web sites, it's not going to be much of a burden with good indexing, and it's highly flexible.
|
Please do write more on the different schemas. I don't know why, but I find them fascinating for some reason.
|
|
|
|
10-03-2008, 12:43 PM
|
Re: What do you want to know about databases?
|
Posts: 2,389
Name: <member type="brilliant" alt="foolish">James Lewitzke</member>
Location: / public_html / Universe / Virgo_Supercluster / Local_Group / Milky_Way / Orion_Arm / Solar_System / Earth / North_America / USA / Wisconsin
|
Quote:
Originally Posted by VirtuosiMedia
I think you may want to take the time to plan ahead a little. Usually my databases dictate what I program to a great degree, so if you have that part straightened out, it could save you a lot of programming.
|
So you think I should write the WHOLE DB first? (Well, I mean everything that I plan to put on the main site at this point in time). Before I even touch the design or programming aspect.
Quote:
Originally Posted by VirtuosiMedia
One thing you might want to look into is normalization, which I'm sure John could explain and argue for a lot better than I could, even though I try to use it in all of my database design.
|
Alright, thanks. Yeah I have no idea what "normalization" is either.
|
|
|
|
10-03-2008, 12:56 PM
|
Re: What do you want to know about databases?
|
Posts: 5,662
Name: John Alexander
|
Quote:
Originally Posted by jamestl2
The thing is I don't have absolutely everything planned out for the main site right away either. So would it be better you think to concentrate on creating the DB first? Or design and program the actual site first?
|
The database has a very specific ( even if it's ill defined right now) job. It's providing a service to your web application. If you don't know what's needed of the DB, you're better off waiting on that, and working on the web site. You'll get to a point where your application needs to remember X, and at that point, you have a fairly well defined need. Which makes it easier to learn how to address.
|
|
|
|
10-03-2008, 03:38 PM
|
Re: What do you want to know about databases?
|
Posts: 5,662
Name: John Alexander
|
Quote:
Originally Posted by VirtuosiMedia
So what would a scalable version look like? I'm having a little trouble wrapping my mind around how something like that could be effectively or logically partitioned.
|
So this is what I was talking about in the previous post, with ( for example) hash values. Let's take a more simplified version, and assume we're responsible for a database of people. We decide to use their email address as the table's primary key, because each email has only one owner, and so is unique.
For the sake of easy math, say we have 26,000 users, and even distribution. Everybody who's email starts with a goes to a table called Person_A, everybody who's email starts with b goes to Person_B, and so on down the line. Now we have 26 tables of 1,000 each. Their attributes can also be stored this way - as Properties_A, Properties_B, etc. To search all users with no prior knowledge will take a little bit longer this way, but, to look up a specific user will be faster. The trouble is, there won't be as many email addresses starting with X, which is why a hash value might be a better idea.
Quote:
Originally Posted by VirtuosiMedia
Upon thinking it over a little more, how would you connect objects to each other? Or would you need to?
|
Generally, this should come in the form of requirements and use cases. There are endless numbers of implementations to choose from and features that can be supported - as database programmers, we need a "tie breaker" to choose. If objects need to be connected with one another, how? If not, it's usually best to avoid the complexity and overhead of representing this.
Quote:
Originally Posted by VirtuosiMedia
As an alternative, could you store all attributes and objects in the same table and use the second table to house two foreign keys (or hashes) that represent the same column in the first table (albeit with different key values)?
|
I'm fond of the fish hook relationship, myself. It's called that because of the way it looks to draw a database diagram showing a single table related to itself.
Create Table Fishhook (
EntryID int identity primary key,
ParentID int null foreign key references Fishhook,
Name nvarchar(max),
Value nvarchar(max)
)
This models a tree. It allows virtually unlimited objects and attributes to be modeled, with a hierarchical relationship. When ParentID is null, the current object is a root node, otherwise, you can follow the chain. Recursively.
The downside is, SQL isn't especially good at this sort of recursive paradigm. Think of the query required to load an unknown number of items by following the chain of parents.
Quote:
Originally Posted by VirtuosiMedia
Any idea on what the upper limit might be, or at least the limit at which lookups start to bog down noticeably?
|
That's going to depend on hardware, index structure, and the number of concurrent users firing queries. The faster the storage subsystem is, the less painful this will be, at any level. The more RAM you have, with most database servers, the more data will live in memory, and the less disc IO will be incurred, assuming you need the same rows often. Indexes that allow for seeks as opposed to scans can buy you orders of magnitude. Finally, the Enterprise Edition of SQL Server allows different transactions to share table scans. When two queries need the same data, the storage engine will return it to both, doubling performance. I'm pretty sure Oracle does the same. With all of this, you should be able to have hundreds of thousands of rows on a modest box.
|
|
|
|
10-31-2008, 03:16 PM
|
Re: What do you want to know about databases?
|
Posts: 1,228
|
Here is an interesting link from Wikipedia. I didn't realize that there were 6 forms of normalization. I only knew about the first 3.
|
|
|
|
11-08-2008, 11:18 AM
|
Re: What do you want to know about databases?
|
Posts: 23
|
I would like a list of the best visual database modeling applications, and a selection of book recommendation 
Thanks.
|
|
|
|
|
« Reply to What do you want to know about databases?
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|