My database security denotes the system, processes, and procedures that protect a database from unintended activity.
My security is usually enforced through access control, auditing, and encryption.
Access control ensures and restricts who can connect and what can be done to the database.
Auditing logs what action or change has been performed, when and by who.
Encryption: Since security has become a major issue in recent years, many commercial database vendors provide built-in encryption mechanism. Data is encoded natively into the tables and deciphered "on the fly" when a query comes in. Connections can also be secured and encrypted if required using DSA, MD5, SSL or legacy encryption standard.
Tuesday, July 29, 2008
Types of database
Types of Database
A lot of the sites that we visit on the web today are generated by a script of some description, and a great deal of them will use a database in one form or another. Like it or loathe it, building pages dynamically from databases is a technique that is here to stay.
There are two main types of database; flat-file and relational. Which is the best one to use for a particular job will depend on factors such as the type and the amount of data to be processed; not to mention how frequently it will be used.
Flat-File
The flat-file style of database are ideal for small amounts of data that needs to be human readable or edited by hand. Essentially all they are made up of is a set of strings in one or more files that can be parsed to get the information they store; great for storing simple lists and data values, but can get complicated when you try to replicate more complex data structures. That's not to say that it is impossible to store complex data in a flat-file database; just that doing so can be more costly in time and processing power compared to a relational database. The methods used for storing the more complex data types, are also likely to render the file unreadable and un-editable to anyone looking after the database.
The typical flat-file database is split up using a common delimiter. If the data is simple enough, this could be a comma, but more complex strings are usually split up using tabs, new lines or a combination of characters not likely to be found in the record itself.
One of the main problems with using flat files for even a semi-active database is the fact that it is very prone to corruption. There is no inherent locking mechanism that detects when a file is being used or modified, and so this has to be done on the script level. Even if care is taken to lock and unlock the file on each access, a busy script can cause a "race condition" and it is possible for a file to be wiped clean by two or more processes that are fighting for the lock; the timing of your file locks will become more and more important as a site gets busy.
Database Management (DBM)
The Database Management Layer allows script programmers to store information as a pair of strings; a key, which is used to find the associated value. Essentially, a DBM adds more functionality and better sortation during storage to the binary flat-files that it uses. There are several versions of DBMs available, but the most popular is the Berkley Database Manager; also known as the Berkley DB.
The Berkley DB is an improvement over normal flat-files, as it provides a way for programmers to use the database without having to worry about how the data is stored or how to retrieve the values. Retrieval of data using the Berkley DB is often much faster than from a flat-file, with the time savings being made by storing data in a way that speeds up the locating of a specific key-value pair.
Creating, editing and deleting data when using the Berkley DB is actually quite simple; once the database has been tied to the script you just use and manipulate the variables as normal. The problem of file locking that plagues flat-file databases is still apparent when using DBM, so you should still take care when planning scripts that utilize it.
Relational
The relational databases such as MySQL, Microsoft SQL Server and Oracle, have a much more logical structure in the way that it stores data. Tables can be used to represent real world objects, with each field acting like an attribute. For example, a table called books could have the columns title, author and ISBN, which describe the details of each book where each row in the table is a new book.
The "relation" comes from the fact that the tables can be linked to each other, for example the author of a book could be cross-referenced with the authors table (assuming there was one) to provide more information about the author. These kind of relations can be quite complex in nature, and would be hard to replicate in the standard flat-file format.
One major advantage of the relational model is that, if a database is designed efficiently, there should be no duplication of any data; helping to maintain database integrity. This can also represent a huge saving in file size, which is important when dealing with large volumes of data. Having said that, joining large tables to each other to get the data required for a query can be quite heavy on the processor; so in some cases, particularly when data is read only, it can be beneficial to have some duplicate data in a relational database.
Relational databases also have functions "built in" that help them to retrieve, sort and edit the data in many different ways. These functions save script designers from having to worry about filtering out the results that they get, and so can go quite some way to speeding up the development and production of web applications.
Database Comparisons
In most cases, you would want your database to support various types of relations; such databases, particularly if designed correctly, can dramatically improve the speed of data retrieval as well as being easier to maintain. Ideally, you will want to avoid the replication of data within a database to keep a high level of integrity, otherwise changes to one field will have to be made manually to those that are related.
While several flat-files can be combined in such a way as to be able to emulate some of the behaviours of a relational database, it can prove to be slower in practice. A single connection to a relational database can access all the tables within that database; whereas a flat file implementation of the same data would result in a new file open operation for each table.
All the sorting for flat-file databases need to be done at the script level. Relational databases have functions that can sort and filter the data so the results that are sent to the script are pretty much what you need to work with. It is often quicker to sort the results before they are returned to the script than to have them sorted via a script, few scripting languages are designed to filter data effectively and so the more functions a database supports, the less work a script has to do.
If you are only working with a small amount of data that is rarely updated then a full blown relational database solution can be considered overkill. Flat-file databases are not as scaleable as the relational model, so if you are looking for a suitable database for more frequent and heavy use then a relational database is probably more suitable.
A lot of the sites that we visit on the web today are generated by a script of some description, and a great deal of them will use a database in one form or another. Like it or loathe it, building pages dynamically from databases is a technique that is here to stay.
There are two main types of database; flat-file and relational. Which is the best one to use for a particular job will depend on factors such as the type and the amount of data to be processed; not to mention how frequently it will be used.
Flat-File
The flat-file style of database are ideal for small amounts of data that needs to be human readable or edited by hand. Essentially all they are made up of is a set of strings in one or more files that can be parsed to get the information they store; great for storing simple lists and data values, but can get complicated when you try to replicate more complex data structures. That's not to say that it is impossible to store complex data in a flat-file database; just that doing so can be more costly in time and processing power compared to a relational database. The methods used for storing the more complex data types, are also likely to render the file unreadable and un-editable to anyone looking after the database.
The typical flat-file database is split up using a common delimiter. If the data is simple enough, this could be a comma, but more complex strings are usually split up using tabs, new lines or a combination of characters not likely to be found in the record itself.
One of the main problems with using flat files for even a semi-active database is the fact that it is very prone to corruption. There is no inherent locking mechanism that detects when a file is being used or modified, and so this has to be done on the script level. Even if care is taken to lock and unlock the file on each access, a busy script can cause a "race condition" and it is possible for a file to be wiped clean by two or more processes that are fighting for the lock; the timing of your file locks will become more and more important as a site gets busy.
Database Management (DBM)
The Database Management Layer allows script programmers to store information as a pair of strings; a key, which is used to find the associated value. Essentially, a DBM adds more functionality and better sortation during storage to the binary flat-files that it uses. There are several versions of DBMs available, but the most popular is the Berkley Database Manager; also known as the Berkley DB.
The Berkley DB is an improvement over normal flat-files, as it provides a way for programmers to use the database without having to worry about how the data is stored or how to retrieve the values. Retrieval of data using the Berkley DB is often much faster than from a flat-file, with the time savings being made by storing data in a way that speeds up the locating of a specific key-value pair.
Creating, editing and deleting data when using the Berkley DB is actually quite simple; once the database has been tied to the script you just use and manipulate the variables as normal. The problem of file locking that plagues flat-file databases is still apparent when using DBM, so you should still take care when planning scripts that utilize it.
Relational
The relational databases such as MySQL, Microsoft SQL Server and Oracle, have a much more logical structure in the way that it stores data. Tables can be used to represent real world objects, with each field acting like an attribute. For example, a table called books could have the columns title, author and ISBN, which describe the details of each book where each row in the table is a new book.
The "relation" comes from the fact that the tables can be linked to each other, for example the author of a book could be cross-referenced with the authors table (assuming there was one) to provide more information about the author. These kind of relations can be quite complex in nature, and would be hard to replicate in the standard flat-file format.
One major advantage of the relational model is that, if a database is designed efficiently, there should be no duplication of any data; helping to maintain database integrity. This can also represent a huge saving in file size, which is important when dealing with large volumes of data. Having said that, joining large tables to each other to get the data required for a query can be quite heavy on the processor; so in some cases, particularly when data is read only, it can be beneficial to have some duplicate data in a relational database.
Relational databases also have functions "built in" that help them to retrieve, sort and edit the data in many different ways. These functions save script designers from having to worry about filtering out the results that they get, and so can go quite some way to speeding up the development and production of web applications.
Database Comparisons
In most cases, you would want your database to support various types of relations; such databases, particularly if designed correctly, can dramatically improve the speed of data retrieval as well as being easier to maintain. Ideally, you will want to avoid the replication of data within a database to keep a high level of integrity, otherwise changes to one field will have to be made manually to those that are related.
While several flat-files can be combined in such a way as to be able to emulate some of the behaviours of a relational database, it can prove to be slower in practice. A single connection to a relational database can access all the tables within that database; whereas a flat file implementation of the same data would result in a new file open operation for each table.
All the sorting for flat-file databases need to be done at the script level. Relational databases have functions that can sort and filter the data so the results that are sent to the script are pretty much what you need to work with. It is often quicker to sort the results before they are returned to the script than to have them sorted via a script, few scripting languages are designed to filter data effectively and so the more functions a database supports, the less work a script has to do.
If you are only working with a small amount of data that is rarely updated then a full blown relational database solution can be considered overkill. Flat-file databases are not as scaleable as the relational model, so if you are looking for a suitable database for more frequent and heavy use then a relational database is probably more suitable.
History of database
The earliest known use of the term data base was in November 1963, when the System Development Corporation sponsored a symposium under the title Development and Management of a Computer-centered Data Base[1]. Database as a single word became common in Europe in the early 1970s and by the end of the decade it was being used in major American newspapers. (The abbreviation DB, however, survives.)
The first database management systems were developed in the 1960s. A pioneer in the field was Charles Bachman. Bachman's early papers show that his aim was to make more effective use of the new direct access storage devices becoming available: until then, data processing had been based on punched cards and magnetic tape, so that serial processing was the dominant activity. Two key data models arose at this time: CODASYL developed the network model based on Bachman's ideas, and (apparently independently) the hierarchical model was used in a system developed by North American Rockwell later adopted by IBM as the cornerstone of their IMS product. While IMS along with the CODASYL IDMS were the big, high visibility databases developed in the 1960s, several others were also born in that decade, some of which have a significant installed base today. Two worthy of mention are the PICK and MUMPS databases, with the former developed originally as an operating system with an embedded database and the latter as a programming language and database for the development of healthcare systems.
The relational model was proposed by E. F. Codd in 1970. He criticized existing models for confusing the abstract description of information structure with descriptions of physical access mechanisms. For a long while, however, the relational model remained of academic interest only. While CODASYL products (IDMS) and network model products (IMS) were conceived as practical engineering solutions taking account of the technology as it existed at the time, the relational model took a much more theoretical perspective, arguing (correctly) that hardware and software technology would catch up in time. Among the first implementations were Michael Stonebraker's Ingres at Berkeley, and the System R project at IBM. Both of these were research prototypes, announced during 1976. The first commercial products, Oracle and DB2, did not appear until around 1980. The first successful database product for microcomputers was dBASE for the CP/M and PC-DOS/MS-DOS operating systems.
During the 1980s, research activity focused on distributed database systems and database machines. Another important theoretical idea was the Functional Data Model, but apart from some specialized applications in genetics, molecular biology, and fraud investigation, the world took little notice.
In the 1990s, attention shifted to object-oriented databases. These had some success in fields where it was necessary to handle more complex data than relational systems could easily cope with, such as spatial databases, engineering data (including software repositories), and multimedia data. Some of these ideas were adopted by the relational vendors, who integrated new features into their products as a result. The 1990s also saw the spread of Open Source databases, such as PostgreSQL and MySQL.
In the 2000s, the fashionable area for innovation is the XML database. As with object databases, this has spawned a new collection of start-up companies, but at the same time the key ideas are being integrated into the established relational products. XML databases aim to remove the traditional divide between documents and data, allowing all of an organization's information resources to be held in one place, whether they are highly structured or not.
The first database management systems were developed in the 1960s. A pioneer in the field was Charles Bachman. Bachman's early papers show that his aim was to make more effective use of the new direct access storage devices becoming available: until then, data processing had been based on punched cards and magnetic tape, so that serial processing was the dominant activity. Two key data models arose at this time: CODASYL developed the network model based on Bachman's ideas, and (apparently independently) the hierarchical model was used in a system developed by North American Rockwell later adopted by IBM as the cornerstone of their IMS product. While IMS along with the CODASYL IDMS were the big, high visibility databases developed in the 1960s, several others were also born in that decade, some of which have a significant installed base today. Two worthy of mention are the PICK and MUMPS databases, with the former developed originally as an operating system with an embedded database and the latter as a programming language and database for the development of healthcare systems.
The relational model was proposed by E. F. Codd in 1970. He criticized existing models for confusing the abstract description of information structure with descriptions of physical access mechanisms. For a long while, however, the relational model remained of academic interest only. While CODASYL products (IDMS) and network model products (IMS) were conceived as practical engineering solutions taking account of the technology as it existed at the time, the relational model took a much more theoretical perspective, arguing (correctly) that hardware and software technology would catch up in time. Among the first implementations were Michael Stonebraker's Ingres at Berkeley, and the System R project at IBM. Both of these were research prototypes, announced during 1976. The first commercial products, Oracle and DB2, did not appear until around 1980. The first successful database product for microcomputers was dBASE for the CP/M and PC-DOS/MS-DOS operating systems.
During the 1980s, research activity focused on distributed database systems and database machines. Another important theoretical idea was the Functional Data Model, but apart from some specialized applications in genetics, molecular biology, and fraud investigation, the world took little notice.
In the 1990s, attention shifted to object-oriented databases. These had some success in fields where it was necessary to handle more complex data than relational systems could easily cope with, such as spatial databases, engineering data (including software repositories), and multimedia data. Some of these ideas were adopted by the relational vendors, who integrated new features into their products as a result. The 1990s also saw the spread of Open Source databases, such as PostgreSQL and MySQL.
In the 2000s, the fashionable area for innovation is the XML database. As with object databases, this has spawned a new collection of start-up companies, but at the same time the key ideas are being integrated into the established relational products. XML databases aim to remove the traditional divide between documents and data, allowing all of an organization's information resources to be held in one place, whether they are highly structured or not.
What is a Data base..
A database is a structured collection of records or data. A computer database relies upon software to organize the storage of data. The software models the database structure in what are known as database models. The model in most common use today is the relational model. Other models such as the hierarchical model and the network model use a more explicit representation of relationships (see below for explanation of the various database models).
Database management systems (DBMS) are the software used to organize and maintain the database. These are categorized according to the database model that they support. The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.
Database management systems (DBMS) are the software used to organize and maintain the database. These are categorized according to the database model that they support. The model tends to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.
Subscribe to:
Posts (Atom)