Please style sheet are not equal in internet explorer browser Firefox, Chrome, Safari, Apple and Opera browser please visit this website.

Thank for Visit My Site and Please sent me Shayari, Status and Quotes post request.

MySQL - MySQL SQL Syntax, Query, Advance Querys, Query Optimization

MySQL SQL Syntax
Naming Rules
Almost every SQL statement refers in some way to a database or its constituent elements. This section describes the syntax rules for referring to databases, tables, columns, indexes, and aliases. Names are subject to case sensitivity considerations, which are described as well.
Referring to Elements of Databases
When you use names to refer to elements of databases, you are constrained by the characters you can use and the length that names can be. The form of names also depends on the context in which you use them. Another factor that affects naming rules is that the server can be started in different naming modes.
. Legal characters in names. Unquoted names can consist of any alphanumeric characters in the server's default character set, plus the characters '_' and '$'. Names can start with any character that is legal in a name, including a digit. However, a name cannot consist entirely of digits because that would make it indistinguishable from a number. MySQL's support for names that begin with a number is somewhat unusual among database systems.
If you use such a name, be particularly careful of names containing an 'E' or 'e' because those characters can lead to ambiguous expressions. For example, the expression 23e + 14 (with spaces surrounding the '+' sign) means column 23e plus the number 14, but what about 23e+14? Does it mean the same thing, or is it a number in scientific notation?
Aliases can be fairly arbitrary, but you should quote an alias within single or double quotes if it is a SQL keyword, is entirely numeric, or contains spaces or other special characters.
As of MySQL 3.23.6, names can be quoted within backtick characters ('`'), which allows use of any character except backtick, ASCII 0, and ASCII 255. This is useful when a name contains special characters or is a reserved word. Quoting a name also allows it to be entirely numeric, something that is not true of unquoted names.
There are also two additional constraints for database and table names, even if you quote them. First, you cannot use the '.' character because it is the separator in db_name.tbl_name and db_name.tbl_name.col_name notation.
Second, you cannot use the UNIX or Windows pathname separator characters ('/' or '\'). The separator characters are disallowed in database and table names because databases are represented on disk by directories, and tables are represented on disk by at least one file.
Consequently, these types of names must not contain characters that are illegal in directory names and filenames. The UNIX pathname separator is disallowed on Windows (and vice versa) to make it easier to transfer databases and tables between servers running on different platforms.
For example, suppose you were allowed to use a slash in a table name on Windows. That would make it impossible to move the table to UNIX, because filenames on that platform cannot contain slashes.
. Name length. Names for databases, tables, columns, and indexes can be up to 64 characters long. Alias names can be up to 256 characters long.
. Name qualifiers. Depending on context, a name may need to be qualified to make it clear what the name refers to. To refer to a database, just specify its name:
. USE db_name;
To refer to a table, you have two choices. First, a fully qualified table name consists of a database name and a table name:
SHOW TABLES FROM db_name.tbl_name;
SELECT * FROM db_name.tbl_name;
Second, a table name by itself refers to a table in the default (current) database. If sampdb is the default database, the following statements are equivalent:
SELECT * FROM member;
SELECT * FROM sampdb.member;
If no database has been selected, naming a table without a database qualifier is illegal because the server cannot tell which database the table belongs to.
To refer to a column, there are three choices: fully qualified, partially qualified, and unqualified. A fully qualified name (written as db_name.tbl_name.col_name) is completely specified. A partially qualified name (written as tbl_name.col_name) refers to a column in the named table.
An unqualified name (written simply as col_name) refers to whatever table is indicated by the surrounding context. The following two queries refer to the same pair of column names, but the context supplied by the FROM clause of each statement indicates from which table to select the columns:
SELECT name, emp_id FROM employee;
SELECT name, id FROM members;
It's usually unnecessary to supply fully qualified names, although it's always legal to do so if you want. If you select a database with a USE statement, that database becomes the default database and is implicit in every unqualified table reference.
If you're using a SELECT statement that refers to only one table, that table is implicit for every column reference in the statement. It's necessary to qualify names only when a table or database cannot be determined from context.
For example, if a query refers to tables from multiple databases, any table not in the default database must be referenced using the db_name.tbl_name form to let MySQL know which database to look in to find the table.
Similarly, if a query uses multiple tables and refers to a column name that is present in more than one table, it's necessary to qualify the name with a table name to make it clear which column you mean.
. Server startup mode. If the server has been started with the --ansi or --sql-mode=ANSI_QUOTES option, names can be quoted with double quotes rather than backticks (although backticks can still be used).
Case Sensitivity in SQL Statements
Case sensitivity rules in SQL statements vary for different parts of the statement and also depend on what you referring to and the operating system of the machine on which the server is running:
. SQL keywords and function names. Keywords and function names are not case sensitive. They can be given in any lettercase. The following statements are equivalent:
. select now();
. sElEcT nOw();
. Database and table names. Databases and tables in MySQL are implemented using directories and files in the underlying file system on the server host. As a result, case sensitivity of database and table names depends on the way the operating system on that host treats filenames.
Windows filenames are not case sensitive, so a server running on Windows does not treat database and table names as case sensitive. Servers running on UNIX usually treat database and table names as case sensitive because UNIX filenames are case sensitive. (An exception is that names in HFS+ file systems under Mac OS X are not case sensitive.)
You should consider lettercase issues if you create a database on a server with case-sensitive filenames and you might someday move the database to a server where filenames are not case sensitive.
For example, if you create two tables named abc and ABC on a UNIX server where those names are treated differently, you would have problems moving the tables to a Windows machine; there, abc and ABC would not be distinguishable because names are not case sensitive.
One way to avoid having case sensitivity properties become an issue is to pick a given lettercase (for example, lowercase) and always create databases and tables using names in that lettercase.
Then case of names won't be a problem if you move a database to a different server. Another approach to issues of name lettercase is to start the server with the lower_case_table_names variable set.
. Column and index names. Column and index names are not case sensitive in MySQL. The following queries are equivalent:
. Alias names. Aliases are case sensitive. You can specify an alias in any lettercase (upper, lower, or mixed), but you must refer to it elsewhere in the query using the same case.
Regardless of whether or not a database or table name is case sensitive on your system, you must refer to it using the same lettercase throughout a given query.
That is not true for SQL keywords, function names, or column and index names, all of which can be referred to in varying lettercase style throughout a query. Naturally, the query will be more readable if you use a consistent lettercase rather than, what is called "ransom note" style (SelECt NamE FrOm ...).
Working with Database
MySQL provides several database-level statements: USE for selecting a default database, CREATE DATABASE for creating databases, DROP DATABASE for removing them, and ALTER DATABASE for modifying global database characteristics.
Selecting Databases [use database]
The USE statement selects a database to make it the default (current) database for a given connection to the server:
USE db_name;
You must have some access privilege for the database or you cannot select it. If you do have access to a database, you can use its tables even without selecting the database explicitly by qualifying table names with the database name. For example, to retrieve the contents of the president table in the sampdb database without selecting the database first, write the query like this:
SELECT * FROM mkdtutorials.employee;
However, it's much more convenient to refer to tables without having to specify a database qualifier.
Selecting a default database doesn't mean it must be the default for the duration of the connection. You can issue any number of USE statements to switch back and forth among databases as often as you want, as long as you have access privileges to use them. Nor does selecting a database limit you to using tables only from that database. While one database is the default, you can refer to tables in other databases by qualifying their names with the appropriate database name.
When a connection to the server terminates, any notion by the server of what the default database was disappears. That is, if you connect to the server again, it doesn't remember what database you had selected previously. In fact, that's not even an idea that makes any sense, given that MySQL is multi-threaded and can handle multiple connections from a given user, which can begin and end asynchronously. In this environment, it's not clear what the meaning of "the previously selected database" might be.
Creating Databases
Creating a database is easy; just name it in a CREATE DATABASE statement:
The constraints on database creation are that the name must be legal, the database must not already exist, and you must have sufficient privileges to create it.
Dropping Databases
Dropping a database is just as easy as creating one, assuming you have sufficient privileges:
However, the DROP DATABASE statement is not something you should use with wild abandon. It removes the database and all tables within it. After you drop a database, it's gone forever. In other words, don't try out this statement just to see how it works. If your administrator has been performing database backups regularly, you may be able to get the database back. But I can guarantee that no administrator will be sympathetic if you say, "Uh, I was just playing around with DROP DATABASE to see what would happen, and, uh.can you restore my database for me?"
Note that a database is represented by a directory under the data directory, and the directory is intended for storage of table data. If you drop a database but its name continues to show up when you issue a SHOW DATABASES statement, the reason is most likely that the database directory contains non-table files. DROP DATABASE will not delete such files and, as a result, will not delete the directory either. This means that the database directory will continue to exist, albeit empty of any tables. To really drop the database if this occurs, manually remove any remaining files in the database directory and the directory itself.
Altering Databases
The ALTER DATABASE statement, available as of MySQL 4.1, makes changes to a database's global characteristics or attributes. Currently, the only such characteristic is the default character set:
charset should be the name of a character set supported by the server, such as latin1_de or sjis. (To find out which sets your server supports, issue a SHOW CHARACTER SET statement.) charset can also be DEFAULT to indicate that the database uses the server-level character set by default.
Database attributes are stored in the db.opt file in the database directory.
Select statement
Data is extracted from the table using the SELECT SQL command.
Here is the format of a SELECT statement:
SELECT column_names from table_name [WHERE ...conditions];
The conditions part of the statement is optional (we'll go through this later). Basically, you require to know the column names and the table name from which to extract the data. For example, in order to extract the first and last names of all employees, issue the following command.
SELECT f_name, l_name from employee_data;
The statement tells MySQL to list all the rows from columns f_name and l_name.
mysql> SELECT f_name, l_name from employee_data;
On close examination, you'll find that the display is in the order in which the data was inserted. Furthermore, the last line indicates the number of rows our table has.
To display the entire table, we can either enter all the column names or use a simpler form of the SELECT statement.
SELECT * from employee_data;
Some of you might recognize the * in the above statement as the wildcard. Though we don't use that term for the character here, it serves a very similar function. The * means 'ALL columns'. Thus, the above statement lists all the rows of all columns.
Insert statement
Syntax of Insert statement:
tbl_name [(column_list)]
VALUES (expr [, expr] ...) [, (...)] ...
tbl_name [(column_list)]
tbl_name SET col_name=expr [, col_name=expr] ...
Inserts rows into an existing table tbl_name and returns the number of rows inserted. The INTO keyword is optional as of MySQL 3.22.5.
LOW_PRIORITY causes the statement to be deferred until no clients are reading from the table. LOW_PRIORITY was introduced in MySQL 3.22.5.
DELAYED causes the rows to be placed into a queue for later insertion, and the statement returns immediately so that the client can continue on without waiting. However, in this case, LAST_INSERT_ID() will not return the AUTO_INCREMENT value for any AUTO_INCREMENT column in the table. DELAYED inserts were introduced in MySQL 3.22.15; they work only for ISAM and MyISAM tables.
If IGNORE is specified, rows that duplicate values for unique keys in existing rows are discarded. If duplicate values occur without IGNORE, an error occurs and no more rows are inserted. IGNORE was introduced in MySQL 3.22.10.
The first form of INSERT requires a VALUES() list that specifies all values to be inserted. If no column_list is given, the VALUES() list must specify one value for each column in the table. If a column_list is given consisting of one or more comma-separated column names, one value per column must be specified in the VALUES() list. Columns not named in the column list are set to their default values. As of MySQL 3.22.5, multiple value lists can be specified, allowing multiple rows to be inserted using a single INSERT statement.
As of MySQL 3.23.3, the column_list and VALUES() list can be empty, which can be used as follows to create a record for which all columns are set to their default values:
As of MySQL 4.0.3, the word DEFAULT can be used in a VALUES() list to set a column to its default value explicitly without knowing what the default value is.
The second form of INSERT retrieves records according to the SELECT statement and inserts them into tbl_name. The SELECT statement must select as many columns as are in tbl_name or as many columns as are named in column_list if a column list is specified. When a column list is specified, any columns not named in the list are set to their default values. You cannot select records from the same table into which you are inserting them.
The third form of INSERT, available as of MySQL 3.22.10, inserts columns named in the SET clause to the values given by the corresponding expressions. Columns not named are set to their default values.
insert into associate_query(associate_name,tid,description) values('R. k. Tiwari ', null, 'Courseware not received ')
Update statement
UPDATE [updateoptions] tablename
SET col1=value1, col2=value2, ...
[ WHERE condition ] [ ORDER BY columns ] [ LIMIT maxrecords ]
UPDATE [updateoptions] table1, table2, table3
SET table1.col1=table2.col2 ...
[ WHERE condition][ORDER BY columns ] [ LIMIT maxrecords ]
UPDATE changes individual fields of the table records specified by WHERE. Those fields not specified by SET remain unchanged. In value one can refer to existing fields.
For example, an UPDATE command may be of the following form:
UPDATE products SET price = price+5 WHERE productID=3
Warning: Without a WHERE condition, all data records in the table will be changed. (In the above example, the prices of all products would be increased by 5.)
. updateoptions: Here the options LOW PRIORITY and IGNORE may be given. The effect is the same as with INSERT.
. condition: This condition specifies which records are affected by the change.
. columns: With ORDER BY, you can sort the record list before making changes. This makes sense only in combination with LIMIT, for example, to change the first or last ten records (ordered according to some criterion). This possibility has existed since MySQL 4.0.
. maxrecords: With LIMIT, the maximum number of records that may be changed is specified.
Since MySQL 4.0, UPDATE commands can include more than one table. All tables included in the query must be specified after UPDATE. The only tables that are changed are those whose fields were specified by SET. The link between the tables must be set with WHERE conditions.
Delete statement
DELETE [deleteoptions] FROM tablename [WHERE condition] [ORDER BY ordercolumn [DESC]] [LIMIT maxrecords ]
DELETE deletes the records in a table encompassed by condition.
. deleteoptions: The LOW_PRIORITY option has the effect that the data records are deleted only when all read operations are complete. (The goal of this option is to avoid having SELECT queries unnecessarily delayed due to DELETE operations.)
The option QUICK has the effect that during deletion, an existing index is not optimized. This speeds up the DELETE command, but it can lead to a somewhat inefficient index.
. condition: This condition specifies which records are to be deleted.
. ordercolumn: With ORDER BY you can first sort the data to be deleted. This makes sense only in combination with LIMIT, in order, for example, to delete the first or last ten records (according to some sort criterion).
. maxrecords: With LIMIT, the maximum number of records that may be deleted is specified.
If DELETE is executed without conditions, then all records of the table are deleted (so be careful!). DELETE without conditions cannot be part of a transaction. If a transaction is open, it is closed with COMMIT before the DELETE command is executed. If you wish to delete large tables completely, it is more efficient to use the command TRUNCATE.
DELETE [deleteoptions] table1, table2, ...
FROM table1, table2, table3, ...
[USING columns]
WHERE conditions
This variant of DELETE (available since version 4.0) deletes records from tables table1, table2, etc., where the data of additional tables (table3, etc.) are considered in the search criteria.
After DELETE, all tables from which data are to be deleted must be specified. After FROM, all DELETE tables must appear, as well as any additional tables that serve only in formulating the search criteria.
. deleteoptions: Here you can specify options as in a usual DELETE command.
. columns: Here fields that link the tables can be specified. This assumes that the linking field has the same name in both tables.
. conditions: In addition to the usual delete criteria, here one may specify linking conditions (e.g., WHERE = table2.forgeinID).
Retrieving from Multiple Tables with UNION
If you want to create a result set by selecting records from multiple tables one after the other, you can do that using a UNION statement. UNION is available as of MySQL 4, although prior to that you can use a couple of workarounds (shown later).
For the following examples, assume you have three tables, table1, table2, and t3 that look like this:
mysql> SELECT * FROM table1;
| id   | value |
|    1 | red   |
|    2 | blue  |
|    3 | green |
mysql> SELECT * FROM table2;
| id   | value|
|   -1 | tan  |
|    1 | red  |
mysql> SELECT * FROM t3;
| date       | id   |
| 1904-01-01 |  100 |
| 2004-01-01 |  200 |
| 2004-01-01 |  200 |
Tables' table1 and table2 have integer and character columns, and t3 has date and integer columns. To write a UNION statement that combines multiple retrievals, just write several SELECT statements and put the keyword UNION between them.
For example, to select the integer column from each table, do this:
mysql> SELECT id FROM table1 UNION SELECT id FROM table2 UNION SELECT id FROM t3;
| id   |
|    1 |
|    2 |
|    3 |
|   -1 |
|  100 |
|  200 |
UNION has the following properties:
. The names and data types for the columns of the UNION result come from the names and types of the columns in the first SELECT. The second and subsequent SELECT statements in the UNION must select the same number of columns, but they need not have the same names or types.
Columns are matched by position (not by name), which is why these two queries return different results:
mysql> SELECT id, value FROM table1 UNION SELECT id, date FROM t3;
| id   | value      |
|    1 | red        |
|    2 | blue       |
|    3 | green      |
|  100 | 1904-01-01 |
|  200 | 2004-01-01 |
mysql> SELECT id, value FROM table1 UNION SELECT date, id FROM t3;
| id   | value |
|    1 | red   |
|    2 | blue  |
|    3 | green |
| 1904 | 100   |
| 2004 | 200   |
In both cases, the columns selected from table1 (i and c) determine the types used in the UNION result. These columns have integer and string types, so type conversion takes place when selecting values from t3. For the first query, d is converted from date to string. That happens to result in no loss of information. For the second query, d is converted from date to integer (which does lose information), and i is converted from integer to string.
By default, UNION eliminates duplicate rows from the result set:
| id   |value  |
|    1 | red   |
|    2 | blue  |
|    3 | green |
|   -1 | tan   |
| 1904 | 100   |
| 2004 | 200   |
table1 and table2 both have a row containing values of 1 and 'red', but only one such row appears in the output. Also, t3 has two rows containing '2004-01-01' and 200, one of which has been eliminated.
If you want to preserve duplicates, follow the first UNION keyword with ALL:
| id    | value|
|    1 | red   |
|    2 | blue  |
|    3 | green |
|   -1 | tan   |
|    1 | red   |
| 1904 | 100   |
| 2004 | 200   |
| 2004 | 200   |
. To sort a UNION result, add an ORDER BY clause after the last SELECT; it applies to the query result as a whole. However, because the UNION uses column names from the first SELECT, the ORDER BY should refer to those names, not the column names from the last SELECT, if they differ.
mysql> SELECT i, c FROM table1 UNION SELECT i, d FROM t3
-> ORDER BY c;
| id   | value      |
|  100 | 1904-01-01 |
|  200 | 2004-01-01 |
|    2 | blue       |
|    3 | green      |
|    1 | red        |
You can also specify an ORDER BY clause for an individual SELECT statement within the UNION.
To do this, enclose the SELECT (including its ORDER BY) within parentheses:
mysql> (SELECT id, value FROM table1 ORDER BY id DESC)
-> UNION (SELECT id, value FROM table2 ORDER BY id);
| id   | value |
|    3 | green |
|    2 | blue  |
|    1 | red   |
|   -1 | tan   |
. LIMIT can be used in a UNION in a manner similar to that for ORDER BY.
If added to the end of the statement, it applies to the UNION result as a whole:
-> LIMIT 1;
| id   | value|
|    1 | red  |
If enclosed within parentheses as part of an individual SELECT statement, it applies only to that SELECT:
mysql> (SELECT * FROM table1 LIMIT 1)
| id   | value|
|    1 | red  |
|   -1 | tan  |
| 1904 | 100  |
. You need not select from different tables. You can select different subsets of the same table using different conditions. This can be useful as an alternative to running several different SELECT queries, because you get all the rows in a single result set rather than as several result sets.
Prior to MySQL 4, UNION is unavailable, but you can work around this difficulty by selecting rows from each table into a temporary table and then selecting the contents of that table. In MySQL 3.23 and later, you can handle this problem easily by allowing the server to create the holding table for you. Also, you can make the table a temporary table so that it will be dropped automatically when your session with the server terminates. For quicker performance, use a HEAP (in-memory) table.
INSERT INTO tmp SELECT ... FROM table2 WHERE ... ;
Because tmp is a TEMPORARY table, the server will drop it automatically when your client session ends. (Of course, you can drop the table explicitly as soon as you're done with it to allow the server to free resources associated with it. This is a good idea if you will continue to perform further queries, particularly for HEAP tables.)
For versions of MySQL older than 3.23, the concept is similar, but the details differ because the HEAP table type and TEMPORARY tables are unavailable, as is CREATE TABLE ... SELECT. To adapt the preceding procedure, it's necessary to explicitly create the table first before retrieving any rows into it. (The only table type available will be ISAM, so you cannot use a TYPE option.) Then retrieve the records into the table. When you're done with it, you must use DROP TABLE explicitly because the server will not drop it automatically.
CREATE TABLE tmp (column1, column2, ...);
INSERT INTO tmp SELECT ... FROM table1 WHERE ... ;
INSERT INTO tmp SELECT ... FROM table2 WHERE ... ;
If you want to run a UNION-type query on MyISAM tables that have identical structure, you may be able to set up a MERGE table and query that as a workaround for lack of UNION. (In fact, this can be useful even if you do have UNION, because a query on a MERGE table will be simpler than the corresponding UNION query.) A query on the MERGE table is similar to a UNION that selects corresponding columns from the individual tables that make up the MERGE table. That is, SELECT on a MERGE table is like UNION ALL (duplicates are not removed), and SELECT DISTINCT is like UNION (duplicates are removed).
Select Statement Revisited
Simple Queries (SELECT)
Here is a simple database query: SELECT * FROM tablename. This query returns all data records of the table specified. The asterisk indicates that the query is to encompass all the columns of the table.
Query Result
First SQL experiments in MySQL monitor mysql
USE mkdtutorials
SELECT * FROM table1
| id    | name           | city  |
| 11000 | RAVISH TIWARI  | NOIDA |
| 11001 | AMAN KUMAR     | NOIDA |
| 11003 | VATSA          | DELHI |
SELECT can be used without reference to a database or to tables, for example, in the form SELECT 2*3, in which case SELECT returns its result in the form of a small table (one column, one row). This is used relatively often to determine the content of MySQL variables or functions (as in, say, SELECT NOW( ), to determine the current time).
Determining the Number of Data Records (Lines)
Perhaps you do not have a craving to view your data records in gruesome detail, but would like merely to determine how many records there are in one of your tables.
For that, you can use the following query:
| count(id) |
|         4 |
In this query you could specify, instead of publID, any column of the table (or * for all columns). In any case, MySQL optimizes the query and returns only the number of records, without actually reading them.
Column Restriction
Often, you are not interested in all the columns of a table. In such a case you must specify the columns explicitly (instead of using the asterisk).
SELECT name FROM table1;
| name           |
| AMAN KUMAR     |
| VATSA          |
If a query is going to return a large number of data records, you should get accustomed to the idea of specifying explicitly only those columns of interest (instead of taking the lazy person's route of simply typing an asterisk). The reason is that MySQL (unnecessary data extraction), the client program (memory usage), and the network (unnecessary data transfer) work significantly more efficiently if the data set to be processed is limited to the absolute minimum.
Limiting the Number of Resulting Records (LIMIT)
You can limit not only the number of columns in a query result, but also the number of data records. Imagine that your titles table contains the names of 100,000 books, but you would like to access only the first ten of these (for example, to display in an HTML document). To query the remaining 99,990 records would be a gross squandering of CPU time, memory, and network capacity. To avoid such a scenario, you can limit the number of records returned with LIMIT n.
The following command returns two records from the titles table:
select name from table1 LIMIT 2;
| name          |
To return the next two records, execute a new query, but this time with LIMIT offset, n. Here offset specifies the number of the record at which access to the table is to begin. (Warning: The enumeration of data records begins with 0. An offset of n skips the first n records and, since counting begins with 0, starts processing at record number n.)
select name from table1 LIMIT 2,3;
| name                |
| VATSA               |
| aman Kumar	      |
Determing the Number of Records Suppressed by LIMIT (SQL_CALC_FOUND_ROWS, FOUND_ROWS)
When you execute a SELECT query with LIMIT, you obtain only a partial result. Particularly in the page-by-page display of data, it would often be helpful to know exactly how many records are available altogether.
Beginning with MySQL 4.0, you can use in a SELECT query the additional option SQL_CALC_FOUND_ROWS. Then, in a second query you can evaluate the SQL function FOUND_ROWS( ), which tells how many records the query would have returned without LIMIT.
The use of CALC_FOUND_ROWS and FOUND_ROWS is especially useful in complex queries in which a separate SELECT query for determing the number of records would be time intensive. Note, however, that the option CALC_FOUND_ROWS prevents certain kinds of optimization that MySQL carries out in LIMIT queries. Therefore, use CALC_FOUND_ROWS only when you then really wish to evaluate FOUND_ROWS.
Selecting Records by condition
Where keyword
Often, it is not all of the records in a table that are of interest, but only those that satisfy one or more conditions. Such conditionals are introduced with WHERE.
In our first example, we wish to display the records of only those employees who belongs to Software Development department:
select * from employee_details where dept='Software Development';
Like keyword
In our second example, we employ the operator LIKE to compare character strings. The query determines all employees whose names begin with 'R'. With the operator LIKE, the character % serves as a placeholder for an arbitrary character string.
select * from employee_details where name like 'R%';
HAVING keyword
Instead of formulating conditionals with WHERE, you could instead use HAVING. The WHERE conditionals are executed first, while HAVING conditionals are used only on intermediate results (returned by WHERE). The advantage of HAVING is that the conditions can also be applied to calculated fields (for example, to SUM(columnXy) in a GROUP BY query). An example appears in the section after next.
select * from employee_details having net_salary>8000;
Advance Querys
Using JOIN
Up to now, all of our examples with SELECT have been applied to the search for records from a single table. However, with relational databases we are usually dealing with a number of related tables. Therefore, we are usually interested in applying a SELECT command to combine data from several tables. For this we will use the JOIN syntax.
The Full Join
If a SELECT statement names multiple tables in the FROM clause with the names separated by commas, MySQL performs a full join.
For example, if you join table1 and table2 as follows, each row in ttable1 is combined with each row in table2:
select table1.* ,table2.* from table1, table2;
A full join is also called a cross join because each row of each table is crossed with each row in every other table to produce all possible combinations. This is also known as the cartesian product. Joining tables this way has the potential to produce a very large number of rows because the possible row count is the product of the number of rows in each table. A full join between three tables that contain 100, 200, and 300 rows, respectively, could return 100x200x300 = 6 million rows. That's a lot of rows, even though the individual tables are small. In cases like this, a WHERE clause will normally be used to reduce the result set to a more manageable size.
Consider the following query:
select table1.* ,table2.* from table1, table2 where!;
The JOIN and CROSS JOIN join types are equivalent to the ',' (comma) join operator.
For example, the following statements are all the same:
SELECT table1.*, table2.* FROM table1, table2 WHERE =;
SELECT table1.*, table2.* FROM table1 JOIN table2 WHERE =;
SELECT table1.*, table2.* FROM table1 CROSS JOIN table2 WHERE =;
Normally, the MySQL optimizer considers itself free to determine the order in which to scan tables to retrieve rows most quickly. On occasion, the optimizer will make a non-optimal choice. If you find this happening, you can override the optimizer's choice using the STRAIGHT_JOIN keyword. A join performed with STRAIGHT_JOIN is like a cross join but forces the tables to be joined in the order named in the FROM clause.
STRAIGHT_JOIN can be specified at two points in a SELECT statement. You can specify it between the SELECT keyword and the selection list to have a global effect on all cross joins in the statement, or you can specify it in the FROM clause.
SELECT STRAIGHT_JOIN ... FROM table1, table2;
SELECT ... FROM table1 STRAIGHT_JOIN table2;
Left and Right Joins
An equi-join shows only rows where a match can be found in both tables. Left and right joins show matches, too, but also show rows in one table that have no match in the other table. The examples in this section use LEFT JOIN, which identifies rows in the left table that are not matched by the right table. RIGHT JOIN is the same except that the roles of the tables are reversed. (RIGHT JOIN is available only as of MySQL 3.23.25.)
A LEFT JOIN works like this:
You specify the columns to be used for matching rows in the two tables. When a row from the left table matches a row from the right table, the contents of the rows are selected as an output row. When a row in the left table has no match, it is still selected for output, but joined with a "fake" row from the right table in which all the columns have been set to NULL. In other words, a LEFT JOIN forces the result set to contain a row for every row in the left table whether or not there is a match for it in the right table. The rows with no match can be identified by the fact that all columns from the right table are NULL.
SELECT table1.*, table2.* FROM table1 LEFT JOIN table2 ON;
Performing Tranactions
A transaction is a set of SQL statements that are executed as a unit without interruption. One use for transactions is to make sure that the records involved in an operation are not modified by other clients while you're working with them.
MySQL automatically performs locking for single SQL statements to keep clients from interfering with each other. (For example, two clients cannot update the same record in a table simultaneously.) But automatic single-statement locking is not always sufficient to guarantee that a database operation achieves its intended result, because some operations are performed over the course of several statements.
A transaction groups statements into a single execution unit to prevent concurrency problems that might otherwise occur in a multiple-client environment.
Transaction support also includes commit and rollback capabilities, which allows you to require that the statements must execute as a unit or not at all. That is, if the transaction succeeds, you know that all the statements within it executed successfully.
If any part of the transaction fails, any statements executed up to that point within it are undone, leaving the database in the state it was in prior to the point at which the transaction began.
Transactional systems typically are characterized as providing ACID properties. ACID is an acronym for Atomic, Consistent, Isolated, and Durable, referring to four properties that transactions should have:
. Atomicity. The statements a transaction consists of form a logical unit. You can't have just some of them execute.
. Consistency. The database is consistent before and after the transaction executes. In other words, the transaction doesn't make a mess of your database.
. Isolation. One transaction has no effect on another.
. Durability. When a transaction executes successfully to completion, its effects are recorded permanently in the database.
Some of MySQL's table types are non-transactional (ISAM, MyISAM, and HEAP), and some are transactional (BDB and InnoDB). This section describes the types of problems that can occur if you don't pay attention to transactional issues, as well as how to address them using both non-transactional and transactional approaches.
Why they are useful?
The following example illustrates how concurrency problems can occur when multiple clients attempt to make changes to a database using operations that each require several statements. Suppose you're in the footwear sales business and your cash register software automatically updates your inventory levels whenever one of your salesmen processes a sale.
The sequence of events shown here outlines the operations that take place when multiple sales occur. For the example, assume that the initial Shoes inventory level is 47.
1. Salesman A sells three pair of Shoes and registers the sale. The register software begins to update the database by selecting the current shoes count (47):
2. SELECT quantity FROM inventory WHERE item = 'Shoes_M';
3. In the meantime, Salesman B has sold pair of shoes and registered the sale. The software at the second register also begins to update the database:
4. SELECT quantity FROM inventory WHERE item = 'shoes_M';
5. The first register computes the new inventory level to be 47-3 = 44 and updates the Shoes(Male) count accordingly:
6. UPDATE inventory SET quantity = 44 WHERE item = 'shoes_M';
7. The second register computes the new inventory level to be 47-2 = 45 and updates the count:
8. UPDATE inventory SET quantity = 45 WHERE item = 'shoes_M';
At the end of this sequence of events, you've sold five pair of Shoes (that's good), but the inventory level says 45 (that's bad, because it should be 42). The problem is that if you look up the inventory level in one statement and update the value in another statement, you have a multiple-statement operation.
The action taken in the second statement is dependent on the value retrieved in the first. If separate multiple-statement operations occur during overlapping time frames, the statements from each operation intertwine and interfere with each other. To solve this problem, it's necessary that the statements for a given operation execute without interference from other operations
A transactional system ensures this by executing each salesman's statements as a unit. As a result, Salesman B's statements won't execute until those for Salesman A have completed.
Another issue that occurs in database processing with multiple-statement operations is that, unless handled properly, an error occurring partway through the operation can leave your database in a halfway-updated (inconsistent) state.
The typical example of this involves a financial transfer where money from one account is placed into another account. Suppose Ramu writes a check to Vishw for Rs1000.00 and Vishw cashes the check. Ramu's account should be decremented by Rs.1000.00 and Vishw's account incremented by the same amount:
UPDATE account SET balance = balance - 100 WHERE name = 'Ramu' and ac_id=111002232;
UPDATE account SET balance = balance + 100 WHERE name = 'Vishw' and ac_id=111002256;
If a crash occurs between the two statements, the operation is incomplete. If transactional capabilities are not available to you, you have to figure out the state of ongoing operations at crash time by examining the update log manually to determine how to undo them or complete them.
The rollback capabilities of transaction support allow you to handle this situation properly by undoing the effect of the statements that executed before the error occurred. (You may still have to determine which transactions weren't entered and re-issue them, but at least you don't have to worry about half-transactions making your database inconsistent.)
Non-Transactional Approaches to Transactional Problems
In a non-transactional environment, some transactional issues can be dealt with and some cannot. The following discussion covers what you can and cannot achieve without using transactions.
Consider once again the shoes inventory scenario described earlier. To deal with the concurrency issues inherent in that situation, you can take a couple of approaches:
. Lock the tables explicitly. You can group statements and execute them as a unit by surrounding them with LOCK TABLES and UNLOCK TABLES statements. Lock all the tables that you need to use, issue your queries, and release the locks.
This prevents anyone else from changing the tables while you have them locked. Using table locking, the inventory update scenario might be handled as follows:
1. Salesman A sells three shoess and registers the sale. The register software begins the inventory process by acquiring a table lock and retrieving the current shoes count (47):
2. LOCK TABLES inventory WRITE;
3. SELECT quantity FROM inventory WHERE item = 'shoes';
A WRITE lock is necessary here because the ultimate goal of the operation is to modify the inventory table, which involves writing to it.
4. In the meantime, Salesman B has sold two shoess and registered the sale. The software at the second register also begins to update the database by acquiring a lock:
In this case, this statement will block because Salesman A already holds a lock on the table.
5. The first register computes the new inventory level to be 47-3 = 44, updates the shoes count, and releases the lock:
UPDATE inventory SET quantity = 44 WHERE item = 'shoes'; UNLOCK TABLES;
6. When the first register releases the lock, the second register's lock request succeeds, and it can proceed to retrieve the current shoes count (44):
7. SELECT quantity FROM inventory WHERE item = 'shoes';
8. The second register computes the new inventory level to be 44-2 = 42, updates the shoes count, and releases the lock:
UPDATE inventory SET quantity = 42 WHERE item = 'shoes'; UNLOCK TABLES;
Now the statements from the two operations don't get mixed up and the inventory level is set properly.
If you're using multiple tables, you must lock all of them before you execute the grouped queries. If you only read from a particular table, however, you need only a read lock on it, not a write lock.
(This lets other clients read the tables while you're using them, but prevents clients from writing to them.) Suppose you have a set of queries in which you want to make some changes to the inventory table, and you also need to read some data from a customer table. In this case, you need a write lock on the inventory table and a read lock on the customer table:
LOCK TABLES inventory WRITE, customer READ; ... use the tables here ... UNLOCK TABLES;
. Use relative updates, not absolute updates. For the inventory updating method that uses explicit table locking, the operation involves looking up the current inventory level with one statement, computing the new value based on the number of shoess sold, and then updating the level to the new value with another statement.
Another way to keep operations performed by multiple clients from interfering with each other is to reduce each operation to a single statement. This eliminates inter-statement dependencies that arise in multiple-statement operations.
Not every operation can be handled by a single statement, but for the inventory update scenario, this strategy works well. It's possible to perform each inventory update in one step simply by modifying the shoes count relative to its current value:
1. Salesman A sells three shoes and the register software decrements the shoes count by three:
2. UPDATE inventory SET quantity = quantity - 3 WHERE item = 'shoes';
3. Salesman B sells two shoes and the register software decrements the shoes count by two:
4. UPDATE inventory SET quantity = quantity - 2 WHERE item = 'shoes';
With this method, each modification to the database no longer requires multiple statements and thus eliminates concurrency issues. This means there is no need to use explicit table locks. If an operation you want to perform is similar to this, there may be no need for transactions at all.
These non-transactional approaches can be applied successfully to many types of problems, but they have certain limitations:
. Not every operation can be written in terms of relative updates. Sometimes you must use multiple statements, in which case concurrency issues have to be considered and dealt with.
. You may be able to keep clients from interfering with each other by locking tables for the duration of a multiple-statement operation, but what happens if an error occurs in the middle of the operation? In this case, you'd want the effects of the earlier statements to be undone so that the database isn't left in a half-modified and inconsistent state. Unfortunately, although table locking can help you address concurrency issues, it provides no assistance in recovering from errors.
. The locking strategy requires you to lock and unlock your tables yourself. If you revise the operation to be performed in such a way that the set of tables affected changes, you must remember to modify the LOCK TABLES statement accordingly. A database system with transaction support would determine which locks are necessary and acquire them automatically.
Transactional capabilities help you deal with all these issues. A transaction handler executes a set of statements as a unit and manages concurrency issues by preventing clients from getting in the way of each other.
It also allows rollback in the case of failure to keep half-executed operations from damaging your database, and it automatically acquires any locks that are necessary.
Using Transactions to Ensure Safe Statement Execution
To use transactions, you must use a transactional table type. The ISAM, MyISAM, and HEAP table types will not work; you must use either BDB or InnoDB tables. The BDB and InnoDB handlers first appeared in binary distributions in MySQL 3.23.17 and 3.23.29, respectively, and were added to source distributions as of MySQL 3.23.34. However, it's best to use more recent distributions if possible, to take advantage of the improvements that have been made since then.
By default, MySQL runs in auto-commit mode, which means that changes made by individual statements are committed to the database immediately to make them permanent. In effect, each statement is its own transaction. To perform transactions explicitly, disable auto-commit mode and then tell MySQL when to commit or roll back changes.
One way to perform a transaction is to issue a BEGIN statement to disable auto-commit mode, execute the statements that make up the transaction, and end the transaction with a COMMIT statement to make the changes permanent. If an error occurs during the transaction, cancel it by issuing a ROLLBACK statement instead to undo the changes. BEGIN suspends the current auto-commit mode, so after the transaction has been committed or rolled back, the mode reverts to its state prior to the BEGIN. (If auto-commit was enabled beforehand, ending the transaction puts you back in auto-commit mode. If it was disabled, ending the current transaction causes you to begin the next one.)
The following example illustrates this approach. First, create a table to use:
mysql> CREATE TABLE t (name CHAR(20), UNIQUE (name)) TYPE = INNODB;
The statement creates an InnoDB table, but you can use BDB if you like. Next, initiate a transaction with BEGIN, add a couple of rows to the table, commit the transaction, and see what the table looks like:
mysql> BEGIN;
mysql> INSERT INTO t SET name = 'William';
mysql> INSERT INTO t SET name = 'Wallace';
mysql> COMMIT; mysql> SELECT * FROM t;
| name    |
| Wallace |
| William |
You can see that the rows have been recorded in the table. If you had started up another instance of mysql and selected the contents of t after the inserts but before the commit, the rows would not show up. They would not become visible to the other mysql process until the COMMIT statement had been issued by the first process.
If an error occurs during a transaction, you can cancel it with ROLLBACK. Using the t table again, you can see this by issuing the following statements:
mysql> BEGIN;
mysql> INSERT INTO t SET name = 'Gromit';
mysql> INSERT INTO t SET name = 'Wallace'; ERROR 1062: Duplicate entry 'Wallace' for key 1
mysql> ROLLBACK;
mysql> SELECT * FROM t;
| name    |
| Wallace |
| William |
The second INSERT attempts to place a row into the table that duplicates an existing name value. The statement fails because name has a UNIQUE index. After issuing the ROLLBACK, the table has only the two rows that it contains prior to the failed transaction. In particular, the INSERT that was performed just prior to the point of the error has been undone and its effect is not recorded in the table.
Issuing a BEGIN statement while a transaction is in process commits the current transaction implicitly before beginning a new one.
Another way to perform transactions is to manipulate the auto-commit mode directly using SET statements:
Setting AUTOCOMMIT to zero disables auto-commit mode. The effect of any following statements become part of the current transaction, which you end by issuing a COMMIT or ROLLBACK statement to commit or cancel it. With this method, auto-commit mode remains off until you turn it back on, so ending one transaction also begins the next one. You can also commit a transaction by re-enabling auto-commit mode.
To see how this approach works, begin with the same table as for the previous examples:
mysql> DROP TABLE t;
mysql> CREATE TABLE t (name CHAR(20), UNIQUE (name)) TYPE = INNODB;
Then disable auto-commit mode, insert some records, and commit the transaction:
mysql> SET AUTOCOMMIT = 0;
mysql> INSERT INTO t SET name = 'William';
mysql> INSERT INTO t SET name = 'Wallace';
mysql> COMMIT;
mysql> SELECT * FROM t;
| name    |
| Wallace |
| William |
At this point, the two records have been committed to the table, but auto-commit mode remains disabled. If you issue further statements, they become part of a new transaction, which may be committed or rolled back independently of the first transaction. To verify that auto-commit is still off and that ROLLBACK will cancel uncommitted statements, issue the following queries:
mysql> INSERT INTO t SET name = 'Gromit';
mysql> INSERT INTO t SET name = 'Wallace'; ERROR 1062: Duplicate entry 'Wallace' for key 1
mysql> ROLLBACK;
mysql> SELECT * FROM t;
| name    |
| Wallace |
| William |
To restore auto-commit mode, use the following statement:
Transactions also end under the following circumstances:
. In addition to statements like SET AUTOCOMMIT, BEGIN, COMMIT, and ROLLBACK that affect transactions explicitly, certain other statements do so implicitly because they cannot be part of a transaction. If you issue any of these while a transaction is in progress, the server commits the transaction first before executing the statement. Statements that cause a commit are as follows:
(if tables currently are locked)
. If the client connection ends or is broken during a transaction before a commit occurs, the server rolls back the transaction automatically.
Transactions are useful in all kinds of situations. For example, suppose you're working with the score table that is part of the grade-keeping project and you discover that the grades for two students have gotten mixed up and need to be switched. The grades as entered incorrectly are as follows:
mysql> SELECT * FROM score WHERE event_id = 5 AND student_id IN (8,9);
| student_id | event_id | score |
|          8 |        5 |    18 |
|          9 |        5 |    13 |
To fix this, student 8 should be given a score of 13 and student 9 a score of 18. That can be done easily with two statements:
UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8;
UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9;
However, it's necessary to ensure that both statements succeed as a unit-a problem to which transactional methods can be applied. To use BEGIN, do the following:
mysql> BEGIN;
mysql> UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8; mysql> UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9; mysql> COMMIT;
To accomplish the same thing by manipulating the auto-commit mode explicitly instead, do this:
mysql> SET AUTOCOMMIT = 0;
mysql> UPDATE score SET score = 13 WHERE event_id = 5 AND student_id = 8; mysql> UPDATE score SET score = 18 WHERE event_id = 5 AND student_id = 9; mysql> COMMIT; mysql> SET AUTOCOMMIT = 1;
Either way, the result is that the scores are swapped properly:
mysql> SELECT * FROM score WHERE event_id = 5 AND student_id IN (8,9);
| student_id | event_id | score |
|          8 |        5 |    13 |
|          9 |        5 |    18 |
Using FULLTEXT Searches
Versions of MySQL from 3.23.23 on include the capability for performing full text searches. The full text search engine allows you to look for words or phrases without using pattern-matching operations. This capability is enabled for a given table by creating a special kind of index and has the following characteristics:
. Full text searches are based on FULLTEXT indexes, which may be created only for MyISAM tables, and only for TEXT columns and non-BINARY CHAR and VARCHAR columns.
. FULLTEXT searches are not case sensitive. This follows as a consequence of the column types for which FULLTEXT indexes may be used.
. Common words are ignored for FULLTEXT searches, where "common" means "present in at least half the records." It's especially important to remember this when you're setting up a test table to experiment with the FULLTEXT capability. (Be sure to insert at least three records into your test table. If the table has just one or two records, every word in it will occur at least 50 percent of the time and you'll never get any results!) Certain very common words, such as "the," "after," and "other," are stop words that are always ignored. Words that are too short also are ignored. By default, "too short" is defined as less than four characters, but with a recent enough server may be set lower.
. Words are defined as sequences of characters that include letters, digits, apostrophes, and underscores. This means that a string like "full-blooded" is considered to contain two words-"full" and "blooded." Normally, a full text search matches whole words, not partial words, and the FULLTEXT engine considers a record to match a search string if it includes any of the words in the search string.
A variant form of search called a boolean full text search allows you to impose the additional constraint that all the words must be present (either in any order, or, to perform a phrase search, in exactly the order listed in the search string). With a boolean search, it's also possible to match records that do not include certain words or to add a wildcard modifier to match all words that begin with a given prefix.
. A FULLTEXT index can be created for a single column or multiple columns. If it spans multiple columns, searches based on the index look through all the columns simultaneously. The flip side of this is that when you perform a search, you must specify a column list that corresponds exactly to the set of columns that matches some FULLTEXT index. For example, if you want to search col1 sometimes, col2 sometimes, and both col1 and col2 sometimes, you should have three indexes: one for each of the columns separately, and one that includes both columns.
The following table shows the versions at which FULLTEXT features were introduced:
Feature Version

Basic FULLTEXT searching


Configurable parameters


Boolean searches


Phrase searches


The following examples show how to use full text searching by creating FULLTEXT indexes and then performing queries on them using the MATCH operator.
A FULLTEXT index is created the same way as other indexes. That is, you can define it with CREATE TABLE when creating the table initially or add it afterward with ALTER TABLE or CREATE INDEX. Because FULLTEXT indexes require you to use MyISAM tables, you can take advantage of one of the properties of the MyISAM handler if you're creating a new table to use for FULLTEXT searches: Table loading proceeds more quickly if you populate the table and then add the indexes afterward rather than loading data into an already indexed table.
Suppose you have a data file named sayings.txt containing famous sayings and the people to whom they're attributed:
Aeschylus -Time as he grows old teaches many lessons
Alexander Graham Bell- Mr. Watson, come here. I want you!
Benjamin Franklin- It is hard for an empty bag to stand upright
Benjamin Franklin- Little strokes fell great oaks
Benjamin Franklin- Remember that time is money
Napolian Bonaparta- Impossible is a word found in fool's dictionary
Captain Corelli's Mandolin- Love itself is what is left over when being in love has burned away, and this is both an art and a fortunate accident. Your mother and I had it, we had roots that grew towards each other underground, and when all the pretty blossom had fallen from our branches we found that we were one tree and not two.
Miguel de Cervantes- Bell, book, and candle
Proverbs 15:1- A soft answer turneth away wrath
Theodore Roosevelt- Speak softly and carry a big stick
William Shakespeare- But, soft! what light through yonder window
If you want to search by phrase and attribution separately or together, you need to index the columns separately and also create an index that includes both columns. You can create, populate, and index a table named quotes as follows:
CREATE TABLE quotes (attribution VARCHAR(40), phrase TEXT);
LOAD DATA LOCAL INFILE 'C:\sayings.txt' INTO TABLE quotes;
ADD FULLTEXT (phrase),
ADD FULLTEXT (attribution),
ADD FULLTEXT (phrase, attribution);
After setting up the table, perform searches on it using MATCH to name the column or columns to search and AGAINST() to specify the search string. For example:
mysql> SELECT * FROM quotes WHERE MATCH(attribution) AGAINST('roosevelt');
| attribution        | phrase                             |
| Theodore Roosevelt | Speak softly and carry a big stick |
mysql> SELECT * FROM quotes WHERE MATCH(phrase) AGAINST('time');
| attribution       | phrase                                    |
| Benjamin Franklin | Remember that time is money               |
| Aeschylus         | Time as he grows old teaches many lessons |
mysql> SELECT * FROM quotes WHERE MATCH(attribution,phrase) -> AGAINST('bell');
| attribution           | phrase                             |
| Alexander Graham Bell | Mr. Watson, come here. I want you! |
| Miguel de Cervantes   | Bell, book, and candle             |
In the last example, note how the query finds records that contain the search word in different columns, which demonstrates the FULLTEXT capability of searching multiple columns at once. Also note that the order of the columns as named in the query is attribution, phrase.
That differs from the order in which they were named when the index was created (phrase, attribution), which illustrates that order does not matter. What matters is that there must be some FULLTEXT index that consists of exactly the columns named.
If you just want to see how many records a search matches, use COUNT(*):
mysql> SELECT COUNT(*) FROM quotes WHERE MATCH(phrase) AGAINST('time');
| COUNT(*) |
|        2 |
By default, output rows for FULLTEXT searches are ordered by decreasing relevance when you use a MATCH expression in the WHERE clause. Relevance values are non-negative floating point values, with zero indicating "no relevance." To see these values, use a MATCH expression in the column output list:
mysql> SELECT phrase, MATCH(phrase) AGAINST('time') AS relevance -> FROM quotes;
Out Put: Try it Yourself
By default, a search finds records that contain any of the search words, so a query like the following will return records with either "hard" or "soft":
mysql> SELECT * FROM quotes WHERE MATCH(phrase) -> AGAINST('hard soft');
| attribution      | phrase             |
+----------------- +--------------------+
|B. Franklin       | It is hard. ...    |
|Proverbs 15:1     | A soft answer  ... |
|W. Shakespeare    | But, soft! what .. |
Greater control over multiple-word matching can be obtained as of MySQL 4.0.1, when support was added for boolean mode FULLTEXT searches. This type of search is performed by adding IN BOOLEAN MODE after the search string in the AGAINST() function. Boolean searches have the following characteristics:
. The 50% rule is ignored; searches will find words even if they occur in more than half of the records.
. Results are not sorted by relevance.
. Modifiers can be applied to words in the search string. A leading plus or minus sign requires a word to be present or not present in matching records. A search string of 'bell' matches records that contain "bell," but a search string of '+bell-candle' in boolean mode matches only records that contain "bell" and do not contain "candle."
. mysql> SELECT * FROM quotes
. -> WHERE MATCH(attribution,phrase)
. -> AGAINST('bell');
| attribution           | phrase                             |
| Alexander Graham Bell | Mr. Watson, come here. I want you! |
| Miguel de Cervantes   | Bell, book, and candle             |
. mysql> SELECT * FROM quotes
. -> WHERE MATCH(attribution,phrase)
. -> AGAINST('+bell -candle' IN BOOLEAN MODE);
| attribution           | phrase                             |
| Alexander Graham Bell | Mr. Watson, come here. I want you! |
A trailing asterisk acts as a wildcard so that any record containing words beginning with the search word match. For example, 'soft*' matches "soft," "softly," "softness," and so on:
mysql> SELECT * FROM quotes WHERE MATCH(phrase) -> AGAINST('soft*' IN BOOLEAN MODE);
| attribution         | phrase                    |
| Proverbs 15:1       | A soft answer . . .       |
| W. Shakespeare      |  But, soft! what light .  |
| Theodore Roose      |Speak softly and carry     |
However, the wildcard feature cannot be used to match words shorter than the minimum index word length.
. Stop words are ignored just as for non-boolean searches, even if marked as required. A search for '+Alexander +the +great' will find records containing "Alexander" and "great," but will ignore the stop word "the."
. A phrase search can be performed to require all words to be present in a particular order. Phrase searching requires MySQL 4.0.2. Enclose the search string in double quotes and include punctuation and whitespace as present in the phrase you want to match. In other words, you must specify the exact phrase:
. mysql> SELECT * FROM quotes
. -> WHERE MATCH(attribution,phrase)
. -> AGAINST('"bell book and candle"' IN BOOLEAN MODE);
. Empty set (0.00 sec)
. mysql> SELECT * FROM quotes
. -> WHERE MATCH(attribution,phrase)
. -> AGAINST('"bell, book, and candle"' IN BOOLEAN MODE);
| attribution         | phrase                 |
| Miguel de Cervantes | Bell, book, and candle |
. It's possible to perform a boolean mode full text search on columns that are not part of a FULLTEXT index, although this will be much slower than using indexed columns.
Prior to MySQL 4, FULLTEXT search parameters can be modified only by making changes to the source code and recompiling the server. MySQL 4 provides several configurable parameters that can be modified by setting server variables.
The two that are of most interest are ft_min_word_len and ft_max_word_len, which determine the shortest and longest words that will be indexed. The default values are 4 and 254; words with lengths outside that range are ignored when FULLTEXT indexes are built.
Suppose you want to change the minimum word length from 4 to 3. Do so like this:
1. Start the server with the ft_min_word_len variable set to 3. To ensure that this happens whenever the server starts, it's best to place the setting in an option file such as /etc/my.cnf:
2. [mysqld]
3. set-variable = ft_min_word_len=3
4. For any existing tables that already have FULLTEXT indexes, you must rebuild those indexes. You can drop and add the indexes, but it's easier just to do the following:
6. Any new FULLTEXT indexes that you create after changing the parameter will use the new value automatically.
Query Optimization
Advantages & Disadvantages of Indexing
Advantages of Indexing
Let's consider how an index works by beginning with a table that has no indexes. An unindexed table is simply an unordered collection of rows.
Unindexed ad table
shows the same table but with the addition of an index on the company_num column in the ad table. The index contains an entry for each row in the ad table, but the index entries are sorted by company_num value. Now, instead of searching through the table row by row looking for items that match, we can use the index. Suppose we're looking for all rows for company 13.
We begin scanning the index and find three rows for that company. Then we reach the row for company 14, a value higher than the one we're looking for. Index values are sorted, so when we read the record containing 14, we know we won't find any more matches and can quit looking. Thus, one efficiency gained by using the index is that we can tell where the matching rows end and can skip the rest.
Another efficiency is that there are positioning algorithms for finding the first matching entry without doing a linear scan from the start of the index (for example, a binary search is much quicker than a scan).
That way, we can quickly position to the first matching value and save a lot of time in the search. Databases use various techniques for positioning to index values quickly, but it's not so important here what those techniques are. What's important is that they work and that indexing is a good thing.
Indexed Table
You may be asking why we don't just sort the data file and dispense with the index file. Wouldn't that produce the same type of improvement in search speed? Yes, it would--if the table had a single index. But you might want to add a second index, and you can't sort the data file two different ways at once. (For example, you might want one index on customer names and another on customer ID numbers or phone numbers.)
Using indexes as entities separate from the data file solves the problem and allows multiple indexes to be created. In addition, rows in the index are generally shorter than data rows. When you insert or delete new values, it's easier to move around shorter index values to maintain the sort order than to move around the longer data rows.
The example just described corresponds in general to the way MySQL indexes tables, although the particular details vary for different table types. For example, for a MyISAM or ISAM table, the table's data rows are kept in a data file, and index values are kept in an index file. You can have more than one index on a table; if you do, they're all stored in the same index file.
Each index in the index file consists of a sorted array of key records that are used for fast access into the data file. By contrast, the BDB and InnoDB table handlers do not separate data rows and index values in the same way, although both maintain indexes as sets of sorted values.
The BDB handler uses a single file per table to store both data and index values, and the InnoDB handler uses a single tablespace within which it manages data and index storage for all InnoDB tables.
The preceding discussion describes the benefit of an index in the context of single-table queries, where the use of an index speeds searches significantly by eliminating the need for full table scans.
However, indexes are even more valuable when you're running queries involving joins on multiple tables. In a single-table query, the number of values you need to examine per column is the number of rows in the table. In a multiple-table query, the number of possible combinations skyrockets because it's the product of the number of rows in the tables.
Suppose you have three unindexed tables, t1, t2, and t3, each containing a column c1, c2, and c3, respectively, and each consisting of 1000 rows that contain the numbers 1 through 1000. A query to find all combinations of table rows in which the values are equal looks like this:
SELECT t1.c1, t2.c2, t3.c3
FROM t1, t2, t3
WHERE t1.c1 = t2.c2 AND t1.c1 = t3.c3;
The result of this query should be 1000 rows, each containing three equal values. If we process the query in the absence of indexes, we have no idea which rows contain which values. Consequently, we must try all combinations to find the ones that match the WHERE clause.
The number of possible combinations is 1000x1000x1000 (1 billion!), which is a million times more than the number of matches. That's a lot of wasted effort, and this query is likely to be very slow, even for a database such as MySQL that is very fast. And that is with only 1000 rows per table.
What happens when you have tables with millions of rows? As tables grow, the time to process joins on those tables grows even more if no indexes are used, leading to very poor performance. If we index each table, we can speed things up considerably because indexing allows the query to be processed as follows:
. Select the first row from table t1 and see what value the row contains.
. Using the index on table t2, go directly to the row that matches the value from t1. Similarly, using the index on table t3, go directly to the row that matches the value from t1.
. Proceed to the next row of table t1 and repeat the preceding procedure until all rows in t1 have been examined.
In this case, we're still performing a full scan of table t1, but we're able to do indexed lookups on t2 and t3 to pull out rows from those tables directly. The query runs about a million times faster this way-literally.
(This example is contrived for the purpose of making a point, of course. Nevertheless, the problems it illustrates are real, and adding indexes to tables that have none often results in dramatic performance gains.)
MySQL uses indexes as just described to speed up searches for rows matching terms of a WHERE clause or rows that match rows in other tables when performing joins. It also uses indexes to improve the performance of other types of operations:
. For queries that use the MIN() or MAX() functions, the smallest or largest value in a column can be found quickly without examining every row if the column is indexed.
. MySQL can often use indexes to perform sorting and grouping operations quickly for ORDER BY and GROUP BY clauses.
. Sometimes MySQL can use an index to avoid reading data rows entirely. Suppose you're selecting values from an indexed numeric column in a MyISAM table and you're not selecting other columns from the table.
. Sometimes MySQL can use an index to avoid reading data rows entirely. Suppose you're selecting values from an indexed numeric column in a MyISAM table and you're not selecting other columns from the table.
Disadvantages of Indexing
In general, if MySQL can figure out how to use an index to process a query more quickly, it will. This means that, for the most part, if you don't index your tables, you're hurting yourself. You can see that I'm painting a rosy picture of the benefits of indexing. Are there disadvantages? Yes, there are. In practice, these drawbacks tend to be outweighed by the advantages, but you should know what they are.
First, an index takes up disk space, and multiple indexes take up correspondingly more space. This may cause you to reach a table size limit more quickly than if there are no indexes:
. For ISAM and MyISAM tables, indexing a table heavily may cause the index file to reach its maximum size more quickly than the data file.
. For BDB tables, which store data and index values together in the same file, adding indexes will certainly cause the table to reach the maximum file size more quickly.
. InnoDB tables all share space within the InnoDB tablespace. Adding indexes depletes storage within the tablespace more quickly. However, as long as you have additional disk space, you can expand the tablespace by adding new components to it. (Unlike files used for ISAM, MyISAM, and BDB tables, the InnoDB tablespace is not bound by your operating system's file size limit, because it can comprise multiple files.)
Second, indexes speed up retrievals but slow down inserts and deletes as well as updates of values in indexed columns. That is, indexes slow down most operations involving writing. This occurs because writing a record requires writing not only the data row, it requires changes to any indexes as well. The more indexes a table has, the more changes need to be made, and the greater the average performance degradation.
Choosing Index Type
The syntax for creating indexes was covered earlier in the tutorial , "MySQL SQL Syntax and Use." I assume here that you've read that section. But knowing syntax doesn't in itself help you determine how your tables should be indexed.
That requires some thought about the way you use your tables. This section gives some guidelines on how to identify candidate columns for indexing and how best to set up indexes:
. Index columns that you use for searching, sorting, or grouping, not columns you display as output. In other words, the best candidate columns for indexing are the columns that appear in your WHERE clause, columns named in join clauses, or columns that appear in ORDER BY or GROUP BY clauses. Columns that appear only in the output column list following the SELECT keyword are not good candidates:
. SELECT col_a FROM tbl1 LEFT JOIN tbl2 ON tbl1.col_b = tbl2.col_c candidates WHERE col_d = expr;
The columns that you display and the columns you use in the WHERE clause might be the same, of course. The point is that appearance of a column in the output column list is not in itself a good indicator that it should be indexed.
Columns that appear in join clauses or in expressions of the form col1 = col2 in WHERE clauses are especially good candidates for indexing. col_b and col_c in the query just shown are examples of this. If MySQL can optimize a query using joined columns, it cuts down the potential table-row combinations quite a bit by eliminating full table scans.
. Use unique indexes. Consider the spread of values in a column. Indexes work best for columns with unique values and most poorly with columns that have many duplicate values. For example, if a column contains many different age values, an index will differentiate rows readily.
An index probably will not help much for a column that is used to record sex and contains only the two values 'M' and 'F'. If the values occur about equally, you'll get about half of the rows whichever value you search for. Under these circumstances, the index may never be used at all because the query optimizer generally skips an index in favor of a full table scan if it determines that a value occurs in more than about 30 percent of a table's rows.
. Index short values.
If you're indexing a string column, specify a prefix length whenever it's reasonable to do so. For example, if you have a CHAR(200) column, don't index the entire column if most values are unique within the first 10 or 20 bytes. Indexing the first 10 or 20 bytes will save a lot of space in the index, and probably will make your queries faster as well.
A smaller index involves less disk I/O, and shorter values can be compared more quickly. More importantly, with shorter key values, blocks in the index cache hold more key values, so MySQL can hold more keys in memory at once. This improves the likelihood of locating rows without reading additional index blocks from disk.
(You want to use some common sense, of course. Indexing just the first character from a column isn't likely to be that helpful because there won't be very many distinct values in the index.)
. Take advantage of leftmost prefixes.
When you create an n-column composite index, you actually create n indexes that MySQL can use. A composite index serves as several indexes because any leftmost set of columns in the index can be used to match rows. Such a set is called a leftmost prefix.
Suppose you have a table with a composite index on columns named state, city, and zip. Rows in the index are sorted in state/city/zip order, so they're automatically sorted in state/city order and in state order as well.
This means that MySQL can take advantage of the index even if you specify only state values in a query or only state and city values. Thus, the index can be used to search the following combinations of columns:
state, city, zip
state, city
MySQL cannot use the index for searches that don't involve a leftmost prefix. For example, if you search by city or by zip, the index isn't used.
If you're searching for a given state and a particular Zip code (columns 1 and 3 of the index), the index can't be used for the combination of values, although MySQL can narrow the search using the index to find rows that match the state.
. Don't over-index.
Don't index everything in sight based on the assumption "the more, the better." That's a mistake. Every additional index takes extra disk space and hurts performance of write operations, as has already been mentioned.
Indexes must be updated and possibly reorganized when you modify the contents of your tables, and the more indexes you have, the longer this takes. If you have an index that is rarely or never used, you'll slow down table modifications unnecessarily. In addition, MySQL considers indexes when generating an execution plan for retrievals. Creating extra indexes creates more work for the query optimizer.
It's also possible (if unlikely) that MySQL will fail to choose the best index to use when you have too many indexes. Maintaining only the indexes you need helps the query optimizer avoid making such mistakes.
If you're thinking about adding an index to a table that is already indexed, consider whether the index you're thinking about adding is a leftmost prefix of an existing multiple-column index.
If so, don't bother adding the index because, in effect, you already have it. (For example, if you already have an index on state, city, and zip, there is no point in adding an index on state.)
. Consider the type of comparisons you perform on a column. Generally, indexes are used for <, <=, =, >=, >, and BETWEEN operations. Indexes are also used for LIKE operations when the pattern has a literal prefix.
If you use a column only for other kinds of operations, such as STRCMP(), there is no value in indexing it. For HEAP tables, indexes are hashed and are used only for equality comparisons. If you perform a range search (such as a < b) with a HEAP table, an index will not help.
. Use the slow-query log to identify queries that may be performing badly. This log can help you find queries that may benefit from indexing. Use the mysqldumpslow utility to view this log.
If a given query shows up over and over in the slow-query log, that's a clue that you've found a query that may not be written optimally. You may be able to rewrite it to make it run more quickly. Keep the following points in mind when assessing your slow-query log:
o "Slow" is measured in real time, so more queries will show up in the slow-query log on a heavily loaded server than on a lightly loaded one. You'll need to take this into account.
o If you use the --log-long-format option in addition to enabling slow-query logging, the log also will include queries that execute without using any index. These queries aren't necessarily slow. (No index may be needed for small tables, for example.)
MySQL Query Optimizer
When you issue a query that selects rows, MySQL analyzes it to see if any optimizations can be used to process the query more quickly. In this section, we'll look at how the query optimizer works. For additional information, consult the optimization chapter in the MySQL Reference Manual; it describes various optimization measures that MySQL takes.
The MySQL query optimizer takes advantage of indexes, of course, but it also uses other information. For example, if you issue the following query, MySQL will execute it very quickly, no matter how large the table is:
SELECT * FROM tbl_name WHERE 1 = 0;
In this case, MySQL looks at the WHERE clause, realizes that no rows can possibly satisfy the query, and doesn't even bother to search the table.
You can see this by issuing an EXPLAIN statement, which tells MySQL to display some information about how it would execute a SELECT query without actually executing it. To use EXPLAIN, just put the word EXPLAIN in front of the SELECT statement:
mysql> EXPLAIN SELECT * FROM tbl_name WHERE 1 = 0;
| Comment          |
| Impossible WHERE |
Normally, EXPLAIN returns more information than that, including information about the indexes that will be used to scan tables, the types of joins that will be used, and estimates of the number of rows that will need to be scanned from each table.
How the Optimizer Works
The MySQL query optimizer has several goals, but its primary aims are to use indexes whenever possible and to use the most restrictive index to eliminate as many rows as possible as soon as possible. That last part may sound backward because it's non-intuitive. After all, because your goal in issuing a SELECT statement is to find rows, not to reject them.
The reason the optimizer works this way is that the faster it can eliminate rows from consideration, the more quickly the rows that do match your criteria can be found. Queries can be processed more quickly if the most restrictive tests can be done first. Suppose you have a query that tests two columns, each of which has an index on it:
SELECT col3 FROM mytable WHERE col1 = 'some value' AND col2 = 'some other value';
Suppose also that the test on col1 matches 900 rows, the test on col2 matches 300 rows, and that both tests succeed on 30 rows. Testing col1 first results in 900 rows that must be examined to find the 30 that also match the col2 value.
That's 870 failed tests. Testing col2 first results in 300 rows that must be examined to find the 30 that also match the col1 value. That's only 270 failed tests, so less computation and disk I/O is required. As a result, the optimizer will attempt to test col2 first.
You can help the optimizer take advantage of indexes by using the following guidelines.
Try to compare columns that have the same type. When you use indexed columns in comparisons, use columns that are of the same type. For example, CHAR(10) is considered the same as CHAR(10) or VARCHAR(10) but different than CHAR(12) or VARCHAR(12). INT is different than BIGINT.
Using columns of the same type is a requirement prior to MySQL 3.23, or indexes on the columns will not be used. From 3.23 on, this is not strictly necessary, but identical column types will still give you better performance than dissimilar types. If the columns you're comparing are of different types, you can use ALTER TABLE to modify one of them so that the types match.
Try to make indexed columns stand alone in comparison expressions. If you use a column in a function call or as part of a more complex term in an arithmetic expression, MySQL can't use the index because it must compute the value of the expression for every row. Sometimes this is unavoidable, but many times you can rewrite a query to get the indexed column to appear by itself.
The following WHERE clauses illustrate how this works. They are equivalent arithmetically, but quite different for optimization purposes. For the first line, the optimizer will simplify the expression 4/2 to the value 2 and then use an index on mycol to quickly find values less than 2. For the second expression, MySQL must retrieve the value of mycol for each row, multiply by 2, and then compare the result to 4.
In this case, no index can be used, because each value in the column must be retrieved so that the expression on the left side of the comparison can be evaluated:
WHERE mycol < 4 / 2
WHERE mycol * 2 < 4
Let's consider another example. Suppose you have an indexed column date_col. If you issue a query such as the following, the index isn't used:
SELECT * FROM mytbl WHERE YEAR(date_col) < 1990;
The expression doesn't compare an indexed column to 1990; it compares a value calculated from the column value, and that value must be computed for each row.
As a result, the index on date_col is not used because performing the query requires a full table scan. What's the fix? Just use a literal date, and the index on date_col can be used to find matching values in the columns:
WHERE date_col < '1990-01-01'
But suppose you don't have a specific date. You might be interested instead in finding records that have a date that lies within a certain number of days from today. There are several ways to express a comparison of this type-not all of which are equally good. Three possibilities are as follows:
WHERE TO_DAYS(date_col) - TO_DAYS(CURDATE()) < cutoff
WHERE TO_DAYS(date_col) < cutoff + TO_DAYS(CURDATE())
For the first line, no index is used because the column must be retrieved for each row so that the value of TO_DAYS(date_col) can be computed. The second line is better. Both cutoff and TO_DAYS(CURDATE()) are constants, so the right hand side of the comparison can be calculated by the optimizer once before processing the query, rather than once per row.
But the date_col column still appears in a function call, so the index isn't used. The third line is best of all. Again, the right side of the comparison can be computed once as a constant before executing the query, but now the value is a date. That value can be compared directly to date_col values, which no longer need to be converted to days. In this case, the index can be used.
Don't use wildcards at the beginning of a LIKE pattern. Sometimes people search for strings using a WHERE clause of the following form:
WHERE col_name LIKE '%string%'
That's the correct thing to do if you want to find string no matter where it occurs in the column. But don't put '%' on both sides of the string simply out of habit. If you're really looking for the string only when it occurs at the beginning of the column, leave out the first '%'.
Suppose you're looking in a column containing last names for names like MacGregor or MacDougall that begin with 'Mac'. In that case, write the WHERE clause like this:
WHERE last_name LIKE 'Mac%'
The optimizer looks at the literal initial part of the pattern and uses the index to find rows that match as though you'd written the following expression, which is in a form that allows an index on last_name to be used:
WHERE last_name >= 'Mac' AND last_name < 'Mad'
This optimization does not apply to pattern matches that use the REGEXP operator.
Help the optimizer make better estimates about index effectiveness. By default, when you are comparing values in indexed columns to a constant, the optimizer assumes that key values are distributed evenly within the index. The optimizer will also do a quick check of the index to estimate how many entries will be used when determining whether or not the index should be used for constant comparisons.
For MyISAM and BDB tables, you can tell the server to perform an analysis of key values by using ANALYZE TABLE. This provides the optimizer with better information. Another option, for MyISAM tables, is to run myisamchk --analyze (or isamchk --analyze for ISAM tables). These utilities operate directly on the table files, so two conditions must be satisfied in order to use them for key analysis:
. You must have an account on the MySQL server host that allows you write access to the table files.
. You must cooperate with the server for access to the table files, because you don't want it to be accessing the table while you're working with its files.
Use EXPLAIN to verify optimizer operation. Check to see that indexes are being used in your query to reject rows quickly. If not, you might try using STRAIGHT_JOIN to force a join to be done using tables in a particular order.
(Run the query both with and without STRAIGHT_JOIN; MySQL may have some good reason not to use indexes in the order you think is best.) As of MySQL 3.23.12, you can also try USE INDEX or IGNORE INDEX to give the server hints about which indexes to prefer.
Test alternate forms of queries, but run them more than once. When testing alternate forms of a query, run it several times each way. If you run a query only once each of two different ways, you'll often find that the second query is faster just because information from the first query is still in the disk cache and need not actually be read from the disk. You should also try to run queries when the system load is relatively stable to avoid effects due to other activities on your system.
Avoid overuse of MySQL's automatic type conversion. MySQL will perform automatic type conversion, but if you can avoid conversions, you may get better performance. For example, if num_col is an integer column, the following two queries both will return the same result:
SELECT * FROM mytbl WHERE num_col = 4;
SELECT * FROM mytbl WHERE num_col = '4';
But the second query involves a type conversion. The conversion operation itself involves a small performance penalty for converting the integer and string to double to perform the comparison.
A more serious problem is that if num_col is indexed, a comparison that involves type conversion may prevent the index from being used.
Overriding Optimization
It sounds odd, but there may be times when you'll want to defeat MySQL's optimization behavior. Some of the reasons to do this are described in the following list:
To empty a table with minimal side effects. When you need to empty a table completely, it's fastest to have the server just drop the table and re-create it based on the description stored in its .frm file. To do this, use a TRUNCATE TABLE statement:
Prior to MySQL 4, you can achieve the same effect by using a DELETE statement with no WHERE clause:
DELETE FROM tbl_name;
The server's optimization of emptying a table by re-creating it from scratch makes the operation extremely fast because each row need not be deleted individually. However, there are some side effects that may be undesirable under certain circumstances:
. Prior to MySQL 4, DELETE with no WHERE clause may report the number of rows affected as zero, even when the table wasn't empty. TRUNCATE TABLE may do this for any version of MySQL, depending on the table type.
Most of the time this doesn't matter, although it can be puzzling if you don't expect it. But for applications that require an accurate count of the number of deleted rows, a count of zero is not acceptable.
. For MyISAM tables, AUTO_INCREMENT values normally are not reused when rows are deleted. However, emptying a table by re-creating it may reset the sequence to begin over at 1.
If you encounter these side effects and want to avoid them, use an "unoptimized" full-table DELETE statement that includes a trivially true WHERE clause:
Adding the WHERE clause forces MySQL to do a row-by-row deletion, because it must evaluate the condition for each row to determine whether or not to delete it. The query executes much more slowly, but it will return the true number of rows deleted, and it will preserve the current AUTO_INCREMENT sequence number for MyISAM tables.
To override the optimizer's table join order. Use STRAIGHT_JOIN to force the optimizer to use tables in a particular order. If you do this, you should order the tables so that the first table is the one from which the smallest number of rows will be chosen. (If you are not sure which table this is, put the table with the most rows first.)
In other words, try to order the tables to cause the most restrictive selection to come first. Queries perform better the earlier you can narrow the possible candidate rows. Make sure to try the query both ways; there may be some reason the optimizer isn't joining tables the way you think it should, and STRAIGHT_JOIN may not actually help.
Another possibility is to use the USE INDEX and IGNORE INDEX modifiers after a table name in the table list of a join to tell MySQL to use or ignore indexes. This may be helpful in cases where the optimizer doesn't make the correct choice.
To retrieve results in random order. As of MySQL 3.23.2, you can use ORDER BY RAND() to sort results randomly. Another technique, which is useful for older versions of MySQL, is to select a column of random numbers and sort on that column. However, if you try writing the query as follows, the optimizer defeats your intent:
SELECT ..., RAND() as rand_col FROM ... ORDER BY rand_col
The problem here is that MySQL sees that the column is a function call, thinks that the value of the column will be a constant, and optimizes the ORDER BY clause right out of the query! You can fool the optimizer by referring to a table column in the expression. For example, if your table has a column named age, you can write the query as follows:
SELECT ..., age*0+RAND() as rand_col FROM ... ORDER BY rand_col;
In this case, the expression value is always equivalent to RAND(). But the optimizer doesn't know that, so it no longer guesses that the column contains a constant value in each row.
To avoid an endless update loop. Prior to MySQL 3.23.2, if you update a column that is indexed, it's possible for the rows that are updated to be updated endlessly if the column is used in the WHERE clause and the update moves the index value into the part of the range that hasn't been processed yet.
Suppose the mytbl table has an integer column key_col that is indexed. Queries such as the following can cause problems:
UPDATE mytbl SET key_col = key_col+1 WHERE key_col > 0;
The solution for this is to use key_col in an expression term in the WHERE clause such that MySQL can't use the index:
UPDATE mytbl SET key_col = key_col+1 WHERE key_col+0 > 0;
Efficient Data Loading
Most of the time you'll probably be concerned about optimizing SELECT queries because they are the most common type of query and because it's not always straightforward to figure out how to optimize them. By comparison, loading data into your database is straightforward. Nevertheless, there are strategies you can use to improve the efficiency of data-loading operations.
The basic principles are as follows:
. Bulk loading is more efficient than single-row loading because the index cache need not be flushed after each record is loaded; it can be flushed at the end of the batch of records. The more you can reduce index cache flushing, the faster data loading will be.
. Loading is faster when a table has no indexes than when it is indexed. If there are indexes, not only must the record be added to the data file, but each index must also be modified to reflect the addition of the new record.
. Shorter SQL statements are faster than longer statements because they involve less parsing on the part of the server and because they can be sent over the network from the client to the server more quickly.
Some of these factors may seem minor (the last one in particular), but if you're loading a lot of data, even small efficiencies make a difference.
From the preceding general principles, several practical conclusions can be drawn about how to load data most quickly:
. LOAD DATA (all forms) is more efficient than INSERT because it loads rows in bulk. Index flushing takes place less often, and the server needs to parse and interpret one statement, not several.
. LOAD DATA is more efficient than LOAD DATA LOCAL. With LOAD DATA, the file must be located on the server and you must have the FILE privilege, but the server can read the file directly from disk. With LOAD DATA LOCAL, the client reads the file and sends it over the network to the server, which is slower.
. If you must use INSERT, use the form that allows multiple rows to be specified in a single statement:
. INSERT INTO tbl_name VALUES(...),(...),... ; The more rows you can specify in the statement, the better. This reduces the total number of statements you need and minimizes the amount of index flushing.
This may seem to contradict the earlier remark that shorter statements can be processed faster than longer statements. But there is no contradiction. The principle here is that a single INSERT statement that inserts multiple rows is shorter overall than an equivalent set of individual single-row INSERT statements, and the multiple-row statement can be processed on the server with much less index flushing.
If you use mysqldump to generate database backup files, use the --extended-insert option so that the dump file contains multiple-row INSERT statements. You can also use --opt (optimize), which turns on the --extended-insert option automatically, as well as some other options that allow the dump file to be processed more efficiently when it is reloaded.
Conversely, avoid using the --complete-insert option with mysqldump; the resulting INSERT statements will be for single rows and will be longer and require more parsing than will statements generated without --complete-insert.
. If you must use multiple INSERT statements, group them if possible to reduce index flushing. For transactional table types, do this by issuing the INSERT statements within a single transaction rather than in auto-commit mode:
. INSERT INTO tbl_name ... ;
. INSERT INTO tbl_name ... ;
. INSERT INTO tbl_name ... ;
For non-transactional table types, obtain a write lock on the table and issue the INSERT statements while the table is locked:
INSERT INTO tbl_name ... ;
INSERT INTO tbl_name ... ;
INSERT INTO tbl_name ... ;
You obtain the same benefit in both cases. The index is flushed once rather than once per INSERT statement, which is what happens in auto-commit mode or if the table has not been locked.
. Use the compressed client/server protocol to reduce the amount of data going over the network. For most MySQL clients, this can be specified using the --compress command line option. Generally, this should only be used on slow networks because compression uses quite a bit of processor time.
. Let MySQL insert default values for you. That is, don't specify columns in INSERT statements that will be assigned the default value anyway. On average, your statements will be shorter, reducing the number of characters sent over the network to the server. In addition, because the statements contain fewer values, the server does less parsing and value conversion.
. If a table is indexed, you can lessen indexing overhead by using batched inserts (LOAD DATA or multiple-row INSERT statements). These minimize the impact of index updating because the index needs flushing only after all rows have been processed, rather than after each row.
. For MyISAM and ISAM tables, if you need to load a lot of data into a new table to populate it, it's faster to create the table without indexes, load the data, and then create the indexes. It's faster to create the indexes all at once rather than to modify them for each row.
For a table that already has indexes, data loading may be faster if you drop or deactivate the indexes beforehand and then rebuild or reactivate them afterward. These strategies do not apply to InnoDB or BDB tables, which do not have optimizations for separate index creation.
If you're considering using the strategy of dropping or deactivating indexes for loading data into MyISAM or ISAM tables, think about the overall circumstances of your situation in assessing whether any benefit is likely to be obtained. If you're loading a small amount of data into a large table, rebuilding the indexes probably will take longer than just loading the data without any special preparation.
To drop and rebuild indexes, use DROP INDEX and CREATE INDEX or the index-related forms of ALTER TABLE. To deactivate and reactivate indexes, you have two choices:
. You can use the DISABLE KEYS and ENABLE KEYS forms of ALTER TABLE:
These statements turn off and on updating of any non-unique indexes in the table.
. The myisamchk or isamchk utilities can perform index manipulation. These utilities operate directly on the table files, so to use them you must have write access to the table files.
The DISABLE KEYS and ENABLE KEYS statements are the preferred method for index deactivation and activation because the server does the work. However, they are available only as of MySQL 4. (Note that if you're using LOAD DATA to load data into an empty MyISAM table, the server performs this optimization automatically.)
To deactivate a MyISAM table's indexes "manually," make sure you've told the server to leave the table alone and then move into the appropriate database directory and run the following command:
% myisamchk --keys-used=0 tbl_name
After loading the table with data, reactivate the indexes:
% myisamchk --recover --quick --keys-used=n tbl_name
n is interpreted as a bitmask indicating which indexes to enable. Bit 0 corresponds to index 1. For example, if a table has three indexes, the value of n should be 7 (111 binary). You can determine index numbers with the --description option:
% myisamchk --description tbl_name
The commands for ISAM tables are similar except that you use isamchk rather than myisamchk, and the --keys-used value for isamchk indicates the highest-numbered index to use. (For a table with three indexes, n would be 3.)
The preceding data-loading principles also apply to mixed-query environments involving clients performing different kinds of operations. For example, you generally want to avoid long-running SELECT queries on tables that are changed (written to) frequently. This causes a lot of contention and poor performance for the writers.
A possible way around this, if your writes are mostly INSERT operations, is to add new records to an auxiliary table and then add those records to the main table periodically. This is not a viable strategy if you need to be able to access new records immediately, but if you can afford to leave them inaccessible for a short time, use of the auxiliary table will help you two ways.
First, it reduces contention with SELECT queries that are taking place on the main table, so they execute more quickly. Second, it takes less time overall to load a batch of records from the auxiliary table into the main table than it would to load the records individually; the index cache need be flushed only at the end of each batch rather than after each individual row.
One application for this strategy is when you're logging Web page accesses from your Web server into a MySQL database. In this case, it may not be a high priority to make sure the entries get into the main table right away.
For MyISAM tables, another strategy for reducing index flushing is to use the DELAYED_KEY_WRITE table creation option if your data are such that it's not absolutely essential that every single record be inserted in the event of abnormal system shutdown. (This might be the case if you're using MySQL for some sort of logging.)
The option causes the index cache to be flushed only occasionally rather than after each insert. If you want to use delayed index flushing on a server-wide basis, start mysqld with the --delay-key-write option. In this case, index block writes for a table are delayed until blocks must be flushed to make room for other index values, until a flush-tables command has been executed, or until the table is closed.
For a replication slave server, you might want to use --delay-key-write=ALL to delay index flushing for all MyISAM tables, regardless of how they were created originally on the master server.
MySQL also allows you to affect the scheduling priorities of statements, which may allow queries arriving from several clients to cooperate better so that individual clients aren't locked out for a long time. Changing the priorities can also ensure that particular kinds of queries are processed more quickly.
This section looks at MySQL's default scheduling policy and the options that are available to you for influencing this policy. It also discusses the effect that table handler locking levels have on concurrency among clients.
For the purposes of this discussion, a client performing a retrieval (a SELECT) is a reader. A client performing an operation that modifies a table (DELETE, INSERT, REPLACE, or UPDATE) is a writer.
MySQL's basic scheduling policy can be summed up as follows:
. Write requests should be processed in the order in which they arrive.
. Writes have higher priority than reads.
For MyISAM and ISAM tables, the scheduling policy is implemented with the aid of table locks. Whenever a client accesses a table, a lock for it must be acquired first. When the client is finished with a table, the lock on it can be released.
It's possible to acquire and release locks explicitly by issuing LOCK TABLES and UNLOCK TABLES statements, but normally the server's lock manager automatically acquires locks as necessary and releases them when they no longer are needed.
A client performing a write operation must have a lock for exclusive access to the table. The table is in an inconsistent state while the operation is in progress because the data record is being deleted, added, or changed, and any indexes on the table may need to be updated to match. Allowing other clients to access the table while the table is in flux causes problems.
It's clearly a bad thing to allow two clients to write to the table at the same time because that would quickly corrupt the table into an unusable mess. But it's not good to allow a client to read from an in-flux table, either, because the table might be changing right at the spot being read, and the results would be inaccurate.
A client performing a read operation must have a lock to prevent other clients from writing to the table so that the table doesn't change while the table is being read. However, the lock need not provide exclusive access for reading.
The lock can allow other clients to read the table at the same time. Reading doesn't change the table, so there is no reason readers should prevent each other from accessing the table.
MySQL allows you to influence its scheduling policy by means of several query modifiers. One of these is the LOW_PRIORITY keyword for DELETE, INSERT, LOAD DATA, REPLACE, and UPDATE statements. Another is the HIGH_PRIORITY keyword for SELECT statements. The third is the DELAYED keyword for INSERT and REPLACE statements.
The LOW_PRIORITY keyword affects scheduling as follows. Normally, if a write operation for a table arrives while the table is being read, the writer blocks until the reader is done because once a query has begun it will not be interrupted.
If another read request arrives while the writer is waiting, the reader blocks, too, because the default scheduling policy is that writers have higher priority than readers. When the first reader finishes, the writer proceeds, and when the writer finishes, the second reader proceeds.
If the write request is a LOW_PRIORITY request, the write is not considered to have a higher priority than reads. In this case, if a second read request arrives while the writer is waiting, the second reader is allowed to slip in ahead of the writer.
Only when there are no more readers is the writer is allowed to proceed. One implication of this scheduling modification is that theoretically, it's possible for LOW_PRIORITY writes to be blocked forever. As long as additional read requests arrive while previous ones are still in progress, the new requests will be allowed to get in ahead of the LOW_PRIORITY write.
The HIGH_PRIORITY keyword for SELECT queries is similar. It allows a SELECT to slip in ahead of a waiting write, even if the write normally has higher priority.
The DELAYED modifier for INSERT acts as follows:
When an INSERT DELAYED request arrives for a table, the server puts the rows in a queue and returns a status to the client immediately so that the client can proceed even before the rows have been inserted. If readers are reading from the table, the rows in the queue are held. When there are no readers, the server begins inserting the rows in the delayed-row queue.
Every now and then, the server checks whether any new read requests have arrived and are waiting. If so, the delayed-row queue is suspended and the readers are allowed to proceed. When there are no readers left, the server begins inserting delayed rows again. This process continues until the queue is empty.
LOW_PRIORITY and DELAYED are similar in the sense that both allow row insertion to be deferred, but they are quite different in how they affect client operation. LOW_PRIORITY forces the client to wait until the rows can be inserted. DELAYED allows the client to continue and the server buffers the rows until it has time to process them.
INSERT DELAYED is useful if other clients may be running lengthy SELECT statements and you don't want to block waiting for completion of the insertion. The client issuing the INSERT DELAYED can proceed more quickly because the server simply queues the row to be inserted.
However, you should be aware of certain other differences between normal INSERT and INSERT DELAYED behavior. The client gets back an error if the INSERT DELAYED statement contains a syntax error, but other information that would normally be available is not. For example, you can't rely on getting the AUTO_INCREMENT value when the statement returns.
You also won't get a count for the number of duplicates on unique indexes. This happens because the insert operation returns a status before the operation actually has been completed. Another implication is that if rows from INSERT DELAYED statements are queued while waiting to be inserted, and the server crashes or is killed with kill -9, the rows are lost. This is not true for a normal kill -TERM kill; in that case, the server inserts the rows before exiting.
The MyISAM handler does allow an exception to the general principle that readers block writers. This occurs under the condition that a MyISAM table has no holes in it (that is, it has no deleted rows), in which case, any INSERT statements must necessarily add rows at the end of the table rather than in the middle.
Under such circumstances, clients are allowed to add rows to the table even while other clients are reading from it. These are known as concurrent inserts because they can proceed concurrently with retrievals without being blocked.
If you use this feature, note the following:
. Do not use the LOW_PRIORITY modifier with your INSERT statements. It causes INSERT always to block for readers and thus prevents concurrent inserts from being performed.
. Readers that need to lock the table explicitly but still want to allow concurrent inserts should use LOCK TABLES ... READ LOCAL rather than LOCK TABLES ... READ. The LOCAL keyword allows you to acquire a lock that allows concurrent inserts to proceed, because it applies only to existing rows in the table and does not block new rows from being added to the end.
The scheduling modifiers did not appear in MySQL all at once. The following table lists the statements that allow modifiers and the version of MySQL in which each appeared. You can use the table to determine which capabilities your server has.

Statement Type

Version of Initial Appearance

























Locking Levels and Concurrency
The scheduling modifiers just discussed allow you to influence the default scheduling policy. For the most part, they were introduced to deal with issues that arise from the use of table-level locks, which is what the MyISAM and ISAM handlers use to manage table contention.
MySQL now has BDB and InnoDB tables, which implement locking at different levels and thus have differing performance characteristics in terms of contention management. The BDB handler uses page-level locks. The InnoDB handler uses row-level locks, but only as necessary. (In many cases, such as when only reads are done, InnoDB may use no locks at all).
The locking level used by a table handler has a significant effect on concurrency among clients. Suppose two clients each want to update a row in a given table. To perform the update, each client requires a write lock.
For a MyISAM table, the handler will acquire a table lock for the first client, which causes the second client to block until the first one has finished. With a BDB table, greater concurrency can be achieved because the updates can proceed simultaneously, as long as both rows are not located within the same page.
With an InnoDB table, concurrency is even higher; both updates can happen at the same time as long as both clients aren't updating the same row.
The general principle is that table locking at a finer level allows better concurrency, because more clients can be using a table at the same time if they use different parts of it. The practical implication is that different table types will be better suited for different query mixes:
. ISAM and MyISAM are extremely fast for retrievals. However, the use of table-level locks can be a problem in environments with mixed retrievals and updates, especially if the retrievals tend to be long running. Under these conditions, updates may need to wait a long time before they can proceed.
. BDB and InnoDB tables can provide better performance when there are many updates. Because locking is done at the page or row level rather than at the table level, the extent of the table that is locked is smaller. This reduces lock contention and improves concurrency.
Table locking does have an advantage over finer levels of locking in terms of deadlock prevention. With table locks, deadlock never occurs. The server can determine which tables are needed by looking at the query and lock them all ahead of time.
With InnoDB and BDB tables, deadlock can occur because the handlers do not acquire all necessary locks at the beginning of a transaction. Instead, locks are acquired as they are determined to be necessary during the course of processing the transaction. It's possible that two queries will acquire locks and then try to acquire further locks that each depend on already-held locks being released.
As a result, each client holds a lock that the other needs before it can continue. This results in deadlock, and the server must abort one of the transactions. For BDB tables, you may be able to help prevent deadlock by using LOCK TABLES to acquire table locks explicitly because the BDB handler sees such locks. That doesn't work for InnoDB, because the handler is not aware of locks set by LOCK TABLES.
Optimization for Administration
There are certain optimizations that can be performed only by administrators who have control of the MySQL server or the machine on which it runs.
Some server parameters pertain to query processing and can be tuned, and certain hardware configuration issues have a direct effect on query processing speed. In general, the primary principles to keep in mind when performing administrative optimizations are as follows:
. Accessing data in memory is faster than accessing data from disk.
. Keeping data in memory as long as possible reduces the amount of disk activity.
. Retaining information from an index is more important than retaining contents of data records.
Increase the size of the server's caches. The server has many parameters (variables) that you can change to affect its operation, several of which directly affect the speed of query processing. The most important parameters you can change are the sizes of the table cache and the caches used by the table handlers for indexing operations.
If you have memory available, allocating it to the server's cache buffers will allow information to be held in memory longer and reduce disk activity. This is good, because it's much faster to access information from memory than to read it from disk.
. The table cache is used to hold information about open tables. Its size is controlled by the table_cache server variable. If the server accesses lots of tables, this cache fills up and the server must close tables that haven't been used for a while to make room for opening new tables. You can assess how effective the table cache is by checking the Opened_tables status indicator:
SHOW STATUS LIKE 'Opened_tables';
Opened_tables indicates the number of times a table had to be opened because it wasn't already open. (This value is also displayed as the Opens value in the output of the mysqladmin status command.) If the number remains stable or increases slowly, it's probably set to about the right value. If the number grows at a high rate, it means the cache is full a lot and that tables have to be closed to make room to open other tables. If you have file descriptors available, increasing the table cache size will reduce the number of table opening operations.
. The key buffer is used by the MyISAM and ISAM table handlers to hold index blocks for index-related operations. Its size is controlled by the key_buffer_size server variable. Larger values allow MySQL to hold more index blocks in memory at once, which increases the likelihood of finding key values in memory without having to read a new block from disk.
The default size of the key buffer is 8MB. If you have lots of memory, that's a very conservative value and you should be able to increase it substantially and see a considerable improvement in performance for index-based retrievals and for index creation and modification operations.
. The InnoDB and BDB handlers have their own caches for buffering data and index values. The sizes are controlled by the innodb_buffer_pool_size and bdb_cache_size variables. The InnoDB handler also maintains a log buffer, the size of which is controlled by the innodb_log_buffer_size variable.
When you change parameter values, adhere to the following guidelines:
. Change one parameter at a time. Otherwise, you're varying multiple independent variables and it becomes more difficult to assess the effect of each change.
. Increase server variable values incrementally. If you increase a variable by a huge amount on the theory that more is always better, you may run your system out of resources, causing it to thrash or slow to a crawl because you've set the value too high.
. To get an idea of the kinds of parameter variables that are likely to be appropriate for your system, take a look at the my-small.cnf, my-medium.cnf, my-large.cnf, and my-huge.cnf option files included with MySQL distributions.
(You can find them under the support-files directory in source distributions and under the share directory in binary distributions.) These files will give you some idea of which parameters are best to change for servers that receive different levels of use and also some representative values to use for those parameters.
Other strategies you can adopt to help the server operate more efficiently include the following:
Disable table handlers that you don't need. The server won't allocate any memory for disabled handlers, allowing you to devote it elsewhere. Keep grant table permissions simple. Although the server caches grant table contents in memory, if you have any rows in the tables_priv or columns_priv tables, the server must check table- and column-level privileges for every query.
If you build MySQL from source, configure it to use static libraries rather than shared libraries. Dynamic binaries that use shared libraries save on disk space, but static binaries are faster
The Query Cache
As of MySQL 4.0.1, the server can use a query cache to speed up processing of SELECT statements that are executed repeatedly. The resulting performance improvement often is dramatic. The query cache works as follows:
. The first time a given SELECT statement is executed, the server remembers the text of the query and the results that it returns.
. The next time the server sees that query, it doesn't bother to execute it again. Instead, the server pulls the query result directly from the cache and returns it to the client.
. Query caching is based on the literal text of query strings as they are received by the server. Queries are considered the same if the text of the queries is exactly the same. Queries are considered different if they differ in lettercase or come from clients that are using different character sets or communication protocols.
They also are considered different if they are otherwise identical but do not actually refer to the same tables (for example, if they refer to identically named tables in different databases).
. When a table is updated, any cached queries that refer to it become invalid and are discarded. This prevents the server from returning out-of-date results.
Support for the query cache is built in by default. If you don't want to use the cache and want to avoid incurring even the minimal overhead that it involves, you can build the server without it by running the configure script with the --without-query-cache option.
For servers that include query cache support, cache operation is based on the values of three variables:
. query_cache_size determines the size of the query cache. A value of zero disables the cache, which is the default setting. (In other words, the cache is not used unless you turn it on explicitly.) To enable the cache, set query_cache_size value to the desired size of the cache, in bytes. For example, to allocate 16MB, use the following setting in an option file:
. [mysqld]
. set-variable = query_cache_size=16M
. query_cache_limit sets the maximum result set size that will be cached; query results larger than this value are never cached.
. query_cache_type determines the operating mode of the query cache. The possible mode values are as follows:




Don't cache


Cache queries except those that begin with SELECT SQL_NO_CACHE


Cache on demand only those queries that begin with SELECT SQL_CACHE

Individual clients begin with query caching behavior in the state indicated by the server's default caching mode. A client may change how its queries are cached by the server by using the following statement:
val can be 0, 1, or 2, which have the same meaning as for the query_cache_type variable. The symbolic values OFF, ON, and DEMAND are synonyms for 0, 1, and 2.
A client can also control caching of individual queries by adding a modifier following the SELECT keyword. SELECT SQL_CACHE causes the query result to be cached if the cache is operating in demand mode. SELECT SQL_NO_CACHE causes the result not to be cached.
Suppression of caching can be useful for queries that retrieve information from a constantly changing table. In that case, the cache is unlikely to be of much use. Suppose you're logging Web server requests to a table in MySQL, and also that you periodically run a set of summary queries on the table.
For a reasonably busy Web server, new rows will be inserted into the table frequently and thus any query results cached for the table become invalidated quickly. The implication is that although you might issue the summary queries repeatedly, it's unlikely that the query cache will be of any value for them.
Under such circumstances, it makes sense to issue the queries using the SQL_NO_CACHE modifier to tell the server not to bother caching their results.
Hardware Issues
The earlier part of this chapter discusses techniques that help improve your server's performance regardless of your hardware configuration. You can of course get better hardware to make your server run faster. But not all hardware-related changes are equally valuable.
When assessing what kinds of hardware improvements you might make, the most important principles are the same as those that apply to server parameter tuning. Put as much information in fast storage as possible, and keep it there as long as possible.
Several aspects of your hardware configuration can be modified to improve server performance:
. Install more memory into your machine. This enables you to increase the server's cache and buffer sizes, which allows it to keep data in memory longer and with less need to fetch information from disk.
. Reconfigure your system to remove all disk swap devices if you have enough RAM to do all swapping into a memory file system. Otherwise, some systems will continue to swap to disk even if you have sufficient RAM for swapping.
. Add faster disks to improve I/O latency. Seek time is typically the primary determinant of performance here. It's slow to move the heads laterally; when the heads are positioned, reading blocks off the track is fast by comparison.
However, if you have a choice between adding more memory and getting faster disks, add more memory. Memory is always faster than your disks, and adding memory allows you to use larger caches, which reduces disk activity.
. Take advantage of parallelism by redistributing disk activity across physical devices. If you can split reading or writing across multiple physical devices, it will be quicker than reading and writing everything from the same device.
For example, if you store databases on one device and logs on another, writing to both devices at once it will be faster than if databases and logs share the same device. Use of RAID devices can give you some advantages of parallelism as well.
. Use multi-processor hardware. For a multi-threaded application like the MySQL server, multi-processor hardware can execute multiple threads at the same time.



Post a Comment

Circle Me On Google Plus


Follow Us