I got a pretty large DB-Table that I need to split into smaller tables for different reasons.
The handling happens via php close to this example:
// Note: It's an example and not working code - the actual function is much larger
function split_db()
{
$results = "
SELECT *
FROM big_table
";
foreach ( $results as $result )
{
// Here I split the big_tables contents and ...
$some_val = $result->SomeVal;
// ...
$another_val = $result->AnotherVal;
// ... here I insert the contents in the different tables
$sql = "
INSERT
INTO first_small_table
// ...
VALUES
// ...
";
}
}
Problem: The query inserts 255 rows, no matter if I'm in the local environment or on the test server.
Question: Why? What am I doing wrong or am I missing something? And how would I avoid this?
Info about MySQL-Client-Version:
Dev-Server: 5.0.32,
Local Dev-Env.: 5.1.41
I'm no MySQL-Hero, so any help and explanation is appreciated, as Google brought nothing meaningful (to me) up. Thanks!
I bet you have your primary key of unsigned tinyint type, that has limit of 255 for the maximum value.
So change it to just int
ALTER TABLE first_small_table MODIFY id INT;
I can't say why you're limited to 255 rows, but what I can say is that you can do a single query to add your rows from your big table into your small table with a INSERT INTO ... SELECT query :
INSERT INTO first_small_table (col1, col2, col3)
SELECT col1, col2, col3
FROM big_table;
If you don't need to use PHP, then by all mean don't use it. It's faster to only use SQL.
Related
Getting the Value:
I've got the levenshtein_ratio function, from here, queued up in my MySQL database. I run it in the following way:
$stmt = $db->prepare("SELECT r_id, val FROM table WHERE levenshtein_ratio(:input, someval) > 70");
$stmt->execute(array('input' => $input));
$result = $stmt->fetchAll();
if(count($result)) {
foreach($result as $row) {
$out .= $row['r_id'] . ', ' . $row['val'];
}
}
And it works a treat, exactly as expected. But I was wondering, is there a nice way to also get the value that levenshtein_ratio() calculates?
I've tried:
$stmt = $db->prepare("SELECT levenshtein_ratio(:input, someval), r_id, val FROM table WHERE levenshtein_ratio(:input, someval) > 70");
$stmt->execute(array('input' => $input));
$result = $stmt->fetchAll();
if(count($result)) {
foreach($result as $row) {
$out .= $row['r_id'] . ', ' . $row['val'] . ', ' . $row[0];
}
}
and it does technically work (I get the percentage from the $row[0]), but the query is a bit ugly, and I can't use a proper key to get the value, like I can for the other two items.
Is there a way to somehow get a nice reference for it?
I tried:
$stmt = $db->prepare("SELECT r_id, val SET output=levenshtein_ratio(:input, someval) FROM table WHERE levenshtein_ratio(:input, someval) > 70");
modelling it after something I found online, but it didn't work, and ends up ruining the whole query.
Speeding It Up:
I'm running this query for an array of values:
foreach($parent as $input){
$stmt = ...
$stmt->execute...
$result = $stmt->fetchAll();
... etc
}
But it ends up being remarkably slow. Like 20s slow, for an array of only 14 inputs and a DB with about 350 rows, which is expected to be in the 10,000's soon. I know that putting queries inside loops is naughty business, but I'm not sure how else to get around it.
EDIT 1
When I use
$stmt = $db->prepare("SELECT r_id, val SET output=levenshtein_ratio(:input, someval) FROM table WHERE levenshtein_ratio(:input, someval) > 70");
surely that's costing twice the time as if I only calculated it once? Similar to having $i < sizeof($arr); in a for loop?
To clean up the column names you can use "as" to rename the column of the function. At the same time you can speed things up by using that column name in your where clause so the function is only executed once.
$stmt = $db->prepare("SELECT r_id, levenshtein_ratio(:input, someval) AS val FROM table HAVING val > 70");
If it is still too slow you might consider a c library like https://github.com/juanmirocks/Levenshtein-MySQL-UDF
doh - forgot to switch "where" to "having", as spencer7593 noted.
I'm assuming that `someval` is an unqalified reference to a column in the table. While you may understand that without looking at the table definition, someone else reading the SQL statement can't tell. As an aid to future readers, consider qualifying your column references with the name of the table or (preferably) a short alias assigned to the table in the statement.
SELECT t.r_id
, t.val
FROM `table` t
WHERE levenshtein_ratio(:input, t.someval) > 70
That function in the WHERE clause has to be evaluated for every row in the table. There's no way to get MySQL to build an index on that. So there's no way to get MySQL to perform an index range scan operation.
It might be possible to get MySQL to use an index for the query, for example, if the query had an ORDER BY t.val clause, or if there is a "covering index" available.
But that doesn't get around the issue of needing to evaluate the function for every row. (If the query had other predicates that excluded rows, then the function wouldn't necessarily need be evaluated for the excluded rows.)
Adding the expression to the SELECT list really shouldn't be too expensive if the function is declared to be DETERMINISTIC. A second call to a DETERMINISTIC function with the same arguments can reuse the value returned for the previous execution. (Declaring a function DETERMINISTIC essentially means that the function is guaranteed to return the same result when given the same argument values. Repeated calls will return the same value. That is, the return value depends only the argument values, and doesn't depend on anything else.
SELECT t.r_id
, t.val
, levenshtein_ratio(:input, t.someval) AS lev_ratio
FROM `table` t
WHERE levenshtein_ratio(:input2, t.someval) > 70
(Note: I used a distinct bind placeholder name for the second reference because PDO doesn't handle "duplicate" bind placeholder names as we'd expect. (It's possible that this has been corrected in more recent versions of PDO. The first "fix" for the issue was an update to the documentation noting that bind placeholder names should appear only once in statement, if you needed two references to the same value, use two different placeholder names and bind the same value to both.)
If you don't want to repeat the expression, you could move the condition from the WHERE clause to the HAVING, and refer to the expression in the SELECT list by the alias assigned to the column.
SELECT t.r_id
, t.val
, levenshtein_ratio(:input, t.someval) AS lev_ratio
FROM `table` t
HAVING lev_ratio > 70
The big difference between WHERE and HAVING is that the predicates in the WHERE clause are evaluated when the rows are accessed. The HAVING clause is evaluated much later, after the rows have been accessed. (That's a brief explanation of why the HAVING clause can reference columns in the SELECT list by their alias, but the WHERE clause can't do that.)
If that's a large table, and a large number of rows are being excluded, there might be a significant performance difference using the HAVING clause.. there may be a much larger intermediate set created.
To get an "index used" for the query, a covering index is the only option I see.
ON `table` (r_id, val, someval)
With that, MySQL can satisfy the query from the index, without needing to lookup pages in the underlying table. All of the column values the query needs are available from the index.
FOLLOWUP
To get an index created, we would need to create a column, e.g.
lev_ratio_foo FLOAT
and pre-populate with the result from the function
UPDATE `table` t
SET t.lev_ratio_foo = levenshtein_ratio('foo', t.someval)
;
Then we could create an index, e.g.
... ON `table` (lev_ratio_foo, val, r_id)
And re-write the query
SELECT t.r_id
, t.val
, t.lev_ratio_foo
FROM `table` t
WHERE t.lev_ratio_foo > 70
With that query, MySQL can make use of an index range scan operation on an index with lev_ratio_foo as the leading column.
Likely, we would want to add BEFORE INSERT and BEFORE UPDATE triggers to maintain the value, when a new row is added to the table, or the value of the someval column is modified.
That pattern could be extended, additional columns could be added for values other than 'foo'. e.g. 'bar'
UPDATE `table` t
SET t.lev_ratio_bar = levenshtein_ratio('bar', t.someval)
Obviously that approach isn't going to be scalable for a broad range of input values.
I am using a Postgres 9.4 database and have PHP as my front end.
A general query I may run would look like this:
PHP :
$query = "select * from some_table";
pg_prepare($connection,"some_query",$query);
$result = pg_execute($connection,"some_query",array());
while ($row = pg_fetch_array($result,null,PGSQL_ASSOC)) {
echo $row['some_field'];
echo $row['some_field_1'];
echo $row['some_field_2'];
}
I am running into a front-end that requires to know the datatype of the column that spits out - specifically I need to know when the echo'd database field is a timestamp column.
Obviously I can tell integers and string, however timestamp is a bit of a different thing.
I suppose I could see if strtotime() returns false, however that seems a little dirty to me.
So my question is:
Is there a PHP built-in function that can return a multi-dimensional array of the database row with not only $key=>$value pair but also the datatype?
Any help on this would be appreciated - thank you!
You can query from information_schema.columns and fetch just like any other query:
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name='some_table'
Or after your query use pg_field_type():
$type = pg_field_type($result, 0);
But you need to know the position of the column in the result so you should (best practice anyway) list the columns. For the above case using 0 would give the type of col1 in the query below::
SELECT col1, col2, col3 FROM some_table
Using the PHP SQLSRV driver to connect to SQL Server 2000, is there a way I could match all of these rows using this piece of data: 5553442524?
555-344-2524
(555) 344-2524
555.344.2524
1-555-344-2524
I imagine this would be done through a specific query probably using a stored procedure?
Thank you.
For SQL 2000 the only way I can think of would be using the REPLACE function.
declare #SearchTerm bigint
Set #SearchTerm = 5553442524
Select * From dbo.Table
Where Replace(Replace(Replace(Replace(Col1,'-',''), '(',''),')',''),'.','')
= #SearchTerm
The problem with this would be it wouldn't cater for the leading 1.
A better way would be wrap all this logic in to a function.
e.g.
Create Function dbo.fn_FormatTelephoneNumber(#input varchar(100))
returns bigint
as begin
declare #temp bigint
Set #temp = Replace(Replace(Replace(Replace(#input ,'-',''), '(',''),')',''),'.','')
If Len(#temp) = 11
begin
Set #temp = Right(#temp, 10)
end
return #temp
End
To call the function you would use it like so:
Select *,
dbo.fn_FormatTelephoneNumber(YourColumnName) as [FormattedTelephoneNumber]
From dbo.YourTable
Or to use it in a WHERE clause:
Select *
From dbo.YourTable
Where dbo.fn_FormatTelephoneNumber(YourColumnName) = 5553442524
Obviously the best thing here would be to clean up the data that is stored in the columns and restrict any further "bad" data from being inserted. Although in my experience that is easier said than done.
Problem
I have a table tbl_student_courses which join 2 tables student and courses now when data is inserted it is the combination of 2 ids course_id and student_id. I just want there would be no duplication of this combination in tbl_student_courses.
Code
foreach($_POST['sel_course'] as $val) {
$query_std_course = "
INSERT INTO
`tbl_student_courses`
SET
`course_id` = '".$val."',
`std_id` = '".$_POST['std']."',
WHERE NOT EXISTS (
SELECT * FROM `tbl_student_courses` WHERE course_id=$val AND std_id=$std
)";
}
Help
This query giving SQL syntax error.
Can any body help me?
Thanks in advance.
Probably you are missing quotes one inner query values.
You SQL query should look like this
$sql = "
INSERT INTO
`tbl_student_courses`
SET
`course_id` = '".$val."',
`std_id` = '".$_POST['std']."',
WHERE NOT EXISTS (
SELECT * FROM `tbl_student_courses` WHERE course_id='".$val."' AND std_id='".$std."'
)";
NOTE: Inserting in database not prepared statements like std_id = '".$_POST['std']."' is not of a good manner. Consider using PDO or filter data yourself, bec. this can be easily used for SQL Iinjection therefore it is potential security breach.
UPDATE: Try to use ON DUPLICATE KEY UPDATE or INSERT IGNORE INTO table.
You can find more information regarding your implementation - http://bogdan.org.ua/2007/10/18/mysql-insert-if-not-exists-syntax.html
And read about proposed implementation - http://dev.mysql.com/doc/refman/5.1/en/insert-on-duplicate.html
The SQL syntax you seek is the MERGE statement, or its equivalent on your platform
http://en.wikipedia.org/wiki/Merge_(SQL)
I often have large arrays, or large amounts of dynamic data in PHP that I need to run MySQL queries to handle.
Is there a better way to run many processes like INSERT or UPDATE without looping through the information to be INSERT-ed or UPDATE-ed?
Example (I didn't use prepared statement for brevity sake):
$myArray = array('apple','orange','grape');
foreach($myArray as $arrayFruit) {
$query = "INSERT INTO `Fruits` (`FruitName`) VALUES ('" . $arrayFruit . "')";
mysql_query($query, $connection);
}
OPTION 1
You can actually run multiple queries at once.
$queries = '';
foreach(){
$queries .= "INSERT....;"; //notice the semi colon
}
mysql_query($queries, $connection);
This would save on your processing.
OPTION 2
If your insert is that simple for the same table, you can do multiple inserts in ONE query
$fruits = "('".implode("'), ('", $fruitsArray)."')";
mysql_query("INSERT INTO Fruits (Fruit) VALUES $fruits", $connection);
The query ends up looking something like this:
$query = "INSERT INTO Fruits (Fruit)
VALUES
('Apple'),
('Pear'),
('Banana')";
This is probably the way you want to go.
If you have the mysqli class, you can iterate over the values to insert using a prepared statement.
$sth = $dbh->prepare("INSERT INTO Fruits (Fruit) VALUES (?)");
foreach($fruits as $fruit)
{
$sth->reset(); // make sure we are fresh from the previous iteration
$sth->bind_param('s', $fruit); // bind one or more variables to the query
$sth->execute(); // execute the query
}
one thing to note about your original solution over the implosion method of jerebear (which I have used before, and love) is that it is easier to read. The implosion takes more programmer brain cycles to understand, which can be more expensive than processor cycles. premature optimisation, blah, blah, blah... :)
One thing to note about jerebear's answer with multiple VALUE-blocks in one INSERT:
It can be rather dangerous for really large amounts of data, because most DBMS have an upper limit on the size of the commands they can handle. If you exceed that with too many VALUE-blocks, your insert will fail. On MySQL for example the limit is usually 1MB AFAIK.
So you should figure out what the maximum size is (ideally at runtime, might be available from the database metadata), and make sure you don't exceed it by spreading your lists of values over several INSERTs.
I was inspired by jerebear's answer to build something like his second option for one of my current projects. Because of the shear volume of records I couldn't save and do all the data at once. So I built this to do imports. You add your data, and then call a method when each record is done. After a certain, configurable, number of records the data in memory will be saved with a mass insert like jerebear's second option.
// CREATE TABLE example ( Id INT, Field1 INT, Field2 INT, Field3 INT);
$import=new DataImport($dbh, 'example', 'Id, Field1, Field2, Field3');
foreach ($whatever as $row) {
// add data in the order of your column definition
$import->addValue($Id);
$import->addValue($Field1);
$import->addValue($Field2);
$import->addValue($Field3);
$import->nextRow();
}
$import->lastRow();