Optimize Select Query from a table with Millions Rows - php

I am developing a website with Wordpress self-hosted CMS.
In one of the page, i ran a function that do a query into wordpress database, to check wether a post is already posted or not, i am comparing the title to check it.
Here is my query:
$wpdb->get_row("SELECT id FROM wp_posts WHERE post_title = '" . $title . "'", 'ARRAY_A');
So i am checking whether $title is posted or not, but i am afraid if the number of post grows, let says 1 Million Posts, i am afraid that it will be very slow..
Any suggestion on how to make this query faster? i heard about CREATE INDEX and mysql caching but i don't understand how to implement it.. any explanations and references suggestion will be highly appreciated.

Try this:
CREATE INDEX IX_wp_posts_post_title ON wp_posts (post_title)
The creation of the index will take a long time but afterward your queries should be close to instant.

create indexes on your tables based on most common columns that are used in querying data, such as here where you are looking for the post_title.
Additionally, from you building the SQL-Select statement on the fly like you are, you are wide-open for SQL-Injection attacks and should escape out the string and preferrably do with parameterized query calls.

Not quite sure what you are trying to achieve here. It doesn't seem like the end of the world to have two posts with the same title.
More concerning though is your code is totally sql-injectable. Read up on that, and use parameterised queries.
Creating an index is easy.
create index myindex on mytable ( columnname );
This will help selects... but if you are really having millions of rows, you might be better to get some proper database advice - you may need to partition your data.

If you are checking to see if a particular post is already in your database, you should be using the post's id to test instead of its title. One because it is a garanteed unique identifier (assuming it is the primary key), and two because the query will be able to search for it far far faster.

Related

Search tags in mysql table with PHP

I have a table with some submissions, this table has a tags field, and I need to search in it.
The data is saved in JSON format in the table, like this: ["basic","example","html","chart"]
I'm trying to find a way to search all rows in the tags fields, but not sure how it can be done the best way when it is in this format.
The user submits an tag to search, like: html, then I need to search all rows for that tag, without to much overhead.
I know most people use to say: what have you tried yourself?
- well, nothing. As I have no clue how to do this, I know how to search in sql and all that. but never tried it in this logic.
There is no "best way" to search in this format. There is no way at all.
No wonder you have no clue how to do that. I'll tell you more - no one knows it either. Tags should never be stored in json format. It is like as if you built a car, placing wheels on the roof. And then come asking, how to drive it.
You have to learn database basics first. And then create your tables proper way. making a separate table for tags. Storing each on a separate row. After that you will be able to search a tag usual way, using JOIN query to attach the corresponding records to the result.
$sql = "SELECT a.* FROM articles a, tags t WHERE aid=a.id AND tag=?";
$stmt = $pdo->prepare($sql);
$stmt->execute(array($tag));
$data = $stmt->fetchAll();
You should create another table tag with fields name, post_id.
I believe that is the best solution to do a search feature.
If you do not have permission to create database table. It depends on how many posts you have. a few? hundreds or even more? If there is not a huge rows of your post table. You can fetch all of them and decode to PHP Array and then use string comparison.
Or maybe, you can give up the database way, just handling with a cache file. We're only need to write cache if user create/modify a post.
But you also can use the unreliable way, using like operator in mysql.
You should take a look at the MySQL fulltext index.
Take a look in the manual and this Zend Developer article
But you shouldn't use fulltext searching for many columns.
In one of my projects I worked around it by concatenating to be searched columns in a TEXT column and apply to the fulltext index on it.
It's simple you can try using like query
SELECT * FROM `post` WHERE `tags` LIKE '%html%';
In PHP Variable:
$tag = "html";
$query = mysql_query("SELECT * FROM `post` WHERE `tags` LIKE '%'.$tag.'%'");

preferred way to make a unique column id in sql database

Hi all I just wanted to know if there is a preferred way of creating a unique id within a sql table so far I have tried Auto-increment of the number and this started to cause problems in future planning, so I have thought about using the php Rand() and then turning this into a string and then insert this into a database but there is a higher chance off two numbers being the same with the amount of data that will be within the database.
I was just wondering if there any suggestions of a preferred way to create a unique column_id
im open to all suggestions there is just a couple things the id needs to be easy enough to be used within other tables within inner joins and left outer joins and also used a file name as well for a download section and a upload section.
use GUID
string com_create_guid ( void )
<?php
echo uniqid();
?>
Description

Loop through MySQL database until field = 'specified value'

I need some help please! Basically I have a system that has an unlimited amount of categories and the way in which it works is through unique IDs. So basically the system will find the root folder and match all subfolders based on its parent's UID. An endless loop...
But now I want to do the opposite of that in a single MySQL statement (if possible).
Basically I want it to do this.. (By the way this isn't my actual code, it's just how I want it to work)
SELECT UID FROM Table
WHERE UID = 'value'
--AND ALSO:
SELECT * FROM SameTable
WHERE UID = The Parent UID just fetched...
And do this until the UID = 'Specified Value'.
I seriously hope that makes sense!
Is it even possible? I could do it using multiple queries in a PHP loop I know, but that just feels like a long way around, and bad practice.
What you have is called "Hierarchical data". You have to read on it on google. In short, there are three main ways to represent it in a 2-dimensional table:
Adjacency list (what you have). You scarcely can make it with single query
Materialized path (my favorite). Natural and readable. Not so efficient though.
Nested set (Most complicated) yet most powerful.
You can choose any system you like ir stick to your current one. Single query is not Holy grail to pursue at any cost.

MySQL headache, should I or should I not?

I have a classifieds website.
I am using SOLR for indexing and storing data. Then I also have a MySQL db with some more information about the classified which I dont store or index.
Now, I have a pretty normalized db with 4 tables.
Whenever ads are searched on the website, SOLR does the searching and returns an array of ID_numbers which will then be used to query mysql.
So solr returns id:s, which are then used to get all ads from the mysql db with THOSE id:s.
Now, all the JOIN and relations between my tables gives me a headache.
What except for maintanance-ease do I get for having a normalized db?
I could you know, store all info into one table with some 50 columns.
So instead of this for finding one ad and displaying it:
SELECT
category_option.option_name,
option_values.value
FROM classified, category_option, option_values
WHERE classified.classified_id=?id
AND classified.cat_id=category_options.cat_id
AND option_values.option_id=category_options.option_id
I could use this:
SELECT * FROM table_name WHERE classified_id = $classified_id
Isn't the last one actually faster?
Or does a normalized db permform faster?
Thanks
I would advise against denormalizing in your situation. You'll get better with joins as you use them more and they start to become clearer in your head, and maintenance ease is a good benefit for the future.
Here's a pretty good link about normalization (and denormalization). Here's a question about denormalization. One answer suggests creating a view using joins to get the data you need, and using that like your SELECT * FROM table_name WHERE classified_id = $classified_id query. A normalized DB will likely be slower, but it's unlikely you'll want to denormalize for that reason. I hope this provides some help.
Whenever you do denormalization you usually gain reading speed and lose write speed, because you have to write the same value many times. Additionally, extra care should be taken to maintain data integrity.
How many times the query will be executed?
Is this a high traffic application?
Can you add a cache?
The query using a JOIN is trivial as far as MySQL joins are concerned. I see no need to denormalize this.
I would however suggest rewriting it to not be such a PITA to read:
SELECT
category_option.option_name,
option_values.value
FROM classified
JOIN category_option USING (cat_id)
JOIN option_values USING (option_id)
WHERE classified.classified_id = ?

PHP join help with two tables

I am just learning php as I go along, and I'm completely lost here. I've never really used join before, and I think I need to here, but I don't know. I'm not expecting anyone to do it for me but if you could just point me in the right direction it would be amazing, I've tried reading up on joins but there are like 20 different methods and I'm just lost.
Basically, I hand coded a forum, and it works fine but is not efficient.
I have board_posts (for posts) and board_forums (for forums, the categories as well as the sections).
The part I'm redoing is how I get the information for the last post for the index page. The way I set it up is that to avoid using joins, I have it store the info for latest post in the table for board_forums, so say there is a section called "Off Topic" there I would have a field for "forum_lastpost_username/userid/posttitle/posttime" which I woudl update when a user posts etc. But this is bad, I'm trying to grab it all dynamically and get rid of those fields.
Right now my query is just like:
`SELECT * FROM board_forums WHERE forum_parent='$forum_id''
And then I have the stuff where I grab the info for that forum (name, description, etc) and all the data for the last post is there:
$last_thread_title = $forumrow["forum_lastpost_title"];
$last_thread_time = $forumrow["forum_lastpost_time"];
$lastpost_username = $forumrow["forum_lastpost_username"];
$lastpost_threadid = $forumrow["forum_lastpost_threadid"];
But I need to get rid of that, and get it from board_posts. The way it's set up in board_posts is that if it's a thread, post_parentpost is NULL, if it's a reply, then that field has the id of the thread (first post of the topic). So, I need to grab the latest post_date, see which user posted that, THEN see if parentpost is NULL (if it's null then the last post is a new thread, so I can get all the info of the title and user there, but if it's not, then I need to get the info (title, id) of the first post in that thread (which can be found by seeing what post_parentpost is, looking up that ID and getting the title from it.
Does that make any sense? If so please help me out :(
Any help is greatly appreciated!!!!
Updating board___forums whenever a post or a reply is inserted is - regarding performance - not the worst idea. For displaying the index page you only have to select data from one table board_forums - this is definitely much faster than selecting a second table to get the "last posts' information", even when using a clever join.
You are better off just updating the stats on each action, New Post, Delete Post etc.
The other instances would not likely require any stats update (deletion of a thread would trigger a forum update, to show one less topic in the topic count).
Think about all the actions the user would do, in most cases, you dont need to update any stats, therefore, getting the counts on the fly is very inefficient and you are right to think so.
It looks like you've already done the right thing.
If you were to join, you'd do it like this:
SELECT * FROM board_forums
JOIN board_posts ON board_posts.forum_id = board_forums.id
WHERE forum_parent = '$forum_id'
The problem with that, is that it gets you every post, which is not useful (and very slow). What you would want to do is something like this
SELECT * FROM board_forums
JOIN board_posts ON board_posts.forum_id = board_forums.id ORDER BY board_posts.id desc LIMIT 1
WHERE forum_parent = '$forum_id'
except SQL doesn't work like that. You can't order or limit on a join (or do many other useful things like that), so you have to fetch every row and then scan them in code (which sucks).
In short, don't worry. Use joins for the actual case where you do want to load all forums and all posts in one hit.
The simple solution will result in numerous queries, some optional, as you're already discovered.
The classic approach to this is to cache the results, and only retrieve it once in a while. The cache doesn't have to live long; even two or three seconds on a busy site will make a significant difference.
De-normalizing the data into a table you're already reading anyway will help. This approach saves you figuring out optional queries and can be a bit of a cheap win because it's just one more update when an insert is already happening. But it shifts some data integrity to the application.
As an aside, you might be running into the recursive-query problem with your threads. Relational databases do not store heirarchical data all that well if you use a "simple" algorithim. A better way is something sometimes called 'set trees'. It's a bit hard to Google, unfortunately, so here are some links.

Categories