Php and mysql. Data management for classifieds ads website - php

Visitor opens url, for example
/transport/cars/audi/a6
or
/real-estate/flats/some-city/city-district
I plan separate table for cars and real-estate (separate table for each top level category).
Based on url (php explode create array)
$array[0] - based on the value know which table to SELECT
And so on $array[1], $array[2] ...
For example, RealEstate table may look like:
IdOfAd | RealEstateType | Location1 | Location2 | TextOfAd | and so on
----------------------------------------------------------------------
1 | flat | City1 | CityDistric1 | text.. |
2 | land | City2 | CityDistric2 | text.. |
And mysql query to display ads would be like:
SELECT `TextOfAd`, `and so on...`
WHERE RealEstateType = ? AND Location1 =? AND Location2 = ?
// and possibly additional AND .. AND
LIMIT $start, $limit
Thinking about performance. Hopefully after some long time number of active ads would be high (also i plan not to delete expired ads, just change column value to 0 not to display for SELECT; but display if directly visit from search engine).
What i need to do (change database design or SELECT in some another way), if for example number of rows in table would be 100 000 or millions?
Thinking about moving expired ads to another table (for which performance is not important). For example, from search engine user goes to some url with expired ad. At first select main table, if do not find, then select in table for expired ads. Is this some kind of solution?

Two hints:
Use ENGINE=InnoDB when creating your table. InnoDB uses row-level locking, which is MUCH better for bigger tables, as this allows rows to be read much faster, even when you're updating some of them.
ADD INDEX on suitable columns. Indexing big tables can reduce search times by several orders of magnitude. They're easy to forget and a pain to debug! More than once I've been investigating a slow query, realised I forgot a suitable INDEX, added it, and had immediate results on a query that used to take 15 seconds to run.

Related

Three simultaneous SQL queries in one table

I have the following table structure:
----------------------------------
ID | Acces | Group | User | Gate |
----------------------------------
1 | 1 | TR | tsv | TL-23|
----------------------------------
And I have a page with 3 functions:
Select group to see all gates where selected group has access.
Select gate to see all groups which have access to selected gate.
Select group to see all users that belong to selected group.
So basically:
SELECT Gate WHERE Group = TR
SELECT Group WHERE Gate = TL-23
SELECT User WHERE Group = TR
What I am trying to achieve is: The user should be able to run the all three queries in any order without the results of the former queries dissappearing.
Now, I know multi-threading is no longer possible in PHP, but there must be a way to temporarily save the results of a specific query until the same query is made again.
Any help and suggestions would be appreciated.
Firstly, PHP has never done MT out of the box, but can (still) with the use of pcntl extensions.
That said, that isn't the sauce you seek.
If you simply want the user on the front-end to interact 3 separate times, without having to hit the DB once the first time, twice the second (to redo both the first and the new query) you may benifit from caching the results of each call in the user's session.
If you actually want to make the 3 queries in a relational way at one exact time, try JOINs.
If you simply want to make all 3 separate queries at (what is essentially) the same time, look into TRANSACTIONs.
Hope that helps.

Restructured database, using SQL in phpmyadmin to move data around

I've recently been working on normalizing and restructuring my database to make it more effective in the long run. Currently I have around 500 records, and obviously I don't want to lose the users data.
I assume SQL through phpmyadmin is the easiest way to do this?
So let me give you guys an example
In my old table I would have something like this
records //this table has misc fields, but they are unimportant right now
id | unit |
1 | g |
With my new one, I have it split apart 3 different tables.
records
id
1
units
id | unit
1 | g
record_units
id | record_id | unit_id
1 | 1 | 1
Just to be clear, I am not adding anything into the units table. The table is there as a reference for which id to store in the record_units table
As you can see it's pretty simple. What moved in the second table is that I started using an index table to hold my units, since they would be repeated quite often. I then store that unit id, and the pairing record id in the record_units table, so I can later retrieve the fields.
I am not incredibly experienced with SQL, though i'd say I know average. I know this operation would be quite simple to do with my cakephp setup, because all my associations are already setup, but I can't do that.
If I understand correctly you want to copy related records from your old table to the new tables, in which case you can use something like this
UPDATE units u
INNER JOIN records r
ON u.id=r.id
SET u.unit = r.unit
This will copy the unit type from your old table to the matching id in the new units table and then you can do something similare on your 3rd table

mysql table rows limit?

I'm new to mysql & I have coded my first php-mysql application.
I have 30 devices list from which I want user to add their preferred devices into his account.
So, I created mysql table "A" where some device-id's are being stored under particular userID like this
UserID | Device Id
1 | 33
1 | 21
1 | 52
2 | 12
2 | 45
3 | 22
3 | 08
1 | 5
more.....
Say, I have 5000 user-ids & 30 devices-ids.
& If each user has 15 device-id's (on-average) under his account records.
Then it will be 5000 X 15 = 75000 records under Table "A"
So, My question, Is there any limit on how many records can we store in to mysql table ?
Is my approach for storing records as mentioned above is correct ?? whether it will affect query performance if more users are added ?
OR there is any better way to do that ?
It's very unlikely that you will approach the limitations of a MySQL table with two columns that are only integers.
If you're really concerned about query performance, you can just go ahead and throw an index on both columns. It's likely that the cost of inserting / updating your table will be negligible even with an index on the device ID. If your database gets huge, it can speed up queries such as "which users prefer this device". Your queries that ask "what devices do this user prefer" will also be fast with an index on user.
I would just say to make this table a simple two column table with a two-part composite key (indexed). This way, it will be as atomic as possible and won't require any of the tom-foolery that some may suggest to "increase performance."
Keep it atomic and normal -- your performance will be fine and you won't exceed any limitations of your DBMS
I don't see anything wrong in it. If you are looking forward to save some server space, then don't worry about it. Let your database do the underlying job. Index your database properly with an ID - int(10) primary auto increment . Think about scalability when it is needed. Your first target should be to complete the application that you are making. Then test it. If you find that it is causing any lag, problem, then start worrying about the things to solve the problem. Don't bother yourself with things that probably you might not even face.
But considering the scale of your application (75k to 1 lac records), it shouldn't be much of a task. Alternatively you can have a schema like this for your users
(device_table)
device_id
23
45
56
user_id | device_id
1 | 23,45,67,45,23
2 | 45,67,23,45
That is storing device_ids in an array and then getting the device_id for particular user as
$device_for_user=explode(',',$device_id)
Where of course device_id is retrieved from mysql database.
so you'll have
$device_for_user[0]=23
$device_for_user[1]=45
amd so on.
But this method isn't a very good design or an approach. But just for your information, this is one way of doing it

High-performance multi-tier tag filtering

I have a large database of artists, albums, and tracks. Each of these items may have one or more tags assigned via glue tables (track_attributes, album_attributes, artist_attributes). There are several thousand (or even hundred thousand) tags applicable to each item type.
I am trying to accomplish two tasks, and I'm having a very hard time getting the queries to perform acceptably.
Task 1) Get all tracks that have any given tags (if provided) by artists that have any given tags (if provided) on albums with any given tags (if provided). Any set of tags may not be present (i.e. only a track tag is active, no artist or album tags)
Variation: The results are also presentable by artist or by album rather than by track
Task 2) Get a list of tags that are applied to the results from the previous filter, along with a count of how many tracks have each given tag.
What I am after is some general guidance in approach. I have tried temp tables, inner joins, IN(), all my efforts thus far result in slow responses. A good example of the results I am after can be seen here: http://www.yachtworld.com/core/listing/advancedSearch.jsp, except they only have one tier of tags, I am dealing with three.
Table structures:
Table: attribute_tag_groups
Column | Type |
------------+-----------------------------+
id | integer |
name | character varying(255) |
type | enum (track, album, artist) |
Table: attribute_tags
Column | Type |
--------------------------------+-----------------------------+
id | integer |
attribute_tag_group_id | integer |
name | character varying(255) |
Table: track_attribute_tags
Column | Type |
------------+-----------------------------+
track_id | integer |
tag_id | integer |
Table: artist_attribute_tags
Column | Type |
------------+-----------------------------+
artist_id | integer |
tag_id | integer |
Table: album_attribute_tags
Column | Type |
------------+-----------------------------+
album_id | integer |
tag_id | integer |
Table: artists
Column | Type |
------------+-----------------------------+
id | integer |
name | varchar(350) |
Table: albums
Column | Type |
------------+-----------------------------+
id | integer |
artist_id | integer |
name | varchar(300) |
Table: tracks
Column | Type |
-------------+-----------------------------+
id | integer |
artist_id | integer |
album_id | integer |
compilation | boolean |
name | varchar(300) |
EDIT I am using PHP, and I am not opposed to doing any sorting or other hijinx in script, my #1 concern is speed of return.
If you want speed, I would suggest you look into Solr/Lucene. You can store your data, and have very speedy lookups by calling Solr and parsing the result from PHP. And as an added benefit you get faceted searches as well (which is task 2 of your question if I interpret it correctly). The downside is of course that you might have redundant information (once stored in DB, once in the Solr document store). And it does take a while to setup (well, you could learn a lot from Drupal Solr integration).
Just check out the PHP reference docs for Solr.
Here's on article on how to use Solr with PHP, just in case : http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/.
You probably should try to denormalize your data. Your structure is optimised for insert/update load, but not for queries. As I got it, your will have much more select queries than insert/update queries.
For example you can do something like this:
store your data in normalized structure.
create agregate table like this
track_id, artist_tags, album_tags, track_tags
1 , jazz/pop/, jazz/rock, /heavy-metal/
or
track_id, artist_tags, album_tags, track_tags
1 , 1/2/, 1/3, 4/
to spead up search you probably should create FULLTEXT index on *_tags columns
query this table with sql like
select * from aggregate where album_tags MATCH (track_tags) AGAINST ('rock')
rebuild this table incrementally once a day.
I think the answer greately depends on how much money you wish to spend on your project - there are some tasks that are even theoretically impossible to accomplish given strict conditions(for example that you must use only one weak server). I will assume that you are ready to upgrade your system.
First of all - your table structure forces JOIN's - I think you should avoid them if possible when writing high performace applications. I don't know "attribute_tag_groups" is, so I propose a table structure: tag(varchar 255), id(int), id_type(enum (track, album, artist)). Id can be artist_id,track_id or album_id depending on id_type. This way you will be able too lokup all your data in one table, but of cource it will use much more memory.
Next - you should consider using several databases. It will help even more if each database contains only part of your data(each lookup will be faster). Deciding how to spread your data between databases is usually rather hard task: I suggest you make some statistics about tag length, find ranges of length that will get similar trac/artists results count and hard-code it into your lookup code.
Of cource you should consider MySql tuning(I am sure you did that, but just in case) - all your tables should reside in RAM - if that is impossible try to get SSD discs, raids etc.. Proper indexing and database types/settings are really important too (MySql may even show some bottlenecks in internal statistics).
This suggestion may sound mad - but sometimes it is good to let PHP do some calculations that MySql can do itself. MySql databases are much harder to scale, while a server for PHP processing can be added in in the matter of minutes. And different PHP threads can run on different CPU cores - MySql have problems with it. You can increase your PHP performace by using some advanced modules(you can even write them yourself - profile your PHP scripts and hard code bottlenecks in fast C code).
Last but I think the most important - you must use some type of caching. I know that it is really hard, but I don't think that there was any big project without a really good caching system. In your case some tags will surely be much more popular then others, so it should greately increase performance. Caching is a form of art - depending on how much time you can spend on it and how much resources are avaliable you can make 99% of all requests use cache.
Using other databases/indexing tools may help you, but you should always consider theoretical query speed comparison(O(n), O(nlog(n))...) to understand if they can really help you - using this tools sometimes give you low performance gain(like constant 20%), but they may complicate your application design and most of the time it is not worth it.
From my experience most 'slow' MySQL database doesn't have correct index and/or queries. So I would check these first:
Make sure all data talbes' id fields is primary index. Just in case.
For all data tables, create an index on the external id fields and then the id, so that MySQL can use it in search.
For your glue tables, setting a primary key on the two fields, first the subject, then the tag. This is for normal browsing. Then create a normal index on the tag id. This is for searching.
Still slow? Are you using MyISAM for your tables? It is designed for quick queries.
If still slow, run an EXPLAIN on a slow query and post both the query and result in the question. Preferably with an importable sql dump of your complete database structure.
Things you may give a try:
Use a Query Analyzer to explore the bottlenecks of your querys. (In most times the underlying DBS is quite doing an amazing job in optimizing)
Your table structure is well normalized but personal experience showed me that you can archive much greater performance levels with structures that enable you to avoid joins& subquerys. For your case i would suggest to store the tag information in one field. (This requires support by the underlying DBS)
So far.
Check your indices, and if they are used correctly. Maybe MySQL isn't up to the task. PostgreSQL should be similiar to use but has better performance in complex situations.
On a completely different track, google map-reduce and use one of these new fancy no-SQL databases for really really large data sets. This can do distributed search on multiple servers in parallel.

How do I add "friends" function in PHP?

Hello stack overflow I need help with this problem.
Ok, I have a flat file database in php it records users , hobbies, fav movies and all.
Now i want to add a buddy system so they can have friends and send messages to each other in php without SQL.
Consider having another table (er, flat file?) that maintains links. "Mark" and "John" are buddies if there exists a row in this table (ff?) that links "Mark" and "John". I'd recommend using some sort of index (you know, like a primary key).
Suppose you have a users table (or flat file, whatever, it doesn't matter that much) that contains users and some data, it looks like this:
UID | Username | Hobbies
------------------------
1 | Mark | Swimming, Sailing, Skiing
2 | John | Biking, Paragliding
3 | Suzie | Flying, Skiing
And you have this other friends table (again, flat file, whatever...):
Pair ID | A | B
----------------
1 | 1 | 2
2 | 2 | 3
We've encoded in this friends table that Mark and John are friends, and that John and Suzie are friends, but with the absence of relation 1 and 3, we know that Mark and Suzie are not friends (at least according to our records).
Note that if you want to get all of John's friends, you have to find all rows in your table (or file) that have John's UID (here = 2) in either column A or column B.
Well I am afraid there's no magical answer or a magical PHP function you can call to enable this behavior.
If we are to help you at all we really need some more to work with.
If you really, for mysterious reasons, decide to stick without a SQL database - then I would probably still "tilt" towards a SQL like way of storing it. Assuming you're currently storing each user as a row in a file, with each "field" separated by some character - simply add another "field" in the file and have this containing each "id" of every user that they're friends with (id, being whatever you use for that, could be a name as long as it is unique).
As for the messages, another flat file describing the message itself, sender and recipient would probably be the way to go.
Now the real question is, why so eager to avoid using a SQL database? If it is because of having to install a database, try SQLite

Categories