Searching a huge social Database

Searching a huge social Database - php

I am working on implementing search in my Social Networking website and I have this problem.
Say, User "A" is searching for "John" and if that user is a friend of User "A" or friend of friends or friend of friend of friend... recursion level goes till 100, I will be able to find it with search.
What if User "A" has just signed up and has no friends and he is searching for "John" and I have no proper info I can use to filter records? I will have to search my entire database for "John" (ofcourse I am limiting the total search result to 5 using MySQL LIMIT clause).
Is this method efficient or is there anything else I can do to avoid this problem?
And also, First Name, Middle Name, Last Name cannot be set as index since they are not unique. So, I am searching a non-indexed column (without preindexing (Google does preindexing I think) ) and I am using MySQL LIKE for this. So, what should I do about it when considering improving performance?
Using MySQL, PHP. Thanks in advance

Here are some of the problems I see with what you are doing..
By using limit 5 on your Query you are always going to show the first or last 5 John's that the query finds (based on the query).. I would request new users to add their 1st few friends by email (which is unique) or first and last name (non unique)
once the new user has multiple friends you will be able to provide better search results and will be able to avoid always showing the same people to everyone who uses the same keyword/name

Related

Optimizing the friend-relationship storage in MySQL

Assumptions
If A is a friend of B, B is also a friend of A.
I searched for this question and there are already lots of questions on Stack Overflow. But all of them suggest the same approach.
They are creating a table friend and have three columns from, to and status. This serves both purposes : who sent friend request as well as who are friends if status is accepted.
But this means if there are m users and each user has n friends, then I will have mn rows in the friends table.
What I was thinking is to store friends list in a text column. For every user I have a single row and a friends column which will have all accepted friends' IDs separated by a character, say | which I can explode to get all friends list. Similarly, I will have another column named pending requests. When a request is accepted, IDs will move from pending requests to friends column.
Now, this should significantly reduce the entries in the table and the search time.
The only overhead will be when I will have to delete a friend, I will have to retrieve the friend string, search the ID of the friend to be deleted, delete the ID and update the column. However, this is almost negligible if I assume a user cannot have more than 2000 friends.
I assume that I will definitely be forgetting some situations or this approach will have certain pitfalls. So please correct if so.

The answer is NO! Do not try to implement this idea - its complete disaster.
I am going to describe more precise why:
Relations. You are storing just keys separeted with |. What if you want to display list with names of friends? You will have to get list, explode it and make another n queries to DB. With relation table from | to | status you will be able to do that with one JOIN.
Deletions. Just horrible.
Inserts. For every insert you will need to do SELECT + UPDATE instead of INSERT.
Types. You should keep items in DB as they are, so integers as integers. Converting ints into string and back could cause some errors, bugs etc.
No ORM support. In future you will probably leave plain PHP for some framework. Take in mind that none of them will support your idea.
Search time?
Please do some tests. Search with WHERE + PRIMARY KEY is very fast.

MySQL : For big storage, should I use a single heavy column or a table with thousand of rows?

I build a like system for a website and I'm front of a dilemma.
I have a table where all the items which can be liked are stored. Call it the "item table".
In order to preserve the speed of the server, do I have to :
add a column in the item table.
It means that I have to search (with a regex in my PHP) inside a string where all the ID of the users who have liked the item are registered, each time a user like an item. This in order verify if the user in question has (or not) already liked the item before. In this case, I show a different button on my html.
Problem > If I have (by chance) 3000 liked on an item, I fear the string to begin very big and heavy to regex each time ther is a like
on it...
add a specific new table (LikedBy) and record each like separately with the ID of the liker, the name of the item and the state of the like (liked or not).
Problem > In this case, I fear for the MySQL server with thousand of rows to analyze each time a new user like one popular item...
Server version: 5.5.36-cll-lve MySQL Community Server (GPL) by Atomicorp
Should I put the load on the PHP script or the MySql Database? What is the most performant (and scalable)?
If, for some reasons, my question does not make sens could anyone tell me the right way to do the trick?
thx.

You have to create another table call it likes_table containing id_user int, id_item int that's how it should be done, if you do like your proposed first solution your database won't be normalized and you'll face too many issues in the future.
To get count of like you just have to
SELECT COUNT(*) FROM likes_table WHERE id_item='id_item_you_are_looking_for';
To get who liked what:
SELECT id_item FROM likes_table WHERE id_user='id_user_you_are_looking_for';
No regex needed nothing, and your database is well normalized for data to be found easily. You can tell mysql to index id_user and id_item making them unique in likes_table this way all your queries will run much faster

With MySQL you can set the user ID and the item ID as a unique pair. This should improve performance by a lot.
Your table would have these 2 columns: item id, and user id. Every row would be a like.

MySql and PHP - Get the top value in a "tree", based on a lower value

I really don't know how to word this, so the "similar questions" and searching are doing much good.
I've got a table of Users, and each User is assigned a "level" based on where they are in the org. chart. It's sales-based... so you have "Sellers", "Sales Managers","Regional Managers", etc. The way I set up the database is to have a join table with the user_id and the manager_user_id paired up. So a Seller with id 4 is paired with a Manager whose id is 12, and 12 is matched with a Regional Mgr whose id is 34, and so on.
What I'd like to do is get the top most Manager's information based on a the lower Seller's id. So in the example above, I input id #4 and get the info for user_id 34.
I know I could do a loop, and keep returning the manager_user_id until it returns no rows... but that would mean 5 separate queries. Is there a single query that could accomplish this?
If a similar question has been asked before, that would be great too. Thanks!

Not with the database structure you are describing. More full-featured DBMSs allow you to do recursive queries, but MySQL does not have this capability. You could write a stored procedure that does it, which allows you to get the topmost manager with a single SQL statement issued from your application (the logic for determining the topmost manager being handled by the DBMS).

Trending Search Help

My site acts like a search engine where people enter search queries on the main page. I wanted to make a trending / recent feature where each query gets recorded into a mysql database, then from that data, calculates which searches are being searched the most, and then displayed back on the page labeled as trending searches. Also, under that, I would like "recent searches" which simply displays the last 5 or so searches.
Honestly, I have no experience with mysql. I don't even know how to move data from my site to mysql. Any help would be appreciated. I searched and searched these questions and google, but didn't find anything. Thanks!

First of all, you need to CREATE a DATABASE, in which you want a table with a timestamp and the keyword that's been searched. (CREATE TABLE)
Then you want to store each keyword access into this table (INSERT INTO ... VALUES ...)
Then you can select the top key words by creating a SELECT query with a "GROUP BY keyword", ORDER ing by COUNT(*) (the number of occurrences of a keyword)
This is a bit vague, but you'll need to go through a number of steps so I've uppercased the terms you'd need to google for each step. Do come back if you run into complications in any of those steps!

Database structure for saving search results

I currently work for a social networking website.
My boss recently had the idea to show search results by random instead of normal results (registration date). The problem with that is simple and obvious: if you go from one page to another, it's going to show you different results each time as the list is randomized each time.
I had the idea to store results in database+cookies something like this:
Cookie containing a serialized version of the $_POST request (needed if we want to do a re-sort)
A table which would serve as the base for the search id => searches (id,user_id, creation_date)
A table which would store the results and their order => searches_results (search_id, order, user_id)
Flow chart would look like something like that:
After each searches I store the "where" into a cookie or session
Then I erase the previous search in "searches"
Then I delete previous results in "searches_results"
Then I insert a row into "searches" for the key
Then I insert each user row into "searches_results"
And finally I redirect the user to somethink like ?search_id=[search_key]
There is a big flaw here : performances .... it is definetly possible to make the system OR down OR very slow.
Any idea what would be the best to structure this ?

What if instead of ordering randomly, you ordered by some function where the order is known and repeatable, just non-obvious? You could seed such a function with some data from the search query to make it be even less obvious that it repeats. This way, you can page back and forth through your results and always get what you expect. Music players use this sort of function for their shuffle feature (so that if you click back, you get the previous song, and if you click next again, you're back where you started). I'm sure you can divine some function to accomplish this... bitwise XORing ID values with some constant (from the query) and then sorting by the resulting number might be sufficient. I chose XOR arbitrarily because it's a trivially simple function that will get you repeatable and non-obvious results.

Hum maybe, but doesn't the xor operator only say if it is an OR exclusive ? I mean, there is no mathematical operation here, as far as I know of tho.

Sorry, I know this doesn't help, but I don't understand why your boss would want this?
I know that if I search for a person on a social network, then I want the results to be ordered by relevance and relevance only. I think that randomized results would just be frustrating for the user, but maybe that's just me.
For example, if I search for "John Smith", then first first batch of results better be people named "John Smith". Then show me similar names near the end of the results. I don't want to search for "John Smith" and get "Jon Smithers" as my second result.

Well, I'm with Matt in asking "Why?"
I think rmeador has a good suggestion as well. You could randomly sort by a different field or some sort of algorithm. Just from the permutations of DESC / ASC on last updated or some other result field.
Other option would be to do an initial search the first time and return only related ID's and then store the full ID's string in the database and each subsequent page is then a lookup against those ID's.
My two cents.
I can see a scenario where a randomized result set is useful but not for searching but for browsing profiles or artists or local events. It offers more exposure to those that wouldn't show up in a traditionally directed search.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Searching a huge social Database - php

Related

Optimizing the friend-relationship storage in MySQL

MySQL : For big storage, should I use a single heavy column or a table with thousand of rows?

MySql and PHP - Get the top value in a "tree", based on a lower value

Trending Search Help

Database structure for saving search results

Categories

Resources